The starting point
Most data platforms grow before they are governed. The first hard question is usually who can read what, and that question gets harder once a dozen teams have copied tables sideways.
GCP gives you tight primitives to fix this early. Customer managed keys in Cloud KMS, column and row policies in BigQuery, VPC Service Controls around the perimeter, and Workload Identity Federation for non human access.
Pipeline
Sources land in Cloud Storage. A DLP scan runs on landing and tags objects with sensitivity. Dataflow normalises and writes into BigQuery raw, then Dataform shapes curated tables. Every step runs inside a VPC Service Controls perimeter.
Encryption uses CMEK at rest in every service. The same key ring rotates quarterly, with HSM backed keys for the highest sensitivity classes.
Access model
Access is granted to groups, never individuals. Column policies attach to taxonomy tags, so a single decision (mark column as PII) propagates to every consumer.
- workload identity federation for github actions and external services
- access approval required for support engineer reads
- iam conditions to scope role bindings to projects and time windows
- audit logs streamed to bigquery and pinned to scc
- break glass procedure logged and reviewed weekly
Cost and operations
BigQuery slots are reserved for predictable workloads. Ad hoc and exploratory queries land on the on demand pool with a per user cap. Storage cost is controlled by table partitioning and lifecycle rules on raw zones.
References
Official documentation and standards we draw on for this pattern.
Google Cloud Security Foundations Guide
cloud.google.com
BigQuery customer managed encryption keys
cloud.google.com
VPC Service Controls overview
cloud.google.com
Workload Identity Federation
cloud.google.com
Cloud Data Loss Prevention
cloud.google.com
Security Command Center
cloud.google.com
Links open in a new tab
Takeaway
Governance is cheap when you set it up before the data lands. It is a programme of work once the data is already everywhere.