DEV Community

Amit Malhotra
Amit Malhotra

Posted on

GKE Security: Why Monitoring Isn't Enough for Compliance

Your GKE Cluster Passed Security Review — And It's Still Not Audit-Ready

Most GKE clusters I audit look secure on paper. Workload Identity enabled. Private cluster networking. RBAC configured. The security checklist is complete, the platform team feels confident, and then the SOC 2 auditor asks a simple question: "Show me the control that prevents someone from deploying a cluster without these settings."

That's when the conversation gets uncomfortable.

The Gap Between Monitoring and Prevention

Here's the pattern I see repeatedly across SaaS companies running production workloads on GKE: teams implement security controls inside their clusters but leave the provisioning layer wide open.

Security Command Center is enabled. GKE Security Posture dashboard shows green. The team has even run the CIS GKE Benchmark manually and fixed the findings. Everything looks good.

But there's no preventive control at the organization level. Any engineer with the right IAM permissions can spin up a new cluster tomorrow with default settings — no Workload Identity, no Shielded Nodes, client certificates enabled. That cluster will appear in SCC findings eventually, but by then it's running production traffic.

The business risk here isn't theoretical. I've watched teams scramble during SOC 2 preparation when auditors ask for evidence that non-compliant infrastructure cannot be provisioned. "We monitor for drift" is not the same answer as "we prevent it at the platform layer."

What Most Teams Get Wrong About GKE Security

The CIS GKE Benchmark exists. GCP has native tooling to enforce it. Security Command Center can surface violations in real time. But teams implement these pieces in isolation rather than as a layered enforcement system.

In my experience working with B2B SaaS companies preparing for audits, the breakdown usually looks like this:

Detection without prevention. SCC is enabled, but findings pile up with no remediation workflow. The dashboard becomes noise. Nobody reviews it weekly because there's no escalation path.

Benchmark compliance as a point-in-time exercise. Teams run CIS scans before an audit, fix the violations, and move on. Six months later, a new cluster gets provisioned with the same issues because nothing prevents it.

Security posture without organizational enforcement. The GKE Security Posture dashboard now integrates beautifully with SCC to surface OS vulnerabilities, workload misconfigurations, and exposed secrets per workload. Most teams I work with don't know this integration exists — and even fewer use Organization Policies to enforce the same controls preventively.

The result is a cluster that looks secure but fails the audit question that actually matters: "What prevents this from happening in the first place?"

Org Policies Are the Missing Layer

Custom Organization Policies are where GKE security moves from reactive to preventive. Instead of detecting a non-compliant cluster after deployment, you block the creation of that cluster before it happens.

This is the Security by Design principle in the SCALE framework — your controls should prevent misconfiguration at provisioning time, not just detect it afterward.

Here's a concrete example. To enforce Workload Identity across your entire organization:

constraint: constraints/container.requireWorkloadIdentity
Enter fullscreen mode Exit fullscreen mode

Apply this at the organization or folder level, and any gcloud container clusters create command that doesn't include Workload Identity configuration fails immediately. No cluster gets provisioned. No SCC finding to remediate later.

The same pattern applies to Shielded Nodes, client certificate issuance, and other CIS Benchmark controls. In Terraform, this enforcement happens before the cluster exists:

resource "google_container_cluster" "main" {
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
  enable_shielded_nodes = true
  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

When Org Policies are in place, Terraform plans that violate the policy fail during terraform plan — not after the cluster is running.

The SCC Integration Most Teams Miss

GKE Security Posture dashboard now integrates directly with Security Command Center. This surfaces top threats per workload in a unified view — OS vulnerabilities from base images, workload misconfigurations, secrets accidentally committed to pods.

I've seen teams enable SCC and enable GKE Security Posture as separate activities, never realizing they feed the same dashboard. The integration matters because it gives you one place to track both infrastructure-level compliance (Org Policies) and workload-level threats (runtime security findings).

If you're using SCC Enterprise tier, those findings also feed into Chronicle for SIEM correlation. That's useful for threat detection, but for most SaaS companies, the Standard tier's vulnerability findings are the priority for SOC 2.

The Trade-offs Are Real

Org Policies are blunt instruments. They enforce at the organization or folder level, which means a policy that makes sense for production clusters might block legitimate experimentation in development environments.

The solution is folder-level scoping. Put production projects under a folder with strict enforcement. Put sandbox and development projects under a different folder with relaxed policies. This gives you preventive controls where they matter without blocking engineers from learning.

SCC tier matters more than most teams realize. Standard tier gives you vulnerability findings — container image CVEs, misconfigurations, exposed secrets. Enterprise tier adds threat detection — suspicious network activity, potential lateral movement, compromise indicators. Most SaaS companies preparing for SOC 2 can start with Standard tier and upgrade when threat intelligence becomes a priority.

Retroactive enforcement is painful. If you have existing GKE clusters that violate the Org Policies you want to enforce, enabling those policies doesn't fix the existing clusters — it just blocks new non-compliant ones. You need a remediation plan for existing infrastructure, and that means scheduled maintenance windows.

What Auditors Actually Ask For

SOC 2 auditors increasingly ask for evidence of preventive controls, not just detective ones. This is the shift that catches teams off guard.

"We monitor for this" is no longer sufficient when the follow-up question is "What stops an engineer from bypassing that monitoring?"

Org Policies enforcing CIS Benchmark controls is the difference between those two answers. It's the evidence that your security posture is structural, not procedural. When an auditor asks how you ensure all GKE clusters use Workload Identity, you show them the constraint that blocks any cluster creation without it.

If you're heading into an audit with GKE clusters, this is the first place to look. Not because the technical implementation is complex — it isn't — but because it changes the nature of your control evidence from "we detect and respond" to "we prevent."

That distinction matters more than most teams realize until they're sitting across from an auditor explaining why a finding from six months ago is still open.


About the Author

Amit Malhotra is Principal GCP Architect at Buoyant Cloud Inc, where he helps B2B SaaS companies design audit-ready GKE platforms and implement the SCALE framework for cloud infrastructure.

Work with a GCP specialist — book a free discovery call


Work with a GCP specialist — book a free discovery callhttps://buoyantcloudtech.com

Top comments (0)