<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Malhotra</title>
    <description>The latest articles on DEV Community by Amit Malhotra (@buoyantcloudinc).</description>
    <link>https://dev.to/buoyantcloudinc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3802617%2F9582f413-6a47-49ad-8fb7-bb2aa20db577.png</url>
      <title>DEV Community: Amit Malhotra</title>
      <link>https://dev.to/buoyantcloudinc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/buoyantcloudinc"/>
    <language>en</language>
    <item>
      <title>Zero Trust Requires IAM Hygiene, Not Just Products</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 14 Apr 2026 15:04:45 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-3286</link>
      <guid>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-3286</guid>
      <description>&lt;h1&gt;
  
  
  Zero Trust Isn't a Product — It's What Happens When You Actually Review IAM
&lt;/h1&gt;

&lt;p&gt;Most GCP organizations I assess have a zero trust problem they don't know about. They've configured VPC Service Controls. They've enabled BeyondCorp. They've checked the "zero trust" boxes on their security roadmap. But when I export their IAM bindings to BigQuery and run a simple query, I find service accounts with &lt;code&gt;roles/editor&lt;/code&gt; granted two years ago that have never been reviewed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero trust without IAM hygiene is security theater.&lt;/strong&gt; The perimeter controls are there, but inside the perimeter, every service account has the keys to the kingdom.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Wants to Own
&lt;/h2&gt;

&lt;p&gt;Least privilege is the goal. Everyone agrees on this. The problem is that nobody achieves it manually across a GCP org with dozens of projects and hundreds of service accounts.&lt;/p&gt;

&lt;p&gt;Here's the pattern I see repeatedly in mid-market SaaS companies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial platform setup happens fast — engineers grant &lt;code&gt;roles/owner&lt;/code&gt; to service accounts because it works and they're under deadline pressure&lt;/li&gt;
&lt;li&gt;Security reviews happen quarterly (if at all) and focus on project-level IAM, missing org-wide patterns&lt;/li&gt;
&lt;li&gt;Nobody has a clear owner for IAM hygiene, so recommendations pile up indefinitely&lt;/li&gt;
&lt;li&gt;SOC 2 auditors ask for evidence of periodic access reviews, and the team scrambles to produce manual spreadsheets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fundamental issue isn't technical capability. GCP gives you everything you need to operationalize least privilege. The issue is that IAM governance requires a workflow, an owner, and a system of record. Most organizations have none of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Recommender Exists — But Nobody Uses It Properly
&lt;/h2&gt;

&lt;p&gt;IAM Recommender is one of the most underutilized tools in GCP. It automatically surfaces over-privileged bindings — roles granted that haven't been used in 90 days. It's doing the analysis work that would take a human weeks to do manually.&lt;/p&gt;

&lt;p&gt;But here's what I've seen: teams enable IAM Recommender, look at the recommendations once, feel overwhelmed by the volume, and never act on them.&lt;/p&gt;

&lt;p&gt;The recommendations pile up. Nothing changes. The audit comes around, and the team is in the same position they were in a year ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missing piece is the analysis layer.&lt;/strong&gt; IAM Recommender gives you individual recommendations per principal per resource. That's useful for tactical fixes, but it doesn't give you the strategic view. You can't see patterns across your org. You can't prioritize by risk. You can't track remediation progress over time.&lt;/p&gt;

&lt;p&gt;This is where BigQuery changes the game.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operationalizing Zero Trust with BigQuery
&lt;/h2&gt;

&lt;p&gt;Exporting IAM Recommender data to BigQuery lets you run org-wide analysis at scale. Instead of reviewing recommendations one by one in the console, you can query your entire IAM posture programmatically.&lt;/p&gt;

&lt;p&gt;Start with Cloud Asset Inventory to export IAM bindings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud asset &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--organization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ORG_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--billing-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--asset-types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"iam.googleapis.com/ServiceAccount"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output-bigquery-table&lt;/span&gt; projects/PROJECT/datasets/DATASET/tables/iam_export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query for the highest-risk patterns — service accounts with &lt;code&gt;roles/editor&lt;/code&gt; or &lt;code&gt;roles/owner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;members&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.dataset.iam_export`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'roles/editor'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'roles/owner'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In one SaaS company I worked with, this query revealed 47 service accounts with &lt;code&gt;roles/editor&lt;/code&gt; at the project level. Fifteen of those service accounts had additional roles — some with 15+ unused permissions going back two years. The platform team had no idea.&lt;/p&gt;

&lt;p&gt;For recommendations specifically, use the Recommender API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud recommender recommendations list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--recommender&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;google.iam.policy.Recommender &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also integrate IAM Recommender findings with Security Command Center. Recommendations surface as findings with the &lt;code&gt;google.iam.policy.Insight&lt;/code&gt; finding type. Route these to your ticketing system, and you've got an automated workflow that didn't exist before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes When You Have the Data
&lt;/h2&gt;

&lt;p&gt;Once you have IAM analysis in BigQuery, several things become possible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk prioritization.&lt;/strong&gt; Not all over-privileged bindings are equal. A service account with &lt;code&gt;roles/owner&lt;/code&gt; on your production data project is more urgent than one with &lt;code&gt;roles/editor&lt;/code&gt; on a sandbox project. BigQuery lets you join IAM data with resource metadata to prioritize by blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation tracking.&lt;/strong&gt; Run the same query weekly. Track the count of high-risk bindings over time. Show the trend line to auditors. This is the evidence of continuous improvement that SOC 2 controls require.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ownership visibility.&lt;/strong&gt; BigQuery analysis often reveals that nobody knows who created certain service accounts or why they exist. This visibility forces the conversation about IAM ownership that most orgs avoid.&lt;/p&gt;

&lt;p&gt;The Lifecycle Operations stage of the SCALE Framework is where most teams fall short. They have security controls in place, but no ongoing governance process. BigQuery + IAM Recommender gives you the operational layer that makes governance sustainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-Offs You Need to Understand
&lt;/h2&gt;

&lt;p&gt;This approach isn't without complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;90-day usage window limitations.&lt;/strong&gt; IAM Recommender looks at the last 90 days of activity. If you have seasonal workloads or jobs that run quarterly, they'll get flagged as unused. Review recommendations before auto-remediating. I've seen teams accidentally revoke permissions from their disaster recovery service accounts because those accounts only get used during DR tests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom role maintenance burden.&lt;/strong&gt; The proper remediation for over-privileged bindings is often a custom role scoped to actual API usage. But custom roles require maintenance. When GCP releases new APIs, custom roles don't automatically get new permissions. Someone has to own the role lifecycle, or you'll break workloads when GCP updates services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Point-in-time exports.&lt;/strong&gt; A single BigQuery export gives you a snapshot. For continuous monitoring, set up scheduled exports via Cloud Asset Inventory feeds. This adds infrastructure to maintain, but it's the only way to make IAM governance truly continuous.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question You Need to Answer
&lt;/h2&gt;

&lt;p&gt;Zero trust is an architecture principle, not a product you buy. IAM Recommender gives you the data. BigQuery gives you the analysis layer. The tools exist.&lt;/p&gt;

&lt;p&gt;What's missing in most organizations is the remediation workflow and ownership. If nobody owns IAM hygiene, the recommendations pile up and nothing changes. You'll have all the visibility in the world and no improvement to show for it.&lt;/p&gt;

&lt;p&gt;The question isn't whether to implement this pattern. The question is: who in your organization owns IAM governance, and what happens when they find 200 over-privileged service accounts?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the oldest unused role binding you've found in your GCP org?&lt;/strong&gt; I've seen some that predate the company's SOC 2 certification by years.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Work with a GCP specialist — &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>zerotrust</category>
      <category>iam</category>
      <category>gcp</category>
      <category>cloudsecurity</category>
    </item>
    <item>
      <title>Zero Trust Requires IAM Hygiene, Not Just Products</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:58:34 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-3d8m</link>
      <guid>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-3d8m</guid>
      <description>&lt;h1&gt;
  
  
  Zero Trust Isn't a Product — It's What Happens When You Actually Review IAM
&lt;/h1&gt;

&lt;p&gt;Most GCP organizations I assess have a zero trust problem they don't know about. They've configured VPC Service Controls. They've enabled BeyondCorp. They've checked the "zero trust" boxes on their security roadmap. But when I export their IAM bindings to BigQuery and run a simple query, I find service accounts with &lt;code&gt;roles/editor&lt;/code&gt; granted two years ago that have never been reviewed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero trust without IAM hygiene is security theater.&lt;/strong&gt; The perimeter controls are there, but inside the perimeter, every service account has the keys to the kingdom.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Wants to Own
&lt;/h2&gt;

&lt;p&gt;Least privilege is the goal. Everyone agrees on this. The problem is that nobody achieves it manually across a GCP org with dozens of projects and hundreds of service accounts.&lt;/p&gt;

&lt;p&gt;Here's the pattern I see repeatedly in mid-market SaaS companies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial platform setup happens fast — engineers grant &lt;code&gt;roles/owner&lt;/code&gt; to service accounts because it works and they're under deadline pressure&lt;/li&gt;
&lt;li&gt;Security reviews happen quarterly (if at all) and focus on project-level IAM, missing org-wide patterns&lt;/li&gt;
&lt;li&gt;Nobody has a clear owner for IAM hygiene, so recommendations pile up indefinitely&lt;/li&gt;
&lt;li&gt;SOC 2 auditors ask for evidence of periodic access reviews, and the team scrambles to produce manual spreadsheets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fundamental issue isn't technical capability. GCP gives you everything you need to operationalize least privilege. The issue is that IAM governance requires a workflow, an owner, and a system of record. Most organizations have none of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Recommender Exists — But Nobody Uses It Properly
&lt;/h2&gt;

&lt;p&gt;IAM Recommender is one of the most underutilized tools in GCP. It automatically surfaces over-privileged bindings — roles granted that haven't been used in 90 days. It's doing the analysis work that would take a human weeks to do manually.&lt;/p&gt;

&lt;p&gt;But here's what I've seen: teams enable IAM Recommender, look at the recommendations once, feel overwhelmed by the volume, and never act on them.&lt;/p&gt;

&lt;p&gt;The recommendations pile up. Nothing changes. The audit comes around, and the team is in the same position they were in a year ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missing piece is the analysis layer.&lt;/strong&gt; IAM Recommender gives you individual recommendations per principal per resource. That's useful for tactical fixes, but it doesn't give you the strategic view. You can't see patterns across your org. You can't prioritize by risk. You can't track remediation progress over time.&lt;/p&gt;

&lt;p&gt;This is where BigQuery changes the game.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operationalizing Zero Trust with BigQuery
&lt;/h2&gt;

&lt;p&gt;Exporting IAM Recommender data to BigQuery lets you run org-wide analysis at scale. Instead of reviewing recommendations one by one in the console, you can query your entire IAM posture programmatically.&lt;/p&gt;

&lt;p&gt;Start with Cloud Asset Inventory to export IAM bindings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud asset &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--organization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ORG_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--billing-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--asset-types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"iam.googleapis.com/ServiceAccount"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output-bigquery-table&lt;/span&gt; projects/PROJECT/datasets/DATASET/tables/iam_export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query for the highest-risk patterns — service accounts with &lt;code&gt;roles/editor&lt;/code&gt; or &lt;code&gt;roles/owner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;members&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.dataset.iam_export`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'roles/editor'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'roles/owner'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In one SaaS company I worked with, this query revealed 47 service accounts with &lt;code&gt;roles/editor&lt;/code&gt; at the project level. Fifteen of those service accounts had additional roles — some with 15+ unused permissions going back two years. The platform team had no idea.&lt;/p&gt;

&lt;p&gt;For recommendations specifically, use the Recommender API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud recommender recommendations list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--recommender&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;google.iam.policy.Recommender &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also integrate IAM Recommender findings with Security Command Center. Recommendations surface as findings with the &lt;code&gt;google.iam.policy.Insight&lt;/code&gt; finding type. Route these to your ticketing system, and you've got an automated workflow that didn't exist before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes When You Have the Data
&lt;/h2&gt;

&lt;p&gt;Once you have IAM analysis in BigQuery, several things become possible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk prioritization.&lt;/strong&gt; Not all over-privileged bindings are equal. A service account with &lt;code&gt;roles/owner&lt;/code&gt; on your production data project is more urgent than one with &lt;code&gt;roles/editor&lt;/code&gt; on a sandbox project. BigQuery lets you join IAM data with resource metadata to prioritize by blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation tracking.&lt;/strong&gt; Run the same query weekly. Track the count of high-risk bindings over time. Show the trend line to auditors. This is the evidence of continuous improvement that SOC 2 controls require.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ownership visibility.&lt;/strong&gt; BigQuery analysis often reveals that nobody knows who created certain service accounts or why they exist. This visibility forces the conversation about IAM ownership that most orgs avoid.&lt;/p&gt;

&lt;p&gt;The Lifecycle Operations stage of the SCALE Framework is where most teams fall short. They have security controls in place, but no ongoing governance process. BigQuery + IAM Recommender gives you the operational layer that makes governance sustainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-Offs You Need to Understand
&lt;/h2&gt;

&lt;p&gt;This approach isn't without complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;90-day usage window limitations.&lt;/strong&gt; IAM Recommender looks at the last 90 days of activity. If you have seasonal workloads or jobs that run quarterly, they'll get flagged as unused. Review recommendations before auto-remediating. I've seen teams accidentally revoke permissions from their disaster recovery service accounts because those accounts only get used during DR tests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom role maintenance burden.&lt;/strong&gt; The proper remediation for over-privileged bindings is often a custom role scoped to actual API usage. But custom roles require maintenance. When GCP releases new APIs, custom roles don't automatically get new permissions. Someone has to own the role lifecycle, or you'll break workloads when GCP updates services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Point-in-time exports.&lt;/strong&gt; A single BigQuery export gives you a snapshot. For continuous monitoring, set up scheduled exports via Cloud Asset Inventory feeds. This adds infrastructure to maintain, but it's the only way to make IAM governance truly continuous.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question You Need to Answer
&lt;/h2&gt;

&lt;p&gt;Zero trust is an architecture principle, not a product you buy. IAM Recommender gives you the data. BigQuery gives you the analysis layer. The tools exist.&lt;/p&gt;

&lt;p&gt;What's missing in most organizations is the remediation workflow and ownership. If nobody owns IAM hygiene, the recommendations pile up and nothing changes. You'll have all the visibility in the world and no improvement to show for it.&lt;/p&gt;

&lt;p&gt;The question isn't whether to implement this pattern. The question is: who in your organization owns IAM governance, and what happens when they find 200 over-privileged service accounts?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the oldest unused role binding you've found in your GCP org?&lt;/strong&gt; I've seen some that predate the company's SOC 2 certification by years.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Work with a GCP specialist — &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>zerotrust</category>
      <category>iam</category>
      <category>gcp</category>
      <category>cloudsecurity</category>
    </item>
    <item>
      <title>Zero Trust Requires IAM Hygiene, Not Just Products</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:53:24 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-1113</link>
      <guid>https://dev.to/buoyantcloudinc/zero-trust-requires-iam-hygiene-not-just-products-1113</guid>
      <description>&lt;h1&gt;
  
  
  Zero Trust Isn't a Product — It's What Happens When You Actually Review IAM
&lt;/h1&gt;

&lt;p&gt;Most GCP organizations I assess have a zero trust problem they don't know about. They've configured VPC Service Controls. They've enabled BeyondCorp. They've checked the "zero trust" boxes on their security roadmap. But when I export their IAM bindings to BigQuery and run a simple query, I find service accounts with &lt;code&gt;roles/editor&lt;/code&gt; granted two years ago that have never been reviewed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero trust without IAM hygiene is security theater.&lt;/strong&gt; The perimeter controls are there, but inside the perimeter, every service account has the keys to the kingdom.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Wants to Own
&lt;/h2&gt;

&lt;p&gt;Least privilege is the goal. Everyone agrees on this. The problem is that nobody achieves it manually across a GCP org with dozens of projects and hundreds of service accounts.&lt;/p&gt;

&lt;p&gt;Here's the pattern I see repeatedly in mid-market SaaS companies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial platform setup happens fast — engineers grant &lt;code&gt;roles/owner&lt;/code&gt; to service accounts because it works and they're under deadline pressure&lt;/li&gt;
&lt;li&gt;Security reviews happen quarterly (if at all) and focus on project-level IAM, missing org-wide patterns&lt;/li&gt;
&lt;li&gt;Nobody has a clear owner for IAM hygiene, so recommendations pile up indefinitely&lt;/li&gt;
&lt;li&gt;SOC 2 auditors ask for evidence of periodic access reviews, and the team scrambles to produce manual spreadsheets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fundamental issue isn't technical capability. GCP gives you everything you need to operationalize least privilege. The issue is that IAM governance requires a workflow, an owner, and a system of record. Most organizations have none of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Recommender Exists — But Nobody Uses It Properly
&lt;/h2&gt;

&lt;p&gt;IAM Recommender is one of the most underutilized tools in GCP. It automatically surfaces over-privileged bindings — roles granted that haven't been used in 90 days. It's doing the analysis work that would take a human weeks to do manually.&lt;/p&gt;

&lt;p&gt;But here's what I've seen: teams enable IAM Recommender, look at the recommendations once, feel overwhelmed by the volume, and never act on them.&lt;/p&gt;

&lt;p&gt;The recommendations pile up. Nothing changes. The audit comes around, and the team is in the same position they were in a year ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missing piece is the analysis layer.&lt;/strong&gt; IAM Recommender gives you individual recommendations per principal per resource. That's useful for tactical fixes, but it doesn't give you the strategic view. You can't see patterns across your org. You can't prioritize by risk. You can't track remediation progress over time.&lt;/p&gt;

&lt;p&gt;This is where BigQuery changes the game.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operationalizing Zero Trust with BigQuery
&lt;/h2&gt;

&lt;p&gt;Exporting IAM Recommender data to BigQuery lets you run org-wide analysis at scale. Instead of reviewing recommendations one by one in the console, you can query your entire IAM posture programmatically.&lt;/p&gt;

&lt;p&gt;Start with Cloud Asset Inventory to export IAM bindings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud asset &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--organization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ORG_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--billing-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--asset-types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"iam.googleapis.com/ServiceAccount"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output-bigquery-table&lt;/span&gt; projects/PROJECT/datasets/DATASET/tables/iam_export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query for the highest-risk patterns — service accounts with &lt;code&gt;roles/editor&lt;/code&gt; or &lt;code&gt;roles/owner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;members&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.dataset.iam_export`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bindings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;role&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'roles/editor'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'roles/owner'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In one SaaS company I worked with, this query revealed 47 service accounts with &lt;code&gt;roles/editor&lt;/code&gt; at the project level. Fifteen of those service accounts had additional roles — some with 15+ unused permissions going back two years. The platform team had no idea.&lt;/p&gt;

&lt;p&gt;For recommendations specifically, use the Recommender API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud recommender recommendations list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--recommender&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;google.iam.policy.Recommender &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also integrate IAM Recommender findings with Security Command Center. Recommendations surface as findings with the &lt;code&gt;google.iam.policy.Insight&lt;/code&gt; finding type. Route these to your ticketing system, and you've got an automated workflow that didn't exist before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes When You Have the Data
&lt;/h2&gt;

&lt;p&gt;Once you have IAM analysis in BigQuery, several things become possible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk prioritization.&lt;/strong&gt; Not all over-privileged bindings are equal. A service account with &lt;code&gt;roles/owner&lt;/code&gt; on your production data project is more urgent than one with &lt;code&gt;roles/editor&lt;/code&gt; on a sandbox project. BigQuery lets you join IAM data with resource metadata to prioritize by blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation tracking.&lt;/strong&gt; Run the same query weekly. Track the count of high-risk bindings over time. Show the trend line to auditors. This is the evidence of continuous improvement that SOC 2 controls require.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ownership visibility.&lt;/strong&gt; BigQuery analysis often reveals that nobody knows who created certain service accounts or why they exist. This visibility forces the conversation about IAM ownership that most orgs avoid.&lt;/p&gt;

&lt;p&gt;The Lifecycle Operations stage of the SCALE Framework is where most teams fall short. They have security controls in place, but no ongoing governance process. BigQuery + IAM Recommender gives you the operational layer that makes governance sustainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-Offs You Need to Understand
&lt;/h2&gt;

&lt;p&gt;This approach isn't without complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;90-day usage window limitations.&lt;/strong&gt; IAM Recommender looks at the last 90 days of activity. If you have seasonal workloads or jobs that run quarterly, they'll get flagged as unused. Review recommendations before auto-remediating. I've seen teams accidentally revoke permissions from their disaster recovery service accounts because those accounts only get used during DR tests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom role maintenance burden.&lt;/strong&gt; The proper remediation for over-privileged bindings is often a custom role scoped to actual API usage. But custom roles require maintenance. When GCP releases new APIs, custom roles don't automatically get new permissions. Someone has to own the role lifecycle, or you'll break workloads when GCP updates services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Point-in-time exports.&lt;/strong&gt; A single BigQuery export gives you a snapshot. For continuous monitoring, set up scheduled exports via Cloud Asset Inventory feeds. This adds infrastructure to maintain, but it's the only way to make IAM governance truly continuous.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question You Need to Answer
&lt;/h2&gt;

&lt;p&gt;Zero trust is an architecture principle, not a product you buy. IAM Recommender gives you the data. BigQuery gives you the analysis layer. The tools exist.&lt;/p&gt;

&lt;p&gt;What's missing in most organizations is the remediation workflow and ownership. If nobody owns IAM hygiene, the recommendations pile up and nothing changes. You'll have all the visibility in the world and no improvement to show for it.&lt;/p&gt;

&lt;p&gt;The question isn't whether to implement this pattern. The question is: who in your organization owns IAM governance, and what happens when they find 200 over-privileged service accounts?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the oldest unused role binding you've found in your GCP org?&lt;/strong&gt; I've seen some that predate the company's SOC 2 certification by years.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Work with a GCP specialist — &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>zerotrust</category>
      <category>iam</category>
      <category>gcp</category>
      <category>cloudsecurity</category>
    </item>
    <item>
      <title>Static Service Account Keys: Your Biggest GCP Identity Risk</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 24 Mar 2026 14:49:12 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/static-service-account-keys-your-biggest-gcp-identity-risk-210</link>
      <guid>https://dev.to/buoyantcloudinc/static-service-account-keys-your-biggest-gcp-identity-risk-210</guid>
      <description>&lt;h1&gt;
  
  
  Static Service Account Keys Are Still Your Biggest GCP Identity Risk
&lt;/h1&gt;

&lt;p&gt;Most GCP environments I audit have the same problem hiding in plain sight. Not misconfigured firewall rules. Not overly permissive IAM roles. Service account keys.&lt;/p&gt;

&lt;p&gt;I find them in GitHub repos, in CI/CD environment variables, stored on developer laptops, committed to private repos that "nobody external can access." The teams running these environments aren't careless. They're experienced engineers who set up keys years ago when it was the standard approach, and nobody has had the bandwidth to migrate.&lt;/p&gt;

&lt;p&gt;That key sitting in your Jenkins server is a ticking breach. And unlike a compromised password, a compromised GCP key doesn't trigger an account lockout after failed attempts. It just works — silently, indefinitely — until you notice the billing spike or the security incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $450k Weekend
&lt;/h2&gt;

&lt;p&gt;One team I worked with learned this the hard way. A service account key leaked through a public GitHub commit. The commit was reverted within hours, but the key was already harvested by automated scrapers. Over a single weekend, attackers spun up Cloud Run instances across every available region, running crypto mining workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bill: $450,000.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GCP support eventually provided credits, but the incident consumed weeks of engineering time, triggered their SOC 2 auditor's attention, and forced an emergency security review across their entire infrastructure.&lt;/p&gt;

&lt;p&gt;The key had been valid for three years. Nobody remembered creating it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most Teams Get Wrong
&lt;/h2&gt;

&lt;p&gt;The solution to this problem has existed for years: &lt;strong&gt;Workload Identity Federation&lt;/strong&gt;. External identities — GitHub Actions runners, GitLab CI, even AWS workloads — can exchange OIDC tokens for short-lived GCP credentials. No keys required.&lt;/p&gt;

&lt;p&gt;For GKE workloads, &lt;strong&gt;Workload Identity&lt;/strong&gt; lets Kubernetes Service Accounts impersonate GCP Service Accounts without any credentials stored in the cluster.&lt;/p&gt;

&lt;p&gt;These aren't new features. They're production-ready and well-documented. So why do I still find keys everywhere?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because teams implement one piece without completing the migration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I see this pattern constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WIF configured for GitHub Actions, but old keys left active "just in case the new approach breaks"&lt;/li&gt;
&lt;li&gt;Workload Identity enabled on GKE, but legacy deployments still mounting key files as secrets&lt;/li&gt;
&lt;li&gt;Org policy blocking key creation, but dozens of existing keys still valid and in use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The partial migration is almost worse than no migration. Your audit trail shows both authentication methods being used. Your security team can't tell which is legitimate. Your attackers now have two paths into your systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Identity Pattern That Actually Works
&lt;/h2&gt;

&lt;p&gt;Eliminating keys requires two components working together: &lt;strong&gt;Workload Identity Federation&lt;/strong&gt; and &lt;strong&gt;Service Account Impersonation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;WIF handles machine-to-machine authentication. Your GitHub Actions workflow authenticates to GCP without storing any secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google-github-actions/auth@v2&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;workload_identity_provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projects/123456/locations/global/workloadIdentityPools/github-pool/providers/github-provider&lt;/span&gt;
    &lt;span class="na"&gt;service_account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deploy-sa@project.iam.gserviceaccount.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No key to rotate. No secret to leak. The token expires automatically.&lt;/p&gt;

&lt;p&gt;For GKE, the Kubernetes Service Account annotation binds to a GCP Service Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud iam service-accounts add-iam-policy-binding deploy-sa@project.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt; &lt;span class="s2"&gt;"serviceAccount:project.svc.id.goog[production/app-ksa]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Service Account Impersonation&lt;/strong&gt; handles the human side. Instead of developers holding permanent credentials to a powerful service account, they impersonate a scoped service account on demand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;auth/impersonate_service_account deploy-sa@project.iam.gserviceaccount.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The developer's identity is still the audit principal. You can see exactly who impersonated which account, when, and what they did. Compare that to five engineers sharing the same downloaded key file — your audit logs just show the service account, with no way to trace the actual human.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Org Policy That Creates Friction
&lt;/h2&gt;

&lt;p&gt;Once you're confident your workloads don't need keys, enforce it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;constraints/iam.disableServiceAccountKeyCreation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This org policy prevents anyone from generating new keys. I've seen it implemented successfully — and I've seen it create chaos.&lt;/p&gt;

&lt;p&gt;The chaos happens when you enable the policy before educating your engineering team. Developers who don't know about WIF or &lt;code&gt;gcloud auth application-default login&lt;/code&gt; suddenly can't authenticate their local development environments. They file urgent tickets. They complain about "security blocking progress." Some creative ones figure out workarounds that are worse than the original keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The migration order matters.&lt;/strong&gt; Document the new authentication patterns. Train your developers. Set up WIF for CI/CD. Verify that no active workloads depend on keys. Then enable the org policy.&lt;/p&gt;

&lt;p&gt;This sequence aligns with the Security by Design phase of our SCALE framework — identity architecture has to be right before you build automation on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-offs Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;WIF and impersonation aren't without friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local development gets more complex.&lt;/strong&gt; With keys, developers could just set &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; and move on. With WIF, you need &lt;code&gt;gcloud auth application-default login&lt;/code&gt; workflows documented and understood. Some developers will resist this. Your platform team needs to make the secure path the easy path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit configuration has to be correct.&lt;/strong&gt; Impersonation creates cleaner audit trails, but only if you're capturing the right logs. &lt;code&gt;sts.googleapis.com&lt;/code&gt; events need to be in your Cloud Audit Logs configuration. I've seen teams implement impersonation and then realize months later that they weren't logging the token exchanges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-project impersonation gets complicated fast.&lt;/strong&gt; A service account in Project A impersonating a service account in Project B that accesses resources in Project C creates a chain that's hard to audit and easy to misconfigure. Keep impersonation chains to one hop maximum.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for SOC 2
&lt;/h2&gt;

&lt;p&gt;Every SOC 2 audit I've supported in the last three years has flagged service account keys. The auditors aren't wrong — long-lived credentials with no rotation policy and unclear ownership are a control gap.&lt;/p&gt;

&lt;p&gt;The finding usually reads something like: "Service account keys exist without defined rotation schedules or ownership assignment."&lt;/p&gt;

&lt;p&gt;You can write a policy that says keys must be rotated every 90 days. You can assign ownership in a spreadsheet. You can build automation to rotate keys. Or you can eliminate keys entirely and remove the finding at its root.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eliminating keys is not optional for regulated SaaS.&lt;/strong&gt; The migration path from keys to WIF is well-defined — the blocker is usually organizational, not technical. Someone has to own the project, inventory the existing keys, map them to workloads, and execute the migration without breaking production.&lt;/p&gt;

&lt;p&gt;That's the work. It's not glamorous. It doesn't involve new tools or exciting architecture diagrams. But it's the single highest-impact security improvement most GCP environments can make today.&lt;/p&gt;

&lt;p&gt;If identity boundaries are wrong, everything built on top of them inherits the risk.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc&lt;/p&gt;

&lt;p&gt;&lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;Work with a GCP specialist — book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gcp</category>
      <category>cloudsecurity</category>
      <category>serviceaccounts</category>
      <category>identitymanagement</category>
    </item>
    <item>
      <title>VPC Service Controls Private IP Gap: A Security Risk</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:47:16 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/vpc-service-controls-private-ip-gap-a-security-risk-271a</link>
      <guid>https://dev.to/buoyantcloudinc/vpc-service-controls-private-ip-gap-a-security-risk-271a</guid>
      <description>&lt;h1&gt;
  
  
  VPC Service Controls Without Private IP Coverage Is Security Theater
&lt;/h1&gt;

&lt;p&gt;Most GCP teams I work with have VPC Service Controls enabled. They check the compliance box, show auditors the perimeter configuration, and move on. What they don't realize is that their internal services can still exfiltrate data to external projects without triggering a single alert.&lt;/p&gt;

&lt;p&gt;The gap isn't in VPC-SC itself — it's in how teams deploy it. Private IP support in VPC-SC perimeters has been available for a while now, but I'd estimate fewer than 20% of the SaaS platforms I audit have actually implemented it. The rest have a perimeter that looks solid on paper but leaves the most common exfiltration path wide open.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Exfiltration Path Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;VPC Service Controls were designed to prevent data from leaving your GCP organization through managed services like BigQuery, Cloud Storage, and Secret Manager. The original implementation worked well for public internet traffic — if someone tried to copy data from your BigQuery dataset to an external project over the public API, VPC-SC blocked it.&lt;/p&gt;

&lt;p&gt;But traffic originating from private IP ranges inside your VPC? That wasn't covered.&lt;/p&gt;

&lt;p&gt;Think about what that means in practice. An attacker compromises a service account on a GKE workload running in your private network. They have access to BigQuery through that identity. With VPC-SC but without private IP coverage, they can query your datasets and write the results to an external project they control — all from inside your "protected" perimeter.&lt;/p&gt;

&lt;p&gt;I've seen this exact scenario during penetration tests. The security team was confident their VPC-SC perimeter would catch any data exfiltration attempt. It didn't. The test showed data flowing out through internal services that the perimeter didn't inspect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Gap Persists
&lt;/h2&gt;

&lt;p&gt;Three patterns explain why most teams haven't closed this gap:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dry-run mode paralysis.&lt;/strong&gt; VPC-SC is notoriously difficult to enforce without breaking production services. The safe approach is to run in dry-run mode first, watch the logs, and then switch to enforced. I've seen teams run in dry-run mode for six months or longer. At that point, dry-run becomes the permanent state — and dry-run mode doesn't actually block anything. It just logs what would have been blocked. That's monitoring, not protection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incomplete perimeter design.&lt;/strong&gt; Teams enable VPC-SC for BigQuery and Cloud Storage but skip Secret Manager, Cloud SQL Admin API, or other services that handle sensitive data. Attackers don't care which service holds your data — they'll take whatever path is open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Misconfigured ingress and egress rules.&lt;/strong&gt; Once private IP support is enabled, you need explicit ingress rules allowing legitimate internal traffic. Most teams either make these rules too broad (defeating the purpose) or too narrow (breaking production). The operational burden pushes teams toward permissive configurations or abandoning the feature entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Actually Works
&lt;/h2&gt;

&lt;p&gt;In my experience, closing the data exfiltration gap requires three components working together:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPC-SC with private IP coverage.&lt;/strong&gt; Configure your perimeter to inspect traffic from internal CIDR ranges, not just public internet traffic. This is the foundation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access Context Manager access levels tied to network origin and identity.&lt;/strong&gt; Don't just allow traffic from a private IP range — require that traffic to also come from a specific service account. Defense in depth means an attacker needs to compromise both the network position and the identity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingress rules scoped to specific services and methods.&lt;/strong&gt; If your data pipeline only needs to read from BigQuery, don't grant write access through the perimeter. Principle of least privilege applies to network perimeters just like it applies to IAM.&lt;/p&gt;

&lt;p&gt;Here's what a properly scoped ingress rule looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ingressPolicies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ingressFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/YOUR_PROJECT"&lt;/span&gt;
      &lt;span class="na"&gt;identities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;serviceAccount:data-pipeline@project.iam.gserviceaccount.com&lt;/span&gt;
    &lt;span class="na"&gt;ingressTo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;operations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bigquery.googleapis.com&lt;/span&gt;
          &lt;span class="na"&gt;methodSelectors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google.cloud.bigquery.v2.JobService.Query"&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/YOUR_PROJECT/datasets/production_data"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This rule allows one service account to run queries against one dataset. An attacker who compromises a different service account — or the same service account trying to write data externally — gets blocked.&lt;/p&gt;

&lt;p&gt;The Terraform resource for managing this is &lt;code&gt;google_access_context_manager_service_perimeter&lt;/code&gt;. I'd recommend managing all perimeter configuration through infrastructure as code. Manual console changes to VPC-SC perimeters are a recipe for configuration drift and broken production services.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for SOC 2 and Audit Readiness
&lt;/h2&gt;

&lt;p&gt;During SOC 2 audits, I've had auditors ask specifically about data exfiltration controls. "Show me how you prevent an insider or compromised credential from copying data outside your organization."&lt;/p&gt;

&lt;p&gt;VPC-SC without private IP coverage doesn't answer that question. You can show them the perimeter configuration, but if private IP traffic isn't covered, you have a control gap. Auditors who understand GCP will catch it. Auditors who don't will accept the checkbox — until you have an incident and the forensics reveal the gap.&lt;/p&gt;

&lt;p&gt;This is where the security-by-design principle from the SCALE framework matters most. If you build your perimeter architecture correctly from the start, SOC 2 evidence collection is straightforward. If you retrofit private IP coverage onto an existing perimeter, you're doing the work twice and risking production outages during the transition.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-Offs Are Real
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend VPC-SC with full private IP coverage is easy to operate. It adds friction to every new service deployment. Your platform team will field tickets from developers asking why their new Cloud Function can't reach BigQuery. The answer will be "because you didn't add it to the perimeter ingress rules," and that will slow down their sprint.&lt;/p&gt;

&lt;p&gt;The CIDR planning requirements are also non-trivial. If you have overlapping IP ranges across projects — common in organizations that grew without central network governance — you'll hit routing issues that are painful to debug.&lt;/p&gt;

&lt;p&gt;And not all GCP services support VPC-SC yet. Before designing your perimeter architecture, check the &lt;a href="https://cloud.google.com/vpc-service-controls/docs/supported-products" rel="noopener noreferrer"&gt;supported services list&lt;/a&gt;. Building a perimeter around services that don't support it creates gaps you can't close with configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Enforcement Question
&lt;/h2&gt;

&lt;p&gt;If your VPC-SC perimeter has been in dry-run mode for more than a month, you need to ask yourself an honest question: is it ever going to be enforced?&lt;/p&gt;

&lt;p&gt;Dry-run mode is valuable for the first two weeks. You watch the logs, identify legitimate traffic that would be blocked, and adjust your ingress and egress rules. After that, you either enforce the perimeter or admit that you're not actually protecting anything.&lt;/p&gt;

&lt;p&gt;I've worked with teams who ran dry-run mode for eight months because they were afraid of breaking production. During that time, they had zero data exfiltration protection. The perimeter existed on paper. In practice, it was monitoring, not security.&lt;/p&gt;

&lt;p&gt;Data exfiltration is the top concern in every regulated SaaS environment I work with. Your customers trust you with their data. VPC-SC with private IP support is one of the few controls that actually prevents data from leaving your organization through GCP's managed services.&lt;/p&gt;

&lt;p&gt;If you've been putting off this work, the gap is still open. What's your plan to close it?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;book a free discovery call&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amit Malhotra&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Principal GCP Architect, Buoyant Cloud Inc&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gcpsecurity</category>
      <category>vpcservicecontrols</category>
      <category>cloudsecurity</category>
      <category>dataexfiltration</category>
    </item>
    <item>
      <title>GKE Security: Why Monitoring Isn't Enough for Compliance</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 10 Mar 2026 14:37:24 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/gke-security-why-monitoring-isnt-enough-for-compliance-17ph</link>
      <guid>https://dev.to/buoyantcloudinc/gke-security-why-monitoring-isnt-enough-for-compliance-17ph</guid>
      <description>&lt;h1&gt;
  
  
  Your GKE Cluster Passed Security Review — And It's Still Not Audit-Ready
&lt;/h1&gt;

&lt;p&gt;Most GKE clusters I audit look secure on paper. Workload Identity enabled. Private cluster networking. RBAC configured. The security checklist is complete, the platform team feels confident, and then the SOC 2 auditor asks a simple question: "Show me the control that &lt;em&gt;prevents&lt;/em&gt; someone from deploying a cluster without these settings."&lt;/p&gt;

&lt;p&gt;That's when the conversation gets uncomfortable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap Between Monitoring and Prevention
&lt;/h2&gt;

&lt;p&gt;Here's the pattern I see repeatedly across SaaS companies running production workloads on GKE: teams implement security controls &lt;em&gt;inside&lt;/em&gt; their clusters but leave the provisioning layer wide open.&lt;/p&gt;

&lt;p&gt;Security Command Center is enabled. GKE Security Posture dashboard shows green. The team has even run the CIS GKE Benchmark manually and fixed the findings. Everything looks good.&lt;/p&gt;

&lt;p&gt;But there's no preventive control at the organization level. Any engineer with the right IAM permissions can spin up a new cluster tomorrow with default settings — no Workload Identity, no Shielded Nodes, client certificates enabled. That cluster will appear in SCC findings eventually, but by then it's running production traffic.&lt;/p&gt;

&lt;p&gt;The business risk here isn't theoretical. I've watched teams scramble during SOC 2 preparation when auditors ask for evidence that non-compliant infrastructure &lt;em&gt;cannot&lt;/em&gt; be provisioned. "We monitor for drift" is not the same answer as "we prevent it at the platform layer."&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most Teams Get Wrong About GKE Security
&lt;/h2&gt;

&lt;p&gt;The CIS GKE Benchmark exists. GCP has native tooling to enforce it. Security Command Center can surface violations in real time. But teams implement these pieces in isolation rather than as a layered enforcement system.&lt;/p&gt;

&lt;p&gt;In my experience working with B2B SaaS companies preparing for audits, the breakdown usually looks like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detection without prevention.&lt;/strong&gt; SCC is enabled, but findings pile up with no remediation workflow. The dashboard becomes noise. Nobody reviews it weekly because there's no escalation path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark compliance as a point-in-time exercise.&lt;/strong&gt; Teams run CIS scans before an audit, fix the violations, and move on. Six months later, a new cluster gets provisioned with the same issues because nothing prevents it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security posture without organizational enforcement.&lt;/strong&gt; The GKE Security Posture dashboard now integrates beautifully with SCC to surface OS vulnerabilities, workload misconfigurations, and exposed secrets per workload. Most teams I work with don't know this integration exists — and even fewer use Organization Policies to enforce the same controls preventively.&lt;/p&gt;

&lt;p&gt;The result is a cluster that &lt;em&gt;looks&lt;/em&gt; secure but fails the audit question that actually matters: "What prevents this from happening in the first place?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Org Policies Are the Missing Layer
&lt;/h2&gt;

&lt;p&gt;Custom Organization Policies are where GKE security moves from reactive to preventive. Instead of detecting a non-compliant cluster after deployment, you block the creation of that cluster before it happens.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;Security by Design&lt;/strong&gt; principle in the SCALE framework — your controls should prevent misconfiguration at provisioning time, not just detect it afterward.&lt;/p&gt;

&lt;p&gt;Here's a concrete example. To enforce Workload Identity across your entire organization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;constraint: constraints/container.requireWorkloadIdentity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply this at the organization or folder level, and any &lt;code&gt;gcloud container clusters create&lt;/code&gt; command that doesn't include Workload Identity configuration fails immediately. No cluster gets provisioned. No SCC finding to remediate later.&lt;/p&gt;

&lt;p&gt;The same pattern applies to Shielded Nodes, client certificate issuance, and other CIS Benchmark controls. In Terraform, this enforcement happens before the cluster exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_container_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;workload_identity_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;workload_pool&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_id}.svc.id.goog"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;enable_shielded_nodes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;master_auth&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;client_certificate_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;issue_client_certificate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Org Policies are in place, Terraform plans that violate the policy fail during &lt;code&gt;terraform plan&lt;/code&gt; — not after the cluster is running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SCC Integration Most Teams Miss
&lt;/h2&gt;

&lt;p&gt;GKE Security Posture dashboard now integrates directly with Security Command Center. This surfaces top threats per workload in a unified view — OS vulnerabilities from base images, workload misconfigurations, secrets accidentally committed to pods.&lt;/p&gt;

&lt;p&gt;I've seen teams enable SCC and enable GKE Security Posture as separate activities, never realizing they feed the same dashboard. The integration matters because it gives you one place to track both infrastructure-level compliance (Org Policies) and workload-level threats (runtime security findings).&lt;/p&gt;

&lt;p&gt;If you're using SCC Enterprise tier, those findings also feed into Chronicle for SIEM correlation. That's useful for threat detection, but for most SaaS companies, the Standard tier's vulnerability findings are the priority for SOC 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-offs Are Real
&lt;/h2&gt;

&lt;p&gt;Org Policies are blunt instruments. They enforce at the organization or folder level, which means a policy that makes sense for production clusters might block legitimate experimentation in development environments.&lt;/p&gt;

&lt;p&gt;The solution is folder-level scoping. Put production projects under a folder with strict enforcement. Put sandbox and development projects under a different folder with relaxed policies. This gives you preventive controls where they matter without blocking engineers from learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SCC tier matters more than most teams realize.&lt;/strong&gt; Standard tier gives you vulnerability findings — container image CVEs, misconfigurations, exposed secrets. Enterprise tier adds threat detection — suspicious network activity, potential lateral movement, compromise indicators. Most SaaS companies preparing for SOC 2 can start with Standard tier and upgrade when threat intelligence becomes a priority.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retroactive enforcement is painful.&lt;/strong&gt; If you have existing GKE clusters that violate the Org Policies you want to enforce, enabling those policies doesn't fix the existing clusters — it just blocks new non-compliant ones. You need a remediation plan for existing infrastructure, and that means scheduled maintenance windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Auditors Actually Ask For
&lt;/h2&gt;

&lt;p&gt;SOC 2 auditors increasingly ask for evidence of preventive controls, not just detective ones. This is the shift that catches teams off guard.&lt;/p&gt;

&lt;p&gt;"We monitor for this" is no longer sufficient when the follow-up question is "What stops an engineer from bypassing that monitoring?"&lt;/p&gt;

&lt;p&gt;Org Policies enforcing CIS Benchmark controls is the difference between those two answers. It's the evidence that your security posture is structural, not procedural. When an auditor asks how you ensure all GKE clusters use Workload Identity, you show them the constraint that blocks any cluster creation without it.&lt;/p&gt;

&lt;p&gt;If you're heading into an audit with GKE clusters, this is the first place to look. Not because the technical implementation is complex — it isn't — but because it changes the nature of your control evidence from "we detect and respond" to "we prevent."&lt;/p&gt;

&lt;p&gt;That distinction matters more than most teams realize until they're sitting across from an auditor explaining why a finding from six months ago is still open.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Amit Malhotra is Principal GCP Architect at Buoyant Cloud Inc, where he helps B2B SaaS companies design audit-ready GKE platforms and implement the SCALE framework for cloud infrastructure.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;Work with a GCP specialist — book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gke</category>
      <category>kubernetessecurity</category>
      <category>soc2compliance</category>
      <category>cloudsecurity</category>
    </item>
    <item>
      <title>Why Cloud Platforms Fail: It's the Sequence, Not the Tools</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Wed, 04 Mar 2026 00:40:26 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/why-cloud-platforms-fail-its-the-sequence-not-the-tools-5f5f</link>
      <guid>https://dev.to/buoyantcloudinc/why-cloud-platforms-fail-its-the-sequence-not-the-tools-5f5f</guid>
      <description>&lt;h1&gt;
  
  
  Why Your Cloud Platform Keeps Breaking — It's Not the Tools, It's the Sequence
&lt;/h1&gt;

&lt;p&gt;Most cloud failures I get called in to fix weren't caused by picking the wrong technology. They were caused by doing the right things in the wrong order.&lt;/p&gt;

&lt;p&gt;I've spent years designing and securing GCP infrastructure for enterprises like RBC, Tangerine Bank, Telus Health, and Loblaws — plus dozens of high-growth B2B SaaS companies where security and speed both had to work at the same time. The pattern is consistent: platforms don't collapse because someone chose Cloud Run over GKE, or PostgreSQL over Spanner. They collapse because security got bolted on after architecture decisions were already locked in. Because infrastructure got provisioned manually "just this once" and stayed that way. Because scaling was something to figure out later.&lt;/p&gt;

&lt;p&gt;Later always arrives faster than you expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Problems Nobody Talks About Until the Audit
&lt;/h2&gt;

&lt;p&gt;Here's what I keep seeing across engagements, regardless of company size or industry:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational complexity growing faster than the team.&lt;/strong&gt; Every new service adds overhead nobody planned for. Your platform team that comfortably managed three services is now drowning in twelve, and the cognitive load has become unsustainable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment drift.&lt;/strong&gt; Dev, Staging, and Production are configured differently in ways nobody fully documented. Bugs appear only in prod. Deployments that passed every test still fail in ways that take days to diagnose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security gaps discovered during audits — not designed out from the start.&lt;/strong&gt; SOC 2 Type II with Drata sounds straightforward until the auditor asks why your service accounts have project-wide editor permissions, or why your VPC has no network segmentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling surprises.&lt;/strong&gt; Platforms that work perfectly at 10,000 users fall over at 100,000. The architecture wasn't wrong — it just wasn't designed for what came next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud costs nobody can fully explain.&lt;/strong&gt; Spend growing 40% quarter over quarter, but revenue only growing 15%. Finance asks what's driving it. The honest answer is "we're not sure."&lt;/p&gt;

&lt;p&gt;These aren't tool problems. They're sequence problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Take: Order of Operations Matters More Than Technology Choice
&lt;/h2&gt;

&lt;p&gt;In my experience, the difference between platforms that scale gracefully and platforms that require emergency redesigns every 18 months comes down to one thing: whether the foundational decisions were made in the right order.&lt;/p&gt;

&lt;p&gt;This is why I built the SCALE Framework — not as a product, but as a diagnostic lens and design sequence that addresses the five failure modes I kept encountering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security by Design (S)&lt;/strong&gt; has to come first. Not because security is more important than everything else, but because security decisions made late are always weaker than security decisions made early. By the time most teams think seriously about IAM models, network segmentation, or workload identity, the architecture has already made those decisions for them — usually badly. Retrofitting Zero-Trust into a platform built on convenience-first permissions is painful, expensive, and never quite complete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud-Native Architecture (C)&lt;/strong&gt; comes next, because how you structure workloads determines what's even possible for automation, scaling, and cost management later. Cloud-native doesn't mean "runs on cloud." It means the platform takes advantage of how cloud infrastructure actually works — managed services over self-managed VMs, containers over monoliths, regional redundancy over single points of failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automation and Infrastructure as Code (A)&lt;/strong&gt; locks in consistency. I've seen teams with brilliant architecture and solid security still suffer from environment drift because infrastructure was provisioned manually. If you can't reproduce your environment from code, you don't have infrastructure — you have a snowflake. Terraform-driven provisioning, where every environment is deployed from the same codebase, reviewed like code, deployed without manual steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lifecycle Operations (L)&lt;/strong&gt; — what most people call DevSecOps — treats deployment as part of development, not something that happens after. Security checks, automated testing, and policy validation run in the pipeline before anything reaches production. This is where the earlier decisions pay off: secure-by-design architecture, consistent infrastructure, automated deployments combine into a release process that's routine instead of risky.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Elastic Scalability and Efficiency (E)&lt;/strong&gt; closes the loop. Scalability isn't just a technical requirement — it's a financial one. A platform that scales technically but doubles your cloud bill every time you grow 20% isn't a success. Dynamic scaling, right-sized resources, cost visibility tooling — so the team always knows what they're spending and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why All Five Pillars Matter Together
&lt;/h2&gt;

&lt;p&gt;Each pillar reinforces the others. A weakness in any one creates pressure on the rest.&lt;/p&gt;

&lt;p&gt;Security gaps usually trace back to missing automation — manual processes create inconsistency, inconsistency creates gaps. Cost problems usually trace back to missing scalability design — resources provisioned for peak load running 24/7 because nobody built the auto-scaling logic. Operational overhead usually traces back to missing cloud-native architecture — teams managing infrastructure that should be managed services.&lt;/p&gt;

&lt;p&gt;This is why I address all five in every engagement, even when a client comes to me with just one problem. The presenting issue is rarely the root cause.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;A client came to me last year after a failed SOC 2 audit. The immediate problem was IAM — service accounts with too-broad permissions, no workload identity, manual access management. But the real problem was that their infrastructure had grown organically without automation. Every fix required touching multiple environments by hand. Implementing least-privilege IAM without infrastructure as code would have taken months and created more drift.&lt;/p&gt;

&lt;p&gt;We started with security architecture, yes — but we implemented it through Terraform modules that standardized IAM across all environments simultaneously. Security improvement and automation improvement happened together, because they had to.&lt;/p&gt;

&lt;p&gt;Six months later, they passed SOC 2 Type II with no findings. More importantly, their deployment frequency increased 3x because the team trusted the pipeline. That's what the right sequence produces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-Offs and Honest Limitations
&lt;/h2&gt;

&lt;p&gt;SCALE is opinionated. It assumes Terraform, GCP managed services, and automated CI/CD. Teams wedded to manual processes or different tooling will find friction.&lt;/p&gt;

&lt;p&gt;All five pillars takes time. Clients who want speed over structure sometimes push back on the full approach. I've learned to sequence pragmatically — you don't have to solve everything at once — but shortcuts in Security or Automation always surface later. Usually during an audit or outage.&lt;/p&gt;

&lt;p&gt;It's a framework, not a product. Every engagement requires real architectural thinking. SCALE provides the lens, not the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;The best starting point is a 30-minute architecture review. We look at where you are against the five pillars, identify the most urgent gaps, and map out what a SCALE-driven platform looks like for your specific situation.&lt;/p&gt;

&lt;p&gt;If you're mid-growth, mid-audit, or mid-crisis — the sequence matters now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/" rel="noopener noreferrer"&gt;Work with a GCP specialist — book a free discovery call&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Work with a GCP specialist — book a free discovery call&lt;/em&gt; → &lt;a href="https://buoyantcloudtech.com/gcp-consulting-services-canada/?utm_source=devto&amp;amp;utm_medium=content&amp;amp;utm_campaign=thought-leadership" rel="noopener noreferrer"&gt;https://buoyantcloudtech.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudinfrastructure</category>
      <category>devops</category>
      <category>cloudsecurity</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>What a Fractional GCP Architect Actually Does (And When You Need One)</title>
      <dc:creator>Amit Malhotra</dc:creator>
      <pubDate>Tue, 03 Mar 2026 02:46:30 +0000</pubDate>
      <link>https://dev.to/buoyantcloudinc/what-a-fractional-gcp-architect-actually-does-and-when-you-need-one-1pjj</link>
      <guid>https://dev.to/buoyantcloudinc/what-a-fractional-gcp-architect-actually-does-and-when-you-need-one-1pjj</guid>
      <description>&lt;p&gt;There's a hiring pattern I see a lot in B2B SaaS companies at the 50–200 employee stage: engineering is moving fast, the cloud infrastructure is getting complex, but it's not quite time to hire a full-time Principal Cloud Architect.&lt;br&gt;
That's exactly where fractional cloud architecture fits.&lt;br&gt;
I'm Amit Malhotra, a Principal GCP Architect and founder of Buoyant Cloud. I embed with SaaS engineering teams as a fractional GCP architect — bringing senior-level cloud expertise scoped to what the team actually needs, when they need it.&lt;/p&gt;

&lt;p&gt;What fractional actually means in practice&lt;br&gt;
Fractional isn't a watered-down engagement. It's a scoped one.&lt;br&gt;
You get a senior architect who has designed and shipped production GCP infrastructure across industries — not someone learning on your dime. The difference is the model: instead of a full-time hire or a 12-month consulting contract, you get focused, high-impact work tied to your current priorities.&lt;br&gt;
That might look like:&lt;/p&gt;

&lt;p&gt;Designing and building a GCP landing zone from scratch&lt;br&gt;
Hardening a GKE cluster ahead of a SOC 2 audit&lt;br&gt;
Replacing service account keys with Workload Identity Federation across your CI/CD pipeline&lt;br&gt;
Investigating and resolving a sudden spike in cloud costs&lt;br&gt;
Setting up a DevSecOps pipeline that your team can actually own and maintain&lt;/p&gt;

&lt;p&gt;After the core engagement, many clients keep me on retainer — PR reviews, architecture advisory, and course-correcting before small decisions become expensive problems.&lt;/p&gt;

&lt;p&gt;The stack I bring to every engagement&lt;br&gt;
IaC: Terraform with a modular structure — separate modules for networking, GKE, IAM, and services. Terragrunt for DRY configuration across environments.&lt;br&gt;
Identity &amp;amp; Access: Workload Identity Federation over service account keys, always. WIF eliminates an entire class of credential exposure risk at the CI/CD layer.&lt;br&gt;
Networking: Shared VPC with host/service project separation. VPC Service Controls for clients touching regulated data or going through SOC 2. Private Google Access on by default.&lt;br&gt;
Secrets: Secret Manager with versioning and automatic rotation where possible — injected at runtime via WIF, never baked into images or plaintext environment variables.&lt;br&gt;
CI/CD: Cloud Build or GitHub Actions depending on where the team already lives. Artifact Registry for container images. Binary Authorization for production workloads.&lt;br&gt;
Observability: Cloud Monitoring and Cloud Logging as the baseline. Prometheus + Grafana on GKE where teams want more control. OpenTelemetry for instrumentation.&lt;/p&gt;

&lt;p&gt;How I structure every engagement — the SCALE Framework&lt;br&gt;
Every engagement I run is grounded in the SCALE Framework:&lt;/p&gt;

&lt;p&gt;S — Security by Design (not bolted on post-launch)&lt;br&gt;
C — Cloud-Native architecture (managed services over self-managed where it makes sense)&lt;br&gt;
A — Automation &amp;amp; IaC (everything in code, nothing clicked in console)&lt;br&gt;
L — Lifecycle Ops (Day 2 operations planned from Day 0)&lt;br&gt;
E — Elastic Scalability (design for growth without redesign)&lt;/p&gt;

&lt;p&gt;It's the pattern I kept seeing in every successful infrastructure build versus every one that needed to be rebuilt six months later.&lt;/p&gt;

&lt;p&gt;What I'm writing about here&lt;br&gt;
Practical, implementation-level content from real engagements:&lt;/p&gt;

&lt;p&gt;GKE hardening walkthroughs&lt;br&gt;
Terraform patterns for GCP landing zones&lt;br&gt;
Workload Identity Federation end-to-end&lt;br&gt;
VPC Service Controls deep dives&lt;br&gt;
Kong API Gateway on GKE — real configs&lt;br&gt;
SOC 2 on GCP — mapping controls to actual infrastructure&lt;/p&gt;

&lt;p&gt;Long-form guides live on buoyantcloudtech.com/blog.&lt;/p&gt;

&lt;p&gt;Let's connect&lt;br&gt;
If you're a CTO or engineering lead evaluating whether fractional cloud architecture is the right fit for your stage — or dealing with a specific GCP challenge right now — I'd love to talk. buoyantcloudtech.com or connect on LinkedIn.&lt;/p&gt;

</description>
      <category>gcp</category>
      <category>devsecops</category>
      <category>cloudpractitioner</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
