DEV Community

Cover image for Building a SOC2-Compliant Azure Multi-Subscription Architecture with Terraform
Lucas
Lucas

Posted on

Building a SOC2-Compliant Azure Multi-Subscription Architecture with Terraform

A deep dive into implementing Microsoft's Cloud Adoption Framework Landing Zones with Terraform, and why Azure's approach to multi-tenancy requires a fundamentally different mindset than AWS.


The Challenge: "Just Make It Work Like AWS"

When our team at Foo was tasked with building a SOC2-compliant infrastructure on Azure, the initial instinct was simple: "Let's just replicate our AWS multi-account strategy." After all, we already had a battle-tested AWS organization with:

  • 9+ AWS accounts (dev, staging, prod, security-tooling, log-archive, shared-services, etc.)
  • Service Control Policies (SCPs) for guardrails
  • IAM Identity Center (SSO) for centralized access
  • AWS Organizations for hierarchy

Spoiler alert: This approach would have been a costly mistake on Azure.

Here's why, and what we built instead.


🚨 The Azure Subscription Limit Reality Check

Unlike AWS, where you can spin up accounts almost endlessly (soft limit of ~10, easily increased to hundreds), Azure has hard limits on subscriptions that require engagement with Microsoft support:

Cloud Provider Isolation Unit Typical Count Limit Increase
AWS Account 10-100+ Easy (support ticket)
Azure Subscription 3-10 Requires Microsoft engagement

This isn't just a technical limitationβ€”it's a fundamental architectural constraint that shapes how you design your Azure landing zone.

πŸ’‘ Key Insight: Azure's answer to "we need more isolation" isn't "create more subscriptions." It's Resource Groups with RBAC.


The Architecture: Microsoft CAF Landing Zone Pattern

After evaluating several approaches, we landed on Microsoft's Cloud Adoption Framework (CAF) Landing Zone architecture. Here's our 4-tier Management Group hierarchy:

5 Subscriptions vs 9+ AWS Accounts

Here's how we consolidated:

AWS Account Azure Equivalent Strategy
Dev Account Corp-NonProduction Sub Use Resource Groups: rg-foo-preview-*
Staging Account Corp-NonProduction Sub Use Resource Groups: rg-foo-staging-*
Prod Account Corp-Production Sub Dedicated subscription
Security-Tooling Management Sub Consolidated with monitoring
Log-Archive Management Sub Storage accounts with lifecycle policies
Shared-Services Connectivity Sub Hub networking
Network Account Connectivity Sub Azure Firewall, VPN, DNS
Monitoring Management Sub Log Analytics, Azure Monitor
Sandbox Sandbox Sub Developer experimentation

Result: 5 subscriptions doing the work of 9 AWS accounts.


Show Me The Code: Terraform Implementation

Management Group Hierarchy

Here's how we define the 4-tier CAF hierarchy in Terraform:

# management_groups.tf
# Azure Management Groups - Microsoft CAF Landing Zone Architecture

# Tier 1: Root Management Group
resource "azurerm_management_group" "foo" {
  display_name = var.organization_name
  name         = var.organization_name

  timeouts {
    create = "30m"  # MGs can take a while!
    delete = "30m"
  }
}

# Tier 2: Category Management Groups (Platform, Landing Zones, Sandbox)
resource "azurerm_management_group" "tier2_groups" {
  for_each = {
    for k, v in var.management_groups : k => v
    if v.parent_id == "foo"
  }

  display_name               = each.key
  name                       = each.key
  parent_management_group_id = azurerm_management_group.foo.id

  depends_on = [azurerm_management_group.foo]
}

# Tier 3: Specialization (Connectivity, Management, Corp)
resource "azurerm_management_group" "tier3_groups" {
  for_each = {
    for k, v in var.management_groups : k => v
    if v.parent_id == "Platform" || v.parent_id == "LandingZones"
  }

  display_name = each.key
  name         = each.key
  parent_management_group_id = lookup(
    { for k, v in azurerm_management_group.tier2_groups : k => v.id },
    each.value.parent_id,
    azurerm_management_group.foo.id
  )
}

# Tier 4: Environment (Prod, Non-Prod)
resource "azurerm_management_group" "tier4_groups" {
  for_each = {
    for k, v in var.management_groups : k => v
    if v.parent_id == "Corp"
  }

  display_name = each.key
  name         = each.key
  parent_management_group_id = azurerm_management_group.tier3_groups["Corp"].id
}
Enter fullscreen mode Exit fullscreen mode

Subscription to Management Group Association

The key to this architecture is associating existing subscriptions with the hierarchy:

# subscriptions.tf
locals {
  subscription_to_mg_mapping = {
    "management"   = "Management"
    "corp-nonprod" = "NonProd"
    "corp-prod"    = "Prod"
    "connectivity" = "Connectivity"
    "sandbox"      = "Sandbox"
  }
}

resource "azurerm_management_group_subscription_association" "assignments" {
  for_each = local.subscription_to_mg_mapping

  management_group_id = lookup(local.all_management_group_ids, each.value)
  subscription_id     = "/subscriptions/${var.subscription_ids[each.key]}"
}
Enter fullscreen mode Exit fullscreen mode

SOC2 Compliance: Azure Policy as Your Guardrails

For SOC2 Type II compliance, we implemented 7 custom Azure Policies that enforce security controls across the entire organization:

Legend: πŸ”΄ Deny policies (blocking) | 🟒 Audit policies (non-blocking)

Policy Definition Example: Baseline Security

Here's a real policy that enforces SOC2 CC6.6 (Encryption) and CC7.2 (Monitoring):

# modules/policy/main.tf
resource "azurerm_policy_definition" "baseline_security" {
  name         = "foo-soc2-baseline-security"
  policy_type  = "Custom"
  mode         = "All"
  display_name = "SOC2 Baseline Security Controls"
  description  = "Baseline security - prevents deletion of critical resources"

  metadata = jsonencode({
    category = "Security"
    version  = "1.0.0"
    SOC2     = "CC6.6, CC7.2, CC8.1"
  })

  management_group_id = var.management_group_id

  policy_rule = file("${path.module}/policies/baseline_security.json")
}
Enter fullscreen mode Exit fullscreen mode

And the policy rule itself (policies/baseline_security.json):

{
  "if": {
    "anyOf": [
      {
        "allOf": [
          { "field": "type", "equals": "Microsoft.Insights/activityLogAlerts" },
          { "field": "Microsoft.Insights/activityLogAlerts/enabled", "equals": "false" }
        ]
      },
      {
        "allOf": [
          { "field": "type", "equals": "Microsoft.OperationalInsights/workspaces" },
          { "field": "Microsoft.OperationalInsights/workspaces/retentionInDays", "less": 90 }
        ]
      },
      {
        "allOf": [
          { "field": "type", "equals": "Microsoft.Storage/storageAccounts" },
          { "field": "Microsoft.Storage/storageAccounts/encryption.requireInfrastructureEncryption", "notEquals": "true" }
        ]
      }
    ]
  },
  "then": {
    "effect": "deny"
  }
}
Enter fullscreen mode Exit fullscreen mode

This policy blocks:

  • Disabling activity log alerts
  • Creating Log Analytics workspaces with < 90 days retention
  • Creating storage accounts without infrastructure encryption

Policy Assignment at Management Group Level

resource "azurerm_management_group_policy_assignment" "baseline_security" {
  name                 = "soc2-baseline-security"
  display_name         = "SOC2 Baseline Security Controls"
  management_group_id  = var.management_group_id
  policy_definition_id = azurerm_policy_definition.baseline_security.id

  metadata = jsonencode({
    Organization = "Foo"
    ManagedBy    = "Terraform"
    Purpose      = "SOC2 Compliance Guardrails"
  })
}
Enter fullscreen mode Exit fullscreen mode

RBAC: The Azure AD Groups Strategy

Instead of AWS SSO Permission Sets, we use Azure AD Groups with RBAC role assignments:

# modules/iam/groups.tf
resource "azuread_group" "groups" {
  for_each = var.groups

  display_name     = each.value.display_name
  description      = each.value.description
  security_enabled = true

  prevent_duplicate_names = true
}

# modules/iam/role_assignments.tf
resource "azurerm_role_assignment" "group_assignments" {
  for_each = local.all_role_assignments

  scope                = each.value.scope
  role_definition_name = each.value.role_definition_name
  principal_id         = azuread_group.groups[each.value.group_key].object_id

  description = "Managed by Terraform - Foo IAM Module"
}
Enter fullscreen mode Exit fullscreen mode

Our 7 Default Groups

Group Platform Subs Workload Subs Purpose
Platform-Team Contributor Reader Platform engineers
Security-Team Security Admin Security Reader Security engineers
BreakGlass-Admins Owner (at root) - Emergency access
Finance-Team Cost Mgmt Reader Cost Mgmt Reader Billing/finance
ReadOnly-Users Reader Reader Auditors
Dev-Team - Contributor (nonprod) Developers
DevOps-Team Reader Contributor DevOps engineers

The Resource Group Pattern: Your New Best Friend

Here's where Azure truly differs from AWS. Instead of creating separate subscriptions for staging vs preview, we use Resource Groups:

Corp-NonProduction Subscription
β”œβ”€β”€ rg-foo-staging-api
β”œβ”€β”€ rg-foo-staging-database
β”œβ”€β”€ rg-foo-staging-network
β”œβ”€β”€ rg-foo-preview-api
β”œβ”€β”€ rg-foo-preview-database
└── rg-foo-preview-network
Enter fullscreen mode Exit fullscreen mode

Each Resource Group gets its own RBAC assignments:

# Staging RGs: Only staging team
resource "azurerm_role_assignment" "staging_contributor" {
  scope                = azurerm_resource_group.staging_api.id
  role_definition_name = "Contributor"
  principal_id         = azuread_group.staging_team.object_id
}

# Preview RGs: All developers
resource "azurerm_role_assignment" "preview_contributor" {
  scope                = azurerm_resource_group.preview_api.id
  role_definition_name = "Contributor"
  principal_id         = azuread_group.dev_team.object_id
}
Enter fullscreen mode Exit fullscreen mode

SOC2 Control Mapping

Here's how our implementation maps to SOC2 Trust Services Criteria:

SOC2 Control Azure Implementation
CC6.1 - Logical Access Azure AD groups, MFA via Conditional Access, PIM for JIT access
CC6.6 - Encryption Azure Policy enforcing encryption at rest, TLS 1.2, Key Vault
CC7.2 - Monitoring Log Analytics (90-day retention), Activity Logs, Azure Monitor
CC7.3 - Incident Response Azure Sentinel (SIEM), Alert Rules, Action Groups
CC8.1 - Change Management Activity Logs, Azure Policy audit, Git-based IaC
CC9.2 - Risk Mitigation Region restrictions, Defender for Cloud, Budget alerts

Cost Impact: The Numbers

Running our 5-subscription architecture vs a 9-account AWS-style approach:

Metric Azure CAF (5 subs) AWS-style (9+ subs) Savings
Management overhead Low High ~50% less ops time
Subscription costs $0 $0 Same
Log Analytics 1 central workspace 9 workspaces ~60% cost reduction
Policy management 1 root assignment 9 assignments Simpler

Lessons Learned

1. Management Groups Take Time

Azure Management Groups can take 10-15 minutes to create or delete. Set your Terraform timeouts accordingly:

timeouts {
  create = "30m"
  delete = "30m"
}
Enter fullscreen mode Exit fullscreen mode

2. Subscription Association is the Key

Don't try to create subscriptions via Terraform in most cases. Instead, create them in the portal (or via billing APIs) and associate them:

resource "azurerm_management_group_subscription_association" "assignments" {
  management_group_id = azurerm_management_group.prod.id
  subscription_id     = "/subscriptions/${var.existing_subscription_id}"
}
Enter fullscreen mode Exit fullscreen mode

3. Azure Policy β‰  AWS SCPs

While conceptually similar, Azure Policy is more granular but also more complex. Use the built-in policies where possible, and only create custom ones for specific compliance needs.

4. Resource Groups Are First-Class Citizens

Unlike AWS, where you might create a new account for isolation, in Azure you create a new Resource Group. This is a fundamental mindset shift.


Module Structure

Our final Terraform module structure:

infra/azure/modules/organization/
β”œβ”€β”€ main.tf
β”œβ”€β”€ management_groups.tf       # 4-tier CAF hierarchy
β”œβ”€β”€ subscriptions.tf           # Subscription associations
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ policy/                # 7 SOC2 policy definitions
β”‚   β”‚   β”œβ”€β”€ policies/
β”‚   β”‚   β”‚   β”œβ”€β”€ baseline_security.json
β”‚   β”‚   β”‚   β”œβ”€β”€ region_restriction.json
β”‚   β”‚   β”‚   β”œβ”€β”€ require_encryption.json
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── main.tf
β”‚   β”œβ”€β”€ iam/                   # Azure AD groups + RBAC
β”‚   β”‚   β”œβ”€β”€ groups.tf
β”‚   β”‚   └── role_assignments.tf
β”‚   β”œβ”€β”€ monitoring/            # Log Analytics, alerts
β”‚   β”œβ”€β”€ audit/                 # Activity log export
β”‚   β”œβ”€β”€ budget/                # Cost management
β”‚   └── delegation/            # Defender for Cloud
Enter fullscreen mode Exit fullscreen mode

Conclusion

Building a SOC2-compliant Azure infrastructure isn't about replicating AWS patternsβ€”it's about embracing Azure's native paradigms:

  1. Fewer subscriptions, more Resource Groups
  2. Management Groups for policy inheritance
  3. Azure AD as your single identity plane
  4. Azure Policy as your compliance engine

The result? A simpler, more cost-effective, and equally secure infrastructure that works with Azure rather than against it.


Resources


Have you implemented Azure Landing Zones? I'd love to hear about your approach in the comments! πŸ‘‡


This article is part of our series on building enterprise-grade cloud infrastructure. Follow for more deep dives into Terraform, cloud architecture, and compliance automation.


Tags: #azure #terraform #devops #cloud #soc2 #compliance #infrastructure

Top comments (0)