Oskar Mamrzynski

Posted on May 26, 2023

Automating high-privilege operations in Azure

#azure #architecture #terraform #devops

Motivation

In a lot of traditional companies the IT department holds the keys to Azure Active Directory. Many times these teams have no interest in making things easier for developers or follow DevOps mentality to automate. For example, log a ticket in your ITSM of choice and 3 days later someone will manually create an app registration for you. With luck, it may even have been configured correctly.

Understandably, many of these operations require high privileges in AAD and you don't want to be giving these out left and right to developers to solve their own problems.

Examples of high-privilege operations we automated

Creating AAD groups and managing memberships
- We have a group-centric RBAC model in Azure. Using team-based and role-based groups we give out granular permissions to limited scopes.
- Onboarding, offboarding users - managing group memberships helps when people leave or join teams, or need a subset of team's permissions to collaborate.
- Onboarding new product teams - We want to quickly spin up new Azure subscriptions and associated groups, roles etc.
- Consistent structure - each team receives the same set of basic permissions, and any deviations are reviewed and approved.
Creating service principals and app registrations
- Often you will need a service principal to give a team access to Azure resources for automation. You may need to store its Client ID and Secret in a Key Vault or ADO service connection for them to use.
- 3rd party software like Grafana Cloud, Elastic Cloud, GitHub etc. may need service principals configured for SAML / SSO. You can manage access to these by assigning roles in the Enterprise App.
- Developers will want to secure their own APIs with App registrations, define roles, reply URLs and extra permissions.
Creating role definitions and assignments
- Custom role definitions require high privileges to create, but are essential for least-privilege systems.
- Creating consistent permission structure across landing zone subscriptions requires creating role assignments for AAD groups.
- You may need to create management-group level role assignments too.

Solution overview

Instead of creating things manually, use Terraform to create things in AAD.
Put Terraform into a source control repository with branch policies. Disallow direct commits to main branch.
Use pull requests as means to propose, review and approve changes.
Only reviewed and approved changes can be automatically deployed.
Use secure high-privilege service principals to execute Terraform pipeline.
Anyone can submit changes via Pull Requests, including developers. IT / DevOps become reviewers and advisories instead of doers of the work.

Setting up Terraform repo

We use Azure DevOps (ADO) to host our git repositories. I recommend having a dedicated project for centralised IT functions like AAD, firewall, DNS etc. This project can have a small subset of people with rights. We decided to create all 3 categories from above in the same Terraform repository, because it's easier to do onboarding of new teams and Azure subscriptions.

Importantly, you want to set up branch policies on the repo to prevent people from submitting changes directly to main branch. Branch policies can also enforce use of Pull Requests, comment resolution, running of a validation pipeline and number of reviewer votes.

Comment resolution ensures that all discussion is completed with both submitter and reviewers happy. We also have an auto-generated Terraform plan summary comment from this blog post.
Build validation will run a Terraform plan pipeline to check that IAC is valid and will post the plan to PR as a comment. We always sanity check the plan against proposed changes.
DevOps team is automatically added as reviewers. You can add multiple teams or different teams automatically depending on which paths in the repo are modified.

The actual Terraform files set up is quite easy. We use azuread and azurerm providers.

providers.tf

terraform {
  backend "azurerm" {}
  required_providers {
    azuread = {
      source  = "hashicorp/azuread"
      version = "<3.0.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "<4.0.0"
    }
  }
}

provider "azuread" {
  use_msi   = true
  client_id = var.group_admin_client_id
  tenant_id = local.tenant_id
}

provider "azuread" {
  alias     = "service-principal-creator"
  use_msi   = true
  client_id = var.service_principal_creator_client_id
  tenant_id = local.tenant_id
}

provider "azurerm" {
  alias                      = "access-admin"
  subscription_id            = "<subscription_id>"
  use_msi                    = true
  client_id                  = var.access_admin_client_id
  tenant_id                  = local.tenant_id
  skip_provider_registration = true
  features {
    key_vault {
      recover_soft_deleted_key_vaults = false
    }
  }
}

data "azuread_client_config" "sp_creator" {
  provider = azuread.service-principal-creator
}

data "azuread_client_config" "group_admin" {}

dynamic-groups.tf (example)

resource "azuread_group" "az-all-technology-staff" {
  display_name     = "az-all-technology-staff"
  security_enabled = true
  owners           = [data.azuread_client_config.group_admin.object_id]
  types            = ["DynamicMembership"]

  dynamic_membership {
    enabled = true
    rule    = "user.department -eq \"Technology\" and (user.accountEnabled -eq true)"
  }
  lifecycle {
    ignore_changes = [members]
  }
}

service-principals.tf (module example)

module "policy_contributor" {
  source         = "../modules/service-principal"
  principal_name = "<conventions_prefix>-azure-policy-contributor"
  providers = {
    azuread = azuread.service-principal-creator
  }
}

service-principals.tf (inside the module)

data "azuread_client_config" "identity" {}

resource "azuread_application" "app" {
  display_name = var.principal_name
  owners       = [data.azuread_client_config.identity.object_id]
  lifecycle {
    ignore_changes = [required_resource_access]
  }
}

resource "azuread_service_principal" "sp" {
  application_id = azuread_application.app.application_id
  owners         = [data.azuread_client_config.identity.object_id]
}

resource "azuread_application_password" "app_client_secret" {
  application_object_id = azuread_application.app.object_id
}

resource "azurerm_key_vault_secret" "client_id" {
  name         = "${var.principal_name}-client-id"
  value        = azuread_application.app.application_id
  key_vault_id = var.key_vault_id
}

resource "azurerm_key_vault_secret" "client_secret" {
  name         = "${var.principal_name}-client-secret"
  value        = azuread_application_password.app_client_secret.value
  key_vault_id = var.key_vault_id
}

role-assignments.tf (example)

resource "azurerm_role_assignment" "az-sub-readers-prod" {
  provider             = azurerm.access-admin
  scope                = "/subscriptions/${var.prod_subscription_id}"
  role_definition_name = "Reader"
  principal_id         = module.az-sub-readers-prod.object_id
}

resource "azurerm_role_assignment" "az-sub-contributors-prod" {
  provider             = azurerm.access-admin
  scope                = "/subscriptions/${var.prod_subscription_id}"
  role_definition_name = "Contributor"
  principal_id         = module.az-sub-contributors-prod.object_id
}

We can put all of these together into a Terraform module to onboard whole sets of product teams with default permissions. Module example below creates AAD groups for readers, contributors, AKS readers, AKS admins, SQL admins, SQL readers, KV readers, KV admins, monitoring contributors, a service principal for team-abc to use in ADO pipelines and assigns roles in Azure on relevant subscriptions.

portfolio-abc.tf (example)

module "team-abc" {
  source = "../modules/group"
  name   = "team-abc"
  active_members = [
    local.user_ids["user1@domain.com"],    
    local.user_ids["user2@domain.com"],
  ]
}

module "abc-default-groups" {
  source                  = "../modules/default-portfolio-groups"
  portfolio_code          = "abc"
  devtest_subscription_id = local.sub_ids["abc-devtest"]
  prod_subscription_id    = local.sub_ids["abc-prod"]
  default_team_object_id  = module.team-abc.object_id
  devops_team_object_id   = module.team-devops.object_id
  dba_team_object_id      = module.team-db-admins.object_id
  kingmakers              = true
  providers = {
    azurerm.kv-admin                  = azurerm
    azurerm.access-admin              = azurerm.access-admin
    azuread.service-principal-creator = azuread.service-principal-creator
  }
}

Setting up automation identities

You may have noticed we use 3 provider blocks in providers.tf file. We manually created a separate user-assigned managed identity (service principal) for each scenario and assigned them the right set of privileges. You need to be a Global Admin on AAD to set these up.

Access admin - We assigned it User Access Administrator role over the tenant root management group. This way it can create role definitions scoped across all subscriptions and manage default role assignments for each sub during landing zone onboarding.

Group admin - This identity needs permissions over Microsoft Graph API to create/delete AAD groups and manage them.

We assigned it Directory.Read.All, Group.ReadWrite.All and RoleManagement.ReadWrite.Directory app roles on Microsoft Graph API. Required roles are described here.

I found this blog post about how to assign extra app roles to managed identities. Execute this Azure PowerShell script as Global Admin:

$roles = @('Directory.Read.All', 'Group.ReadWrite.All', 'RoleManagement.ReadWrite.Directory')
$managed_identity = Get-AzADServicePrincipal -ObjectId '<mi_object_id>'

$access_token = (Get-AzAccessToken -ResourceTypeName 'MSGraph').Token
$graph_sp = Get-AzADServicePrincipal -ApplicationId '00000003-0000-0000-c000-000000000000'

$roles | % {
    $role_name = $_
    $role = $graph_sp.AppRole | ? { $_.Value -eq $role_name }
    $body = @{
        'principalId' = $managed_identity.Id;
        'resourceId'  = $graph_sp.Id;
        'appRoleId'   = $role.Id
    } | ConvertTo-Json -Compress

    Invoke-RestMethod `
        -Method POST `
        -Headers @{Authorization = "Bearer $access_token" } `
        -ContentType 'application/json' `
        -Uri "https://graph.microsoft.com/v1.0/servicePrincipals/$($managed_identity.Id)/appRoleAssignments" `
        -Body $body
}

Service principal creator - This identity also needs MS Graph permissions, as described here and here.

We executed the same script as above, just replacing roles list with Directory.Read.All, Application.ReadWrite.OwnedBy, AppRoleAssignment.ReadWrite.All.

Setting up pipeline

We use managed identities rather than service principals so we do not need to use and rotate client secrets. Our Terraform pipeline should execute on an ADO agent in a trusted location. See my other blog post about how we set up an ADO agent linked to managed identities. The agent can run on a normal VM too and be assigned these managed identities directly instead of using federated credentials. Bottom line is that our ADO project has an agent pool where the agent is able to obtain tokens from these 3 managed identities.

For Terraform to log in with 3 different managed identities with 3 different providers we need to pass in Client ID for each of them as a parameter.

pipeline.yaml (example)

name: $(Rev:rr)
trigger:
- main
pool: aks-azure-rbac

variables:
  tf_dir: $(Build.SourcesDirectory)/terraform
  tf_vars: |
    service_principal_creator_client_id = "<client_id_1>"
    group_admin_client_id = "<client_id_2>"
    access_admin_client_id = "<client_id_3>"

steps:
- checkout: self
  clean: true

- pwsh: |
    $tf_vars = '$(tf_vars)'
    [System.IO.File]::WriteAllText('terraform.tfvars', $tf_vars)
  displayName: set tf vars
  workingDirectory: $(tf_dir)

- task: TerraformTaskV2@2
  displayName: tf init
  inputs:
    provider: azurerm
    command: init
    workingDirectory: $(tf_dir)
    backendServiceArm: <terraform_storage_service_connection>
    backendAzureRmResourceGroupName: <storage_account_rg>
    backendAzureRmStorageAccountName: <storage_account_name>
    backendAzureRmContainerName: <storage_container>
    backendAzureRmKey: <blob_name>

- pwsh: |
    terraform plan -out tfplan
  displayName: tf plan
  workingDirectory: $(tf_dir)

- pwsh: |
    & '$(Build.SourcesDirectory)/scripts/set-tf-plan-pr-comments.ps1'
  workingDirectory: $(tf_dir)
  displayName: set pr comments
  condition: and(succeeded(), eq(variables['Build.Reason'], 'PullRequest'), ne(variables ['Build.Repository.Provider'] , 'GitHub') )
  env:
    SYSTEM_ACCESSTOKEN: $(System.AccessToken)

- pwsh: |
    terraform apply -auto-approve tfplan
  displayName: tf apply
  condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
  workingDirectory: $(tf_dir)

- task: DeleteFiles@1
  displayName: clean up tf files
  condition: always()
  inputs: 
    Contents: |
      **/tfplan
      **/*.tfvars
      **/*.tfstate

We set up the pipeline to only run the apply step when running on main branch. Terraform will still plan changes and post PR comments with a summary when running a PR validation build.

Process for making changes

In our ADO project we allow everyone in the tech department to read and contribute to our repository. They must do so via pull requests. A contributor can either clone the repo to their local, make changes on a branch and submit a pull request or do it directly via the ADO UI.

All pull requests go to our Slack channel so our team can review them and approve if everything is OK.

Once approved, the PR is merged into main where automated pipeline picks it up, authenticates using managed identities and creates whatever is necessary via Terraform.

Our own team members raising changes are treated no different from any other contributor. Someone else still has to review and approve.

Security considerations

You need high privilege role to set this up to begin with. At least Application Administrator in AAD and User Access Administrator over management groups. I had to elevate my account to Global Administrator using Privileged Identity Management.
Creating things manually with user accounts can be safer because you would normally have to pass through Privileged Identity Management, approvals, conditional access policy, MFA etc. before you can execute a high privilege action.
Automated identities have fewer safety restrictions than user accounts (lack of MFA for instance). You may be able to set up conditional access policy to only allow obtaining access tokens from 1 trusted location - ADO agent.
"Who guards the guardians" - whoever controls the system responsible for automation can elevate themselves to use these managed identities. This includes Azure DevOps project admins (on that particular project), Project Collection Admins, on Azure - Contributors and Managed Identity Operator roles.
If you have a Managed Identity Role over those managed identities (or equivalent role on a service principal) then you can obtain access tokens with privileges potentially higher than your own. Azure landing zone architecture seems to recommend a dedicated Azure subscription where you create these managed identities and where limited number of people have access. We follow this practice.
Our ADO agents run on an AKS cluster. There are a myriad of ways in which a cluster can become vulnerable and we try our best to secure it, but you also have to be careful who can execute kubectl exec on it.
Terraform state file is stored in Azure blob storage in our case. We create Azure Key Vault secrets via Terraform to put in Client ID and Client Secret, but plain text values are also present in the Terraform state file. Described here. We will soon mitigate this by moving most of our service principals to managed identities. Whoever has access to the state file can read out these secrets.
You may want to set up any Activity Log alerts in AAD and in Azure for when these identities do anything. If the timestamp doesn't coincide with a main-branch pipeline, then something fishy may be going on.

DEV Community

Automating high-privilege operations in Azure

Motivation

Examples of high-privilege operations we automated

Solution overview

Setting up Terraform repo

Setting up automation identities

Setting up pipeline

Process for making changes

Security considerations

Top comments (0)

Read next

Azure Container Storage: A New Dawn in Kubernetes Storage Solutions

Unified Kubernetes Management Across Clouds with Azure Arc

How to scale a crawler for 1000 websites

Invest Wisely: Web Design's Impact on Business ROI