DEV Community

Oskar Mamrzynski
Oskar Mamrzynski

Posted on

Azure DevOps agents on AKS with workload identity

Recently I deployed some Azure DevOps (ADO) agents on Azure Kubernetes Service (AKS) and use the new workload identity add-on to authenticate to Azure. We mainly use these agents to deploy other Azure with Terraform / Azure CLI / Azure PowerShell within our private network. Builds are done on Microsoft-hosted agents.

Pros:

  • Resource utilisation is better than using VMs. Agents used for deployment are not very resource intensive on CPU/Memory as they mostly do network I/O to talk to various APIs and deploy things.

grafana cpu usage

grafana memory usage

  • You can use KEDA to auto-scale agents based on number of pending ADO jobs. This lets you scale down during quiet periods to reclaim resources.

  • You can deploy agents within your private network. Most of our deployment targets are behind either private link or IP-whitelisted resources. You cannot use Microsoft-hosted agents to deploy to those. You must use self-hosted agents on e.g. network-joined AKS cluster.

  • When running deployment pipelines on ADO agents, you will have to authenticate to Azure to deploy or manage resources. You can use service principals, but they are not the most secure option because they require a Client ID and Secret.

  • Managed identities are great for keyless authentication to Azure resources. Terraform and all Azure tools support managed identity authentication. You can get rid of the need for principal credential rotation.

  • Workload identity add-on for Kubernetes is the new way of assigning managed identities to pods in Kubernetes. This can be done between multiple Azure subscriptions (and even cross-tenant), and agent pod doesn't even have to live on Azure Kubernetes Service (could be EKS or GKE). It is less clunky than aad-pod-identity.

Cons:

  • Anyone who can execute jobs on that agent pool will be able to obtain managed identity access token. If you share agent pools between multiple teams then you shouldn't use this method. Each team will be able to access each other's resources.

  • For multiple teams you should ideally have 1 managed identity linked to 1 agent pool, and 1 agent pool usable only by 1 team (ADO project).

This is more of a "fill in the blanks" guide than a step-by-step tutorial. I am going to only briefly mention some setup steps because they are better documented elsewhere.

Set up a basic ADO agent on AKS first:

  • Create a basic ADO agent image. This just involves a base Debian image, some dependencies and a pre-made script from Microsoft. My agents are all based on a Linux distro, so you can find the guide here.

  • Add your dependencies on top of the base ADO agent image. In my case, I just need Terraform and PowerShell, but you can also install Azure CLI, kubectl etc.

FROM <Base ADO agent image>

WORKDIR /tools

# To make it easier for build and release pipelines to run apt-get,
# configure apt to not require confirmation (assume the argument by default)
ENV DEBIAN_FRONTEND=noninteractive
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes

# Updating paths to tools
ENV PATH="/tools/azure-powershell-core:/tools/terraform:/tools:${PATH}"

# Install basics
RUN apt-get update 
RUN apt-get install curl zip unzip

# Install Terraform
ADD https://releases.hashicorp.com/terraform/1.4.6/terraform_1.4.6_linux_amd64.zip /tmp/terraform.zip
RUN unzip /tmp/terraform.zip -d terraform
RUN chmod +x terraform/terraform

# Install PowerShell Core
ADD https://github.com/PowerShell/PowerShell/releases/download/v7.1.5/powershell_7.1.5-1.ubuntu.20.04_amd64.deb powershell.deb
RUN dpkg -i powershell.deb ; exit 0
RUN apt-get install -f

WORKDIR /azp
Enter fullscreen mode Exit fullscreen mode
  • Push your ADO agent image to a registry of your choice, in my case ACR:
az acr login -n <acr_name>
docker push <acr_name>.azurecr.io/<ado_agent_image>
Enter fullscreen mode Exit fullscreen mode
  • You need an AKS cluster with workload identity add-on deployed. You can create one using Azure CLI as described here. In my case, clusters are deployed using Terraform, so I just set these properties to true. If using Terraform, you will need to output OIDC issuer URL to be used in subsequent steps.
resource "azurerm_kubernetes_cluster" "cluster" {
(...)
  oidc_issuer_enabled                 = true
  workload_identity_enabled           = true
(...)
}

output "aks_oidc_issuer_url" {
  value = azurerm_kubernetes_cluster.cluster.oidc_issuer_url
}
Enter fullscreen mode Exit fullscreen mode

ado-agent-pool

pat-token

  • Deploy your ADO agent image as a Kubernetes deployment. You can use YAML, though I prefer to create everything in Terraform.
### variables.tf
variable "pool_name" {}
variable "ado_token" {}

variable "replicas" {
  default = 1
}

variable "requests" {
  default = {
    cpu    = "100m"
    memory = "512Mi"
  }
}

variable "limits" {
  default = {
    memory = "2Gi"
  }
}

variable "disk_space_limit" {
  default = "4Gi"
}

locals {
  work_dir  = "/mnt/work"
  ado_agent = "ado-agent-${var.pool_name}"
}

### providers.tf
terraform {
  <backend_configuration>
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "<3.0.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "<3.0.0"
    }
  }
}

### kubernetes.tf
resource "kubernetes_namespace_v1" "namespace" {
  metadata {
    name = "ado-agents-${var.pool_name}"
  }
}

resource "kubernetes_secret_v1" "env_vars" {
  metadata {
    name      = local.ado_agent
    namespace = kubernetes_namespace_v1.namespace.metadata[0].name
  }

  data = {
    # These variables are used by the script from Step 1. 
    # Have a look at the script Microsoft provides in their guide and what other config options they have available. 
    # Each of these secret Key-Value entries is another environment variable mounted to the pod.
    AZP_URL   = "https://dev.azure.com/<your_org_name>"
    # You can source this from input variable, Azure Key Vault, etc.
    AZP_TOKEN = var.ado_token 
    AZP_POOL  = var.pool_name
    AZP_WORK  = local.work_dir
  }
}

resource "kubernetes_deployment_v1" "ado_agent" {
  metadata {
    name      = local.ado_agent
    namespace = kubernetes_namespace_v1.namespace.metadata[0].name
  }

  spec {
    replicas = var.replicas

    selector {
      match_labels = {
        app = local.ado_agent
      }
    }

    template {
      metadata {
        labels = {
          app = local.ado_agent
        }
      }

      spec {
        container {
          name  = local.ado_agent
          image = "<acr_name>.azurecr.io/<ado_agent_image>"

          # We link all variables from a secret ref instead of listing them here directly.
          # so that your PAT token doesn't show up on kubectl describe pod
          env_from {
            secret_ref {
              name = kubernetes_secret_v1.env_vars.metadata[0].name
            }
          }

          volume_mount {
            name       = "temp-data"
            mount_path = local.work_dir
          }

          resources {
            limits   = var.limits
            requests = var.requests
          }
        }

        volume {
          name = "temp-data"

          # Using node's ephemeral disk space for agent storage.
          # Deployments agents do not use much disk space and can be wiped after pod dies.
          empty_dir {
            size_limit = var.disk_space_limit
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
  • Use terraform plan and apply while your kubectl context is linked to your AKS cluster. This should create a new agent and link it to the pool. Microsoft's start up script downloads a latest version of the agent each time a pod starts up, registers it against the pool and removes it on pod termination.
PS> kubectl get deployment
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
omtest1                  1/1     1            1           11d
Enter fullscreen mode Exit fullscreen mode

ado-agent-in-pool

  • This set up is just a normal ADO agent without workload identity linked together. You would still need a Client ID / Secret or Azure DevOps Service Connection to use in your pipelines.

Adding workload identity to ADO agent:

  • Workload identity works by linking a Kubernetes service account to a managed identity via federated credentials. This establishes trust between a specific Kubernetes cluster + namespace + service account and a managed identity. An identity no longer has to be assigned to AKS nodes like in aad-pod-identity.

  • Create a user-assigned managed identity and give it some role assignment over Azure resources or resource group, so it can manage it. Here is modified Terraform from above.

### variables.tf
variable "pool_name" {}

variable "ado_token" {}

variable "replicas" {
  default = 1
}

variable "requests" {
  default = {
    cpu    = "100m"
    memory = "512Mi"
  }
}

variable "limits" {
  default = {
    memory = "2Gi"
  }
}

variable "disk_space_limit" {
  default = "4Gi"
}

variable "location" {
  default = "westeurope"
}

variable "managed_identity_scope {}

# Take this from your AKS deployment
variable "aks_oidc_issuer_url" {}

locals {
  work_dir  = "/mnt/work"
  ado_agent = "ado-agent-${var.pool_name}"
}

### providers.tf
terraform {
  <backend_configuration>
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "<3.0.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "<3.0.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "<4.0.0"
    }
  }
}

provider "azurerm" {
  features {}
}

### azure.tf
resource "azurerm_resource_group" "group" {
  name     = "rg-identity-${var.pool_name}"
  location = var.location
}

resource "azurerm_user_assigned_identity" "identity" {
  name                = "mi-${var.pool_name}"
  resource_group_name = azurerm_resource_group.group.name
  location            = var.location
}

# This resource will establish trust between Kubernetes cluster, namespace and service account. 
# Only pods assigned that service account will be able to obtain an access token from Azure for this corresponding managed identity.
resource "azurerm_federated_identity_credential" "identity" {
  resource_group_name = azurerm_user_assigned_identity.identity.resource_group_name
  parent_id           = azurerm_user_assigned_identity.identity.id
  name                = "${kubernetes_namespace_v1.namespace.metadata[0].name}-${kubernetes_service_account_v1.service_account.metadata[0].name}"
  subject             = "system:serviceaccount:${kubernetes_namespace_v1.namespace.metadata[0].name}-${kubernetes_service_account_v1.service_account.metadata[0].name}"
  audience            = ["api://AzureADTokenExchange"]
  issuer              = var.aks_oidc_issuer_url
}

# Give the managed identity some role over a resource group it will manage. 
# This does not have to be done in Terraform if you don't have permissions to do it.
resource "azurerm_role_assignment" "deployment_group" {
  scope                = var.managed_identity_scope
  role_definition_name = "Contributor"
  principal_id         = azurerm_user_assigned_identity.identity.principal_id
}

### kubernetes.tf
resource "kubernetes_namespace_v1" "namespace" {
  metadata {
    name = "ado-agents-${var.pool_name}"
  }
}

resource "kubernetes_secret_v1" "env_vars" {
  metadata {
    name      = local.ado_agent
    namespace = kubernetes_namespace_v1.namespace.metadata[0].name
  }

  data = {
    AZP_URL   = "https://dev.azure.com/<org_name>"
    AZP_TOKEN = var.ado_token
    AZP_POOL  = var.pool_name
    AZP_WORK  = local.work_dir
  }
}

# Pods would normally use namespace default service account.
# Instead, we create a specific service account for these pods and link it to Azure. 
# You do not have to automount service account token. 
# Workload Identity will still mount a signed token file different to normal Kubernetes Service account token file. 
# This token can be exchanged with Azure for an Azure access token.
resource "kubernetes_service_account_v1" "service_account" {
  metadata {
    namespace = kubernetes_namespace_v1.namespace.metadata[0].name
    name      = local.ado_agent
    # These annotations are optional but they will set the "default" managed identity for this service account. 
    # This is helpful if you plan to have 1:N SA->MI federated credentials.
    annotations = {
      "azure.workload.identity/client-id" = azurerm_user_assigned_identity.identity.client_id
      "azure.workload.identity/tenant-id" = azurerm_user_assigned_identity.identity.tenant_id
    }
  }
  automount_service_account_token = false
}

resource "kubernetes_deployment_v1" "ado_agent" {
  metadata {
    name      = local.ado_agent
    namespace = kubernetes_namespace_v1.namespace.metadata[0].name
  }

  spec {
    replicas = var.replicas

    selector {
      match_labels = {
        app = local.ado_agent
      }
    }

    template {
      metadata {
        # This label must be specified so workload identity add-on is able to process the pod.
        labels = {
          app                           = local.ado_agent
          "azure.workload.identity/use" = "true"
        }
        # While workload identity works by token-exchange method by default, this is not supported by Azure terraform provider. 
        # If we want to use managed identity authentication in our agent, we must inject a sidecar. 
        # This extra container will spoof a managed identity endpoint and provide us with an access token.
        annotations = {
          "azure.workload.identity/inject-proxy-sidecar" : "true"
        }
      }

      spec {
        # Link the service account to deployment, so all pods created by the deployment have the SA token.
        service_account_name            = kubernetes_service_account_v1.service_account.metadata[0].name
        automount_service_account_token = false
        container {
          name  = local.ado_agent
          image = "<acr_name>.azurecr.io/<ado_agent_image>"

          env_from {
            secret_ref {
              name = kubernetes_secret_v1.env_vars.metadata[0].name
            }
          }

          volume_mount {
            name       = "temp-data"
            mount_path = local.work_dir
          }

          resources {
            limits   = var.limits
            requests = var.requests
          }
        }

        volume {
          name = "temp-data"

          empty_dir {
            size_limit = var.disk_space_limit
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
  • Once deployed, your agent pod will have an extra sidecar container and init container:
PS> kubectl get pod
NAME                                     READY   STATUS    RESTARTS   AGE
ado-agent-<pool_name>-7b8d79dd65-hwb22   2/2     Running   0          3d22h

PS> kubectl describe pod
Name:         ado-agent-<pool_name>-7b8d79dd65-hwb22
Namespace:    ado-agents-<pool_name>
(...)
Labels:       app=ado-agent-<pool_name>
              azure.workload.identity/use=true
Annotations:  azure.workload.identity/inject-proxy-sidecar: true
(...)
Init Containers:
  azwi-proxy-init:
    Image:          mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v1.0.0
    (...)
    Environment:
      PROXY_PORT:                  8000
      AZURE_CLIENT_ID:             <managed_identity_client_id>
      AZURE_TENANT_ID:             <managed_identity_tenant_id>
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
Containers:
  ado-agent-<pool_name>:
    Image:          <acr_name>.azurecr.io/<ado_agent_image>
    (...)
    Environment Variables from:
      ado-agent-<pool_name>  Secret  Optional: false
    Environment:
      AZURE_CLIENT_ID:             <managed_identity_client_id>
      AZURE_TENANT_ID:             <managed_identity_tenant_id>
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /mnt/work from temp-data (rw)
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
  azwi-proxy:
    Image:         mcr.microsoft.com/oss/azure/workload-identity/proxy:v1.0.0
    (...)
    Port:          8000/TCP
    Host Port:     0/TCP
    Args:
      --proxy-port=8000
      --log-level=info
    Environment:
      AZURE_CLIENT_ID:             <managed_identity_client_id>
      AZURE_TENANT_ID:             <managed_identity_tenant_id>
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
(...)
Enter fullscreen mode Exit fullscreen mode

Obtaining an access token using workload identity:

  • Guides here and here describe how access tokens are obtained via workload identity. Since we use a sidecar to mimic an IMDS endpoint, getting an access token is described here. Let's try it using kubectl exec.
PS> kubectl exec -it ado-agent-<pool_name>-7b8d79dd65-hwb22 -- pwsh
Defaulted container "ado-agent-<pool_name>" out of: ado-agent-<pool_name>, azwi-proxy, azwi-proxy-init (init)
PowerShell 7.2.6
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS /azp> Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -Headers @{Metadata="true"}

StatusCode        : 200
StatusDescription : OK
Content           : {"access_token":"eyJ0eXA...
RawContent        : HTTP/1.1 200 OK
                    Server: azure-workload-identity/proxy/v1.0.0 (linux/amd64) 9893baf/2023-03-27-20:58
                    Date: Tue, 23 May 2023 09:17:48 GMT
                    Content-Type: application/json
                    Content-Length: 1707

                    {"access_to…
Headers           : {[Server, System.String[]], [Date, System.String[]], [Content-Type, System.String[]], [Content-Length, System.String[]]}
Images            : {}
InputFields       : {}
Links             : {}
RawContentLength  : 1707
RelationLink      : {}


PS /azp> (Invoke-RestMethod -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -Headers @{Metadata="true"}).access_token
eyJ0eXA...
Enter fullscreen mode Exit fullscreen mode

workload-identity-get-token

We are now ready to run a Terraform pipeline against this new agent pool.

Running a pipeline:

  • Create a Terraform repo in Azure DevOps to create some example resources.
### main.tf
terraform {
  <backend_configuration>
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "<4.0.0"
    }
  }
}

provider "azurerm" {
  # We can just use this flag to authenticate
  use_msi         = true
  subscription_id = "<subscription_id>"
  features {}
}

resource "azurerm_application_insights" "example" {
  name                = "appins-omtest1"
  location            = "westeurope"
  resource_group_name = "rg-identity-omtest1"
  application_type    = "web"
}
Enter fullscreen mode Exit fullscreen mode
  • Create YAML pipeline to run Terraform:
name: $(Rev:rr)
trigger:
- main
pool: omtest1

steps:
- pwsh: |
    terraform init
    terraform plan -out tfplan
    terraform apply -auto-approve tfplan
  workingDirectory: $(Build.SourcesDirectory)/terraform-deploy-with-managed-identity
Enter fullscreen mode Exit fullscreen mode
  • Result:

pipeline-result

pipeline-result-azure

In summary:
We have our ADO agent running in AKS and use workload + managed identity combination to achieve keyless authentication to Azure for running Terraform pipelines.

Top comments (0)