DEV Community

Hector Flores
Hector Flores

Posted on • Originally published at htek.dev

GitHub Actions at Enterprise Scale: The Identity-First Platform That Took Us from 3 Teams to 1,000 Repos

Copy-Paste Workflows Don't Scale

Every platform team hits the same wall. You start with a handful of repos, each with bespoke CI/CD workflows. Twelve months later you have 200 repos, and every deployment pipeline is a snowflake. Engineers copy YAML from Slack threads. Secrets sprawl across repositories. Nobody can answer "who deployed what, and with which permissions?"

I hit this wall at a Fortune 500 energy company, managing CI/CD for an enterprise DevOps platform. We went from 2–3 teams to 300 teams across roughly 1,000 repositories — all on GitHub Actions — in under two years. The secret wasn't better YAML. It was treating Actions as a platform engineering problem, starting from identity.

GitHub Actions processed 11.5 billion minutes in 2025 alone — up 35% year-over-year — with 71 million jobs running per day on its re-architected backend. At that scale, the question isn't "does Actions work?" — it's "how do you govern it without becoming a bottleneck?"

Here's the recipe: identify bottlenecks → codify them → scale identity.

The Subject Claim Problem (And Why I Built an OIDC Broker)

GitHub Actions supports OpenID Connect (OIDC) federation for passwordless cloud authentication. In theory, every workflow gets a short-lived token scoped to its repo. No more long-lived secrets sitting in repository settings.

In practice? The sub (subject) claim in GitHub's OIDC token has a structural limitation: when you call a reusable workflow, the token's subject reflects the caller context, not the called workflow. This makes it difficult to enforce "only this approved deployment workflow can authenticate to production Azure resources" — because the subject claim doesn't consistently identify which reusable workflow is executing.

GitHub has since added job_workflow_ref as a custom claim and introduced immutable subject claims (enforced for new repos, renames, and transfers after June 18, 2026 — existing repos can opt in now). But when I was building this platform, those features didn't exist yet.

My solution: a custom OIDC server acting as an identity broker.

The broker accepts a GitHub Actions OIDC token, validates it against the caller's identity, checks the requested scope against a centralized policy, and issues a new scoped token for Azure. Think of it as an identity translation layer sitting between GitHub and your cloud provider.

# Composite action: login-to-azure (simplified)
name: "Platform Login to Azure"
description: "Authenticate via OIDC broker with centrally managed scopes"
inputs:
  scope:
    description: "Requested permission scope"
    required: true
runs:
  using: composite
  steps:
    - name: Get GitHub OIDC Token
      uses: actions/github-script@v7
      id: token
      env:
        ACTIONS_ID_TOKEN_REQUEST_URL: ${{ env.ACTIONS_ID_TOKEN_REQUEST_URL }}
      with:
        script: |
          // Requires `permissions: id-token: write` in the caller workflow
          const token = await core.getIDToken('platform-oidc-broker');
          core.setOutput('id_token', token);

    - name: Exchange for Scoped Azure Token
      run: |
        AZURE_TOKEN=$(curl -s -X POST "${{ env.OIDC_BROKER_URL }}/exchange" \
          -H "Authorization: Bearer ${{ steps.token.outputs.id_token }}" \
          -d '{"scope": "${{ inputs.scope }}"}')
        echo "::add-mask::$AZURE_TOKEN"
        echo "AZURE_TOKEN=$AZURE_TOKEN" >> $GITHUB_ENV
      shell: bash
Enter fullscreen mode Exit fullscreen mode

This single composite action became the foundation everything else was built on. Every team authenticates the same way. Every permission is centrally governed. No secrets in repos.

The Framework Stack: Each Framework = GitHub App + Identity + Reusable Workflow

With centralized identity solved, I layered five frameworks on top — each following the same architecture pattern:

Framework Purpose What Teams Define
IAM Identity and access management RBAC roles in a YAML workflow file
Secrets Central Key Vault management Secret names and scopes
IAC Infrastructure as Code (Bicep → Azure) Bicep modules and parameters
Docs Centralized documentation deployment Markdown content
Config Configuration management Environment variables and app settings

Each framework consists of three components:

  1. A GitHub App — provides the automation identity and webhook triggers
  2. An Entra ID (Azure AD) app — holds the federated credential with scoped permissions
  3. A reusable workflow — the actual pipeline logic teams call from their repos

The IAM Framework: The Crown Jewel

The IAM framework is where this architecture pays off most dramatically. Here's the team experience:

# teams/my-team/iam.yaml
name: "payment-service"
rbac:
  - role: "Contributor"
    scope: "/subscriptions/xxx/resourceGroups/payments-prod"
  - role: "Key Vault Secrets User"
    scope: "/subscriptions/xxx/resourceGroups/payments-prod/providers/Microsoft.KeyVault/vaults/payments-kv"
environments:
  - production
  - staging
Enter fullscreen mode Exit fullscreen mode

When a team pushes this file, the IAM framework:

  1. Creates an Entra ID application registration
  2. Configures federated credentials tied to their specific repo
  3. Stores the client ID as a repository variable
  4. Sets up RBAC assignments in Azure

The team then calls the login composite action with a version tag — that's it. Zero portal clicks. Zero tickets. Full auditability.

Result: a new team goes from "we need Azure access" to "we're deploying to production" in a single PR review cycle.

The Scaling Arc: Patterns That Actually Matter

A 2025 practitioner survey of 419 GitHub Actions users found that while reusable actions see heavy adoption, reusable workflows remain underutilized — largely because teams fear versioning complexity and loss of control. This matches what I observed: teams resist reuse unless the abstraction is genuinely simpler than copy-paste.

The patterns that made reuse stick:

1. Composite Actions as the Building Block

Composite actions (not reusable workflows) are where you start. They're simpler to version, test, and compose. Our login-to-azure action is called by every framework's reusable workflow — it's the atomic unit.

2. Reusable Workflows as Contracts

Reusable workflows define the contract — "this is how you deploy infrastructure" or "this is how docs get published." GitHub recently expanded these to support 10 levels of nesting and 50 workflow calls per run, which validates the deep composition patterns we built early.

3. Trigger Type Literacy

The most underrated skill in Actions at scale: understanding trigger types deeply. workflow_call vs workflow_dispatch vs repository_dispatch each has fundamentally different trust boundaries and token behaviors. Most engineers treat them interchangeably — and then get bitten by permission escalation or silent failures.

4. Central Repos as the Source of Truth

Each framework lives in a dedicated repo. Teams never fork — they call with version tags. Updates propagate instantly. Governance lives in one place.

From CI/CD to Intelligent System

The final evolution was adding intelligence on top of the platform. Using webhooks and GitHub Issues, we built:

  • AI-powered issue categorization: incoming platform issues get triaged automatically
  • Automated release notes: framework releases generate changelogs from PR descriptions
  • Policy drift detection: nightly runs compare actual Azure state against declared YAML

None of this required a separate tool. The identity layer, the reusable workflows, and the event system were already there. Intelligence was just another consumer of the same platform primitives.

Your Playbook: The Three-Step Recipe

If you're staring at 50+ repos with snowflake workflows, here's the path:

  1. Solve identity first. Whether you use GitHub's native OIDC (with the newer job_workflow_ref claims and repository custom properties) or build a broker — centralized, auditable identity is your foundation.

  2. Build frameworks, not pipelines. Each framework should be composable (composite action → reusable workflow → team YAML). Teams should define what they need, not how to get it.

  3. Scale the identity, not the humans. When a new team onboards, they shouldn't need a meeting. They define their requirements in YAML, the framework provisions everything, and identity flows through automatically.

AstraZeneca scaled 5,000 developers across 20,000 repositories on GitHub Enterprise using similar patterns — reusable Actions libraries with security baked in by default. The pattern works whether you're 50 engineers or 5,000.

The Bottom Line

GitHub Actions at enterprise scale isn't a YAML problem — it's a platform engineering problem. The organizations that scale are the ones that treat identity as infrastructure, workflows as contracts, and frameworks as products with versioned APIs.

I've written extensively about platform engineering with GitHub and how GitHub Actions debugging fits into this picture. If you're building internal developer platforms, the identity-first approach is the one architecture decision that makes everything else possible.

The recipe hasn't changed since I scaled to 1,000 repos: identify bottlenecks → codify them → scale identity. Everything else is implementation detail.

Top comments (0)