DEV Community: Hector Flores

Platform Team Burnout Is Real — Here's How I Rescued Mine with AI

Hector Flores — Fri, 29 May 2026 13:19:26 +0000

I Built the Perfect Platform — and It Nearly Broke Me

Seventy-three percent of platform engineers work 50+ hour weeks. Nearly a third of organizations report understaffed platform teams. And 58% of platform engineers are on-call for more than 10 services. I know these numbers are real because I lived them — except my story was worse. I was one person responsible for 10 interconnected frameworks spanning 60+ repositories.

This is the story of how I built a platform engineering ecosystem that became my company's greatest asset and my personal greatest liability — and how AI agents pulled me out of the burnout spiral.

The Mandate: Unify Everything

At a Fortune 500 energy company, I was brought in to lead a massive consolidation effort. The engineering org was scattered across Azure DevOps, Bitbucket, Stash, SVN, and a mess of legacy CI/CD tools. My mandate was simple: bring everything under one roof on GitHub.

My approach was equally simple: find developer bottlenecks and fill them with frameworks. Every time I saw engineers struggling — with credentialing, infrastructure provisioning, documentation, runner management — I'd build a framework to solve it.

Over time, I built roughly ten interconnected frameworks:

Identity Management Framework — CI/CD credentialing solved entirely. Developers add a reusable workflow; each job represents an identity they need. RBAC defined through file paths in a central identity repo. Federated credentials use base64-encoded metadata in the description field for state management — no Terraform state files needed. PR approval gates let the identity team review permissions. Merge triggers automatic provisioning via PowerShell.
Infrastructure as Code (IaC) Framework — Centralized all infrastructure provisioning. Developers create Bicep or Terraform in their repo, add a config file referencing the IaC framework, and their repo becomes a fully instrumented IaC module with CI/CD pipelines and credentialing — all automated.
Documentation Framework — Docs-as-code applied org-wide. Consolidated documentation into a unified, maintainable system.
Self-Hosted Runtime Framework — Automated GitHub Actions self-hosted runners. Started as issue-based requests, evolved into demand-based auto-scaling — creating and destroying VMs dynamically based on pipeline demand.
Platform Meta-Framework — The framework that maintains and discovers all other frameworks.
Use Framework — Named after the uses: keyword in GitHub Actions. Handled workflow inventory — repos register their workflows in a central repository, enabling org-wide discovery.
Release Framework — Standardized release actions and processes across the organization.
Plus additional specialized frameworks handling discovery, inventory, and integration patterns across the ecosystem.

These frameworks weren't isolated. They formed a web. Most consumed the Identity Framework for Azure access. Registration-based frameworks fed into the Documentation Framework. Frameworks needing Azure resources consumed the IaC Framework. A beautiful, complex web of internal tooling — and exactly what GitHub's Well-Architected guidance recommends for enterprise-scale reusable workflows.

The Framework Web: 10 interconnected frameworks spanning 60+ repos, all maintained by a single engineer. Identity Management sits at the center — nearly every framework depends on it.

I wrote about the architectural patterns behind this approach in Platform Engineering with GitHub: How to Build an Internal Developer Platform. The technical approach was sound. The organizational model was not.

The Burnout Equation

Here's where the beauty becomes the beast.

Sixty-plus repositories of extremely high complexity. One person maintaining all of them. A backlog that grew to 500+ open issues. I became both a massive asset and a critical liability simultaneously.

The support team couldn't keep up — nobody else had the depth to maintain these repos. Classic hero engineer anti-pattern: "exceptional individuals who alone understand how these Lego blocks fit together become single points of failure, centralizing critical knowledge and leaving the broader system brittle and unsustainable."

That was me. Textbook.

Microsoft calls this the human scale problem — the fundamental mismatch between platform complexity and team capacity. My 10 frameworks were the right technical solution, but they exceeded human scale for a single maintainer.

And here's the irony that Thoughtworks nails perfectly: "Platform engineering often starts as a promise of freedom but devolves into a labyrinth — systems so complex and cognitively heavy that they become the very bottlenecks they were meant to solve." I built frameworks to remove developer bottlenecks, and those frameworks became the bottleneck when I couldn't maintain them fast enough.

Platform engineering doesn't eliminate cognitive load. It redistributes the burden into an increasingly narrow cohort.

That "narrow cohort" was exactly one person. The 500-issue backlog was proof that the redistribution had reached its breaking point.

The Burnout Equation: When platform scale exceeds human capacity, the math becomes unsustainable — until AI agents change the equation entirely.

The Rescue: From Developer to Reviewer

Then GitHub Copilot arrived, and everything changed.

I went from developing to reviewing.

Instead of writing code across 60+ repos myself, I was running six work streams simultaneously every day. Copilot agents would pick up issues, generate solutions, and open pull requests. My job shifted to cycling through reviews:

Review PR → leave comment → next PR → leave comment → next PR...

On peak days, I was reviewing close to 100 PRs per day. The 500-issue backlog started getting crushed. Work that would have taken me months to develop was being generated, reviewed, and merged in days.

This wasn't just my experience being lucky. The data backs it up at scale:

GitHub's research with Accenture shows Copilot enables developers to code up to 55% faster with 85% higher confidence in code quality
Copilot's coding agent is now contributing approximately 1.2 million PRs per month across the platform
72.6% of Copilot code review users report improved effectiveness — validating the "reviewer, not developer" workflow
67% of enterprise engineers now use Copilot for AI-assisted code review, far ahead of any alternative

The workflow shift is the key insight. I didn't just code faster — I changed what my job was. The bottleneck dissolved because the constraint wasn't my technical skill. It was my typing speed multiplied by context-switching overhead across 60+ repos. AI agents eliminated both.

The Workflow Shift: From developer mode (writing code, context-switching, ~5 PRs/day) to reviewer mode (reviewing AI-generated PRs across 6 parallel streams, ~100 PRs/day).

You Don't Have to Be Solo for This to Matter

My story is an extreme case — one person, ten frameworks, sixty repos. But the pattern repeats everywhere.

WEX, a global fintech, consolidated 300+ Azure DevOps organizations onto GitHub Enterprise and deployed Copilot across 1,700+ engineers. Result: 30% higher developer productivity, approximately 60% ROI on Copilot licenses, and a 99% reduction in deployment cycle times. Nearly the same journey as mine — Azure DevOps to GitHub, then layering AI on top — but at enterprise scale with a full team.

A KubeCon survey of 143 platform professionals found four pain points reported at nearly equal rates: hiring the right people, too many tools for the team size, operational overload, and no time for automation. Two consecutive years of the same survey, same answers. "Too many tools for the team size" — that's the one-sentence summary of every platform engineer's reality.

The success stories from companies like Volvo (1,000+ weekly users on Backstage) and Zepto (90% setup time reduction) all share one common thread: they had teams. Dedicated platform engineering teams staffed to maintain what they built. When you don't have that luxury, AI becomes the team multiplier.

What Platform Teams Should Do Right Now

If you're drowning in a maintenance backlog — whether you're a team of one or a team of ten — here's what I learned:

Shift your identity from developer to reviewer. The highest-leverage activity isn't writing code. It's reviewing AI-generated PRs and ensuring they meet your standards. Your deep domain knowledge becomes the quality gate, not the bottleneck.
Start with the backlog, not greenfield. AI agents thrive on well-defined issues. Point them at your 500-item backlog, not ambiguous new features. Bug fixes, dependency updates, documentation — these are perfect candidates for AI-assisted PRs.
Run multiple work streams in parallel. The biggest unlock wasn't speed on any single task — it was running six work streams simultaneously. Each stream had its own set of issues and PRs. I cycled between them continuously.
Don't wait for perfect. Your framework ecosystem doesn't need to be perfectly documented for AI to be useful. Start assigning issues and iterating on the generated code. You'll converge faster than writing it all yourself.
Measure the shift. Track your ratio of code written vs. code reviewed. When that ratio flips — when you're reviewing more than you're writing — you've broken through the solo maintainer ceiling.

The Bottom Line

Platform team burnout isn't a people problem. It's a scale problem. We build incredible infrastructure — 82% of enterprises now have dedicated platform teams — but the maintenance burden grows faster than headcount.

The answer isn't always hiring more engineers. Sometimes it's giving the existing ones AI-powered development tools that multiply their output by 10x. I went from drowning in a 500-issue backlog to crushing it at 100 PRs a day. The developer becomes the reviewer. The backlog becomes manageable. The hero engineer becomes a scalable team of one.

If one person with GitHub Copilot can maintain 60+ complex repos and review 100 PRs per day, then platform team burnout is solvable. That's not theory — I lived it.

This experience is what convinced me to specialize in agentic development. Because the workflow shift from developer to reviewer isn't just a productivity hack. It's the future of platform engineering — and if you've been buried under a backlog you helped create, you should know: there's a way out.

The Definitive GitHub Actions Debugging Guide: 65+ Real Errors and How to Fix Them

Hector Flores — Fri, 29 May 2026 13:18:10 +0000

GitHub Actions is the CI/CD backbone for millions of repositories. It's also the source of some of the most confusing, silent, and undocumented failure modes in modern DevOps.

I've spent years debugging Actions workflows — first across 500+ repository migrations at an enterprise scale, then building agentic DevOps platforms that push Actions to its limits. This guide is the result: every error message I've collected, every silent failure I've traced, and every workaround that actually works.

This is a reference guide, not a tutorial. Bookmark it. Search it when something breaks. Every section includes the actual error message (so you can Ctrl+F or Google it), the root cause, and the fix with copy-paste code.

Quick Diagnosis Flowchart

Start here: identify your failure category before diving into 65+ specific scenarios.

Before diving into 65+ scenarios, start here:

Workflow never appears in Actions tab? → YAML Syntax Issues or Trigger Problems
Workflow runs but a step fails? → Check the error message against the sections below
Workflow runs but produces wrong results silently? → Silent Failures
Secrets are empty or permissions denied? → Secrets & Permissions
Cache miss or artifact not found? → Caching & Artifacts
Jobs cancelled unexpectedly? → Concurrency Issues

Pro tip: Install actionlint right now. It catches the majority of syntax and context issues in this guide before you push. Run it locally or add it to your CI: uses: raven-actions/actionlint@v2.

YAML Syntax & Validation Errors

These errors prevent your workflow from even registering with GitHub. No run appears — the workflow is silently rejected.

Unexpected or Typo'd YAML Keys

Error:

The workflow is not valid. .github/workflows/ci.yml (Line: 6, Col: 5):
Unexpected value 'default'

unexpected key "Shell" for step to run shell command. expected one of
"continue-on-error", "env", "id", "if", "name", "run", "shell",
"timeout-minutes", "working-directory" [syntax-check]

Root cause: YAML key names in GitHub Actions are case-sensitive. default: is not defaults:. Shell: is not shell:. branch: is not branches:.

Fix: Use actionlint to catch these before pushing. Common corrections:

default: → defaults:
branch: → branches:
Shell: → shell:

Standard YAML linters (yamllint, Python yaml.safe_load()) won't catch these because the YAML is syntactically valid — it's semantically wrong for GitHub Actions.

Missing Required Keys

Error:

"runs-on" section is missing in job "test" [syntax-check]
"jobs" section should not be empty [syntax-check]

Fix: Every job needs runs-on: and at least one entry in steps:. Matrix keys are compared case-insensitively — node and NODE cannot coexist.

Expression Syntax Errors

Error:

got unexpected character '"' while lexing expression...
do you mean string literals? only single quotes are available
for string delimiter [expression]

Root cause: GitHub Actions expressions use a custom mini-language, not JavaScript. Double quotes are not valid string delimiters. The + operator doesn't exist for concatenation.

Fix:

# ❌ Wrong
run: echo "${{ "hello" }}"
run: echo "${{ var1 + var2 }}"

# ✅ Correct
run: echo "${{ 'hello' }}"
run: echo "${{ format('{0}{1}', var1, var2) }}"

Context Variable Type Errors

Error:

receiver of object dereference "owner" must be type of object but
got "string" [expression]

Root cause: github.repository is a string ("owner/repo"), not an object. People try github.repository.owner expecting the org name.

Fix: Use github.repository_owner for the owner. Use toJSON(env) to dump environment variables, not ${{ env }} (which outputs the string 'Object').

`secrets.*` in Unexpected Contexts — Silent Failures

Error: No error. The workflow behaves unexpectedly or steps are silently skipped.

Root cause: While secrets is technically available in step if: conditions, using it there can cause unexpected behavior — particularly in composite actions, reusable workflows, or when the secret is undefined. The expression evaluates to empty string for undefined secrets, which can cause conditions to behave differently than expected.

Fix:

# ⚠️ Can behave unexpectedly with undefined secrets
- if: ${{ secrets.MY_SECRET != '' }}
  run: echo "has secret"

# ✅ Map to env first, then check env (more reliable)
- env:
    MY_SECRET: ${{ secrets.MY_SECRET }}
  run: |
    if [ -n "$MY_SECRET" ]; then
      echo "has secret"
    fi

This pattern is especially dangerous because the failure mode is silence — no error, no notification. The env-mapping approach is more explicit and actionlint can validate it.

`env` Context Unavailable in Reusable Workflow `with:`

Error:

Unrecognized named-value: 'env'. Located at position 1 within
expression: env.SOMETHING

Root cause: The env context is not available in the with: block when calling reusable workflows. This is a confirmed open bug with 226+ reactions.

Fix: Pass values via github.event.inputs, secrets: inherit, or hardcode them. There is no clean workaround — this is a known platform limitation.

`if:` Conditionals Always Evaluating to `true`

Error: No error. The step always runs regardless of condition.

Root cause: Using YAML block scalar |, trailing spaces, or wrapping ${{ }} with extra characters makes the condition a non-empty string — which is always truthy.

# ❌ Always true — trailing newline from |
if: |
  ${{ github.event_name == 'push' }}

# ❌ Always true — trailing space
if: "${{ github.event_name == 'push' }} "

# ❌ Always true — extra characters between ${{ }} blocks
if: ${{ github.event_name == 'push' }} && ${{ github.ref_name == 'main' }}

Fix:

# ✅ Correct — no extra characters
if: github.event_name == 'push'

# ✅ Correct — single expression, no wrapping needed
if: github.event_name == 'push' && github.ref_name == 'main'

Boolean Inputs Are Strings in Composite Actions

# In composite action — this is ALWAYS false:
if: ${{ inputs.realRun == true }}

Root cause: Composite actions receive all inputs as strings, even when declared with type: boolean. This is a confirmed bug with 117+ reactions.

Fix: Compare to the string 'true':

if: ${{ inputs.realRun == 'true' }}

Composite Actions: No `defaults:` Support

Root cause: Composite actions do not support the defaults: key. You cannot set a default shell. Every run: step must explicitly specify shell:.

Fix:

runs:
  using: composite
  steps:
    - run: echo "hello"
      shell: bash        # Required on EVERY step
    - run: echo "world"
      shell: bash        # Must repeat

Tab Characters in YAML

Error:

found a tab character where an indentation space is expected

Fix: YAML does not allow tabs for indentation. In VS Code: View → Render Whitespace. Add to .editorconfig:

[*.yml]
indent_style = space
indent_size = 2

Silent Failures: The Most Dangerous Category

The most dangerous bugs are the ones your pipeline says passed.

These are the scenarios where nothing visibly breaks — your workflow just does the wrong thing.

Scheduled Workflows Silently Disabled After 60 Days

Symptom: A cron workflow that's been running for months just stops. No notification.

Root cause: GitHub automatically disables schedule-triggered workflows after 60 days of repository inactivity (no commits). Workflow runs themselves don't count as activity.

Fix:

- uses: gautamkrishnar/keepalive-workflow@v2
  with:
    time_elapsed: '45'  # triggers 15 days before the 60-day cutoff

Or re-enable manually:

gh workflow enable "Workflow Name" --repo OWNER/REPO

`GITHUB_TOKEN` Cannot Trigger Downstream Workflows

Symptom: A workflow pushes a commit or creates a tag, but the expected downstream workflow (triggered by on: push) never fires.

Root cause: This is by design. Commits made with GITHUB_TOKEN do not trigger further workflow runs — it's GitHub's recursion prevention mechanism.

Fix: Use a GitHub App installation token or a PAT:

- uses: actions/create-github-app-token@v1
  id: app-token
  with:
    app-id: ${{ vars.APP_ID }}
    private-key: ${{ secrets.APP_PRIVATE_KEY }}

- uses: actions/checkout@v4
  with:
    token: ${{ steps.app-token.outputs.token }}

Cache Rate Limiting Falls Through as "Cache Not Found"

Error:

Warning: Failed to restore: Failed to GetCacheEntryDownloadURL:
Rate limited: Failed request: (429) Too Many Requests
Cache not found for input keys: ...

Root cause: When the cache API rate limits you, the action reports it as a cache miss — not a rate limit error. Your build proceeds without cache, silently slower.

Fix: Don't trigger hundreds of parallel matrix jobs all saving caches simultaneously. Stagger cache operations or use fewer, broader cache keys.

Fork PR Secrets Evaluate to Empty Strings

Symptom: A contributor opens a PR from a fork. Secret-dependent steps fail or skip silently.

Root cause: Secrets are not passed to workflows triggered by pull_request from forks. This is a deliberate security boundary.

Fix: Design CI to not require secrets for tests. For deployment previews after code review, use pull_request_target with a mandatory label gate:

on:
  pull_request_target:
    types: [labeled]

jobs:
  deploy-preview:
    if: github.event.label.name == 'safe to test'
    # ...

⚠️ Security warning: Never checkout fork code with pull_request_target and then run it with repository secrets. This creates a pwn-request vulnerability.

Runner & Environment Problems

Self-Hosted Runner Registration & Update Loops

Error:

Runner update in progress, do not shutdown runner.
Downloading 2.277.1 runner... Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
[...loops again...]

Root cause: Containerized runners built on older Ubuntu images (18.04) hit glibc incompatibility when auto-update downloads a newer runner binary.

Fix:

Rebuild container on Ubuntu 22.04+
Disable auto-update: DISABLE_AUTO_UPDATE=1
Add rm -rf /home/runner/actions-runner to container entrypoint before ./config.sh
Add watchdog cron polling GET /orgs/{org}/actions/runners every 5 minutes

Runner Out of Disk Space

Error:

No space left on device (os error 28)

Root cause: GitHub-hosted ubuntu-latest runners have ~14GB usable, but pre-installed toolchains (Android SDK ~8GB, .NET ~1.5GB, Haskell ~5GB) consume most of it.

Fix: Add a cleanup step before heavy builds:

- name: Free Disk Space
  uses: jlumbroso/free-disk-space@main
  with:
    tool-cache: false
    android: true
    dotnet: true
    haskell: true
    large-packages: true

This reclaims ~10-15GB.

Environment Variables Not Persisting Between Steps

Error:

Warning: The `set-output` command is deprecated and will be disabled soon.

Root cause: ::set-output and ::set-env were deprecated in favor of environment files.

Fix:

# ❌ Deprecated
- run: echo "::set-output name=dir::$(yarn cache dir)"

# ✅ Current
- run: echo "dir=$(yarn cache dir)" >> $GITHUB_OUTPUT

# For multi-line values:
- run: |
    echo "MY_VAR<<EOF" >> $GITHUB_ENV
    echo "$multiline_value" >> $GITHUB_ENV
    echo "EOF" >> $GITHUB_ENV

Tools Not Found in Next Step (PATH Issues)

Error:

/bin/bash: my-tool: command not found

Root cause: Each run: step spawns a fresh shell. export PATH=... is lost when that step ends.

Fix: Write to $GITHUB_PATH, not PATH:

- name: Install tool
  run: |
    pip install my-cli-tool
    echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Use tool  # PATH is now updated
  run: my-cli-tool --version

Docker Not Available on Runner

Error:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?

Root cause: ubuntu-latest-slim, ARC containers, and self-hosted runners without DinD don't expose Docker.

Fix:

Standard ubuntu-latest: Docker is available natively
ARC/containerized: Use DinD sidecar or switch to JavaScript/composite actions
For private registry pulls, add docker/login-action before container actions

Service Container Connectivity

Error:

connection to server at "localhost", port 5432 failed: Connection refused

Root cause: In containerized jobs (container: at job level), service containers are on a Docker bridge network. localhost doesn't work.

Fix: Always add health checks, and use the service label as hostname in containerized jobs:

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_PASSWORD: password
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

For containerized jobs, connect to postgres:5432 (the service label), not localhost:5432.

Runner Image Deprecation

Error:

No hosted runners with requested label(s): 'ubuntu-18.04' can be found.
sudo: docker-compose: command not found

Fix:

# ❌ Removed
- run: sudo docker-compose up -d

# ✅ Docker Compose v2 plugin syntax
- run: sudo docker compose up -d

Track upcoming removals at the actions/runner-images releases.

Windows Runner Gotchas

Error:

AssertionError: expected '40-learnings\\passesdefaultgate.md' to contain '40-learnings/'

Root cause: Path separators (\ vs /), missing POSIX tools (jq, sed), shebangs not honored, CRLF line endings.

Fix:

defaults:
  run:
    shell: bash  # uses Git Bash on Windows

# Install missing tools
- if: runner.os == 'Windows'
  run: choco install jq -y
  shell: pwsh

# Disable CRLF auto-conversion
- run: git config --global core.autocrlf false

Node.js Runtime Deprecation

Error:

Node.js 16 actions are deprecated. Please update the following actions
to use Node.js 20: actions/checkout@v3, actions/cache@v3

Fix: Bump to latest major versions of all actions. For own actions, update action.yml to runs.using: node24. Emergency workaround:

env:
  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: 'true'

Deprecation timeline: node12 (cutoff mid-2023) → node16 (mid-2024) → node20 (enforcement rolling out 2025-2026). Check the GitHub Actions changelog for the latest timeline.

Secrets, Permissions & Authentication

The GitHub Actions permission model: repo defaults → workflow permissions block → GITHUB_TOKEN scope. The #1 source of 403 errors.

`GITHUB_TOKEN` Permission Denied (403)

Error:

remote: Permission to org/repo.git denied to github-actions[bot].
fatal: unable to access '...': The requested URL returned error: 403

Root cause: Default GITHUB_TOKEN is read-only since GitHub tightened defaults for new repos and orgs in February 2023.

Fix: Add explicit permissions: to the job:

permissions:
  contents: write       # git push
  pull-requests: write  # PR creation
  packages: write       # GHCR push

Critical: The permissions: block completely replaces defaults. Any permission not listed becomes none. Listing only contents: write drops all other permissions including pull-requests.

OIDC Federation Failures with AWS

Error:

Could not assume role with OIDC: Not authorized to perform
sts:AssumeRoleWithWebIdentity

Root causes and fixes:

Reusable workflows change the sub claim. The OIDC JWT subject reflects the calling repo, not the reusable workflow's repo. IAM trust policies must match the caller.
Missing permissions: id-token: write on the calling job.
Audience mismatch:

- uses: aws-actions/configure-aws-credentials@v4
  with:
    audience: sts.amazonaws.com  # must match trust policy
    role-to-assume: arn:aws:iam::123456789012:role/MyRole
    aws-region: us-east-1

Cross-Repo Access (403)

Error:

remote: Permission to other-org/other-repo.git denied to github-actions[bot].

Root cause: GITHUB_TOKEN is scoped to a single repository. It cannot access other repos — this is a security boundary by design.

Fix: Use a GitHub App installation token (recommended) or a PAT:

- uses: actions/create-github-app-token@v1
  id: app-token
  with:
    app-id: ${{ vars.APP_ID }}
    private-key: ${{ secrets.APP_PRIVATE_KEY }}
    repositories: "target-repo"

- uses: actions/checkout@v4
  with:
    token: ${{ steps.app-token.outputs.token }}
    repository: org/target-repo

Environment Protection Rules Blocking Deployments

Error:

This deployment was rejected

Root cause: The triggering ref doesn't match the environment's allowed branches/tags filter, or the required reviewer also triggered the workflow (GitHub doesn't allow self-approval).

Fix: Ensure the triggering ref matches the environment's branch filter pattern. Add a second reviewer if the triggering user is the sole required reviewer.

GitHub App Token Generation Failures

Error:

error:0909006C:PEM routines:get_name:no start line

Root cause: Private key corrupted during shell escaping or base64 encoding.

Fix: Store the raw PEM file directly as a GitHub secret:

gh secret set APP_PRIVATE_KEY < my-app.private-key.pem

Use actions/create-github-app-token@v1 (official, node20-native) instead of tibdex/github-app-token.

Docker Registry Auth (GHCR)

Error:

denied: installation not allowed to Write organization package

Fix:

Add permissions: packages: write to the job
For org packages: visit package settings → Manage Actions Access → add the repository with Write access
Don't set DOCKER_CONFIG: $HOME/.docker at job level — it breaks credential persistence

Dependabot Secrets Namespace

Root cause: Dependabot runs in a separate secrets namespace. Repository secrets are not available to Dependabot-triggered workflows.

Fix: Add secrets to both namespaces:

gh secret set NPM_TOKEN --body "npm_xxx" --app actions
gh secret set NPM_TOKEN --body "npm_xxx" --app dependabot

PAT vs. GITHUB_TOKEN Decision Matrix

Scenario	Use
Push to same repo	`GITHUB_TOKEN` + `contents: write`
Create PR on same repo	`GITHUB_TOKEN` + `pull-requests: write`
Push to different repo	GitHub App token or PAT
Trigger another workflow	PAT (GITHUB_TOKEN can't trigger workflows)
Cross-org operations	Classic PAT with `repo` scope

Prefer GitHub App tokens over PATs: PATs are tied to individuals (leave org = token breaks), expire, and are harder to audit.

Caching, Artifacts & Dependencies

Cache Miss Despite Recent Save

Error:

Cache not found for input keys: Linux-node-abc123def456

Root causes:

Branch scoping: Caches from main are accessible to branches, but not vice-versa
Version mismatch: Changing OS or compression tool changes the cache version hash
Rate limiting: 429s fall through silently as "cache not found"
Infrastructure outage: Check githubstatus.com

Fix: Always prime cache on the default branch first. Use the List Caches API to debug version mismatches.

`cache-hit` Output Semantics

# ❌ Wrong — cache-hit is empty string (not 'false') on full miss
if: steps.cache.outputs.cache-hit == 'false'

# ✅ Correct — always use != 'true'
if: steps.cache.outputs.cache-hit != 'true'

cache-hit is 'true' on exact key match, empty string on miss, and 'false' on restore-keys match. Yes, really.

Cache Size Limit (10 GB Per Repo)

Symptom: Random cache misses on older branches.

Root cause: Repos have a 10 GB total cache limit. Oldest caches are LRU-evicted silently.

Fix: Clean up branch caches on PR close:

on:
  pull_request:
    types: [closed]
jobs:
  cleanup:
    runs-on: ubuntu-latest
    permissions:
      actions: write
    steps:
      - run: |
          for id in $(gh cache list --ref refs/pull/${{ github.event.pull_request.number }}/merge \
            --limit 100 --json id --jq '.[].id'); do
            gh cache delete $id
          done
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GH_REPO: ${{ github.repository }}

`upload-artifact` v3 → v4 Breaking Changes

Error:

An artifact with the same name already exists for the associated workflow run.

Root cause: v4 artifacts are immutable. Multiple jobs can no longer upload to the same artifact name.

Fix:

# v4 — unique names per matrix job
- uses: actions/upload-artifact@v4
  with:
    name: build-${{ matrix.os }}-${{ matrix.node }}

# Download all and merge
- uses: actions/download-artifact@v4
  with:
    pattern: build-*
    merge-multiple: true
    path: dist/

Cross-Workflow Artifact Download

Error:

Unable to download artifact(s): Artifact not found for name: my-artifact

Fix: Both upload and download must use the same version family (v3↔v3 or v4↔v4 — they use different storage backends):

- uses: actions/download-artifact@v4
  with:
    name: my-artifact
    github-token: ${{ secrets.GITHUB_TOKEN }}  # required for cross-workflow
    run-id: ${{ github.event.workflow_run.id }}

`npm ci` Cache Save Timeout

Error:

The operation was canceled.

Root cause: Cache save (tar compression) on large node_modules exceeds the job timeout. Missing zstd in DinD containers forces slow gzip fallback.

Fix: Cache ~/.npm (the npm cache directory), not node_modules:

- uses: actions/cache@v5
  with:
    path: ${{ steps.npm-cache-dir.outputs.dir }}
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}

For DinD environments, install zstd: apt-get install -y zstd.

Docker Layer Caching

Error:

cache export feature is currently not supported for docker driver

Fix: You must use docker/setup-buildx-action first — the default Docker driver doesn't support cache export:

- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v6
  with:
    cache-from: type=gha,scope=${{ github.workflow }}
    cache-to: type=gha,mode=max,scope=${{ github.workflow }}

Cache Corruption

Error:

tar: Error is not recoverable: exiting now
gzip: stdin: unexpected end of file

Fix: Delete the corrupt cache via CLI:

gh cache list --repo owner/repo
gh cache delete <cache-id> --repo owner/repo

Prevent future corruption with a download timeout:

env:
  SEGMENT_DOWNLOAD_TIMEOUT_MINS: 5

Git LFS Files Not Downloaded

Symptom: Binary files are 140-byte text pointers instead of actual content.

Fix:

- uses: actions/checkout@v4
  with:
    lfs: true
    fetch-depth: 1

Cache LFS objects to reduce bandwidth:

- uses: actions/cache@v5
  with:
    path: .git/lfs
    key: ${{ runner.os }}-lfs-${{ hashFiles('.lfsconfig') }}

Lockfile Hash Returns Empty String

Error:

Cache not found for input keys: Linux-node-

Root cause: hashFiles('**/package-lock.json') matched no files, returning empty string.

Fix: Debug with:

- run: |
    echo "Hash: ${{ hashFiles('**/package-lock.json') }}"
    find . -name "package-lock.json" -not -path "*/node_modules/*"

Correct patterns per ecosystem:

# npm
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
# pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt', '**/pyproject.toml') }}
# Gradle
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}

Trigger Problems

Workflow Not Triggering At All

No error. No run appears.

Root causes (in priority order):

Workflow file is not on the default branch
YAML syntax error (silently rejected)
Branch filter mismatch (branches: [master] but default is main)
Workflow disabled via UI or inactivity
Commit made by GITHUB_TOKEN (won't trigger downstream)

Fix:

# Check workflow state
gh workflow list
gh workflow view "My Workflow"

`workflow_dispatch` Button Not Showing

Root causes:

Workflow file not on default branch (most common)
No write access to repository
Wrong YAML indentation:

# ❌ Wrong — nested under push
on:
  push:
    branches: [main]
    workflow_dispatch:      # indented under push

# ✅ Correct — sibling of push
on:
  push:
    branches: [main]
  workflow_dispatch:        # same level as push

Cron Schedule Running Late or Not Running

Root cause: GitHub does not guarantee cron timing. During high load, scheduled runs can be delayed by hours or skipped entirely. Minimum interval is 5 minutes. Public/free-tier repos are deprioritized. All times are UTC.

A real-world case: workflow configured for */10 * * * * (expected ~144 runs/day), but only 4 runs fired in 32 hours.

Fix: For time-sensitive operations, use an external cron service to trigger workflow_dispatch via API. Accept a ±1 hour SLA for GitHub-hosted scheduled workflows.

`workflow_run` Not Firing

Root causes:

The listener workflow must be on the default branch
workflows: ["CI Build"] must exactly match the source workflow's name: field
Missing types: [completed] — without it, fires on both start and finish
Source workflow triggered by GITHUB_TOKEN (recursion prevention)

Fix:

on:
  workflow_run:
    workflows: ["CI Build"]     # exact match to name: in source workflow
    types: [completed]

jobs:
  post-build:
    if: github.event.workflow_run.conclusion == 'success'

`repository_dispatch` Returns 204 But Workflow Doesn't Run

Root cause: API returns 204 even when event_type doesn't match — the mismatch is silent.

Fix: Verify event_type exactly matches the workflow's types::

on:
  repository_dispatch:
    types: [docker-image-updated]  # must EXACTLY match API call

Path Filters Not Working as Expected

Root cause: paths: and paths-ignore: are mutually exclusive — using both on the same event is not supported. docs (without /**) matches a file literally named docs, not the directory.

Fix:

# Correct: ignore docs directory
on:
  push:
    paths-ignore:
      - 'docs/**'
      - '*.md'

Tag Push vs. Release Published

Trigger	When It Fires	Use Case
`push: tags: [v*]`	On tag push	Binary build
`release: types: [created]`	Release created	Build + draft release
`release: types: [published]`	Explicit publish	Deploy to prod

Concurrency & Timing

Jobs Cancelled Unexpectedly

Root cause: Overly broad concurrency group key. Using group: ${{ github.workflow }} alone means all runs compete, even on different branches.

Fix:

# PR workflows — cancel stale runs on same PR
concurrency:
  group: ci-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

# Production deploys — queue, never cancel
concurrency:
  group: deploy-production
  cancel-in-progress: false

# Branch-sensitive — cancel only on non-default branches
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

Empty `head_ref` Causing Cross-Branch Cancellation

Root cause: github.head_ref is empty for push events. All push-triggered runs get the same group key and cancel each other.

Fix:

concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}

Job `needs` Failure Cascading

Symptom: A downstream job is Skipped even though you want it to run after upstream failure.

Root cause: Default if: on every job is success(), meaning "only run if ALL needs jobs succeeded."

Fix:

# Always run (notifications, cleanup)
final-job:
  needs: [job-a, job-b]
  if: always()
  steps:
    - if: contains(needs.*.result, 'failure')
      run: exit 1

Default Timeout is 6 Hours

Root cause: A hung test suite silently consumes a runner for 6 hours.

Fix: Always set timeout-minutes at the job level:

jobs:
  test:
    timeout-minutes: 20
    steps:
      - run: npm test
        timeout-minutes: 10

Matrix `include` vs. `exclude` Confusion

Key insight:

include entries that match ALL existing keys add properties to the existing row — they don't create a new job
include entries that match NO existing cell create a new job
exclude requires ALL keys to exist in the base matrix — unknown keys are silently ignored
Max 256 matrix jobs per workflow run

strategy:
  fail-fast: false  # strongly recommended for diagnostics
  matrix: ${{ fromJSON(needs.prepare.outputs.matrix) }}

Dynamic Matrix and Required Status Checks

The problem: Matrix job names like test (ubuntu-latest, 16) change when matrix values change. Branch protection requires exact string matches — no wildcards.

Fix: Add a stable summary job and require that instead:

test-summary:
  needs: [test]
  if: always()
  runs-on: ubuntu-latest
  steps:
    - if: needs.test.result != 'success'
      run: exit 1

Known Unsolved Problems

These are confirmed platform limitations with no clean workaround. Understanding them saves hours of debugging dead ends.

No SSH / Interactive Debugging (#241 — 107 👍, open since 2019)

The runner has no TTY allocated. Interactive debugging is not possible natively. Workarounds like mxschmitt/action-tmate open SSH reverse tunnels but are a security risk (session URL is in public logs).

No Step-Level Retry

There's no native retry: 3 syntax on steps. Use nick-fields/retry for run: steps, or a bash loop:

for i in 1 2 3; do
  flaky-command && break || sleep 15
done

No Early-Exit / Step Flow Control (#662 — 1,031 👍)

The highest-voted open runner issue. You cannot exit a job early with a specific conclusion (success/neutral). Every step must use if: guards to skip, creating verbose YAML.

Reusable Workflows Cannot Be Called from Composite Actions

Composite actions are inlined steps on the parent runner. Calling a reusable workflow (which spawns a separate runner) from inside a composite action is architecturally impossible without a lifecycle model redesign.

No `services:` or `container:` in Composite Actions (ADR 0549)

By architectural decision. Service containers require Docker lifecycle management at the job level — composite actions don't have job-level lifecycle.

Secret Masking Edge Cases (#475 — 68 👍, open since 2020)

::add-mask:: echoes the secret value before the mask takes effect. Short secrets (1-3 chars) cause entire log lines to become ***. Base64 and URL-encoded versions of secrets may not be masked.

Cost/Billing Opacity

No per-workflow, per-job, or per-repository breakdown of Actions minutes. The billing page shows total org-level usage. Use gh api /repos/{owner}/{repo}/actions/runs/{id} for approximate per-run duration.

Essential Tooling

`actionlint` — The Single Most Impactful Tool

rhysd/actionlint catches the majority of syntax, context, and type errors in this guide before you push:

# Install
go install github.com/rhysd/actionlint/cmd/actionlint@latest
# Or brew install actionlint

# Run
actionlint

# In CI
- uses: raven-actions/actionlint@v2

It validates: YAML syntax, expression types, context availability, matrix configurations, reusable workflow inputs/outputs, shell script syntax, and action version compatibility.

Online Playground

Don't want to install anything? Use the actionlint playground — paste your workflow YAML and get instant feedback.

Debug Logging

Enable debug logging for any workflow run:

Go to the failed run → "Re-run all jobs" → check "Enable debug logging"
Or set repository variable ACTIONS_STEP_DEBUG to true (adds ##[debug] output to all steps)

`gh` CLI for Debugging

# List workflow runs
gh run list --workflow ci.yml

# View specific run logs
gh run view <run-id> --log

# Download logs for grep
gh run view <run-id> --log | grep 'error'

# List and delete caches
gh cache list
gh cache delete <id>

# Check workflow state
gh workflow list
gh workflow enable "Workflow Name"

Cross-Reference: Related Guides

If you're working with GitHub Actions in the context of platform engineering and DevOps automation, these related articles go deeper on specific patterns:

Lessons from 500 GitHub Migrations — enterprise-scale GitHub rollouts
Platform Engineering with GitHub — building internal developer platforms on GitHub
GitOps for Everything: Beyond Deployments — declarative infrastructure with Actions
GitHub Agentic Workflows: Hands-On Guide — automated workflows with GitHub Copilot
CI Monitor Extension: Agent CI Feedback Loop — automated CI debugging with AI agents

Resources

Every error message, workaround, and fix in this guide is sourced from real GitHub Issues, official documentation, and architecture decision records:

rhysd/actionlint — Static linter for GitHub Actions workflows (the canonical error message reference)
actions/runner Issues — Official runner bug tracker
actions/cache Tips & Workarounds — Official caching troubleshooting
actions/upload-artifact Migration Guide — v3 → v4 breaking changes
GitHub Actions Context Availability — Which contexts are available where
GitHub Actions Security Guides — GITHUB_TOKEN, OIDC, fork PR security
actions/runner ADRs — Architecture decisions explaining why limitations exist
GitHub Status — Check for infrastructure incidents before debugging

This guide covers the scenarios that have cost me and thousands of other developers the most debugging hours. If your specific error isn't here, open an issue or reach out on LinkedIn — I'll add it to the next update.

The Functional Options Pattern for AI Agent Composition

Hector Flores — Mon, 25 May 2026 18:52:25 +0000

Most AI agent APIs are turning into constructor soup. Add tools, then memory, then hooks, then approvals, then retries, then handoffs, then model settings, then telemetry, and suddenly your “simple” NewAgent(...) call looks like an archaeological dig through six months of product decisions.

Go solved this problem years ago. The functional options pattern is still one of the cleanest ways to build APIs that start simple, grow safely, and stay readable. After building AI Harness as a reference implementation for Harness as Code and writing about Harness as Code, I’m convinced the same pattern maps incredibly well to AI agent composition.

Not because agents are written in Go.

Because agents have the exact same shape of problem.

Why Go Reached for Functional Options in the First Place

Dave Cheney’s classic post on functional options for friendly APIs is still the best starting point. His argument was simple: constructor signatures get brittle fast when you keep adding optional behavior. Teams usually bounce through the same bad progression:

start with a nice small constructor
add more positional arguments
give up and introduce a config struct
end up passing zero values or nil just to say “use the default”

That works until it doesn’t.

Cheney called out the exact problems: poor discoverability, awkward defaults, nil or empty config values that exist only to satisfy the compiler, and APIs that become harder to evolve over time. His alternative was elegant: keep the default path tiny, and expose behavior through With* functions that mutate configuration in a controlled way.

That pattern didn’t stay theoretical. You can see the same shape across mature Go libraries:

gRPC exposes a whole surface of DialOption values such as WithSharedWriteBuffer, WithAuthority, and interceptor-related options.
go-containerregistry exposes Option functions like WithContext, WithPlatform, WithJobs, and WithRetryBackoff, including validation when an option is invalid.
Cheney’s own follow-up on refactoring his profiling package shows why the pattern is powerful: defaults got simpler, invalid combinations got easier to reason about, and the public API stopped growing every time a new capability appeared.

That is the real win.

Functional options are not a cute Go idiom. They’re a growth strategy for APIs.

AI Agents Have the Same API Growth Problem

The inevitable progression every agent SDK follows — from clean constructor to chaos. Functional options provide the exit ramp.

Look at what modern agent systems need to package.

According to OpenAI’s agent definitions guide, an agent can include a model, instructions, tools, handoffs, guardrails, approvals, structured output, and MCP-backed capabilities. OpenAI’s docs on orchestration and handoffs and guardrails and human review make the point even more clearly: the surface area of a real agent grows fast.

Anthropic has been saying something similar from the runtime side. In Building Effective Agents, their team argues that the most successful systems use simple, composable patterns instead of unnecessary framework complexity. In Effective harnesses for long-running agents, they describe the harness as the layer that helps agents keep making progress across many context windows.

That is exactly where functional options shine.

If your agent constructor looks like this, you already lost:

agent := NewAgent(
    model,
    instructions,
    tools,
    hooks,
    memory,
    approvals,
    maxTurns,
    retryPolicy,
    telemetry,
    handoffs,
)

Nobody remembers parameter seven. Nobody knows which ones are truly optional. And the next feature request guarantees the signature gets worse.

The functional-options version is much closer to how agent systems actually evolve:

type AgentOption func(*Agent)

func WithTool(t Tool) AgentOption {
    return func(a *Agent) {
        a.tools = append(a.tools, t)
    }
}

func WithHook(h Hook) AgentOption {
    return func(a *Agent) {
        a.hooks = append(a.hooks, h)
    }
}

func WithMemoryStore(m MemoryStore) AgentOption {
    return func(a *Agent) {
        a.memory = m
    }
}

func WithMaxTurns(n int) AgentOption {
    return func(a *Agent) {
        a.maxTurns = n
    }
}

func WithApprovalPolicy(p ApprovalPolicy) AgentOption {
    return func(a *Agent) {
        a.approvals = p
    }
}

func NewAgent(model Model, instructions string, opts ...AgentOption) *Agent {
    a := &Agent{
        model:        model,
        instructions: instructions,
        maxTurns:     12,
    }
    for _, opt := range opts {
        opt(a)
    }
    return a
}

Now the default path is obvious, and the advanced path reads like a sentence:

agent := NewAgent(
    claudeSonnet,
    researcherPrompt,
    WithTool(webSearch),
    WithTool(readFile),
    WithHook(preToolBudgetGuard),
    WithMemoryStore(sqliteMemory),
    WithApprovalPolicy(humanReviewOnShell),
    WithMaxTurns(20),
)

That is better API design, but more importantly, it is better architecture communication.

The Best Use of Options in Agents: Composition, Not Just Configuration

Each option represents a small architectural move — orthogonal concerns composing independently around a stable core.

This is the part I think most teams miss.

Functional options are often explained as a cleaner way to set fields. That’s true, but it undersells the pattern. For agent systems, the bigger payoff is that options become a composition language for behavior.

Each option can add or change a capability surface:

tools
middleware
pre/post tool hooks
approval gates
memory backends
model routing rules
telemetry sinks
retry policies
handoffs to specialist agents
context filters

In other words, options stop being “parameters” and become small architectural moves.

That maps cleanly to how I think about context engineering and harness design. You do not want one god constructor that knows every future behavior up front. You want a tiny core plus explicit composition points.

Why This Pattern Fits AI Harness Especially Well

AI Harness already leans into this direction in a very literal way.

Its artifact composer exposes a ComposeWith API backed by functional options in artifact/options.go. The options are not random toggles. They shape how composition behaves:

// Only active artifacts (default)
result, _ := composer.ComposeWith()

// Include inactive artifacts (debugging/observability)
result, _ := composer.ComposeWith(artifact.WithIncludeInactive())

// Filter by type
result, _ := composer.ComposeWith(artifact.WithTypeFilter(artifact.TypePlugin))

// Filter by tag
result, _ := composer.ComposeWith(artifact.WithTagFilter("governance"))

// Dynamic evaluation
result, _ := composer.ComposeWith(artifact.WithEvalFn(myEvalFn))

That is not just a nice API. It reflects the product thesis from the repo itself: keep the core small and make composition explicit.

The important part is what those options mean:

WithTypeFilter(...) says which artifact classes should participate
WithTagFilter(...) says which concerns matter for this composition pass
WithEvalFn(...) says composition is dynamic and state-aware, not just startup config
WithIncludeInactive() turns observability into a first-class debugging mode

That is exactly how I expect serious agent infrastructure to evolve.

Not through one bigger Config blob.

Through small, named composition decisions that can be combined on demand.

And once you pair that with the per-turn evaluation model, the pattern gets even more powerful: options can control not just static setup, but how the harness resolves behavior against live session state.

What This Looks Like in Production Agent Systems

Here’s the mental model I recommend.

Use functional options when you need to compose orthogonal behaviors around an agent runtime:

Concern	Good option shape
Tool access	`WithTool(...)`, `WithTools(...)`
Safety	`WithGuardrail(...)`, `WithApprovalPolicy(...)`
Runtime hooks	`WithPreToolHook(...)`, `WithPostToolHook(...)`
Model control	`WithModel(...)`, `WithTemperature(...)`
Memory	`WithMemoryStore(...)`, `WithSessionStore(...)`
Multi-agent routing	`WithHandoff(...)`, `WithDelegate(...)`
Observability	`WithTraceSink(...)`, `WithEventLogger(...)`
Failure handling	`WithRetryPolicy(...)`, `WithTimeout(...)`

That gives you three big advantages.

1. The default agent stays readable

This matters more than people admit. If the simple case is ugly, teams build wrappers immediately, and now you have two APIs to maintain.

2. Advanced behavior stays discoverable

A well-named WithApprovalPolicy(...) is much easier to understand than “argument #8 is optional unless argument #6 is nil.”

3. New capabilities stop breaking existing code

That was the original Go motivation, and it matters even more for agent platforms where the capability surface keeps expanding every quarter.

Where Functional Options Can Go Wrong

Four anti-patterns to watch for — even clean APIs can hide complexity if options aren't designed carefully.

I like this pattern a lot, but it is not magic.

There are a few failure modes worth calling out.

Hidden side effects

If WithMemoryStore(...) quietly enables background persistence, telemetry, and retries, the API stops being honest. Options should be compositional, not surprising.

Order-sensitive behavior

If WithTool(A) followed by WithTool(B) means something different than the reverse order, document it aggressively. Better yet, design around deterministic merge rules.

No validation layer

One reason I like the go-containerregistry version is that its Option type returns an error. That gives the library a clean way to reject invalid combinations such as contradictory auth configuration. Agent systems need the same discipline for incompatible memory backends, mutually exclusive approval modes, or impossible retry settings.

Config blob in disguise

If every option just writes into one giant unstructured struct, you may have improved readability without improving architecture. The best options expose meaningful seams in the system.

That is why I prefer options that represent real agent concerns instead of raw field mutation.

The Bigger Point: This Is a Harness Pattern

I don’t think the functional options pattern is just a nicer constructor trick for AI.

I think it is one of the cleanest ways to express a deeper idea: agent behavior should be composed at the harness layer, not buried in application glue or bloated prompts.

That lines up with everything I’ve been arguing about harness engineering:

keep the core small
make behavior explicit
let governance compose cleanly
make runtime decisions inspectable
avoid monolithic prompt/config blobs

Go developers learned this lesson because APIs kept growing. Agent builders are about to learn the same lesson because runtimes keep growing.

The teams that win here won’t be the ones with the fanciest prompt. They’ll be the ones with the cleanest composition model.

The Bottom Line

The functional options pattern gives you a cleaner way to build agent APIs, but that undersells it.

What it really gives you is a discipline for composing agent behavior: tools, hooks, memory, guardrails, routing, and observability as named, reusable moves instead of constructor chaos.

That is why I think this pattern belongs in every serious conversation about agent architecture.

Go figured out how to keep fast-moving APIs friendly. AI agent platforms should steal that idea immediately.

Per-Turn Evaluation: Dynamic Governance for AI Agents

Hector Flores — Mon, 25 May 2026 11:18:54 +0000

Static governance is fine right up until your agent changes modes mid-session. The same agent can spend turn 1 researching docs, turn 8 editing code, turn 14 fixing failed tests, and turn 20 preparing a production deploy. Pretending one startup-time config should govern all of that is the harness equivalent of hardcoding production policy into a shell alias.

That is why I'm increasingly convinced that per-turn evaluation needs to be a first-class primitive in agentic systems. If you're serious about governed autonomy, you need the harness to ask a fresh question at the start of every turn: given the current state, which rules should be active right now?

This is a core idea behind what I call Harness as Code. And in AI Harness, per-turn artifact evaluation is implemented as a runtime feature in v0.4.0, following the design described in issue #7 for per-turn artifact evaluation.

Why Static Rules Aren't Enough

Static governance overloads context and under-governs risk. Per-turn evaluation adapts rules to what the agent is actually doing right now.

A lot of agent stacks still treat governance as startup configuration: load the prompt, register the tools, inject the rules, and go. That works for short-lived demos. It gets shaky fast in long-running or multi-phase sessions.

There are three failure modes I keep seeing:

You over-load the context window. Every possible rule ships on every turn, even when most of them are irrelevant.
You under-govern risky phases. The same loose rules that were fine during research stay active during writes, approvals, or deployment.
You mix policy with prompt hacks. Instead of the harness making deterministic decisions, the model gets a giant wall of “if you're doing X, remember Y.”

That last part is the killer. Once governance lives primarily inside prompts, you lose clean separation between policy and reasoning. You also lose the discipline that policy systems like Open Policy Agent were built around: keep decisions declarative, versioned, and evaluated against current input.

The better analogy is feature toggles. Martin Fowler's framing still holds: you define behavior once, then let runtime context determine which path is active. AI agents need the same pattern for governance.

What Per-Turn Evaluation Actually Means

The evaluation loop runs at every turn boundary — gathering state, evaluating conditions, and composing only active governance artifacts.

Per-turn evaluation is simple in concept:

The agent loop starts a new turn.
The harness gathers live state for that turn.
Every conditional artifact is re-evaluated against that state.
Only the active artifacts participate in context composition.

The crucial shift is this: governance becomes a function of state, not a snapshot captured at startup.

That state can include things like:

turn: 14
mode: "implementation"
active_files: ["artifact/composer.go"]
error_count: 1
tools_called: ["read_file", "edit_file", "run_tests"]

Now your harness can make deterministic decisions such as:

enable stricter review guidance after repeated failures
load Go-specific conventions only when Go files are active
apply extra deployment guardrails only when the agent enters a production path
switch to more concise context after a long session to protect the token budget

That is dramatically more precise than one giant prompt trying to anticipate every future branch of execution.

Why Starlark Fits This Problem

For conditional governance, I want something declarative and constrained. Starlark's specification and Bazel's language overview make it a strong fit for this kind of work: it is Python-like enough to read quickly, intentionally restricted, deterministic, and designed around predictable evaluation with strong immutability bias.

That matters. A governance condition language should not be a hidden side-effect engine. It should evaluate expressions against input and return a decision.

Here's the kind of artifact-level condition AI Harness supports:

---
name: production-guard
type: override
priority: 100
condition: 'ctx.get("mode") == "production" and ctx.get("turn", 0) > 5'
---
# Production Guard

Require post-write verification.
Block destructive shortcuts.
Confirm the target before deployment actions.

I like this model because the rule is local to the artifact. You don't have to open the runtime and add another hardcoded if statement. You define the condition where the behavior lives.

How AI Harness Implements Dynamic Governance

In AI Harness's current v0.4.0 implementation, per-turn evaluation is not a blog concept. It's wired into the runtime.

At the start of Agent.Run, the loop creates turn-scoped state, increments the turn counter, and sets the current turn in that scratchpad before the model does any work.

a.turnNumber++
scripting.SetTurnState(turnCtx, "turn", a.turnNumber)

if a.composer != nil {
    if err := a.composer.EvaluateConditions(turnCtx); err != nil {
        a.logger.Printf("WARN condition re-evaluation failed: %v", err)
    }
}

That call flows into Composer.EvaluateConditions, which reads the live values from the per-turn scratchpad via TurnStateValues and evaluates each artifact condition against the current turn context.

The registry then updates each artifact's Active field through Registry.UpdateConditions. That's an important implementation detail, because it makes activation status part of the artifact model itself rather than an external side table.

func (r *Registry) UpdateConditions(evalFn func(condition string) (bool, error)) error {
    // re-evaluates every artifact and updates Active in place
}

Even better, the failure mode is sane: per-artifact condition errors are non-fatal. If one expression is malformed, the whole session does not implode. The registry keeps evaluating the rest and preserves the prior active state for the broken artifact. That's the kind of degradation behavior you want in a production harness.

This design also plays nicely with AI Harness's typed artifact model and composition order:

override (100) > harness (80) > builtin (60) > plugin (40) > model (20)

So the runtime isn't just deciding what is active each turn. It's also deciding how active artifacts compose when they conflict.

Patterns This Unlocks

Per-turn evaluation makes four powerful governance patterns practical: escalation after failures, lazy-loading conventions, risk-proportional controls, and automatic concise mode.

Once governance is evaluated per-turn, a bunch of useful patterns stop being awkward.

Progressive escalation

After repeated failures, activate a recovery artifact that tells the agent to stop retrying blindly and explain what changed.

Phase-aware context

Load language or workflow conventions only when the agent is actually operating in that phase. That is the governance equivalent of lazy loading.

Risk-proportional controls

Keep early research turns lightweight, then tighten verification and approval rules when the session crosses into write-heavy or deployment-heavy work.

Token-aware governance

Long sessions can activate concise-mode artifacts that reduce exploration and prioritize completion before the context window gets messy.

This is also why per-turn evaluation pairs so well with context observability. If you are going to make governance dynamic, you need to be able to inspect which artifacts were active, which were inactive, and why.

Why This Is Better Than Prompt Conditionals

Could you write a giant system prompt that says, “if you are deploying, be more careful”? Sure.

I don't think that's governance.

That's advice.

In my view, real governance means the harness decides what the model is allowed to see and what policy surfaces are active before the next reasoning step begins. The model should not be responsible for remembering which governance branch applies. The harness should.

That's the same reason I'm more interested in governance architectures than in ever-bigger prompts. Prompts are necessary. But when they become your only control plane, you're still building too much of the system on vibes.

The Bigger Shift

Per-turn evaluation is one of those ideas that sounds small until you realize it changes the entire posture of the system.

Instead of asking, “What rules should this agent always have?” you start asking, “What rules should be true now?”

That is a much better question for long-running, stateful, tool-using agents.

It's also a cleaner path toward the broader discipline I care about: harness engineering. The same way DevOps normalized pipelines, policy, observability, and infrastructure definitions as real engineering surfaces, agent systems need their own equivalent control plane. Dynamic governance is part of that.

If you're building agents today, my recommendation is straightforward:

keep the static core small
move situational rules into conditional artifacts
re-evaluate those artifacts every turn
make activation observable
make failures degrade gracefully rather than crash the session

That's the practical lesson behind AI Harness so far. And it's why I think per-turn evaluation is going to look obvious in hindsight.

Not because it's flashy — but because for serious, stateful agent systems, static governance was rarely going to be enough.

What Is Harness as Code? The DevOps of AI Agents

Hector Flores — Mon, 25 May 2026 00:56:16 +0000

Most teams are still treating agent behavior like handcrafted prompt art. That works right up until the agent gets real tool access, starts touching production systems, or needs to behave consistently across repos, environments, and sessions.

That's where Harness as Code comes in.

The short version: Harness as Code applies the same ideas that made Infrastructure as Code practical and scalable to AI agents. Instead of hiding governance inside application code or hoping a giant system prompt keeps your agent safe, you define the harness itself as version-controlled, reviewable, testable artifacts.

From DevOps to Agent Governance: same principles, new domain

The Problem Prompt Engineering Can't Solve

Anthropic has been explicit that harness design matters for long-running agents. In its engineering write-up on effective harnesses for long-running agents, the company describes the harness as the layer that helps agents keep making progress across multiple context windows. In its earlier post on building effective agents, Anthropic also argues that simple, composable patterns beat unnecessary framework complexity.

I think that's the right direction, but the industry still underspecifies one crucial idea: the harness should be code, not folklore.

Once agents move beyond toy demos, every team hits the same questions:

How do I control what tools an agent can call?
How do I review behavior changes in pull requests?
How do I reproduce the same governance in another repo or environment?
How do I know what context the agent actually saw on turn 27?
How do I test that my guardrails work before trusting the agent with real autonomy?

If the answer is "we have a really good prompt," you don't have governance. You have hope.

What Harness as Code Actually Means

HashiCorp defines Infrastructure as Code as a declarative, version-controlled way to define systems you can review, test, and automate. Harness as Code takes that same mental model and applies it to agent runtime behavior.

For me, a system only qualifies as Harness as Code if it gives you these properties:

Declarative — behavior is defined in files, not buried in runtime branches
Versioned — harness changes go through Git like any other engineering change
Reviewable — permissions, hooks, and context rules show up in diffs
Composable — you can layer capabilities without rewriting the core
Observable — you can inspect what was active and why
Testable — you can validate behavior in CI instead of relying on vibes
Portable — the harness survives model churn and vendor churn

That is the big leap. The prompt stops being the product. The harness becomes the product.

Why This Matters

The DevOps parallel is real.

DevOps gave us	Harness as Code gives agents
Infrastructure as Code	Agent governance as code
CI/CD gates	Approval and autonomy gates
RBAC / least privilege	Tool access boundaries
Build pipelines	Agent loops with retries and termination
Observability	Context provenance and event trails
Hooks and policy checks	Pre-tool and post-tool governance

This matters because the failure mode for agents is rarely raw model quality. It is almost always control-plane quality.

An agent fails because it had the wrong tools, the wrong context, no retry policy, no safety hook, no way to explain its current state, or no clean boundary between static identity and dynamic runtime behavior. That's a harness problem.

If you care about context engineering, this is the missing operational layer. Context engineering decides what the model should see. Harness as Code decides how that decision is defined, evaluated, audited, and evolved over time.

How It's Different From Existing Approaches

A lot of current agent tooling is useful. I use and study these systems constantly. But they're optimized around different centers of gravity.

GitHub Copilot cloud agent is optimized around GitHub-native repo work in a GitHub-hosted environment.
OpenAI's Agents SDK is optimized around code-first orchestration, tools, guardrails, and state inside your application.
OpenAI Sandbox Agents cleanly split harness and compute, which is an important architectural move.
Pi is one of the strongest examples of a minimal terminal coding harness, extended through TypeScript extensions, skills, prompt templates, themes, packages, and multiple runtime surfaces.

Those are real strengths. But Harness as Code has a different bias: extensibility through portable governance artifacts.

That means the center of the system should stay tiny while the edges get powerful. It also means behavior should not require rewriting the runtime every time you want a new rule. Add an artifact. Add a hook. Add a condition. Review the diff. Re-run validation. Ship.

That's a very different philosophy from both prompt-heavy setups and batteries-included mega-frameworks.

How AI Harness Implements Harness as Code

AI Harness is my reference implementation of this idea. The repo tagline says it plainly: declarative AI agent governance in Go.

Here's how the product makes Harness as Code concrete.

1. Markdown-first control plane

The harness starts with harness.md and a .harness/ directory tree. Identity, tools, hooks, and sub-agents are defined as files you can diff, review, and move between projects.

That sounds simple, but it's the whole point. The harness isn't hidden behind a SaaS UI or locked into a provider-specific workflow. It lives in the repo, next to the code it governs.

2. Typed artifacts instead of loose files

AI Harness doesn't treat all context as an undifferentiated blob. It introduces a typed artifact model with explicit precedence:

override (100) > harness (80) > builtin (60) > plugin (40) > model (20)

Typed Artifact Precedence: deterministic composition, not accidental overrides

That gives each capability a declared role, a priority, and composition semantics. Instead of asking, "Why did this rule win?" you can answer it deterministically.

This is one of the key differences between generic file-based customization and real Harness as Code. Composition is not accidental. It's designed.

3. Per-turn evaluation, not startup-only config

AI Harness evaluates artifact conditions every turn. If an artifact says it should only activate in review mode, after multiple errors, or once the session reaches a certain phase, the runtime reevaluates that condition continuously.

The implementation uses Starlark for those conditional expressions, which keeps the language familiar and constrained while making the runtime dynamic.

That means governance can evolve with the session:

an error-recovery artifact can activate after repeated failures
a language-specific ruleset can appear only when relevant files are active
a stricter override can kick in when risk increases

That's the difference between static config and living governance. If you want the deeper implementation details, I wrote a separate breakdown on per-turn evaluation and dynamic governance.

Per-Turn Evaluation: static config is hope — per-turn evaluation is engineering

4. Context observability as a first-class feature

This is the feature I think most harnesses still underinvest in.

AI Harness ships harness context so you can inspect what the agent sees, where each section came from, which artifacts are active, which are inactive, and how much of your token budget is already gone.

That matters because agent behavior is downstream of context state. If you can't inspect context composition, you're debugging a black box.

5. A tiny core with powerful edges

The repo's philosophy is simple: keep the core tiny and make the edges powerful.

AI Harness already ships commands like:

go install github.com/htekdev/ai-harness/cmd/harness@latest
harness init my-agent
harness validate
harness artifacts --verbose
harness context --verbose

That command set reflects the product thesis. Scaffold fast. Validate fast. Inspect the harness. Inspect the active context. Don't bury governance inside a maze of framework internals.

Where I Think This Goes Next

I don't think "harness engineering" is a side topic. I think it becomes its own discipline.

The same way teams eventually stopped debating whether infrastructure should be hand-managed, teams will stop debating whether agent behavior should live in undocumented prompt glue. They'll expect agent governance to be:

versioned
inspectable
testable
composable
vendor-portable

That's why I keep framing Harness as Code as the DevOps of AI agents. It's not about replacing good prompts. It's about putting prompts in their proper place: as one input inside a larger, engineered runtime.

If you want the broader market landscape, read my live comparison of agent harnesses. If you want the product implementation, start with the AI Harness repository and treat it as a working reference, not just a pitch.

The Bottom Line

Harness as Code is the shift from "trust the model" to "trust the system around the model."

That's the move from prompts as governance to architecture as governance. And once agents start doing real work in real environments, I don't think that move is optional.

Your model is not your control plane. Your harness is.

Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge

Hector Flores — Fri, 22 May 2026 21:35:21 +0000

Most Teams Are Still Using 5% of Copilot

Most developers still treat GitHub Copilot like a very good autocomplete engine. That's useful, but it's not the real unlock.

The interesting shift happens when Copilot stops acting like a generic assistant and starts acting like a domain-expert teammate. Instead of re-explaining your deployment rules, your content pipeline, or your release checklist every session, you package that expertise once. Then Copilot shows up already knowing the job.

That's the difference between using Copilot and building with Copilot. One gives you better suggestions. The other gives you reusable specialists that understand your repo, your patterns, and your operating model.

Want the complete agent manifest, copilot-instructions.md, YAML skill files, and MCP integration code? It's all in Newsletter Issue #11 → subscribe here

What I Mean by a Custom Copilot Agent

At a high level, a custom Copilot agent is a packaged specialization layer for GitHub Copilot. It gives Copilot a clear identity, focused instructions, and the right tools for a specific domain.

GitHub's customization story already points in this direction. You can customize Copilot for your project, add knowledge bases, and connect MCP servers to Copilot CLI. I think of a custom agent as the point where those ideas converge into one opinionated package.

The Model Context Protocol gives Copilot a standard way to reach external systems and tools. Your agent design decides which tools matter, which context belongs in memory, and which workflows are worth encoding.

The 3-Layer Architecture

The easiest way to think about custom agent architecture is as three layers working together:

Agent profile layer — the identity declaration.
This is the small config surface that says: this Copilot specialist owns this domain, responds to these triggers, and should load this knowledge. In practice, that's an agent manifest in an .agent.md file, often paired with copilot-instructions.md when the runtime needs repository-wide guidance.
Skills layer — the structured prompts.
This is where repeatable expertise lives. A skill isn't vague guidance. It's a reusable procedure: what to check, what to avoid, what sequence to follow, and what "done" looks like. I've written before about this in Agent Skills: Microsoft Just Shipped What You've Been Building.
Tools layer — the execution boundary.
This is where the agent gets hands. MCP lets Copilot reach beyond text and interact with real systems. That might mean GitHub workflows, a video pipeline, internal APIs, or a governed task system.

If you've read The Three Layers Your AI Agent Is Missing, this should feel familiar. The point is separation of concerns. The agent profile says who the specialist is. Skills say how it should behave. Tools define what it can actually do.

Newsletter subscribers get the full 3-layer custom agent architecture with real TypeScript, configs, and production patterns.

In Issue #11 I share the actual implementation details: the agent manifest pattern, the copilot-instructions.md setup, the YAML skill layout, and the MCP integration shape I use to turn Copilot into domain-specific teammates instead of generic chat sessions.

Two Production Examples

Here are two real patterns from production that made this click for me.

1. A DevOps Copilot Specialist

One custom agent pattern I keep coming back to is a DevOps-focused Copilot specialist.

The domain is narrow but deep: release prep, workflow governance, branch rules, dependency checks, and CI visibility. The agent profile establishes the role, the skills encode the repeatable procedures, and the tools expose the right capabilities to inspect workflows and surface the next action.

That means I don't start every session with "here are our branch rules, here is how we label releases, here is what counts as a blocker." Copilot starts there already.

2. A Vidpipe Media Workflow Specialist

The second example is from my media pipeline.

A video workflow has a lot of hidden knowledge: ingestion steps, transcript expectations, caption rules, retry paths, and publishing handoffs.

A custom agent turns that operational knowledge into a reusable asset. The skills explain the workflow stages, the tools expose pipeline state, and the agent profile keeps the agent locked into the right role.

And those two custom agents are only the teaser. The third production pattern in Issue #11 is the meta one: a custom agent that helps me scaffold more custom agents faster.

When to Build a Custom Agent vs. Use Skills Directly

Not every recurring workflow needs a full custom agent.

If the problem is mostly procedural, start with skills. Skills are the cheapest leverage point. They let you capture repeatable know-how without adding a new identity surface or tool boundary.

Build a full custom agent when all three signals show up:

you keep repeating the same domain context in session after session
the domain needs its own toolset, not just better instructions
the work benefits from a clear specialist identity instead of a generic assistant persona

My rule of thumb is simple: if Copilot only needs a better playbook, write a skill. If Copilot needs a job title, a toolkit, and a memory of how your team works, build a custom agent.

That line matters because agent sprawl is real. A skill library can stay lightweight. A custom agent should earn its existence.

Where the Deep Dive Lives

This article is deliberately the overview.

The full agent manifest, copilot-instructions.md, YAML examples, and TypeScript MCP integration is in Issue #11. That's where I walk through the actual layering, show the production examples in more detail, and explain why this architecture compounds once you have more than one specialist running.

If this topic connects with the rest of your platform work, the next stop after the newsletter is The Agentic Development Blueprint. It connects custom agent architecture to the bigger system: context engineering, guardrails, workflows, and governance.

You should also read the surrounding pieces if you want the bigger picture:

The Bottom Line

The teams getting the biggest lift from Copilot are not the ones asking better one-off questions. They're the ones turning Copilot into reusable specialists that understand their actual environment.

This was the overview. Newsletter Issue #11 has the step-by-step implementation with real files from 3 production custom agents → Subscribe at htek.dev/newsletter

Copilot Plugins: Building Domain-Expert AI Teammates

Hector Flores — Fri, 22 May 2026 20:58:13 +0000

Most Developers Are Still Using Copilot at the Shallow End

Most developers use GitHub Copilot like a faster autocomplete engine. Useful, yes — but still the shallow end of the pool.

The bigger opportunity is building domain-expert plugins: packages that give Copilot a clear identity, specialized knowledge, and real tools it can use on your behalf. That's when the experience changes from "I ask Copilot for help" to "I built a Copilot teammate that understands my world."

If you think in terms of VS Code extensions or chat participants, the mental model is similar: package context, behavior, and capability into a specialist that shows up with judgment built in.

Want the complete implementation? Subscribe to the newsletter →

A Copilot Plugin Is Identity + Knowledge + Actions

The pattern is simple once you see it. A strong plugin has three layers:

Identity — what the plugin is, when it should activate, and what components it includes.
Knowledge — reusable domain expertise encoded as skills.
Actions — MCP-connected tools that let the plugin do work instead of just talk about it.

That combination is what turns a generic assistant into a specialist.

I've been using this pattern across production systems because it solves a real problem: you stop re-explaining the same domain context every session. Instead of reminding Copilot about your release process, video pipeline, or internal conventions over and over, you package that knowledge once and let the runtime load it when relevant.

`plugin.json` Is the Identity Card

Every plugin starts with plugin.json. GitHub's plugin docs are explicit: a Copilot CLI plugin must include a manifest at the root, and that manifest tells the runtime what this package is and where its pieces live.

That sounds small, but it's the architectural pivot.

plugin.json is the plugin's identity card. It declares the name, description, metadata, and the paths to things like skills and MCP configuration. In practice, that means your plugin stops being "some loose prompt files in a repo" and becomes a portable capability you can install, share, and evolve.

GitHub's own docs on creating Copilot CLI plugins show the expected structure clearly: a root manifest, optional agents/, optional skills/, optional hooks, and optional .mcp.json.

Skills Encode Human Expertise into Reusable Capability

The second layer is where the plugin gets smart.

A skill is a SKILL.md file with YAML frontmatter. GitHub's skills docs make two things clear: the frontmatter describes what the skill does and when to use it, and the body contains the actual instructions, examples, and guidance.

That's more important than it sounds.

A lot of teams still treat AI behavior as a giant blob of prompt text. Skills are better because they package expertise at the workflow level: release prep, video captioning rules, GitHub Actions review, whatever your domain needs.

That means domain knowledge becomes:

Structured instead of scattered
Reusable instead of copy-pasted
Composable instead of monolithic

I wrote about this broader shift in Agent Skills: Microsoft Just Shipped What You've Been Building. The short version is simple: skills are how you scale judgment without turning your agent into a bloated mess.

And this isn't limited to one surface. GitHub's docs explicitly note that skills work with the Copilot cloud agent, Copilot CLI, and agent mode in VS Code. That's a big deal. You aren't building a one-off hack for one interface. You're building reusable capability for the Copilot runtime.

Newsletter subscribers get the real configs, code, and architecture files. The full issue includes the exact manifest patterns, production SKILL.md structure, and the wiring that makes these plugins behave like specialists instead of glorified prompts. Get the deep dive →

MCP Gives Plugins Hands

Knowledge alone is not enough. A useful AI teammate needs the ability to act.

That's where Model Context Protocol changes the game. MCP gives plugins a structured way to expose tools the model can call: query APIs, inspect state, kick off workflows, fetch artifacts, or validate outputs.

This is the line between an assistant that says, "You should probably check the pipeline," and one that actually checks the pipeline, reads the result, and tells you what matters.

Once you combine:

a manifest that establishes identity,
skills that encode judgment, and
MCP tools that expose actions,

...you stop building assistants that merely explain work. You start building teammates that participate in it.

If you want the broader architecture behind that jump, it's tightly connected to the patterns I broke down in GitHub Copilot CLI Extensions: The Complete Guide and The Three Layers Your AI Agent Is Missing.

Three Production Plugins That Prove the Pattern

This isn't hypothetical. The pattern already holds across three public repos:

1. `htekdev/devops-copilot-skill`

This one is the DevOps Workflow Orchestrator: repo health, release prep, workflow linting, dependency audit, and migration readiness. It shows what happens when Copilot starts understanding your operational workflow as a system.

2. `htekdev/vidpipe-copilot-plugin`

This plugin specializes in media production: video analysis, FFmpeg editing workflows, silence removal, captions, and multi-platform output generation. Same pattern, completely different domain. That's the point. Once the architecture is right, the domain can change and the shape still holds.

3. `htekdev/copilot-plugin-skill`

This is the meta move: a plugin for building plugins. It packages the conventions, templates, and hard-won lessons required to scaffold new Copilot capabilities faster. That repo is proof that once you figure out the pattern, you can teach Copilot to reproduce it.

Taken together, these three examples make the case clearly: the progression is not just "I use Copilot." It's "I build on Copilot."

Why This Matters More Than Another Prompt Trick

Prompt tricks are fragile. Domain architecture scales.

Plugins let you move important behavior out of ephemeral chat context and into versioned, inspectable files. Your team can review them, evolve them, reuse them, and ship them.

If you're serious about building agentic systems, this is exactly why I created The Agentic Development Blueprint. It pairs naturally with the newsletter issue if you want both the strategic model and the practical build path.

If your team wants help applying these patterns to real delivery workflows, infrastructure automation, or internal platform tooling, my consulting services are built for that kind of work.

This was the overview. The newsletter issue has the step-by-step implementation → Subscribe

Platform Engineering with GitHub: How to Build an Internal Developer Platform Using Copilot, IssueOps, and Golden-Path Starter Repos

Hector Flores — Thu, 21 May 2026 13:58:30 +0000

The platform engineering movement is accelerating — and most teams are building it wrong.

They're adopting Backstage, standing up Kubernetes operators, hiring dedicated portal teams, and spending 6-12 months before delivering any real value to developers. Meanwhile, the actual developer platform — the thing engineers use every single day — is sitting right in front of them.

It's GitHub.

I built an enterprise-scale internal developer platform at a Fortune 500 energy company. Thousands of developers, hundreds of repos, strict compliance requirements. We didn't need a separate portal because GitHub already IS the platform — the service catalog, the self-service automation, the golden paths, the governance layer. All native primitives, composed together.

Here's the overview of how that architecture works — and the 7 open-source repos that make it real.

Want the complete implementation? This article covers the architecture and approach. The full step-by-step guide — with IssueOps workflows, Copilot extension code, and hookflow governance configs — lives in Issue 008 of the htek.dev newsletter. Subscribe to get it →

The Platform Engineering Movement Is Real — But the Tooling Is Wrong

Platform engineering isn't a trend. It's a structural response to a measurable problem: developer teams spend 30% of their time on operational tasks instead of shipping features. The CNCF Platform Engineering Maturity Model formalized this, and Team Topologies gave us the vocabulary — platform teams exist to reduce cognitive load for stream-aligned teams.

But the industry made a wrong turn. Everyone assumed platform engineering meant Backstage.

Backstage is powerful — and it's also a React application requiring a PostgreSQL database, a dedicated team to maintain, a plugin ecosystem with varying quality, and months of customization. The 2025 Backstage Adopter Survey showed most teams take 6-12 months to reach "useful." Many never get there.

If your developers already live in GitHub — PRs, issues, Actions, Codespaces — why send them to a separate portal?

The 4 Capabilities Every IDP Needs

Every internal developer platform, regardless of implementation, needs four core capabilities:

1. Self-Service Provisioning

Developers need to spin up environments, repos, and infrastructure without filing tickets. On GitHub, this means IssueOps — open an issue with a structured template, and GitHub Actions provisions everything automatically.

2. Golden-Path Templates

Opinionated, well-lit routes through your stack. Developers can deviate — but the default path is fast, correct, and maintained. On GitHub? These are starter repos with embedded AI context via copilot-instructions.md.

3. Governance & Guardrails

Policy enforcement that doesn't require developers to remember rules. On GitHub, this is Copilot hooks and extensions — guardrails that intercept dangerous operations before they happen, not after.

4. Unified Developer Experience

One place where developers see their services, their environments, their compliance status. On GitHub, you already have the repo as the unit of ownership — what's missing is the composition layer, which GitHub Copilot extensions now provide.

Why GitHub-Native Wins Over Backstage

The fundamental advantage: zero adoption friction.

Your developers are already authenticated to GitHub. They already know how to open issues, review PRs, and trigger Actions. A GitHub-native IDP doesn't require new logins, new UIs, new mental models, or new SSO integrations.

Here's the architectural comparison:

Capability	Backstage Approach	GitHub-Native Approach
Service catalog	Custom plugins + PostgreSQL	Repository topics + `CODEOWNERS` + org-level metadata
Self-service	Software templates + scaffolder	IssueOps + reusable Actions
Golden paths	Template catalog	Starter repos + copilot-instructions
Governance	TechDocs + manual reviews	Copilot hooks + required workflows
Developer UI	Custom React portal	GitHub UI + Copilot chat extensions

The GitHub-native approach isn't "less capable." It's differently capable — and it ships in weeks, not months.

The 7 Starter Repos

I've open-sourced the building blocks as 7 starter repositories. Each handles one piece of the IDP puzzle:

copilot-instructions-starter — Golden-path context engineering templates that shape how AI agents interact with your codebase
copilot-hooks-starter — Hook configurations and safety guardrails for controlling what AI agents can and cannot do
copilot-agent-starter — Multi-agent delegation patterns and orchestration templates for PR review, deployment, and triage
issueops-starter — Self-service provisioning workflows triggered by structured issue templates
github-governance-starter — Organization-wide policy enforcement via required workflows, rulesets, and compliance checks
platform-catalog-starter — Service catalog metadata conventions using repository topics, custom properties, and CODEOWNERS
golden-path-app-starter — A complete application template wired with all of the above — the "new project" button for your platform

Each repo is standalone but designed to compose. The platform team maintains the starters; stream-aligned teams consume them through gh repo create --template.

Newsletter subscribers get the full implementation details — complete IssueOps workflow YAML, Copilot extension source code, hookflow governance configs, and the composition patterns that wire all 7 repos together. Subscribe to Issue 008 →

How This Connects to Context Engineering

If you've read my piece on context engineering, you already understand the core insight: the quality of AI output is determined by the context you provide, not the prompts you write.

Golden-path starter repos are context engineering at the organizational level. Every new repo created from a starter inherits:

Architecture decisions (via copilot-instructions.md)
Safety guardrails (via hook configurations)
Governance rules (via required workflows)
Agent behaviors (via agent definitions)

This is what I call the three layers your AI agent is missing — scaled to the platform level. And it ties directly into the governance stack I wrote about recently.

The Blueprint: Part 4 — Platform Engineering

I'm releasing a new Part 4: Platform Engineering chapter in The Agentic Development Blueprint. It covers the full architecture — from IssueOps provisioning flows to Copilot extension development to hookflow governance patterns — with production-ready code and configuration you can deploy this week.

If you're already running the blueprint patterns from Parts 1-3 (agent harnesses, multi-agent orchestration, context engineering), Part 4 shows how to scale those patterns across your entire organization as a platform team.

The Bottom Line

You don't need Backstage. You don't need a 6-month implementation timeline. You don't need a dedicated React portal team.

You need GitHub — which you already have — composed with IssueOps, golden-path starters, Copilot extensions, and hook-based governance. The platform is already there. You just need to wire it together.

This was the overview. The newsletter issue has the full step-by-step implementation — complete IssueOps workflows, Copilot extension code, all 7 repos explained in depth, and the composition patterns that make them work together.

→ Subscribe to the htek.dev newsletter to get Issue 008

Related reading:

GitOps for Everything: The *-as-Code Revolution That Changes How You Ship, Govern, and Scale

Hector Flores — Wed, 20 May 2026 23:41:37 +0000

The *-as-Code Pattern Is Eating Operations

Every major operational discipline has gone through the same evolution: manual clicks in a dashboard → scripts in a wiki → declarative code in a Git repository, enforced through CI/CD. It happened to infrastructure. Then policy. Then identity. Then documentation.

The pattern that wins every time: manual → scripts → declarative code in Git.

The pattern keeps winning because it always delivers the same four things: automation, repeatability, reliability, and audit trail. Once you define something as code and apply GitOps to it, you gain PR-based review, rollback on merge revert, blame for forensics, and branch protection as a governance gate. It's not clever — it's structural leverage.

And the family keeps growing. Here's the landscape in 2026 — and why the newest member, Harness as Code, might be the most important addition yet.

The *-as-Code Family

Infrastructure as Code

The one that started it all. Terraform, Pulumi, OpenTofu, AWS CDK, Azure Bicep — declare your compute, networking, and platform services in version-controlled files. IaC eliminated server snowflakes. Every environment is reproducible from a single source of truth. Drift detection catches unauthorized changes. PRs become infrastructure review gates.

In 2026, this is table stakes.

Policy as Code

Tools like Open Policy Agent (OPA) and HashiCorp Sentinel let you express governance rules — security boundaries, compliance requirements, cost controls — as testable, enforceable code.

Instead of "don't deploy public S3 buckets" being a wiki page someone ignores, it becomes a Rego policy that blocks the Terraform plan. Instead of "all containers must run as non-root" being a Slack reminder, it becomes a Kyverno policy that rejects the admission. The policy is version-controlled, reviewed via PR, and enforced automatically. Shift-left governance that actually works.

Identity as Code

SSO configurations, user group memberships, service account definitions, federation trust relationships — these traditionally live in admin consoles where changes are invisible and unauditable. Identity as Code means managing your Okta, Azure Entra ID, or Keycloak configurations through Terraform providers or declarative YAML. Onboarding a new team? That's a PR that adds them to the right groups with the right app assignments. Offboarding? A PR that revokes.

Access as Code

IAM policies, RBAC role definitions, permission sets, and authorization rules — declared as code rather than clicked through consoles. AWS IAM policies in Terraform. Kubernetes RBAC manifests in Git. Authorization logic expressed in Cedar or OPA rather than scattered if statements across microservices.

When your access rules are code, you can answer "who has access to what and why?" by reading a repo instead of auditing twelve different admin panels.

Management as Code

Runbooks, escalation matrices, incident response playbooks — versioned in Git rather than tribal knowledge in someone's head. On-call rotations as YAML. Incident response as Markdown with automated triggers. Consistency across shifts, updates through review, history preserved.

Docs as Code

Documentation built from Markdown/MDX in Git repos, deployed through CI/CD, reviewed via PRs. Tools like Docusaurus, MkDocs, and Fern make this seamless. When docs are code, they stay in sync because updating them is part of the same PR that changes the system.

Harness as Code — The Newest Member

Here's where it gets interesting. Every -as-code pattern above governs a **technical* surface — servers, policies, identities, documentation. But what governs the behavioral surface of autonomous AI agents?

Harness as Code is the practice of defining AI agent governance — decision boundaries, autonomy levels, communication rules, escalation paths, tool permissions, and safety constraints — as declarative files in a Git repository, enforced through the same PR-review-merge workflow as everything else.

It's the natural next step. If your infrastructure changes require a reviewed PR, why would your AI agent's behavioral boundaries be defined in an unversioned prompt that someone edits directly? That's the equivalent of manually configuring a production server in 2010.

Harness as Code means agent instructions, constitutions, skill definitions, and governance rules all live in Git. Change an agent's autonomy level? PR. Expand what tools an agent can access? PR. Every behavioral change goes through code review, gets attributed, and is reversible.

I cover the full 7-layer governance architecture in a separate piece — but the key insight is that Harness as Code doesn't require inventing new tooling. It requires applying the pattern that already works everywhere else to the newest operational surface: agent behavior.

Why GitOps Is the Multiplier

The -as-code patterns above are powerful on their own. Combine them with **GitOps* — Git as single source of truth, changes reconciled automatically — and you get compounding leverage:

Automation: Merge triggers reconciliation. No manual "apply" steps.
Repeatability: Clone the repo, get an identical system. Every time.
Reliability: Branch protection + required reviews = no unreviewed changes reach production.
Audit trail: Every change attributed, timestamped, diffable, revertable.

But there's a second-order unlock that changes the entire equation.

The Real Unlock: Once It's Code, Agents Can Maintain It

Here's the thesis most *-as-code articles miss — the one that changes the math entirely.

As-code is the prerequisite for agent-maintained. Harness as Code closes the loop.

Automation, repeatability, and audit trails justify the migration to code on their own. But there's a second-order effect that drastically skyrockets your velocity: once something is expressed as code in a Git repository, an AI agent can maintain it.

Think about what that means:

Want agents managing your identity? Make identity as code first. Once your Okta configs are Terraform files, an agent can propose onboarding PRs, detect stale accounts, and enforce least-privilege — autonomously.
Want agents managing your policy? Make policy as code first. Once your governance rules are OPA policies in a repo, an agent can detect compliance drift and propose remediation PRs automatically.
Want agents managing your infrastructure? Make it as code first. Once your cloud resources are declarative configs, an agent can right-size instances, rotate certificates, and propose cost optimizations — all as reviewable PRs.
Want agents managing your documentation? Make docs as code first. Once your docs are Markdown in Git, an agent can detect staleness, update API references, and flag broken links.

The pattern is universal: as-code is the prerequisite for agent-maintained. Without code, agents have nothing to operate on. With code, agents get a structured, diffable, reviewable surface they can read, reason about, and propose changes to.

This is where Harness as Code completes the loop. The agents themselves are governed as code — their instructions, boundaries, and permissions live in Git. So you get agents maintaining infrastructure, policy, identity, and docs... while the agents themselves are maintained through the same pattern. Code governing agents governing code.

That's not a nice-to-have. That's the difference between a team of five managing ten services and a team of five managing a hundred.

The Bottom Line

The -as-code revolution isn't just about automation and audit trails — it's about **creating the surface that agents can operate on*. Every domain you migrate to code becomes a domain that agents can maintain. Every domain you leave in dashboards and admin consoles stays a domain that requires human clicks.

Infrastructure was first. Policy, identity, access, docs, and management followed. Harness as Code closes the loop by governing the agents themselves through the same pattern.

The progression: make it code → apply GitOps → let agents maintain it → govern the agents as code too.

That's not a framework. That's compound leverage.

headline="Want the complete Harness as Code architecture?"
description="The Agentic Development Blueprint ($129) includes the full governance architecture — agent constitutions, skill definitions, hookflow pipelines, and the decision framework for applying GitOps to AI agent behavior at scale."
/>

Aspect-Oriented Programming for AI Agents: Hookflows as an Event Bus

Hector Flores — Wed, 20 May 2026 13:55:26 +0000

The Pattern That Made Me Say "That's Exactly What AOP Is"

I was debugging a notification problem in my 53-agent home assistant when I stumbled onto something unexpectedly powerful. I needed every agent dispatch to notify me via Telegram — but I didn't want to burn tokens on a separate telegram_send_message call. The agents were already being validated by a governance hookflow. Why not piggyback the notification onto the validation step?

One tool call. Validation and notification. Zero additional tokens consumed by the agent.

Then it hit me: I'd accidentally reinvented aspect-oriented programming — but for AI agents instead of Java classes.

What Is AOP, and Why Does It Matter Here?

Aspect-oriented programming emerged in the late 1990s to solve a specific problem: cross-cutting concerns. Logging, security checks, transaction management — these behaviors cut across every module in your application, but they don't belong in any single module's core logic.

The AOP solution: define these concerns once, then weave them into your code at specific join points (method calls, property access) using advice (before, after, around). The original code never knows it's being augmented.

Now apply that mental model to AI agents:

AOP Concept	Agent Equivalent
Join point	Tool call (e.g., `task`, `edit`, `create`)
Pointcut	Hook trigger rule (which tools, which args)
Advice	Hook step logic (validate, notify, log)
Aspect	A hookflow YAML definition
Weaving	The hook engine intercepting tool calls at runtime

The agent doesn't know. It just calls a tool. The governance layer intercepts, validates, and fires side effects. This is textbook AOP — applied to a fundamentally new domain.

The Real Implementation: Enforcement-Triggered Side Effects

Here's the actual hookflow running in my platform. It requires every agent dispatch to include a notification tag, validates it, and then sends the notification as a side effect:

name: Require task or write_agent originator notify
description: >
  Blocks task/write_agent calls unless they contain a valid
  originator_notify tag. On success, sends the parsed message
  to the originator via Telegram Bot API.
on:
  hooks:
    types: [preToolUse]
    tools: [task, write_agent]
blocking: true
env:
  TOOL_NAME: ${{ event.tool.name }}
  TASK_PROMPT_JSON: ${{ toJSON(event.tool.args.prompt) }}
  WRITE_AGENT_MESSAGE_JSON: ${{ toJSON(event.tool.args.message) }}
steps:
  - name: Validate and send originator notification
    run: |
      # Determine which tool arg contains the text
      $text = if ($env:TOOL_NAME -eq 'write_agent') {
        $env:WRITE_AGENT_MESSAGE_JSON | ConvertFrom-Json
      } else {
        $env:TASK_PROMPT_JSON | ConvertFrom-Json
      }

      # Parse the XML tag from tool arguments
      $pattern = '<originator_notify\b(?<attrs>[^>]*)>(?<message>[\s\S]*?)</originator_notify>'
      $matches = [regex]::Matches($text, $pattern)

      if ($matches.Count -eq 0) {
        Write-Error "Missing <originator_notify> block"
        exit 1  # BLOCKS the tool call
      }

      # Extract telegram_id via nested regex on attributes
      $attrs = $matches[0].Groups['attrs'].Value
      $idMatch = [regex]::Match($attrs, 'telegram_id=["\x27](?<id>\d+)["\x27]')
      $telegramId = $idMatch.Groups['id'].Value

      $notifyMessage = $matches[0].Groups['message'].Value.Trim()

      # Side effect: send Telegram notification
      $botToken = $env:TELEGRAM_BOT_TOKEN
      $body = @{ chat_id = $telegramId; text = $notifyMessage } | ConvertTo-Json
      Invoke-RestMethod -Uri "https://api.telegram.org/bot$botToken/sendMessage" `
        -Method Post -ContentType 'application/json' -Body $body

The agent's perspective? It's just including metadata in its prompt — a governance requirement. It has no idea that including that tag triggers a Telegram message. The hookflow handles both enforcement (blocking if the tag is missing) and a side effect (sending the notification) in a single interception.

Why This Is Better Than Explicit Tool Calls

The naive approach is straightforward: after dispatching a sub-agent, call telegram_send_message explicitly. But that approach has serious problems in production multi-agent systems:

Token cost compounds. Every tool call consumes tokens — the call itself, the response, the reasoning about the response. In a system dispatching dozens of agents per hour, those extra calls add up fast.

Agent discretion is unreliable. Agents skip steps. They forget. They decide a notification "isn't necessary this time." When notifications are a side effect of governance, they happen deterministically. Every single time. No exceptions.

Composability degrades. When you want to add a second cross-cutting concern — say, audit logging — you'd need to update every agent's instructions. With hookflows, you stack another aspect. The agents remain untouched.

The Composability Advantage

This pattern isn't limited to notifications. Here are enforcement-triggered side effects I'm now implementing across the platform:

Task creation with auto-notification:
A hookflow on add_task validates the task structure, then parses a notify block to Telegram the assignee. The agent creating the task doesn't need to know who gets notified or how.

File edits with watch triggers:
A hookflow on edit for certain file paths validates the change, then queues a test run. The editing agent doesn't know it just triggered CI.

Content creation with publish intent:
A hookflow on create for content files validates frontmatter, then notifies the content scheduler to slot the piece. The writing agent doesn't manage scheduling.

Each hookflow is independent. Stack them. Compose them. The agent sees one tool call; the platform executes an entire workflow.

Prior Art: I'm Not the First, But the Domain Is New

The broader software world has been exploring this territory:

Spring AI's Advisor system draws an explicit parallel to Spring AOP — intercepting and enhancing AI calls with logging, memory injection, and retrieval augmentation. CrewAI's LLM Call Hooks expose before/after interception points for inspection, approval gates, and response transformation. Victor Dibia's analysis of agent middleware frames middleware as a control and observability mechanism for agent execution loops.

Microsoft's own Agent Governance Toolkit provides application-level policy enforcement for autonomous agents via Python middleware.

What's different about the hookflow approach is the enforcement-plus-side-effect fusion. Most prior art treats governance (blocking/allowing) and side effects (notifications/logging) as separate middleware layers. The hookflow pattern combines them: the same rule that enforces compliance also triggers downstream actions. One interception, multiple outcomes.

The Technical Architecture

The engine powering this is gh-hookflow — a Go-based workflow engine that intercepts GitHub Copilot CLI tool calls using preToolUse and postToolUse triggers. Hookflows are defined in YAML (GitHub Actions syntax) and live in .github/hookflows/. I've written about the governance layer before in Stop Trusting AI Agents with Git and the broader three-layer architecture that makes autonomous agents production-ready.

The execution model:

Agent calls a tool (e.g., task)
Hook engine intercepts via preToolUse
Hookflow steps execute: parse args, validate, fire side effects
If validation fails → tool call is blocked (agent sees denial message)
If validation passes → tool call proceeds, side effects already fired

This is deterministic. It runs on every tool call matching the trigger. The agent cannot bypass it — unlike instructions, which are suggestions the model can ignore.

Why Token Efficiency Matters at Scale

In a system running 53 agents with scheduled cron jobs, every unnecessary tool call matters. Here's the math:

Each telegram_send_message call: ~200 tokens (call + response + reasoning)
Agent dispatches per day: ~80-120
Tokens saved by hookflow side effects: 16,000-24,000 tokens/day
At current model pricing, that's $0.15-0.75/day depending on model tier — which compounds across every cross-cutting concern you'd otherwise implement as explicit tool calls

But the bigger win isn't cost. It's reliability. Those 80-120 notifications now happen with 100% certainty. Not "usually" or "when the agent remembers." Every time.

Building Your Own Enforcement-Triggered Side Effects

If you're building with GitHub Copilot CLI extensions or any hook-based agent framework, here's the pattern:

Identify a cross-cutting concern — something that should happen on every tool call of a certain type
Require metadata — make the agent include structured data (XML, JSON, YAML) in its tool arguments
Validate the metadata — block the call if it's missing or malformed
Fire the side effect — parse the metadata and trigger your downstream action
The agent never needs to know — it thinks it's satisfying a governance requirement

The key insight: by framing side effects as governance requirements, you get both compliance enforcement AND automated workflows from a single hook. The agent is incentivized to include the metadata (because the call fails without it), and the platform gets free automation.

The Bottom Line

Aspect-oriented programming solved cross-cutting concerns in enterprise Java twenty years ago. The same pattern — interception at defined join points, transparent augmentation, composition of independent aspects — solves cross-cutting concerns in autonomous AI agent systems today.

The difference is that agents can't be refactored to call the right methods. They're non-deterministic. They forget. They improvise. That's exactly why enforcement-triggered side effects are more powerful than explicit tool calls: you remove agent discretion from the equation entirely.

One tool call. Governance satisfied. Side effects fired. Zero extra tokens. That's AOP for AI agents.

GitHub Just Shipped What I Built 2 Months Ago — And That's a Good Thing

Hector Flores — Wed, 20 May 2026 13:53:56 +0000

The Pattern Is Undeniable Now

On May 18, GitHub made remote control for Copilot CLI sessions generally available — on mobile, web, and VS Code. You start a session on your workstation, scan a QR code, and steer your agent from your phone while walking the dog.

I published a 3,000-word guide to doing exactly this via Telegram on April 11. Same core concept: your AI agent runs on your machine, you interact with it from your pocket. Different implementation, identical insight.

This isn't an "I told you so" moment. This is a validation moment. When the team building the tool arrives at the same architectural conclusion you reached independently — mobile-first agent interaction isn't optional, it's inevitable — that tells you something important about where this industry is headed.

What GitHub Shipped

The feature dropped in public preview on April 13 and hit GA on May 18. Here's the core workflow:

Start a session: copilot --remote
The CLI displays a link and QR code
Open it in the GitHub Mobile app or any browser
Your session streams in real time — you can steer, approve permissions, send follow-up prompts, switch modes, or stop execution entirely

The GA release expanded the scope significantly: it now works with non-GitHub repositories, supports VS Code and JetBrains as surfaces, and lets you queue messages while the agent is mid-turn. The --remote flag transforms your local agent into a service you can access from anywhere.

This is excellent engineering. Clean, secure (sessions are private to the authenticated user), and integrated directly into the existing GitHub ecosystem. Business and Enterprise users get admin controls. The session link lives alongside your repo in the Agents tab. It's clearly a first-class feature, not an afterthought.

What I Built in April

My Telegram bridge extension solves the same fundamental problem with a different architecture:

Bidirectional messaging — every Telegram message becomes a prompt, every response forwards back
Photo support — send images from your phone for vision analysis
Voice notes — transcribed via Whisper and forwarded as text
Cron scheduling — agents run on schedules, report back to Telegram automatically
Custom tools — telegram_send_message, telegram_send_photo, telegram_get_status

The entire thing is one .mjs file in .github/extensions/. No external servers, no Docker, no cloud functions. It uses the Telegram Bot API over HTTP — the same protocol Telegram has provided since 2015.

Here's the key architectural difference: GitHub's remote sessions stream your existing CLI session to a viewer. My Telegram bridge creates a new interaction surface — the agent is always listening, even when no terminal is open. Combined with cron-scheduled agents, it becomes a persistent service. My daily briefing agent fires at 6:30 AM and sends me a compiled report in Telegram before I'm out of bed.

I wrote more about this always-on pattern in the article about open-sourcing my home assistant — 17 agents, 16 extensions, all orchestrated through Telegram.

The Core Insight Both Approaches Share

Strip away the implementation details and both GitHub's --remote and my Telegram bridge express the same thesis:

The interface to AI agents shouldn't be limited to the device running them.

This sounds obvious in hindsight. But look at the AI coding tool landscape even six months ago: every tool assumed you were sitting at your computer, staring at the terminal, actively supervising. The "agentic" revolution was still tethered to a physical desk.

The insight that unlocks everything is recognizing that agents don't need real-time supervision — they need periodic steering. And steering can happen from anywhere. A quick message from your phone while you're in line at the grocery store. A plan approval while waiting for your kid's soccer practice to end. A "stop, wrong approach" while scrolling on the couch.

The reason mobile-first matters isn't convenience. It's parallelism. When your agent interaction model requires a terminal in front of you, you're serializing your attention. One task at a time. But when your agent can work autonomously and you steer from your phone, suddenly you're genuinely running parallel workflows. The agent handles the mechanical work; you handle judgment calls asynchronously.

Where the DIY Approach Goes Further

GitHub's implementation is polished and production-ready out of the box. But an extension-based approach has capabilities that a platform-native solution can't easily replicate:

Multi-agent orchestration. I'm not steering one session. I'm running 53 agents that communicate via cross-session mesh. An orchestrator agent dispatches work to specialized sub-agents — finance, content, scheduling, health — and they report back through Telegram. Try doing that with a single --remote session.

Proactive notifications. GitHub's remote sessions are pull-based: you open the link to check status. My Telegram bridge is push-based: the agent messages me when something needs attention. "Your CI failed on PR #47." "Your briefing is ready." "The grocery order is confirmed." No polling required.

Governance hooks. Because it's an extension, I wire it into my hookflow system — approval gates, spending limits, safety protocols. The agent can't merge a PR without my explicit Telegram reply of "approved." That's not just remote access — it's remote governance.

Platform independence. Telegram works on iOS, Android, desktop, web, tablets, smartwatches. It works offline and syncs when you reconnect. It doesn't require a GitHub account on the device. My wife can send my agent a message ("add diapers to the grocery list") without knowing what GitHub is.

What This Convergence Means for the Industry

When GitHub, a platform serving 150M+ developers, ships a feature that independent builders already prototyped — that's a signal. It means:

Mobile-first agent interaction is table stakes. Every AI coding tool will ship this within 12 months. The desk-bound model is dead.
The extension ecosystem is where innovation happens. My Telegram bridge existed months before the native feature because Copilot CLI extensions let you build outside the product roadmap. The extensibility model is the product.
The real competition isn't between tools — it's between interaction paradigms. Chat interfaces, terminal sessions, IDE panels, mobile apps, Telegram bots, voice commands — the winners will be platforms that support all of them simultaneously.
Agents are becoming services, not tools. A tool requires your presence. A service works for you whether you're watching or not. GitHub's --remote moves Copilot from tool toward service. Extensions like the Telegram bridge complete that transformation.

What's Next

GitHub's remote sessions will get better. I expect deeper mobile app integration, richer notification controls, and eventually multi-session management from the phone. The public preview to GA path was fast — barely 5 weeks — which tells me the team has conviction about this direction.

On my end, I'm pushing the Telegram bridge toward voice-first interaction. Voice notes already work via Whisper transcription, but I want real-time voice conversations with my agent — think phone calls, not text messages. I'm also exploring MCP-connected phones as a deeper integration layer where the agent doesn't just receive messages from your phone — it controls phone capabilities directly.

The future isn't "AI in the terminal." The future is AI everywhere, steered from whatever device you're holding. GitHub just proved that isn't a fringe opinion — it's the roadmap.

Resources

Platform Engineering with GitHub: Build Your IDP with Copilot, IssueOps, and Golden-Path Repos

Hector Flores — Wed, 20 May 2026 02:42:04 +0000

Every enterprise team I talk to is drowning in the same problem: toolchain sprawl. Backstage instances nobody maintains. ServiceNow tickets that take 3 days to provision a repo. Confluence pages with onboarding steps from 2022. Developers spending 40% of their time fighting infrastructure instead of shipping product.

Platform engineering promises to fix this — and the industry agrees. Gartner predicts that by 2026, 80% of software engineering organizations will establish platform teams. But here's what most teams get wrong: they think they need to build another tool.

They don't. GitHub already is the platform.

Want the complete implementation? This article covers the architecture overview. Newsletter subscribers get the real configs, full code, and step-by-step implementation details. Subscribe to the htek.dev newsletter →

The Problem with "Build Your Own IDP"

I've seen it play out the same way at every Fortune 500 company I've worked with. A platform team spins up a Backstage instance, spends 6 months building plugins, and ends up with a portal that developers still don't want to use — because it's another tab. Another login. Another thing to maintain.

Meanwhile, every developer on the team already lives in GitHub 8 hours a day.

The insight that changed everything for me: the best platform is invisible. It meets developers where they already are — in their IDE, in their pull requests, in their issues. You don't need a separate portal. You need GitHub, used correctly.

The Golden Path Pattern

A golden path isn't a locked-down template. It's an opinionated default that accelerates developers without restricting them. Think of it like Rails conventions — you can deviate, but the default path is so good that most people don't need to.

In the GitHub ecosystem, golden paths are starter repos + Copilot context:

copilot-instructions-starter — Drop-in .github/copilot-instructions.md templates that give Copilot the context to understand your org's conventions from day one.
copilot-agent-starter — Scaffold custom Copilot CLI agents with proper extension architecture, hooks, and skill files.
copilot-life-os-starters — Full starter kits for building agentic systems on top of Copilot CLI.

When a new developer joins the team and creates a repo from your golden-path template, they inherit the right CI/CD pipelines, the right Copilot context, the right linting rules, and the right security policies. Onboarding drops from days to minutes.

IssueOps: Eliminate the Ticketing Layer

Why send developers to ServiceNow when they can just open a GitHub Issue?

IssueOps turns GitHub Issues into the interface for platform operations. Need a new environment? Open an issue with a specific label. Need a database provisioned? Issue template with the right inputs. GitHub Actions picks it up, runs the automation, and comments back with the result.

The gh-aw-overview repo demonstrates this pattern — using GitHub's native primitives (Issues, Actions, labels, comments) as the control plane for platform operations. Developers never leave GitHub. No context switching. No ticket queue.

Hookflows: Governance Without Friction

The hardest part of platform engineering isn't building the golden path — it's keeping people on it without becoming a bottleneck.

This is where hookflows change the game. Hookflows intercept actions at the agent layer — validating commits, enforcing branch naming, checking policy compliance — before they hit the repo. They're governance guardrails that run automatically.

Combined with copilot-hooks-starter, you get a pre-built framework for intercepting and validating agent operations. The copilot-ci-pipeline repo extends this into CI — giving you a full feedback loop from commit to deployment.

I wrote more about this pattern in my article on governing AI agents in git — the principles apply equally to human and AI-driven workflows.

Newsletter subscribers get the real configs. The full hookflow definitions, the IssueOps action templates, and the architecture diagrams that connect all 7 repos into a cohesive platform. Get the implementation details →

The 7-Repo Stack

Here's the full stack, all open source and production-tested:

Repo	Purpose
copilot-instructions-starter	Org-wide Copilot context templates
copilot-agent-starter	Custom Copilot CLI agent scaffolding
copilot-hooks-starter	Agent-layer governance hooks
copilot-ci-pipeline	CI feedback loop for AI-assisted dev
gh-hookflow	Governed git operations framework
gh-aw-overview	IssueOps platform operations pattern
copilot-life-os-starters	Full agentic system starter kits

These aren't demos. I built and validated this stack while running a DevOps enablement platform at a Fortune 500 energy company — supporting hundreds of repos and dozens of development teams. The patterns scale.

Why GitHub IS the Platform

The realization that unlocked all of this: you don't need a separate platform layer. GitHub already has:

Identity & access (Teams, CODEOWNERS, branch protection)
Service catalog (repo topics, README conventions, copilot-instructions.md)
Self-service provisioning (IssueOps + Actions)
Compliance & governance (hookflows, required checks, audit logs)
Developer AI (GitHub Copilot with full repo context)
Deployment (Actions + environments + OIDC)

Every piece of Backstage functionality has a native GitHub equivalent — and developers already know how to use it. Your platform team's job isn't to build a portal. It's to configure GitHub as a platform and encode golden paths that make the right thing the easy thing.

Go Deeper

If you're building (or rebuilding) an internal developer platform, I wrote a full implementation guide as part of The Agentic Development Blueprint — including architecture diagrams, configuration files, and the decision framework for what goes in golden paths versus what stays flexible.

For related patterns, check out my guide to Copilot CLI extensions and how hookflows enforce governed git for AI agents.

This was the architecture overview. The newsletter issue has the step-by-step implementation — exact configs, IssueOps templates, hookflow definitions, and the full wiring diagram connecting all 7 repos into one cohesive IDP.

Subscribe to the htek.dev newsletter →

DEV Community: Hector Flores

Platform Team Burnout Is Real — Here's How I Rescued Mine with AI

I Built the Perfect Platform — and It Nearly Broke Me

The Mandate: Unify Everything

The Burnout Equation

The Rescue: From Developer to Reviewer

You Don't Have to Be Solo for This to Matter

What Platform Teams Should Do Right Now

The Bottom Line

The Definitive GitHub Actions Debugging Guide: 65+ Real Errors and How to Fix Them

Quick Diagnosis Flowchart

YAML Syntax & Validation Errors

Unexpected or Typo'd YAML Keys

Missing Required Keys

Expression Syntax Errors

Context Variable Type Errors

secrets.* in Unexpected Contexts — Silent Failures

env Context Unavailable in Reusable Workflow with:

if: Conditionals Always Evaluating to true

Boolean Inputs Are Strings in Composite Actions

Composite Actions: No defaults: Support

Tab Characters in YAML

Silent Failures: The Most Dangerous Category

Scheduled Workflows Silently Disabled After 60 Days

GITHUB_TOKEN Cannot Trigger Downstream Workflows

Cache Rate Limiting Falls Through as "Cache Not Found"

Fork PR Secrets Evaluate to Empty Strings

Runner & Environment Problems

Self-Hosted Runner Registration & Update Loops

Runner Out of Disk Space

Environment Variables Not Persisting Between Steps

Tools Not Found in Next Step (PATH Issues)

Docker Not Available on Runner

Service Container Connectivity

Runner Image Deprecation

Windows Runner Gotchas

Node.js Runtime Deprecation

Secrets, Permissions & Authentication

GITHUB_TOKEN Permission Denied (403)

OIDC Federation Failures with AWS

Cross-Repo Access (403)

Environment Protection Rules Blocking Deployments

GitHub App Token Generation Failures

Docker Registry Auth (GHCR)

Dependabot Secrets Namespace

PAT vs. GITHUB_TOKEN Decision Matrix

Caching, Artifacts & Dependencies

Cache Miss Despite Recent Save

cache-hit Output Semantics

Cache Size Limit (10 GB Per Repo)

upload-artifact v3 → v4 Breaking Changes

Cross-Workflow Artifact Download

npm ci Cache Save Timeout

Docker Layer Caching

Cache Corruption

Git LFS Files Not Downloaded

Lockfile Hash Returns Empty String

Trigger Problems

Workflow Not Triggering At All

workflow_dispatch Button Not Showing

Cron Schedule Running Late or Not Running

workflow_run Not Firing

repository_dispatch Returns 204 But Workflow Doesn't Run

Path Filters Not Working as Expected

Tag Push vs. Release Published

Concurrency & Timing

Jobs Cancelled Unexpectedly

Empty head_ref Causing Cross-Branch Cancellation

Job needs Failure Cascading

Default Timeout is 6 Hours

Matrix include vs. exclude Confusion

Dynamic Matrix and Required Status Checks

Known Unsolved Problems

No SSH / Interactive Debugging (#241 — 107 👍, open since 2019)

No Step-Level Retry

No Early-Exit / Step Flow Control (#662 — 1,031 👍)

Reusable Workflows Cannot Be Called from Composite Actions

No services: or container: in Composite Actions (ADR 0549)

Secret Masking Edge Cases (#475 — 68 👍, open since 2020)

Cost/Billing Opacity

`secrets.*` in Unexpected Contexts — Silent Failures

`env` Context Unavailable in Reusable Workflow `with:`

`if:` Conditionals Always Evaluating to `true`

Composite Actions: No `defaults:` Support

`GITHUB_TOKEN` Cannot Trigger Downstream Workflows

`GITHUB_TOKEN` Permission Denied (403)

`cache-hit` Output Semantics

`upload-artifact` v3 → v4 Breaking Changes

`npm ci` Cache Save Timeout

`workflow_dispatch` Button Not Showing

`workflow_run` Not Firing

`repository_dispatch` Returns 204 But Workflow Doesn't Run

Empty `head_ref` Causing Cross-Branch Cancellation

Job `needs` Failure Cascading

Matrix `include` vs. `exclude` Confusion

No `services:` or `container:` in Composite Actions (ADR 0549)

`actionlint` — The Single Most Impactful Tool

`gh` CLI for Debugging

`plugin.json` Is the Identity Card

1. `htekdev/devops-copilot-skill`

2. `htekdev/vidpipe-copilot-plugin`

3. `htekdev/copilot-plugin-skill`