<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hector Flores</title>
    <description>The latest articles on DEV Community by Hector Flores (@htekdev).</description>
    <link>https://dev.to/htekdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2155191%2F4eb16de9-82ac-4486-b7cd-6c0ec2b33daf.png</url>
      <title>DEV Community: Hector Flores</title>
      <link>https://dev.to/htekdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/htekdev"/>
    <language>en</language>
    <item>
      <title>Platform Team Burnout Is Real — Here's How I Rescued Mine with AI</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Fri, 29 May 2026 13:19:26 +0000</pubDate>
      <link>https://dev.to/htekdev/platform-team-burnout-is-real-heres-how-i-rescued-mine-with-ai-2j1f</link>
      <guid>https://dev.to/htekdev/platform-team-burnout-is-real-heres-how-i-rescued-mine-with-ai-2j1f</guid>
      <description>&lt;h2&gt;
  
  
  I Built the Perfect Platform — and It Nearly Broke Me
&lt;/h2&gt;

&lt;p&gt;Seventy-three percent of platform engineers work 50+ hour weeks. Nearly a third of organizations &lt;a href="https://platformengineering.org/events/state-of-platform-engineering-in-2026-salary-maturity-and-shifting-down-2026-01-20" rel="noopener noreferrer"&gt;report understaffed platform teams&lt;/a&gt;. And 58% of platform engineers are on-call for &lt;a href="https://www.ai-infra-link.com/platform-team-burnout-key-causes-and-solutions-in-2026/" rel="noopener noreferrer"&gt;more than 10 services&lt;/a&gt;. I know these numbers are real because I lived them — except my story was worse. I was one person responsible for 10 interconnected frameworks spanning 60+ repositories.&lt;/p&gt;

&lt;p&gt;This is the story of how I built a platform engineering ecosystem that became my company's greatest asset and my personal greatest liability — and how AI agents pulled me out of the burnout spiral.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mandate: Unify Everything
&lt;/h2&gt;

&lt;p&gt;At a Fortune 500 energy company, I was brought in to lead a massive consolidation effort. The engineering org was scattered across Azure DevOps, Bitbucket, Stash, SVN, and a mess of legacy CI/CD tools. My mandate was simple: bring everything under one roof on GitHub.&lt;/p&gt;

&lt;p&gt;My approach was equally simple: find developer bottlenecks and fill them with frameworks. Every time I saw engineers struggling — with credentialing, infrastructure provisioning, documentation, runner management — I'd build a framework to solve it.&lt;/p&gt;

&lt;p&gt;Over time, I built roughly ten interconnected frameworks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity Management Framework&lt;/strong&gt; — CI/CD credentialing solved entirely. Developers add a reusable workflow; each job represents an identity they need. RBAC defined through file paths in a central identity repo. Federated credentials use base64-encoded metadata in the description field for state management — no Terraform state files needed. PR approval gates let the identity team review permissions. Merge triggers automatic provisioning via PowerShell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as Code (IaC) Framework&lt;/strong&gt; — Centralized all infrastructure provisioning. Developers create Bicep or Terraform in their repo, add a config file referencing the IaC framework, and their repo becomes a fully instrumented IaC module with CI/CD pipelines and credentialing — all automated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation Framework&lt;/strong&gt; — Docs-as-code applied org-wide. Consolidated documentation into a unified, maintainable system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Hosted Runtime Framework&lt;/strong&gt; — Automated GitHub Actions self-hosted runners. Started as issue-based requests, evolved into demand-based auto-scaling — creating and destroying VMs dynamically based on pipeline demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform Meta-Framework&lt;/strong&gt; — The framework that maintains and discovers all other frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Framework&lt;/strong&gt; — Named after the &lt;code&gt;uses:&lt;/code&gt; keyword in GitHub Actions. Handled workflow inventory — repos register their workflows in a central repository, enabling org-wide discovery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Release Framework&lt;/strong&gt; — Standardized release actions and processes across the organization.&lt;/li&gt;
&lt;li&gt;Plus additional specialized frameworks handling discovery, inventory, and integration patterns across the ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These frameworks weren't isolated. They formed a web. Most consumed the Identity Framework for Azure access. Registration-based frameworks fed into the Documentation Framework. Frameworks needing Azure resources consumed the IaC Framework. A beautiful, complex web of internal tooling — and exactly what &lt;a href="https://wellarchitected.github.com/library/collaboration/recommendations/scaling-actions-reusability/" rel="noopener noreferrer"&gt;GitHub's Well-Architected guidance recommends&lt;/a&gt; for enterprise-scale reusable workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrnjurwhp0ox67crblvx.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrnjurwhp0ox67crblvx.webp" alt="The Framework Web — 10 interconnected frameworks forming a dense dependency graph, with Identity Management at the center as the critical dependency" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Framework Web: 10 interconnected frameworks spanning 60+ repos, all maintained by a single engineer. Identity Management sits at the center — nearly every framework depends on it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I wrote about the architectural patterns behind this approach in &lt;a href="https://htek.dev/articles/platform-engineering-github-internal-developer-platform/" rel="noopener noreferrer"&gt;Platform Engineering with GitHub: How to Build an Internal Developer Platform&lt;/a&gt;. The technical approach was sound. The organizational model was not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Burnout Equation
&lt;/h2&gt;

&lt;p&gt;Here's where the beauty becomes the beast.&lt;/p&gt;

&lt;p&gt;Sixty-plus repositories of extremely high complexity. One person maintaining all of them. A backlog that grew to 500+ open issues. I became both a massive asset and a critical liability simultaneously.&lt;/p&gt;

&lt;p&gt;The support team couldn't keep up — nobody else had the depth to maintain these repos. Classic &lt;a href="https://devtron.ai/blog/the-hero-engineer-problem-in-platform-engineering/" rel="noopener noreferrer"&gt;hero engineer anti-pattern&lt;/a&gt;: "exceptional individuals who alone understand how these Lego blocks fit together become single points of failure, centralizing critical knowledge and leaving the broader system brittle and unsustainable."&lt;/p&gt;

&lt;p&gt;That was me. Textbook.&lt;/p&gt;

&lt;p&gt;Microsoft calls this &lt;a href="https://devblogs.microsoft.com/all-things-azure/the-human-scale-problem-in-platform-engineering/" rel="noopener noreferrer"&gt;the human scale problem&lt;/a&gt; — the fundamental mismatch between platform complexity and team capacity. My 10 frameworks were the right technical solution, but they exceeded human scale for a single maintainer.&lt;/p&gt;

&lt;p&gt;And here's the irony that Thoughtworks &lt;a href="https://www.thoughtworks.com/en-us/insights/blog/platforms/escaping-the-platform-labyrinth--a-product-guide-to-beating-cogn" rel="noopener noreferrer"&gt;nails perfectly&lt;/a&gt;: "Platform engineering often starts as a promise of freedom but devolves into a labyrinth — systems so complex and cognitively heavy that they become the very bottlenecks they were meant to solve." I built frameworks to remove developer bottlenecks, and those frameworks &lt;em&gt;became&lt;/em&gt; the bottleneck when I couldn't maintain them fast enough.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Platform engineering doesn't eliminate cognitive load. It redistributes the burden into an increasingly narrow cohort.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That "narrow cohort" was exactly one person. The 500-issue backlog was proof that the redistribution had reached its breaking point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hpfevgffy1f5n5497pi.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hpfevgffy1f5n5497pi.webp" alt="The Burnout Equation — 1 engineer times 10 frameworks times 60+ repos equals 500+ open issues, resolved by adding AI agents for 100 PRs per day" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Burnout Equation: When platform scale exceeds human capacity, the math becomes unsustainable — until AI agents change the equation entirely.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rescue: From Developer to Reviewer
&lt;/h2&gt;

&lt;p&gt;Then GitHub Copilot arrived, and everything changed.&lt;/p&gt;

&lt;p&gt;I went from &lt;strong&gt;developing&lt;/strong&gt; to &lt;strong&gt;reviewing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of writing code across 60+ repos myself, I was running six work streams simultaneously every day. Copilot agents would pick up issues, generate solutions, and open pull requests. My job shifted to cycling through reviews:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Review PR → leave comment → next PR → leave comment → next PR...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On peak days, I was reviewing close to &lt;strong&gt;100 PRs per day&lt;/strong&gt;. The 500-issue backlog started getting crushed. Work that would have taken me months to develop was being generated, reviewed, and merged in days.&lt;/p&gt;

&lt;p&gt;This wasn't just my experience being lucky. The data backs it up at scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub's research with Accenture shows Copilot enables developers to &lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-in-the-enterprise-with-accenture/" rel="noopener noreferrer"&gt;code up to 55% faster&lt;/a&gt; with 85% higher confidence in code quality&lt;/li&gt;
&lt;li&gt;Copilot's coding agent is now contributing approximately &lt;a href="https://github.blog/ai-and-ml/github-copilot/copilot-faster-smarter-and-built-for-how-you-work-now/" rel="noopener noreferrer"&gt;1.2 million PRs per month&lt;/a&gt; across the platform&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/" rel="noopener noreferrer"&gt;72.6% of Copilot code review users&lt;/a&gt; report improved effectiveness — validating the "reviewer, not developer" workflow&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://jellyfish.co/blog/2025-ai-metrics-in-review/" rel="noopener noreferrer"&gt;67% of enterprise engineers&lt;/a&gt; now use Copilot for AI-assisted code review, far ahead of any alternative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow shift is the key insight. I didn't just code faster — I changed &lt;em&gt;what my job was&lt;/em&gt;. The bottleneck dissolved because the constraint wasn't my technical skill. It was my typing speed multiplied by context-switching overhead across 60+ repos. AI agents eliminated both.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztd8wps7lpa13a1tl6vd.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztd8wps7lpa13a1tl6vd.webp" alt="The Workflow Shift — before and after comparison showing developer mode at 5 PRs per day transforming to reviewer mode at 100 PRs per day with 6 parallel streams" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Workflow Shift: From developer mode (writing code, context-switching, ~5 PRs/day) to reviewer mode (reviewing AI-generated PRs across 6 parallel streams, ~100 PRs/day).&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  You Don't Have to Be Solo for This to Matter
&lt;/h2&gt;

&lt;p&gt;My story is an extreme case — one person, ten frameworks, sixty repos. But the pattern repeats everywhere.&lt;/p&gt;

&lt;p&gt;WEX, a global fintech, &lt;a href="https://theapplied.co/use-cases/wex-github-developer-productivity" rel="noopener noreferrer"&gt;consolidated 300+ Azure DevOps organizations&lt;/a&gt; onto GitHub Enterprise and deployed Copilot across 1,700+ engineers. Result: 30% higher developer productivity, approximately 60% ROI on Copilot licenses, and a 99% reduction in deployment cycle times. Nearly the same journey as mine — Azure DevOps to GitHub, then layering AI on top — but at enterprise scale with a full team.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://www.giantswarm.io/blog/two-years-same-questions-what-platform-teams-told-us-at-kubecon" rel="noopener noreferrer"&gt;KubeCon survey of 143 platform professionals&lt;/a&gt; found four pain points reported at nearly equal rates: hiring the right people, too many tools for the team size, operational overload, and no time for automation. Two consecutive years of the same survey, same answers. "Too many tools for the team size" — that's the one-sentence summary of every platform engineer's reality.&lt;/p&gt;

&lt;p&gt;The success stories from companies like &lt;a href="https://medium.com/@volvogroup/how-volvo-group-scaled-backstage-from-100-to-1-000-users-a-developer-centric-transformation-94f2e1a33d78" rel="noopener noreferrer"&gt;Volvo&lt;/a&gt; (1,000+ weekly users on Backstage) and &lt;a href="https://www.prnewswire.com/news-releases/zepto-wins-cncf-end-user-case-study-contest-for-developer-platform-innovation-with-backstage-argo-and-kubernetes-302520291.html" rel="noopener noreferrer"&gt;Zepto&lt;/a&gt; (90% setup time reduction) all share one common thread: they had &lt;em&gt;teams&lt;/em&gt;. Dedicated platform engineering teams staffed to maintain what they built. When you don't have that luxury, AI becomes the team multiplier.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Platform Teams Should Do Right Now
&lt;/h2&gt;

&lt;p&gt;If you're drowning in a maintenance backlog — whether you're a team of one or a team of ten — here's what I learned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shift your identity from developer to reviewer.&lt;/strong&gt; The highest-leverage activity isn't writing code. It's reviewing AI-generated PRs and ensuring they meet your standards. Your deep domain knowledge becomes the quality gate, not the bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with the backlog, not greenfield.&lt;/strong&gt; AI agents thrive on well-defined issues. Point them at your 500-item backlog, not ambiguous new features. Bug fixes, dependency updates, documentation — these are perfect candidates for AI-assisted PRs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run multiple work streams in parallel.&lt;/strong&gt; The biggest unlock wasn't speed on any single task — it was running six work streams simultaneously. Each stream had its own set of issues and PRs. I cycled between them continuously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't wait for perfect.&lt;/strong&gt; Your framework ecosystem doesn't need to be perfectly documented for AI to be useful. Start assigning issues and iterating on the generated code. You'll converge faster than writing it all yourself.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Measure the shift.&lt;/strong&gt; Track your ratio of code written vs. code reviewed. When that ratio flips — when you're reviewing more than you're writing — you've broken through the solo maintainer ceiling.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Platform team burnout isn't a people problem. It's a scale problem. We build incredible infrastructure — &lt;a href="https://www.neojn.com/insights/reports/state-of-platform-engineering-2026" rel="noopener noreferrer"&gt;82% of enterprises now have dedicated platform teams&lt;/a&gt; — but the maintenance burden grows faster than headcount.&lt;/p&gt;

&lt;p&gt;The answer isn't always hiring more engineers. Sometimes it's giving the existing ones AI-powered development tools that multiply their output by 10x. I went from drowning in a 500-issue backlog to crushing it at 100 PRs a day. The developer becomes the reviewer. The backlog becomes manageable. The hero engineer becomes a scalable team of one.&lt;/p&gt;

&lt;p&gt;If one person with GitHub Copilot can maintain 60+ complex repos and review 100 PRs per day, then platform team burnout is solvable. That's not theory — I lived it.&lt;/p&gt;

&lt;p&gt;This experience is what convinced me to specialize in &lt;a href="https://htek.dev/articles/agentic-development-in-devops-complete-guide/" rel="noopener noreferrer"&gt;agentic development&lt;/a&gt;. Because the workflow shift from developer to reviewer isn't just a productivity hack. It's the future of platform engineering — and if you've been buried under a backlog you helped create, you should know: there's a way out.&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>github</category>
      <category>devops</category>
      <category>leadership</category>
    </item>
    <item>
      <title>The Definitive GitHub Actions Debugging Guide: 65+ Real Errors and How to Fix Them</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Fri, 29 May 2026 13:18:10 +0000</pubDate>
      <link>https://dev.to/htekdev/the-definitive-github-actions-debugging-guide-65-real-errors-and-how-to-fix-them-54p7</link>
      <guid>https://dev.to/htekdev/the-definitive-github-actions-debugging-guide-65-real-errors-and-how-to-fix-them-54p7</guid>
      <description>&lt;p&gt;GitHub Actions is the CI/CD backbone for millions of repositories. It's also the source of some of the most confusing, silent, and undocumented failure modes in modern DevOps.&lt;/p&gt;

&lt;p&gt;I've spent years debugging Actions workflows — first across &lt;a href="https://htek.dev/articles/lessons-from-500-github-migrations/" rel="noopener noreferrer"&gt;500+ repository migrations at an enterprise scale&lt;/a&gt;, then building &lt;a href="https://htek.dev/articles/agentic-devops-next-evolution-of-shift-left/" rel="noopener noreferrer"&gt;agentic DevOps platforms&lt;/a&gt; that push Actions to its limits. This guide is the result: every error message I've collected, every silent failure I've traced, and every workaround that actually works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a reference guide, not a tutorial.&lt;/strong&gt; Bookmark it. Search it when something breaks. Every section includes the actual error message (so you can Ctrl+F or Google it), the root cause, and the fix with copy-paste code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Diagnosis Flowchart
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9hybi7hxzbxo8yhlxx5.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9hybi7hxzbxo8yhlxx5.webp" alt="Quick diagnosis flowchart showing 6 debugging paths for GitHub Actions failures" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Start here: identify your failure category before diving into 65+ specific scenarios.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before diving into 65+ scenarios, start here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Workflow never appears in Actions tab?&lt;/strong&gt; → YAML Syntax Issues or Trigger Problems
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow runs but a step fails?&lt;/strong&gt; → Check the error message against the sections below&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow runs but produces wrong results silently?&lt;/strong&gt; → Silent Failures
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets are empty or permissions denied?&lt;/strong&gt; → Secrets &amp;amp; Permissions
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache miss or artifact not found?&lt;/strong&gt; → Caching &amp;amp; Artifacts
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jobs cancelled unexpectedly?&lt;/strong&gt; → Concurrency Issues
&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Install &lt;a href="https://github.com/rhysd/actionlint" rel="noopener noreferrer"&gt;&lt;code&gt;actionlint&lt;/code&gt;&lt;/a&gt; right now. It catches the majority of syntax and context issues in this guide &lt;em&gt;before&lt;/em&gt; you push. Run it locally or add it to your CI: &lt;code&gt;uses: raven-actions/actionlint@v2&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  YAML Syntax &amp;amp; Validation Errors
&lt;/h2&gt;

&lt;p&gt;These errors prevent your workflow from even registering with GitHub. No run appears — the workflow is silently rejected.&lt;/p&gt;
&lt;h3&gt;
  
  
  Unexpected or Typo'd YAML Keys
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The workflow is not valid. .github/workflows/ci.yml (Line: 6, Col: 5):
Unexpected value 'default'

unexpected key "Shell" for step to run shell command. expected one of
"continue-on-error", "env", "id", "if", "name", "run", "shell",
"timeout-minutes", "working-directory" [syntax-check]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; YAML key names in GitHub Actions are case-sensitive. &lt;code&gt;default:&lt;/code&gt; is not &lt;code&gt;defaults:&lt;/code&gt;. &lt;code&gt;Shell:&lt;/code&gt; is not &lt;code&gt;shell:&lt;/code&gt;. &lt;code&gt;branch:&lt;/code&gt; is not &lt;code&gt;branches:&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use &lt;code&gt;actionlint&lt;/code&gt; to catch these before pushing. Common corrections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;default:&lt;/code&gt; → &lt;code&gt;defaults:&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;branch:&lt;/code&gt; → &lt;code&gt;branches:&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Shell:&lt;/code&gt; → &lt;code&gt;shell:&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Standard YAML linters (&lt;code&gt;yamllint&lt;/code&gt;, Python &lt;code&gt;yaml.safe_load()&lt;/code&gt;) won't catch these because the YAML is syntactically valid — it's semantically wrong for GitHub Actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Missing Required Keys
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"runs-on" section is missing in job "test" [syntax-check]
"jobs" section should not be empty [syntax-check]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Every job needs &lt;code&gt;runs-on:&lt;/code&gt; and at least one entry in &lt;code&gt;steps:&lt;/code&gt;. Matrix keys are compared case-insensitively — &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;NODE&lt;/code&gt; cannot coexist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expression Syntax Errors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;got unexpected character '"' while lexing expression...
do you mean string literals? only single quotes are available
for string delimiter [expression]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; GitHub Actions expressions use a custom mini-language, not JavaScript. Double quotes are not valid string delimiters. The &lt;code&gt;+&lt;/code&gt; operator doesn't exist for concatenation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong&lt;/span&gt;
&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "${{ "hello" }}"&lt;/span&gt;
&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "${{ var1 + var2 }}"&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct&lt;/span&gt;
&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "${{ 'hello' }}"&lt;/span&gt;
&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "${{ format('{0}{1}', var1, var2) }}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Context Variable Type Errors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;receiver of object dereference "owner" must be type of object but
got "string" [expression]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;github.repository&lt;/code&gt; is a string (&lt;code&gt;"owner/repo"&lt;/code&gt;), not an object. People try &lt;code&gt;github.repository.owner&lt;/code&gt; expecting the org name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use &lt;code&gt;github.repository_owner&lt;/code&gt; for the owner. Use &lt;code&gt;toJSON(env)&lt;/code&gt; to dump environment variables, not &lt;code&gt;${{ env }}&lt;/code&gt; (which outputs the string &lt;code&gt;'Object'&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;secrets.*&lt;/code&gt; in Unexpected Contexts — Silent Failures
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt; No error. The workflow behaves unexpectedly or steps are silently skipped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; While &lt;code&gt;secrets&lt;/code&gt; is technically &lt;a href="https://docs.github.com/en/actions/learn-github-actions/contexts#context-availability" rel="noopener noreferrer"&gt;available in step &lt;code&gt;if:&lt;/code&gt; conditions&lt;/a&gt;, using it there can cause unexpected behavior — particularly in composite actions, reusable workflows, or when the secret is undefined. The expression evaluates to empty string for undefined secrets, which can cause conditions to behave differently than expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ⚠️ Can behave unexpectedly with undefined secrets&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.MY_SECRET != '' }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "has secret"&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Map to env first, then check env (more reliable)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;MY_SECRET&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.MY_SECRET }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;if [ -n "$MY_SECRET" ]; then&lt;/span&gt;
      &lt;span class="s"&gt;echo "has secret"&lt;/span&gt;
    &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is especially dangerous because the failure mode is silence — no error, no notification. The env-mapping approach is more explicit and &lt;code&gt;actionlint&lt;/code&gt; can validate it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;env&lt;/code&gt; Context Unavailable in Reusable Workflow &lt;code&gt;with:&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Unrecognized named-value: 'env'. Located at position 1 within
expression: env.SOMETHING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The &lt;code&gt;env&lt;/code&gt; context is &lt;a href="https://github.com/actions/runner/issues/2372" rel="noopener noreferrer"&gt;not available&lt;/a&gt; in the &lt;code&gt;with:&lt;/code&gt; block when calling reusable workflows. This is a confirmed open bug with 226+ reactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Pass values via &lt;code&gt;github.event.inputs&lt;/code&gt;, &lt;code&gt;secrets: inherit&lt;/code&gt;, or hardcode them. There is no clean workaround — this is a known platform limitation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;if:&lt;/code&gt; Conditionals Always Evaluating to &lt;code&gt;true&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt; No error. The step always runs regardless of condition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Using YAML block scalar &lt;code&gt;|&lt;/code&gt;, trailing spaces, or wrapping &lt;code&gt;${{ }}&lt;/code&gt; with extra characters makes the condition a non-empty string — which is always truthy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Always true — trailing newline from |&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
  &lt;span class="s"&gt;${{ github.event_name == 'push' }}&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ Always true — trailing space&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;github.event_name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'push'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ Always true — extra characters between ${{ }} blocks&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.event_name == 'push' }} &amp;amp;&amp;amp; ${{ github.ref_name == 'main' }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct — no extra characters&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event_name == 'push'&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — single expression, no wrapping needed&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event_name == 'push' &amp;amp;&amp;amp; github.ref_name == 'main'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Boolean Inputs Are Strings in Composite Actions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In composite action — this is ALWAYS false:&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ inputs.realRun == &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="s"&gt; }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Composite actions receive all inputs as strings, even when declared with &lt;code&gt;type: boolean&lt;/code&gt;. This is a &lt;a href="https://github.com/actions/runner/issues/2238" rel="noopener noreferrer"&gt;confirmed bug&lt;/a&gt; with 117+ reactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Compare to the string &lt;code&gt;'true'&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ inputs.realRun == 'true' }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Composite Actions: No &lt;code&gt;defaults:&lt;/code&gt; Support
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Composite actions do not support the &lt;code&gt;defaults:&lt;/code&gt; key. You cannot set a default shell. Every &lt;code&gt;run:&lt;/code&gt; step must explicitly specify &lt;code&gt;shell:&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;runs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;using&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;composite&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "hello"&lt;/span&gt;
      &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;        &lt;span class="c1"&gt;# Required on EVERY step&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "world"&lt;/span&gt;
      &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;        &lt;span class="c1"&gt;# Must repeat&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tab Characters in YAML
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;found a tab character where an indentation space is expected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; YAML does not allow tabs for indentation. In VS Code: View → Render Whitespace. Add to &lt;code&gt;.editorconfig&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[*.yml]&lt;/span&gt;
&lt;span class="py"&gt;indent_style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;space&lt;/span&gt;
&lt;span class="py"&gt;indent_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Silent Failures: The Most Dangerous Category
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffu7k2zaca3y69g1arlf.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffu7k2zaca3y69g1arlf.webp" alt="Silent failures in CI/CD — everything looks green but hidden problems lurk beneath the surface" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The most dangerous bugs are the ones your pipeline says passed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These are the scenarios where &lt;em&gt;nothing visibly breaks&lt;/em&gt; — your workflow just does the wrong thing.&lt;/p&gt;
&lt;h3&gt;
  
  
  Scheduled Workflows Silently Disabled After 60 Days
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; A cron workflow that's been running for months just stops. No notification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; GitHub &lt;a href="https://github.com/orgs/community/discussions/86087" rel="noopener noreferrer"&gt;automatically disables&lt;/a&gt; &lt;code&gt;schedule&lt;/code&gt;-triggered workflows after 60 days of repository inactivity (no commits). Workflow runs themselves don't count as activity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gautamkrishnar/keepalive-workflow@v2&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;time_elapsed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;45'&lt;/span&gt;  &lt;span class="c1"&gt;# triggers 15 days before the 60-day cutoff&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or re-enable manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh workflow &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="s2"&gt;"Workflow Name"&lt;/span&gt; &lt;span class="nt"&gt;--repo&lt;/span&gt; OWNER/REPO
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; Cannot Trigger Downstream Workflows
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; A workflow pushes a commit or creates a tag, but the expected downstream workflow (triggered by &lt;code&gt;on: push&lt;/code&gt;) never fires.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; This is &lt;a href="https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow" rel="noopener noreferrer"&gt;by design&lt;/a&gt;. Commits made with &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; do not trigger further workflow runs — it's GitHub's recursion prevention mechanism.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use a GitHub App installation token or a PAT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/create-github-app-token@v1&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-token&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ vars.APP_ID }}&lt;/span&gt;
    &lt;span class="na"&gt;private-key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APP_PRIVATE_KEY }}&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.app-token.outputs.token }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache Rate Limiting Falls Through as "Cache Not Found"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Warning: Failed to restore: Failed to GetCacheEntryDownloadURL:
Rate limited: Failed request: (429) Too Many Requests
Cache not found for input keys: ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; When the &lt;a href="https://github.com/actions/cache/issues/1758" rel="noopener noreferrer"&gt;cache API rate limits&lt;/a&gt; you, the action reports it as a cache miss — not a rate limit error. Your build proceeds without cache, silently slower.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Don't trigger hundreds of parallel matrix jobs all saving caches simultaneously. Stagger cache operations or use fewer, broader cache keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fork PR Secrets Evaluate to Empty Strings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; A contributor opens a PR from a fork. Secret-dependent steps fail or skip silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Secrets are &lt;a href="https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions#using-secrets-in-a-workflow" rel="noopener noreferrer"&gt;not passed&lt;/a&gt; to workflows triggered by &lt;code&gt;pull_request&lt;/code&gt; from forks. This is a deliberate security boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Design CI to not require secrets for tests. For deployment previews after code review, use &lt;code&gt;pull_request_target&lt;/code&gt; with a mandatory label gate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request_target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;labeled&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deploy-preview&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event.label.name == 'safe to test'&lt;/span&gt;
    &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Security warning:&lt;/strong&gt; Never checkout fork code with &lt;code&gt;pull_request_target&lt;/code&gt; and then run it with repository secrets. This creates a &lt;a href="https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/" rel="noopener noreferrer"&gt;pwn-request vulnerability&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Runner &amp;amp; Environment Problems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Self-Hosted Runner Registration &amp;amp; Update Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Runner update in progress, do not shutdown runner.
Downloading 2.277.1 runner... Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
[...loops again...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Containerized runners built on older Ubuntu images (18.04) hit glibc incompatibility when auto-update downloads a newer runner binary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rebuild container on Ubuntu 22.04+&lt;/li&gt;
&lt;li&gt;Disable auto-update: &lt;code&gt;DISABLE_AUTO_UPDATE=1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;rm -rf /home/runner/actions-runner&lt;/code&gt; to container entrypoint before &lt;code&gt;./config.sh&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add watchdog cron polling &lt;code&gt;GET /orgs/{org}/actions/runners&lt;/code&gt; every 5 minutes&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Runner Out of Disk Space
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No space left on device (os error 28)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; GitHub-hosted &lt;code&gt;ubuntu-latest&lt;/code&gt; runners have ~14GB usable, but pre-installed toolchains (Android SDK ~8GB, .NET ~1.5GB, Haskell ~5GB) consume most of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Add a cleanup step before heavy builds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Free Disk Space&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jlumbroso/free-disk-space@main&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tool-cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;android&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;dotnet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;haskell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;large-packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reclaims ~10-15GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environment Variables Not Persisting Between Steps
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Warning: The `set-output` command is deprecated and will be disabled soon.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;::set-output&lt;/code&gt; and &lt;code&gt;::set-env&lt;/code&gt; &lt;a href="https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/" rel="noopener noreferrer"&gt;were deprecated&lt;/a&gt; in favor of environment files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Deprecated&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "::set-output name=dir::$(yarn cache dir)"&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Current&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "dir=$(yarn cache dir)" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;

&lt;span class="c1"&gt;# For multi-line values:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;echo "MY_VAR&amp;lt;&amp;lt;EOF" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
    &lt;span class="s"&gt;echo "$multiline_value" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
    &lt;span class="s"&gt;echo "EOF" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tools Not Found in Next Step (PATH Issues)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/bin/bash: my-tool: command not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Each &lt;code&gt;run:&lt;/code&gt; step spawns a fresh shell. &lt;code&gt;export PATH=...&lt;/code&gt; is lost when that step ends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Write to &lt;code&gt;$GITHUB_PATH&lt;/code&gt;, not &lt;code&gt;PATH&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install tool&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;pip install my-cli-tool&lt;/span&gt;
    &lt;span class="s"&gt;echo "$HOME/.local/bin" &amp;gt;&amp;gt; $GITHUB_PATH&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use tool&lt;/span&gt;  &lt;span class="c1"&gt;# PATH is now updated&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-cli-tool --version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Docker Not Available on Runner
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;ubuntu-latest-slim&lt;/code&gt;, ARC containers, and self-hosted runners without DinD don't expose Docker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard &lt;code&gt;ubuntu-latest&lt;/code&gt;: Docker is available natively&lt;/li&gt;
&lt;li&gt;ARC/containerized: Use DinD sidecar or switch to JavaScript/composite actions&lt;/li&gt;
&lt;li&gt;For private registry pulls, add &lt;code&gt;docker/login-action&lt;/code&gt; before container actions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Service Container Connectivity
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connection to server at "localhost", port 5432 failed: Connection refused
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; In containerized jobs (&lt;code&gt;container:&lt;/code&gt; at job level), service containers are on a Docker bridge network. &lt;code&gt;localhost&lt;/code&gt; doesn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Always add health checks, and use the service label as hostname in containerized jobs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:15&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;5432:5432&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
      &lt;span class="s"&gt;--health-cmd pg_isready&lt;/span&gt;
      &lt;span class="s"&gt;--health-interval 10s&lt;/span&gt;
      &lt;span class="s"&gt;--health-timeout 5s&lt;/span&gt;
      &lt;span class="s"&gt;--health-retries 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For containerized jobs, connect to &lt;code&gt;postgres:5432&lt;/code&gt; (the service label), not &lt;code&gt;localhost:5432&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Runner Image Deprecation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No hosted runners with requested label(s): 'ubuntu-18.04' can be found.
sudo: docker-compose: command not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Removed&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sudo docker-compose up -d&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Docker Compose v2 plugin syntax&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sudo docker compose up -d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Track upcoming removals at the &lt;a href="https://github.com/actions/runner-images/releases" rel="noopener noreferrer"&gt;&lt;code&gt;actions/runner-images&lt;/code&gt; releases&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows Runner Gotchas
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AssertionError: expected '40-learnings\\passesdefaultgate.md' to contain '40-learnings/'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Path separators (&lt;code&gt;\&lt;/code&gt; vs &lt;code&gt;/&lt;/code&gt;), missing POSIX tools (&lt;code&gt;jq&lt;/code&gt;, &lt;code&gt;sed&lt;/code&gt;), shebangs not honored, CRLF line endings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;  &lt;span class="c1"&gt;# uses Git Bash on Windows&lt;/span&gt;

&lt;span class="c1"&gt;# Install missing tools&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;runner.os == 'Windows'&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;choco install jq -y&lt;/span&gt;
  &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pwsh&lt;/span&gt;

&lt;span class="c1"&gt;# Disable CRLF auto-conversion&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git config --global core.autocrlf &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Node.js Runtime Deprecation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node.js 16 actions are deprecated. Please update the following actions
to use Node.js 20: actions/checkout@v3, actions/cache@v3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Bump to latest major versions of all actions. For own actions, update &lt;code&gt;action.yml&lt;/code&gt; to &lt;code&gt;runs.using: node24&lt;/code&gt;. Emergency workaround:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FORCE_JAVASCRIPT_ACTIONS_TO_NODE24&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deprecation timeline:&lt;/strong&gt; node12 (cutoff mid-2023) → node16 (mid-2024) → node20 (enforcement rolling out 2025-2026). Check the &lt;a href="https://github.blog/changelog/label/actions/" rel="noopener noreferrer"&gt;GitHub Actions changelog&lt;/a&gt; for the latest timeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secrets, Permissions &amp;amp; Authentication
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxz8mcxz9ce4q86it7zh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxz8mcxz9ce4q86it7zh.webp" alt="GitHub Actions permission model — nested security layers from repository settings to GITHUB_TOKEN to OIDC federation" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The GitHub Actions permission model: repo defaults → workflow permissions block → GITHUB_TOKEN scope. The #1 source of 403 errors.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; Permission Denied (403)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;remote: Permission to org/repo.git denied to github-actions[bot].
fatal: unable to access '...': The requested URL returned error: 403
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Default &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; is read-only since GitHub &lt;a href="https://github.blog/changelog/2023-02-02-github-actions-updating-the-default-github_token-permissions-to-read-only/" rel="noopener noreferrer"&gt;tightened defaults for new repos and orgs in February 2023&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Add explicit &lt;code&gt;permissions:&lt;/code&gt; to the job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;       &lt;span class="c1"&gt;# git push&lt;/span&gt;
  &lt;span class="na"&gt;pull-requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;  &lt;span class="c1"&gt;# PR creation&lt;/span&gt;
  &lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;       &lt;span class="c1"&gt;# GHCR push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; The &lt;code&gt;permissions:&lt;/code&gt; block completely replaces defaults. Any permission not listed becomes &lt;code&gt;none&lt;/code&gt;. Listing only &lt;code&gt;contents: write&lt;/code&gt; drops all other permissions including &lt;code&gt;pull-requests&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  OIDC Federation Failures with AWS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Could not assume role with OIDC: Not authorized to perform
sts:AssumeRoleWithWebIdentity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root causes and fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reusable workflows change the &lt;code&gt;sub&lt;/code&gt; claim.&lt;/strong&gt; The OIDC JWT subject reflects the &lt;em&gt;calling&lt;/em&gt; repo, not the reusable workflow's repo. IAM trust policies must match the caller.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing &lt;code&gt;permissions: id-token: write&lt;/code&gt;&lt;/strong&gt; on the calling job.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audience mismatch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sts.amazonaws.com&lt;/span&gt;  &lt;span class="c1"&gt;# must match trust policy&lt;/span&gt;
    &lt;span class="na"&gt;role-to-assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::123456789012:role/MyRole&lt;/span&gt;
    &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cross-Repo Access (403)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;remote: Permission to other-org/other-repo.git denied to github-actions[bot].
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; is scoped to a single repository. It cannot access other repos — this is a &lt;a href="https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication" rel="noopener noreferrer"&gt;security boundary by design&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use a GitHub App installation token (recommended) or a PAT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/create-github-app-token@v1&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-token&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ vars.APP_ID }}&lt;/span&gt;
    &lt;span class="na"&gt;private-key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APP_PRIVATE_KEY }}&lt;/span&gt;
    &lt;span class="na"&gt;repositories&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target-repo"&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.app-token.outputs.token }}&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;org/target-repo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Protection Rules Blocking Deployments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This deployment was rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The triggering ref doesn't match the environment's allowed branches/tags filter, or the required reviewer also triggered the workflow (GitHub doesn't allow self-approval).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Ensure the triggering ref matches the environment's branch filter pattern. Add a second reviewer if the triggering user is the sole required reviewer.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub App Token Generation Failures
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error:0909006C:PEM routines:get_name:no start line
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Private key corrupted during shell escaping or base64 encoding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Store the raw PEM file directly as a GitHub secret:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh secret &lt;span class="nb"&gt;set &lt;/span&gt;APP_PRIVATE_KEY &amp;lt; my-app.private-key.pem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;a href="https://github.com/actions/create-github-app-token" rel="noopener noreferrer"&gt;&lt;code&gt;actions/create-github-app-token@v1&lt;/code&gt;&lt;/a&gt; (official, node20-native) instead of &lt;code&gt;tibdex/github-app-token&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Registry Auth (GHCR)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;denied: installation not allowed to Write organization package
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add &lt;code&gt;permissions: packages: write&lt;/code&gt; to the job&lt;/li&gt;
&lt;li&gt;For org packages: visit package settings → Manage Actions Access → add the repository with Write access&lt;/li&gt;
&lt;li&gt;Don't set &lt;code&gt;DOCKER_CONFIG: $HOME/.docker&lt;/code&gt; at job level — it breaks credential persistence&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Dependabot Secrets Namespace
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Dependabot runs in a separate secrets namespace. Repository secrets are not available to Dependabot-triggered workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Add secrets to both namespaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh secret &lt;span class="nb"&gt;set &lt;/span&gt;NPM_TOKEN &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"npm_xxx"&lt;/span&gt; &lt;span class="nt"&gt;--app&lt;/span&gt; actions
gh secret &lt;span class="nb"&gt;set &lt;/span&gt;NPM_TOKEN &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"npm_xxx"&lt;/span&gt; &lt;span class="nt"&gt;--app&lt;/span&gt; dependabot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PAT vs. GITHUB_TOKEN Decision Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Push to same repo&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;GITHUB_TOKEN&lt;/code&gt; + &lt;code&gt;contents: write&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create PR on same repo&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;GITHUB_TOKEN&lt;/code&gt; + &lt;code&gt;pull-requests: write&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push to different repo&lt;/td&gt;
&lt;td&gt;GitHub App token or PAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trigger another workflow&lt;/td&gt;
&lt;td&gt;PAT (GITHUB_TOKEN can't trigger workflows)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-org operations&lt;/td&gt;
&lt;td&gt;Classic PAT with &lt;code&gt;repo&lt;/code&gt; scope&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Prefer GitHub App tokens over PATs:&lt;/strong&gt; PATs are tied to individuals (leave org = token breaks), expire, and are harder to audit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caching, Artifacts &amp;amp; Dependencies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cache Miss Despite Recent Save
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache not found for input keys: Linux-node-abc123def456
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Branch scoping:&lt;/strong&gt; Caches from &lt;code&gt;main&lt;/code&gt; are accessible to branches, but not vice-versa&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version mismatch:&lt;/strong&gt; Changing OS or compression tool changes the cache version hash&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting:&lt;/strong&gt; 429s fall through silently as "cache not found"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure outage:&lt;/strong&gt; Check &lt;a href="https://githubstatus.com" rel="noopener noreferrer"&gt;githubstatus.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Always prime cache on the default branch first. Use the &lt;a href="https://docs.github.com/en/rest/actions/cache#list-github-actions-caches-for-a-repository" rel="noopener noreferrer"&gt;List Caches API&lt;/a&gt; to debug version mismatches.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;cache-hit&lt;/code&gt; Output Semantics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong — cache-hit is empty string (not 'false') on full miss&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;steps.cache.outputs.cache-hit == 'false'&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — always use != 'true'&lt;/span&gt;
&lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;steps.cache.outputs.cache-hit != 'true'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;cache-hit&lt;/code&gt; is &lt;code&gt;'true'&lt;/code&gt; on exact key match, empty string on miss, and &lt;code&gt;'false'&lt;/code&gt; on &lt;code&gt;restore-keys&lt;/code&gt; match. Yes, really.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Size Limit (10 GB Per Repo)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Random cache misses on older branches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Repos have a &lt;a href="https://github.com/actions/cache#cache-limits" rel="noopener noreferrer"&gt;10 GB total cache limit&lt;/a&gt;. Oldest caches are LRU-evicted silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Clean up branch caches on PR close:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;closed&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cleanup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;for id in $(gh cache list --ref refs/pull/${{ github.event.pull_request.number }}/merge \&lt;/span&gt;
            &lt;span class="s"&gt;--limit 100 --json id --jq '.[].id'); do&lt;/span&gt;
            &lt;span class="s"&gt;gh cache delete $id&lt;/span&gt;
          &lt;span class="s"&gt;done&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;GH_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
          &lt;span class="na"&gt;GH_REPO&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.repository }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;upload-artifact&lt;/code&gt; v3 → v4 Breaking Changes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;An artifact with the same name already exists for the associated workflow run.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; v4 artifacts are &lt;a href="https://github.com/actions/upload-artifact" rel="noopener noreferrer"&gt;immutable&lt;/a&gt;. Multiple jobs can no longer upload to the same artifact name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# v4 — unique names per matrix job&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build-${{ matrix.os }}-${{ matrix.node }}&lt;/span&gt;

&lt;span class="c1"&gt;# Download all and merge&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/download-artifact@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build-*&lt;/span&gt;
    &lt;span class="na"&gt;merge-multiple&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dist/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cross-Workflow Artifact Download
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Unable to download artifact(s): Artifact not found for name: my-artifact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Both upload and download must use the &lt;strong&gt;same version family&lt;/strong&gt; (v3↔v3 or v4↔v4 — they use different storage backends):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/download-artifact@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-artifact&lt;/span&gt;
    &lt;span class="na"&gt;github-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;  &lt;span class="c1"&gt;# required for cross-workflow&lt;/span&gt;
    &lt;span class="na"&gt;run-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.event.workflow_run.id }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;npm ci&lt;/code&gt; Cache Save Timeout
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The operation was canceled.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Cache save (tar compression) on large &lt;code&gt;node_modules&lt;/code&gt; exceeds the job timeout. Missing &lt;code&gt;zstd&lt;/code&gt; in DinD containers forces slow gzip fallback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Cache &lt;code&gt;~/.npm&lt;/code&gt; (the npm cache directory), not &lt;code&gt;node_modules&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.npm-cache-dir.outputs.dir }}&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For DinD environments, install &lt;code&gt;zstd&lt;/code&gt;: &lt;code&gt;apt-get install -y zstd&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Layer Caching
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cache export feature is currently not supported for docker driver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; You must use &lt;code&gt;docker/setup-buildx-action&lt;/code&gt; first — the default Docker driver doesn't support cache export:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v6&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cache-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,scope=${{ github.workflow }}&lt;/span&gt;
    &lt;span class="na"&gt;cache-to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,mode=max,scope=${{ github.workflow }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache Corruption
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tar: Error is not recoverable: exiting now
gzip: stdin: unexpected end of file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Delete the corrupt cache via CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh cache list &lt;span class="nt"&gt;--repo&lt;/span&gt; owner/repo
gh cache delete &amp;lt;cache-id&amp;gt; &lt;span class="nt"&gt;--repo&lt;/span&gt; owner/repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prevent future corruption with a download timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;SEGMENT_DOWNLOAD_TIMEOUT_MINS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Git LFS Files Not Downloaded
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Binary files are 140-byte text pointers instead of actual content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;lfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;fetch-depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache LFS objects to reduce bandwidth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.git/lfs&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-lfs-${{ hashFiles('.lfsconfig') }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Lockfile Hash Returns Empty String
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache not found for input keys: Linux-node-
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;hashFiles('**/package-lock.json')&lt;/code&gt; matched no files, returning empty string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Debug with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;echo "Hash: ${{ hashFiles('**/package-lock.json') }}"&lt;/span&gt;
    &lt;span class="s"&gt;find . -name "package-lock.json" -not -path "*/node_modules/*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correct patterns per ecosystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# npm&lt;/span&gt;
&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}&lt;/span&gt;
&lt;span class="c1"&gt;# pip&lt;/span&gt;
&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt', '**/pyproject.toml') }}&lt;/span&gt;
&lt;span class="c1"&gt;# Gradle&lt;/span&gt;
&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Trigger Problems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Workflow Not Triggering At All
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No error. No run appears.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root causes (in priority order):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Workflow file is not on the default branch&lt;/li&gt;
&lt;li&gt;YAML syntax error (silently rejected)&lt;/li&gt;
&lt;li&gt;Branch filter mismatch (&lt;code&gt;branches: [master]&lt;/code&gt; but default is &lt;code&gt;main&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Workflow disabled via UI or inactivity&lt;/li&gt;
&lt;li&gt;Commit made by &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; (won't trigger downstream)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check workflow state&lt;/span&gt;
gh workflow list
gh workflow view &lt;span class="s2"&gt;"My Workflow"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;workflow_dispatch&lt;/code&gt; Button Not Showing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Workflow file not on default branch (most common)&lt;/li&gt;
&lt;li&gt;No write access to repository&lt;/li&gt;
&lt;li&gt;Wrong YAML indentation:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong — nested under push&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;      &lt;span class="c1"&gt;# indented under push&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — sibling of push&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;        &lt;span class="c1"&gt;# same level as push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cron Schedule Running Late or Not Running
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; GitHub does &lt;a href="https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#schedule" rel="noopener noreferrer"&gt;not guarantee cron timing&lt;/a&gt;. During high load, scheduled runs can be delayed by hours or skipped entirely. Minimum interval is 5 minutes. Public/free-tier repos are deprioritized. All times are UTC.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://github.com/leonardaraz/fyndplats-cache-warmer/issues/4" rel="noopener noreferrer"&gt;real-world case&lt;/a&gt;: workflow configured for &lt;code&gt;*/10 * * * *&lt;/code&gt; (expected ~144 runs/day), but only 4 runs fired in 32 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; For time-sensitive operations, use an external cron service to trigger &lt;code&gt;workflow_dispatch&lt;/code&gt; via API. Accept a ±1 hour SLA for GitHub-hosted scheduled workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;workflow_run&lt;/code&gt; Not Firing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The listener workflow must be on the &lt;strong&gt;default branch&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workflows: ["CI Build"]&lt;/code&gt; must &lt;strong&gt;exactly match&lt;/strong&gt; the source workflow's &lt;code&gt;name:&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Missing &lt;code&gt;types: [completed]&lt;/code&gt; — without it, fires on both start and finish&lt;/li&gt;
&lt;li&gt;Source workflow triggered by &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; (recursion prevention)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;workflows&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Build"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;     &lt;span class="c1"&gt;# exact match to name: in source workflow&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;completed&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;post-build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event.workflow_run.conclusion == 'success'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;repository_dispatch&lt;/code&gt; Returns 204 But Workflow Doesn't Run
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; API returns 204 even when &lt;code&gt;event_type&lt;/code&gt; doesn't match — the mismatch is silent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Verify &lt;code&gt;event_type&lt;/code&gt; exactly matches the workflow's &lt;code&gt;types:&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;repository_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;docker-image-updated&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# must EXACTLY match API call&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Path Filters Not Working as Expected
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;paths:&lt;/code&gt; and &lt;code&gt;paths-ignore:&lt;/code&gt; are &lt;a href="https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore" rel="noopener noreferrer"&gt;mutually exclusive&lt;/a&gt; — using both on the same event is not supported. &lt;code&gt;docs&lt;/code&gt; (without &lt;code&gt;/**&lt;/code&gt;) matches a file literally named &lt;code&gt;docs&lt;/code&gt;, not the directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Correct: ignore docs directory&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths-ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;docs/**'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*.md'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tag Push vs. Release Published
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;When It Fires&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;push: tags: [v*]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;On tag push&lt;/td&gt;
&lt;td&gt;Binary build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;release: types: [created]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Release created&lt;/td&gt;
&lt;td&gt;Build + draft release&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;release: types: [published]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Explicit publish&lt;/td&gt;
&lt;td&gt;Deploy to prod&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Concurrency &amp;amp; Timing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Jobs Cancelled Unexpectedly
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Overly broad concurrency group key. Using &lt;code&gt;group: ${{ github.workflow }}&lt;/code&gt; alone means all runs compete, even on different branches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# PR workflows — cancel stale runs on same PR&lt;/span&gt;
&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ci-${{ github.workflow }}-${{ github.ref }}&lt;/span&gt;
  &lt;span class="na"&gt;cancel-in-progress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Production deploys — queue, never cancel&lt;/span&gt;
&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deploy-production&lt;/span&gt;
  &lt;span class="na"&gt;cancel-in-progress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="c1"&gt;# Branch-sensitive — cancel only on non-default branches&lt;/span&gt;
&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.workflow }}-${{ github.ref }}&lt;/span&gt;
  &lt;span class="na"&gt;cancel-in-progress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.ref != 'refs/heads/main' }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Empty &lt;code&gt;head_ref&lt;/code&gt; Causing Cross-Branch Cancellation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; &lt;code&gt;github.head_ref&lt;/code&gt; is empty for push events. All push-triggered runs get the same group key and cancel each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.workflow }}-${{ github.head_ref || github.run_id }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Job &lt;code&gt;needs&lt;/code&gt; Failure Cascading
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; A downstream job is &lt;code&gt;Skipped&lt;/code&gt; even though you want it to run after upstream failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Default &lt;code&gt;if:&lt;/code&gt; on every job is &lt;code&gt;success()&lt;/code&gt;, meaning "only run if ALL needs jobs succeeded."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Always run (notifications, cleanup)&lt;/span&gt;
&lt;span class="na"&gt;final-job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;job-a&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;job-b&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;contains(needs.*.result, 'failure')&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;exit &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Default Timeout is 6 Hours
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; A hung test suite silently consumes a runner for 6 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Always set &lt;code&gt;timeout-minutes&lt;/code&gt; at the job level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout-minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;
        &lt;span class="na"&gt;timeout-minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Matrix &lt;code&gt;include&lt;/code&gt; vs. &lt;code&gt;exclude&lt;/code&gt; Confusion
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;include&lt;/code&gt; entries that match ALL existing keys &lt;strong&gt;add properties&lt;/strong&gt; to the existing row — they don't create a new job&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;include&lt;/code&gt; entries that match NO existing cell create a new job&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;exclude&lt;/code&gt; requires ALL keys to exist in the base matrix — unknown keys are silently ignored&lt;/li&gt;
&lt;li&gt;Max 256 matrix jobs per workflow run
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;fail-fast&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c1"&gt;# strongly recommended for diagnostics&lt;/span&gt;
  &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJSON(needs.prepare.outputs.matrix) }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dynamic Matrix and Required Status Checks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Matrix job names like &lt;code&gt;test (ubuntu-latest, 16)&lt;/code&gt; change when matrix values change. Branch protection requires exact string matches — no wildcards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Add a stable summary job and require that instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;test-summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;needs.test.result != 'success'&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;exit &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Known Unsolved Problems
&lt;/h2&gt;

&lt;p&gt;These are confirmed platform limitations with no clean workaround. Understanding them saves hours of debugging dead ends.&lt;/p&gt;

&lt;h3&gt;
  
  
  No SSH / Interactive Debugging (&lt;a href="https://github.com/actions/runner/issues/241" rel="noopener noreferrer"&gt;#241&lt;/a&gt; — 107 👍, open since 2019)
&lt;/h3&gt;

&lt;p&gt;The runner has no TTY allocated. Interactive debugging is not possible natively. Workarounds like &lt;a href="https://github.com/mxschmitt/action-tmate" rel="noopener noreferrer"&gt;&lt;code&gt;mxschmitt/action-tmate&lt;/code&gt;&lt;/a&gt; open SSH reverse tunnels but are a security risk (session URL is in public logs).&lt;/p&gt;

&lt;h3&gt;
  
  
  No Step-Level Retry
&lt;/h3&gt;

&lt;p&gt;There's no native &lt;code&gt;retry: 3&lt;/code&gt; syntax on steps. Use &lt;a href="https://github.com/nick-fields/retry" rel="noopener noreferrer"&gt;&lt;code&gt;nick-fields/retry&lt;/code&gt;&lt;/a&gt; for &lt;code&gt;run:&lt;/code&gt; steps, or a bash loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in &lt;/span&gt;1 2 3&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;flaky-command &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;break&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;sleep &lt;/span&gt;15
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  No Early-Exit / Step Flow Control (&lt;a href="https://github.com/actions/runner/issues/662" rel="noopener noreferrer"&gt;#662&lt;/a&gt; — 1,031 👍)
&lt;/h3&gt;

&lt;p&gt;The highest-voted open runner issue. You cannot exit a job early with a specific conclusion (success/neutral). Every step must use &lt;code&gt;if:&lt;/code&gt; guards to skip, creating verbose YAML.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reusable Workflows Cannot Be Called from Composite Actions
&lt;/h3&gt;

&lt;p&gt;Composite actions are inlined steps on the parent runner. Calling a reusable workflow (which spawns a separate runner) from inside a composite action is architecturally impossible without a lifecycle model redesign.&lt;/p&gt;

&lt;h3&gt;
  
  
  No &lt;code&gt;services:&lt;/code&gt; or &lt;code&gt;container:&lt;/code&gt; in Composite Actions (&lt;a href="https://github.com/actions/runner/blob/main/docs/adrs/0549-composite-run-steps.md" rel="noopener noreferrer"&gt;ADR 0549&lt;/a&gt;)
&lt;/h3&gt;

&lt;p&gt;By architectural decision. Service containers require Docker lifecycle management at the job level — composite actions don't have job-level lifecycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secret Masking Edge Cases (&lt;a href="https://github.com/actions/runner/issues/475" rel="noopener noreferrer"&gt;#475&lt;/a&gt; — 68 👍, open since 2020)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;::add-mask::&lt;/code&gt; echoes the secret value before the mask takes effect. Short secrets (1-3 chars) cause entire log lines to become &lt;code&gt;***&lt;/code&gt;. Base64 and URL-encoded versions of secrets may not be masked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost/Billing Opacity
&lt;/h3&gt;

&lt;p&gt;No per-workflow, per-job, or per-repository breakdown of Actions minutes. The billing page shows total org-level usage. Use &lt;code&gt;gh api /repos/{owner}/{repo}/actions/runs/{id}&lt;/code&gt; for approximate per-run duration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Essential Tooling
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;actionlint&lt;/code&gt; — The Single Most Impactful Tool
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/rhysd/actionlint" rel="noopener noreferrer"&gt;&lt;code&gt;rhysd/actionlint&lt;/code&gt;&lt;/a&gt; catches the majority of syntax, context, and type errors in this guide before you push:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/rhysd/actionlint/cmd/actionlint@latest
&lt;span class="c"&gt;# Or brew install actionlint&lt;/span&gt;

&lt;span class="c"&gt;# Run&lt;/span&gt;
actionlint

&lt;span class="c"&gt;# In CI&lt;/span&gt;
- uses: raven-actions/actionlint@v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It validates: YAML syntax, expression types, context availability, matrix configurations, reusable workflow inputs/outputs, shell script syntax, and action version compatibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online Playground
&lt;/h3&gt;

&lt;p&gt;Don't want to install anything? Use the &lt;a href="https://rhysd.github.io/actionlint/" rel="noopener noreferrer"&gt;actionlint playground&lt;/a&gt; — paste your workflow YAML and get instant feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Debug Logging
&lt;/h3&gt;

&lt;p&gt;Enable debug logging for any workflow run:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the failed run → "Re-run all jobs" → check "Enable debug logging"&lt;/li&gt;
&lt;li&gt;Or set repository variable &lt;code&gt;ACTIONS_STEP_DEBUG&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt; (adds &lt;code&gt;##[debug]&lt;/code&gt; output to all steps)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;gh&lt;/code&gt; CLI for Debugging
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List workflow runs&lt;/span&gt;
gh run list &lt;span class="nt"&gt;--workflow&lt;/span&gt; ci.yml

&lt;span class="c"&gt;# View specific run logs&lt;/span&gt;
gh run view &amp;lt;run-id&amp;gt; &lt;span class="nt"&gt;--log&lt;/span&gt;

&lt;span class="c"&gt;# Download logs for grep&lt;/span&gt;
gh run view &amp;lt;run-id&amp;gt; &lt;span class="nt"&gt;--log&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt;

&lt;span class="c"&gt;# List and delete caches&lt;/span&gt;
gh cache list
gh cache delete &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;# Check workflow state&lt;/span&gt;
gh workflow list
gh workflow &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="s2"&gt;"Workflow Name"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Cross-Reference: Related Guides
&lt;/h2&gt;

&lt;p&gt;If you're working with GitHub Actions in the context of platform engineering and DevOps automation, these related articles go deeper on specific patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://htek.dev/articles/lessons-from-500-github-migrations/" rel="noopener noreferrer"&gt;Lessons from 500 GitHub Migrations&lt;/a&gt; — enterprise-scale GitHub rollouts&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://htek.dev/articles/platform-engineering-with-github/" rel="noopener noreferrer"&gt;Platform Engineering with GitHub&lt;/a&gt; — building internal developer platforms on GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://htek.dev/articles/gitops-for-everything-beyond-deployments/" rel="noopener noreferrer"&gt;GitOps for Everything: Beyond Deployments&lt;/a&gt; — declarative infrastructure with Actions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://htek.dev/articles/github-agentic-workflows-hands-on-guide/" rel="noopener noreferrer"&gt;GitHub Agentic Workflows: Hands-On Guide&lt;/a&gt; — automated workflows with GitHub Copilot&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://htek.dev/articles/ci-monitor-extension-agent-ci-feedback-loop/" rel="noopener noreferrer"&gt;CI Monitor Extension: Agent CI Feedback Loop&lt;/a&gt; — automated CI debugging with AI agents&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;Every error message, workaround, and fix in this guide is sourced from real GitHub Issues, official documentation, and architecture decision records:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/rhysd/actionlint" rel="noopener noreferrer"&gt;&lt;code&gt;rhysd/actionlint&lt;/code&gt;&lt;/a&gt;&lt;/strong&gt; — Static linter for GitHub Actions workflows (the canonical error message reference)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/actions/runner/issues" rel="noopener noreferrer"&gt;&lt;code&gt;actions/runner&lt;/code&gt; Issues&lt;/a&gt;&lt;/strong&gt; — Official runner bug tracker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/actions/cache/blob/main/tips-and-workarounds.md" rel="noopener noreferrer"&gt;&lt;code&gt;actions/cache&lt;/code&gt; Tips &amp;amp; Workarounds&lt;/a&gt;&lt;/strong&gt; — Official caching troubleshooting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/actions/upload-artifact" rel="noopener noreferrer"&gt;&lt;code&gt;actions/upload-artifact&lt;/code&gt; Migration Guide&lt;/a&gt;&lt;/strong&gt; — v3 → v4 breaking changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.github.com/en/actions/learn-github-actions/contexts#context-availability" rel="noopener noreferrer"&gt;GitHub Actions Context Availability&lt;/a&gt;&lt;/strong&gt; — Which contexts are available where&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.github.com/en/actions/security-for-github-actions" rel="noopener noreferrer"&gt;GitHub Actions Security Guides&lt;/a&gt;&lt;/strong&gt; — &lt;code&gt;GITHUB_TOKEN&lt;/code&gt;, OIDC, fork PR security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/actions/runner/tree/main/docs/adrs" rel="noopener noreferrer"&gt;&lt;code&gt;actions/runner&lt;/code&gt; ADRs&lt;/a&gt;&lt;/strong&gt; — Architecture decisions explaining why limitations exist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://githubstatus.com" rel="noopener noreferrer"&gt;GitHub Status&lt;/a&gt;&lt;/strong&gt; — Check for infrastructure incidents before debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guide covers the scenarios that have cost me and thousands of other developers the most debugging hours. If your specific error isn't here, &lt;a href="https://github.com/htekdev/htek-dev-site/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt; or reach out on &lt;a href="https://linkedin.com/in/htekdev" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; — I'll add it to the next update.&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>devops</category>
      <category>cicd</category>
      <category>debugging</category>
    </item>
    <item>
      <title>The Functional Options Pattern for AI Agent Composition</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Mon, 25 May 2026 18:52:25 +0000</pubDate>
      <link>https://dev.to/htekdev/the-functional-options-pattern-for-ai-agent-composition-15of</link>
      <guid>https://dev.to/htekdev/the-functional-options-pattern-for-ai-agent-composition-15of</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most AI agent APIs are turning into constructor soup.&lt;/strong&gt; Add tools, then memory, then hooks, then approvals, then retries, then handoffs, then model settings, then telemetry, and suddenly your “simple” &lt;code&gt;NewAgent(...)&lt;/code&gt; call looks like an archaeological dig through six months of product decisions.&lt;/p&gt;

&lt;p&gt;Go solved this problem years ago. The &lt;strong&gt;functional options pattern&lt;/strong&gt; is still one of the cleanest ways to build APIs that start simple, grow safely, and stay readable. After building &lt;a href="https://github.com/htekdev/ai-harness" rel="noopener noreferrer"&gt;AI Harness&lt;/a&gt; as a reference implementation for Harness as Code and writing about &lt;a href="https://htek.dev/articles/what-is-harness-as-code/" rel="noopener noreferrer"&gt;Harness as Code&lt;/a&gt;, I’m convinced the same pattern maps incredibly well to &lt;strong&gt;AI agent composition&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not because agents are written in Go.&lt;/p&gt;

&lt;p&gt;Because agents have the exact same shape of problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Go Reached for Functional Options in the First Place
&lt;/h2&gt;

&lt;p&gt;Dave Cheney’s classic post on &lt;a href="https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis" rel="noopener noreferrer"&gt;functional options for friendly APIs&lt;/a&gt; is still the best starting point. His argument was simple: constructor signatures get brittle fast when you keep adding optional behavior. Teams usually bounce through the same bad progression:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;start with a nice small constructor&lt;/li&gt;
&lt;li&gt;add more positional arguments&lt;/li&gt;
&lt;li&gt;give up and introduce a config struct&lt;/li&gt;
&lt;li&gt;end up passing zero values or &lt;code&gt;nil&lt;/code&gt; just to say “use the default”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That works until it doesn’t.&lt;/p&gt;

&lt;p&gt;Cheney called out the exact problems: poor discoverability, awkward defaults, &lt;code&gt;nil&lt;/code&gt; or empty config values that exist only to satisfy the compiler, and APIs that become harder to evolve over time. His alternative was elegant: keep the default path tiny, and expose behavior through &lt;code&gt;With*&lt;/code&gt; functions that mutate configuration in a controlled way.&lt;/p&gt;

&lt;p&gt;That pattern didn’t stay theoretical. You can see the same shape across mature Go libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gRPC exposes a whole surface of &lt;a href="https://github.com/grpc/grpc-go/blob/master/dialoptions.go" rel="noopener noreferrer"&gt;&lt;code&gt;DialOption&lt;/code&gt;&lt;/a&gt; values such as &lt;code&gt;WithSharedWriteBuffer&lt;/code&gt;, &lt;code&gt;WithAuthority&lt;/code&gt;, and interceptor-related options.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;go-containerregistry&lt;/code&gt; exposes &lt;a href="https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/options.go" rel="noopener noreferrer"&gt;&lt;code&gt;Option&lt;/code&gt;&lt;/a&gt; functions like &lt;code&gt;WithContext&lt;/code&gt;, &lt;code&gt;WithPlatform&lt;/code&gt;, &lt;code&gt;WithJobs&lt;/code&gt;, and &lt;code&gt;WithRetryBackoff&lt;/code&gt;, including validation when an option is invalid.&lt;/li&gt;
&lt;li&gt;Cheney’s own follow-up on &lt;a href="https://dave.cheney.net/2014/10/22/simple-profiling-package-moved-updated" rel="noopener noreferrer"&gt;refactoring his profiling package&lt;/a&gt; shows why the pattern is powerful: defaults got simpler, invalid combinations got easier to reason about, and the public API stopped growing every time a new capability appeared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the real win.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functional options are not a cute Go idiom. They’re a growth strategy for APIs.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Agents Have the Same API Growth Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmves048ue20pqvyc1mjh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmves048ue20pqvyc1mjh.webp" alt="The inevitable API growth progression from a clean constructor to constructor hell, with functional options as the exit ramp" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The inevitable progression every agent SDK follows — from clean constructor to chaos. Functional options provide the exit ramp.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Look at what modern agent systems need to package.&lt;/p&gt;

&lt;p&gt;According to OpenAI’s &lt;a href="https://developers.openai.com/api/docs/guides/agents/define-agents" rel="noopener noreferrer"&gt;agent definitions guide&lt;/a&gt;, an agent can include a model, instructions, tools, handoffs, guardrails, approvals, structured output, and MCP-backed capabilities. OpenAI’s docs on &lt;a href="https://developers.openai.com/api/docs/guides/agents/orchestration" rel="noopener noreferrer"&gt;orchestration and handoffs&lt;/a&gt; and &lt;a href="https://developers.openai.com/api/docs/guides/agents/guardrails-approvals" rel="noopener noreferrer"&gt;guardrails and human review&lt;/a&gt; make the point even more clearly: the surface area of a real agent grows fast.&lt;/p&gt;

&lt;p&gt;Anthropic has been saying something similar from the runtime side. In &lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Building Effective Agents&lt;/a&gt;, their team argues that the most successful systems use &lt;strong&gt;simple, composable patterns&lt;/strong&gt; instead of unnecessary framework complexity. In &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents?s=09" rel="noopener noreferrer"&gt;Effective harnesses for long-running agents&lt;/a&gt;, they describe the harness as the layer that helps agents keep making progress across many context windows.&lt;/p&gt;

&lt;p&gt;That is exactly where functional options shine.&lt;/p&gt;

&lt;p&gt;If your agent constructor looks like this, you already lost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;NewAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;approvals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;maxTurns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retryPolicy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;telemetry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nobody remembers parameter seven. Nobody knows which ones are truly optional. And the next feature request guarantees the signature gets worse.&lt;/p&gt;

&lt;p&gt;The functional-options version is much closer to how agent systems actually evolve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="n"&gt;Hook&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithMemoryStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithMaxTurns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maxTurns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithApprovalPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;ApprovalPolicy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AgentOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;approvals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;AgentOption&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;maxTurns&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the default path is obvious, and the advanced path reads like a sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;NewAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;claudeSonnet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;researcherPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;WithTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;webSearch&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;WithTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;readFile&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;WithHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preToolBudgetGuard&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;WithMemoryStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sqliteMemory&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;WithApprovalPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;humanReviewOnShell&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;WithMaxTurns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is better API design, but more importantly, it is better &lt;strong&gt;architecture communication&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Best Use of Options in Agents: Composition, Not Just Configuration
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj5t5b55hcqwsd4wfugc.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj5t5b55hcqwsd4wfugc.webp" alt="Hub-spoke diagram showing an Agent Core surrounded by orthogonal composable concerns: tools, safety, hooks, memory, routing, and observability" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Each option represents a small architectural move — orthogonal concerns composing independently around a stable core.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the part I think most teams miss.&lt;/p&gt;

&lt;p&gt;Functional options are often explained as a cleaner way to set fields. That’s true, but it undersells the pattern. For agent systems, the bigger payoff is that options become a &lt;strong&gt;composition language&lt;/strong&gt; for behavior.&lt;/p&gt;

&lt;p&gt;Each option can add or change a capability surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tools&lt;/li&gt;
&lt;li&gt;middleware&lt;/li&gt;
&lt;li&gt;pre/post tool hooks&lt;/li&gt;
&lt;li&gt;approval gates&lt;/li&gt;
&lt;li&gt;memory backends&lt;/li&gt;
&lt;li&gt;model routing rules&lt;/li&gt;
&lt;li&gt;telemetry sinks&lt;/li&gt;
&lt;li&gt;retry policies&lt;/li&gt;
&lt;li&gt;handoffs to specialist agents&lt;/li&gt;
&lt;li&gt;context filters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, options stop being “parameters” and become &lt;strong&gt;small architectural moves&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That maps cleanly to how I think about &lt;a href="https://htek.dev/articles/context-engineering-key-to-ai-development/" rel="noopener noreferrer"&gt;context engineering&lt;/a&gt; and harness design. You do not want one god constructor that knows every future behavior up front. You want a tiny core plus explicit composition points.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Pattern Fits AI Harness Especially Well
&lt;/h2&gt;

&lt;p&gt;AI Harness already leans into this direction in a very literal way.&lt;/p&gt;

&lt;p&gt;Its artifact composer exposes a &lt;code&gt;ComposeWith&lt;/code&gt; API backed by functional options in &lt;a href="https://github.com/htekdev/ai-harness/blob/main/artifact/options.go" rel="noopener noreferrer"&gt;&lt;code&gt;artifact/options.go&lt;/code&gt;&lt;/a&gt;. The options are not random toggles. They shape how composition behaves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Only active artifacts (default)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComposeWith&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;// Include inactive artifacts (debugging/observability)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComposeWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithIncludeInactive&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c"&gt;// Filter by type&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComposeWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTypeFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TypePlugin&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c"&gt;// Filter by tag&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComposeWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTagFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"governance"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c"&gt;// Dynamic evaluation&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComposeWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithEvalFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myEvalFn&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not just a nice API. It reflects the product thesis from the repo itself: keep the core small and make composition explicit.&lt;/p&gt;

&lt;p&gt;The important part is what those options &lt;em&gt;mean&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;WithTypeFilter(...)&lt;/code&gt; says which artifact classes should participate&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithTagFilter(...)&lt;/code&gt; says which concerns matter for this composition pass&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithEvalFn(...)&lt;/code&gt; says composition is dynamic and state-aware, not just startup config&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WithIncludeInactive()&lt;/code&gt; turns observability into a first-class debugging mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly how I expect serious agent infrastructure to evolve.&lt;/p&gt;

&lt;p&gt;Not through one bigger &lt;code&gt;Config&lt;/code&gt; blob.&lt;/p&gt;

&lt;p&gt;Through &lt;strong&gt;small, named composition decisions&lt;/strong&gt; that can be combined on demand.&lt;/p&gt;

&lt;p&gt;And once you pair that with the &lt;a href="https://htek.dev/articles/per-turn-evaluation-dynamic-governance-ai-agents/" rel="noopener noreferrer"&gt;per-turn evaluation model&lt;/a&gt;, the pattern gets even more powerful: options can control not just static setup, but how the harness resolves behavior against live session state.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Production Agent Systems
&lt;/h2&gt;

&lt;p&gt;Here’s the mental model I recommend.&lt;/p&gt;

&lt;p&gt;Use functional options when you need to compose &lt;strong&gt;orthogonal behaviors&lt;/strong&gt; around an agent runtime:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Good option shape&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool access&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithTool(...)&lt;/code&gt;, &lt;code&gt;WithTools(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithGuardrail(...)&lt;/code&gt;, &lt;code&gt;WithApprovalPolicy(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime hooks&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithPreToolHook(...)&lt;/code&gt;, &lt;code&gt;WithPostToolHook(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model control&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithModel(...)&lt;/code&gt;, &lt;code&gt;WithTemperature(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithMemoryStore(...)&lt;/code&gt;, &lt;code&gt;WithSessionStore(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent routing&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithHandoff(...)&lt;/code&gt;, &lt;code&gt;WithDelegate(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithTraceSink(...)&lt;/code&gt;, &lt;code&gt;WithEventLogger(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure handling&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WithRetryPolicy(...)&lt;/code&gt;, &lt;code&gt;WithTimeout(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That gives you three big advantages.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The default agent stays readable
&lt;/h3&gt;

&lt;p&gt;This matters more than people admit. If the simple case is ugly, teams build wrappers immediately, and now you have two APIs to maintain.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Advanced behavior stays discoverable
&lt;/h3&gt;

&lt;p&gt;A well-named &lt;code&gt;WithApprovalPolicy(...)&lt;/code&gt; is much easier to understand than “argument #8 is optional unless argument #6 is nil.”&lt;/p&gt;

&lt;h3&gt;
  
  
  3. New capabilities stop breaking existing code
&lt;/h3&gt;

&lt;p&gt;That was the original Go motivation, and it matters even more for agent platforms where the capability surface keeps expanding every quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Functional Options Can Go Wrong
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdj15r1c48t762b77pv6.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdj15r1c48t762b77pv6.webp" alt="Four failure modes of functional options: hidden side effects, order-sensitive behavior, no validation layer, and config blob in disguise" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Four anti-patterns to watch for — even clean APIs can hide complexity if options aren't designed carefully.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like this pattern a lot, but it is not magic.&lt;/p&gt;

&lt;p&gt;There are a few failure modes worth calling out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hidden side effects
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;WithMemoryStore(...)&lt;/code&gt; quietly enables background persistence, telemetry, and retries, the API stops being honest. Options should be compositional, not surprising.&lt;/p&gt;

&lt;h3&gt;
  
  
  Order-sensitive behavior
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;WithTool(A)&lt;/code&gt; followed by &lt;code&gt;WithTool(B)&lt;/code&gt; means something different than the reverse order, document it aggressively. Better yet, design around deterministic merge rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  No validation layer
&lt;/h3&gt;

&lt;p&gt;One reason I like the &lt;code&gt;go-containerregistry&lt;/code&gt; version is that its &lt;code&gt;Option&lt;/code&gt; type returns an error. That gives the library a clean way to reject invalid combinations such as contradictory auth configuration. Agent systems need the same discipline for incompatible memory backends, mutually exclusive approval modes, or impossible retry settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Config blob in disguise
&lt;/h3&gt;

&lt;p&gt;If every option just writes into one giant unstructured struct, you may have improved readability without improving architecture. The best options expose meaningful seams in the system.&lt;/p&gt;

&lt;p&gt;That is why I prefer options that represent real agent concerns instead of raw field mutation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Point: This Is a Harness Pattern
&lt;/h2&gt;

&lt;p&gt;I don’t think the functional options pattern is just a nicer constructor trick for AI.&lt;/p&gt;

&lt;p&gt;I think it is one of the cleanest ways to express a deeper idea: &lt;strong&gt;agent behavior should be composed at the harness layer, not buried in application glue or bloated prompts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That lines up with everything I’ve been arguing about harness engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the core small&lt;/li&gt;
&lt;li&gt;make behavior explicit&lt;/li&gt;
&lt;li&gt;let governance compose cleanly&lt;/li&gt;
&lt;li&gt;make runtime decisions inspectable&lt;/li&gt;
&lt;li&gt;avoid monolithic prompt/config blobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Go developers learned this lesson because APIs kept growing. Agent builders are about to learn the same lesson because runtimes keep growing.&lt;/p&gt;

&lt;p&gt;The teams that win here won’t be the ones with the fanciest prompt. They’ll be the ones with the cleanest composition model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The functional options pattern gives you a cleaner way to build agent APIs, but that undersells it.&lt;/p&gt;

&lt;p&gt;What it really gives you is a &lt;strong&gt;discipline for composing agent behavior&lt;/strong&gt;: tools, hooks, memory, guardrails, routing, and observability as named, reusable moves instead of constructor chaos.&lt;/p&gt;

&lt;p&gt;That is why I think this pattern belongs in every serious conversation about agent architecture.&lt;/p&gt;

&lt;p&gt;Go figured out how to keep fast-moving APIs friendly. AI agent platforms should steal that idea immediately.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>agenticdevelopment</category>
      <category>platformengineering</category>
      <category>deepdive</category>
    </item>
    <item>
      <title>Per-Turn Evaluation: Dynamic Governance for AI Agents</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Mon, 25 May 2026 11:18:54 +0000</pubDate>
      <link>https://dev.to/htekdev/per-turn-evaluation-dynamic-governance-for-ai-agents-5653</link>
      <guid>https://dev.to/htekdev/per-turn-evaluation-dynamic-governance-for-ai-agents-5653</guid>
      <description>&lt;p&gt;&lt;strong&gt;Static governance is fine right up until your agent changes modes mid-session.&lt;/strong&gt; The same agent can spend turn 1 researching docs, turn 8 editing code, turn 14 fixing failed tests, and turn 20 preparing a production deploy. Pretending one startup-time config should govern all of that is the harness equivalent of hardcoding production policy into a shell alias.&lt;/p&gt;

&lt;p&gt;That is why I'm increasingly convinced that &lt;strong&gt;per-turn evaluation&lt;/strong&gt; needs to be a first-class primitive in agentic systems. If you're serious about governed autonomy, you need the harness to ask a fresh question at the start of every turn: &lt;em&gt;given the current state, which rules should be active right now?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is a core idea behind what I call &lt;a href="https://htek.dev/articles/what-is-harness-as-code/" rel="noopener noreferrer"&gt;Harness as Code&lt;/a&gt;. And in &lt;a href="https://github.com/htekdev/ai-harness" rel="noopener noreferrer"&gt;AI Harness&lt;/a&gt;, per-turn artifact evaluation is implemented as a runtime feature in &lt;a href="https://github.com/htekdev/ai-harness/releases/tag/v0.4.0" rel="noopener noreferrer"&gt;v0.4.0&lt;/a&gt;, following the design described in &lt;a href="https://github.com/htekdev/ai-harness/issues/7" rel="noopener noreferrer"&gt;issue #7 for per-turn artifact evaluation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Static Rules Aren't Enough
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fstatic-vs-dynamic.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fstatic-vs-dynamic.webp" alt="Static governance loads all rules at startup regardless of context, while dynamic per-turn evaluation selects only relevant rules based on live state each turn" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Static governance overloads context and under-governs risk. Per-turn evaluation adapts rules to what the agent is actually doing right now.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A lot of agent stacks still treat governance as startup configuration: load the prompt, register the tools, inject the rules, and go. That works for short-lived demos. It gets shaky fast in long-running or multi-phase sessions.&lt;/p&gt;

&lt;p&gt;There are three failure modes I keep seeing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You over-load the context window.&lt;/strong&gt; Every possible rule ships on every turn, even when most of them are irrelevant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You under-govern risky phases.&lt;/strong&gt; The same loose rules that were fine during research stay active during writes, approvals, or deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You mix policy with prompt hacks.&lt;/strong&gt; Instead of the harness making deterministic decisions, the model gets a giant wall of “if you're doing X, remember Y.”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last part is the killer. Once governance lives primarily inside prompts, you lose clean separation between policy and reasoning. You also lose the discipline that policy systems like &lt;a href="https://openpolicyagent.org/docs" rel="noopener noreferrer"&gt;Open Policy Agent&lt;/a&gt; were built around: keep decisions declarative, versioned, and evaluated against current input.&lt;/p&gt;

&lt;p&gt;The better analogy is &lt;a href="https://martinfowler.com/articles/feature-toggles.html" rel="noopener noreferrer"&gt;feature toggles&lt;/a&gt;. Martin Fowler's framing still holds: you define behavior once, then let runtime context determine which path is active. AI agents need the same pattern for governance.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Per-Turn Evaluation Actually Means
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fevaluation-cycle.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fevaluation-cycle.webp" alt="The per-turn evaluation loop: turn starts, gather live state, evaluate Starlark conditions per artifact, compose only active artifacts into context" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The evaluation loop runs at every turn boundary — gathering state, evaluating conditions, and composing only active governance artifacts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Per-turn evaluation is simple in concept:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent loop starts a new turn.&lt;/li&gt;
&lt;li&gt;The harness gathers live state for that turn.&lt;/li&gt;
&lt;li&gt;Every conditional artifact is re-evaluated against that state.&lt;/li&gt;
&lt;li&gt;Only the active artifacts participate in context composition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The crucial shift is this: &lt;strong&gt;governance becomes a function of state, not a snapshot captured at startup.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That state can include things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;turn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;14&lt;/span&gt;
&lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implementation"&lt;/span&gt;
&lt;span class="na"&gt;active_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;artifact/composer.go"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;error_count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;tools_called&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;edit_file"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_tests"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your harness can make deterministic decisions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enable stricter review guidance after repeated failures&lt;/li&gt;
&lt;li&gt;load Go-specific conventions only when Go files are active&lt;/li&gt;
&lt;li&gt;apply extra deployment guardrails only when the agent enters a production path&lt;/li&gt;
&lt;li&gt;switch to more concise context after a long session to protect the token budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is dramatically more precise than one giant prompt trying to anticipate every future branch of execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Starlark Fits This Problem
&lt;/h2&gt;

&lt;p&gt;For conditional governance, I want something declarative and constrained. &lt;a href="https://starlark-lang.org/spec.html" rel="noopener noreferrer"&gt;Starlark's specification&lt;/a&gt; and &lt;a href="https://bazel.build/rules/language" rel="noopener noreferrer"&gt;Bazel's language overview&lt;/a&gt; make it a strong fit for this kind of work: it is Python-like enough to read quickly, intentionally restricted, deterministic, and designed around predictable evaluation with strong immutability bias.&lt;/p&gt;

&lt;p&gt;That matters. A governance condition language should not be a hidden side-effect engine. It should evaluate expressions against input and return a decision.&lt;/p&gt;

&lt;p&gt;Here's the kind of artifact-level condition AI Harness supports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-guard&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;override&lt;/span&gt;
&lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ctx.get("mode")&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"production"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ctx.get("turn",&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;0)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5'&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# Production Guard&lt;/span&gt;

&lt;span class="s"&gt;Require post-write verification.&lt;/span&gt;
&lt;span class="s"&gt;Block destructive shortcuts.&lt;/span&gt;
&lt;span class="s"&gt;Confirm the target before deployment actions.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I like this model because the rule is local to the artifact. You don't have to open the runtime and add another hardcoded &lt;code&gt;if&lt;/code&gt; statement. You define the condition where the behavior lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI Harness Implements Dynamic Governance
&lt;/h2&gt;

&lt;p&gt;In AI Harness's current &lt;code&gt;v0.4.0&lt;/code&gt; implementation, per-turn evaluation is not a blog concept. It's wired into the runtime.&lt;/p&gt;

&lt;p&gt;At the start of &lt;a href="https://github.com/htekdev/ai-harness/blob/main/agent/agent.go" rel="noopener noreferrer"&gt;&lt;code&gt;Agent.Run&lt;/code&gt;&lt;/a&gt;, the loop creates turn-scoped state, increments the turn counter, and sets the current turn in that scratchpad before the model does any work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turnNumber&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;
&lt;span class="n"&gt;scripting&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetTurnState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turnCtx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"turn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turnNumber&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;composer&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;composer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EvaluateConditions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turnCtx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"WARN condition re-evaluation failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That call flows into &lt;a href="https://github.com/htekdev/ai-harness/blob/main/artifact/composer.go" rel="noopener noreferrer"&gt;&lt;code&gt;Composer.EvaluateConditions&lt;/code&gt;&lt;/a&gt;, which reads the live values from the per-turn scratchpad via &lt;a href="https://github.com/htekdev/ai-harness/blob/main/scripting/turn_state.go" rel="noopener noreferrer"&gt;&lt;code&gt;TurnStateValues&lt;/code&gt;&lt;/a&gt; and evaluates each artifact condition against the current turn context.&lt;/p&gt;

&lt;p&gt;The registry then updates each artifact's &lt;code&gt;Active&lt;/code&gt; field through &lt;a href="https://github.com/htekdev/ai-harness/blob/main/artifact/registry.go" rel="noopener noreferrer"&gt;&lt;code&gt;Registry.UpdateConditions&lt;/code&gt;&lt;/a&gt;. That's an important implementation detail, because it makes activation status part of the artifact model itself rather than an external side table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Registry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;UpdateConditions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evalFn&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// re-evaluates every artifact and updates Active in place&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even better, the failure mode is sane: per-artifact condition errors are &lt;strong&gt;non-fatal&lt;/strong&gt;. If one expression is malformed, the whole session does not implode. The registry keeps evaluating the rest and preserves the prior active state for the broken artifact. That's the kind of degradation behavior you want in a production harness.&lt;/p&gt;

&lt;p&gt;This design also plays nicely with AI Harness's typed artifact model and composition order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;override (100) &amp;gt; harness (80) &amp;gt; builtin (60) &amp;gt; plugin (40) &amp;gt; model (20)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the runtime isn't just deciding &lt;em&gt;what is active&lt;/em&gt; each turn. It's also deciding &lt;em&gt;how active artifacts compose&lt;/em&gt; when they conflict.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns This Unlocks
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fpatterns-unlocked.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhtek.dev%2Fimages%2Farticles%2Fper-turn-evaluation-dynamic-governance-ai-agents%2Fpatterns-unlocked.webp" alt="Four governance patterns unlocked by per-turn evaluation: progressive escalation, phase-aware context, risk-proportional controls, and token-aware governance" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Per-turn evaluation makes four powerful governance patterns practical: escalation after failures, lazy-loading conventions, risk-proportional controls, and automatic concise mode.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once governance is evaluated per-turn, a bunch of useful patterns stop being awkward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Progressive escalation
&lt;/h3&gt;

&lt;p&gt;After repeated failures, activate a recovery artifact that tells the agent to stop retrying blindly and explain what changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase-aware context
&lt;/h3&gt;

&lt;p&gt;Load language or workflow conventions only when the agent is actually operating in that phase. That is the governance equivalent of lazy loading.&lt;/p&gt;

&lt;h3&gt;
  
  
  Risk-proportional controls
&lt;/h3&gt;

&lt;p&gt;Keep early research turns lightweight, then tighten verification and approval rules when the session crosses into write-heavy or deployment-heavy work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token-aware governance
&lt;/h3&gt;

&lt;p&gt;Long sessions can activate concise-mode artifacts that reduce exploration and prioritize completion before the context window gets messy.&lt;/p&gt;

&lt;p&gt;This is also why per-turn evaluation pairs so well with &lt;a href="https://htek.dev/articles/context-engineering-key-to-ai-development/" rel="noopener noreferrer"&gt;context observability&lt;/a&gt;. If you are going to make governance dynamic, you need to be able to inspect which artifacts were active, which were inactive, and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Better Than Prompt Conditionals
&lt;/h2&gt;

&lt;p&gt;Could you write a giant system prompt that says, “if you are deploying, be more careful”? Sure.&lt;/p&gt;

&lt;p&gt;I don't think that's governance.&lt;/p&gt;

&lt;p&gt;That's advice.&lt;/p&gt;

&lt;p&gt;In my view, real governance means the harness decides what the model is allowed to see and what policy surfaces are active before the next reasoning step begins. The model should not be responsible for remembering which governance branch applies. The harness should.&lt;/p&gt;

&lt;p&gt;That's the same reason I'm more interested in governance architectures than in ever-bigger prompts. Prompts are necessary. But when they become your only control plane, you're still building too much of the system on vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Shift
&lt;/h2&gt;

&lt;p&gt;Per-turn evaluation is one of those ideas that sounds small until you realize it changes the entire posture of the system.&lt;/p&gt;

&lt;p&gt;Instead of asking, “What rules should this agent always have?” you start asking, “What rules should be true &lt;em&gt;now&lt;/em&gt;?”&lt;/p&gt;

&lt;p&gt;That is a much better question for long-running, stateful, tool-using agents.&lt;/p&gt;

&lt;p&gt;It's also a cleaner path toward the broader discipline I care about: harness engineering. The same way DevOps normalized pipelines, policy, observability, and infrastructure definitions as real engineering surfaces, agent systems need their own equivalent control plane. Dynamic governance is part of that.&lt;/p&gt;

&lt;p&gt;If you're building agents today, my recommendation is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;keep the static core small&lt;/li&gt;
&lt;li&gt;move situational rules into conditional artifacts&lt;/li&gt;
&lt;li&gt;re-evaluate those artifacts every turn&lt;/li&gt;
&lt;li&gt;make activation observable&lt;/li&gt;
&lt;li&gt;make failures degrade gracefully rather than crash the session&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the practical lesson behind AI Harness so far. And it's why I think per-turn evaluation is going to look obvious in hindsight.&lt;/p&gt;

&lt;p&gt;Not because it's flashy — but because for serious, stateful agent systems, static governance was rarely going to be enough.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>agenticdevelopment</category>
      <category>platformengineering</category>
      <category>security</category>
    </item>
    <item>
      <title>What Is Harness as Code? The DevOps of AI Agents</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Mon, 25 May 2026 00:56:16 +0000</pubDate>
      <link>https://dev.to/htekdev/what-is-harness-as-code-the-devops-of-ai-agents-2eo3</link>
      <guid>https://dev.to/htekdev/what-is-harness-as-code-the-devops-of-ai-agents-2eo3</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most teams are still treating agent behavior like handcrafted prompt art.&lt;/strong&gt; That works right up until the agent gets real tool access, starts touching production systems, or needs to behave consistently across repos, environments, and sessions.&lt;/p&gt;

&lt;p&gt;That's where &lt;strong&gt;Harness as Code&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;The short version: Harness as Code applies the same ideas that made &lt;a href="https://developer.hashicorp.com/well-architected-framework/define-and-automate-processes/define/as-code/infrastructure" rel="noopener noreferrer"&gt;Infrastructure as Code practical and scalable&lt;/a&gt; to AI agents. Instead of hiding governance inside application code or hoping a giant system prompt keeps your agent safe, you define the harness itself as version-controlled, reviewable, testable artifacts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhukjq6m6iet1i3hq8e2.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhukjq6m6iet1i3hq8e2.webp" alt="From DevOps to Agent Governance — the same engineering principles that tamed infrastructure now tame AI agents" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;From DevOps to Agent Governance: same principles, new domain&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem Prompt Engineering Can't Solve
&lt;/h2&gt;

&lt;p&gt;Anthropic has been explicit that &lt;strong&gt;harness design&lt;/strong&gt; matters for long-running agents. In its engineering write-up on &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents?s=09" rel="noopener noreferrer"&gt;effective harnesses for long-running agents&lt;/a&gt;, the company describes the harness as the layer that helps agents keep making progress across multiple context windows. In its earlier post on &lt;a href="https://www.anthropic.com/engineering/building-effective-agents?%3Fquery=MTA%3F%3Futm%3D" rel="noopener noreferrer"&gt;building effective agents&lt;/a&gt;, Anthropic also argues that simple, composable patterns beat unnecessary framework complexity.&lt;/p&gt;

&lt;p&gt;I think that's the right direction, but the industry still underspecifies one crucial idea: &lt;strong&gt;the harness should be code, not folklore.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once agents move beyond toy demos, every team hits the same questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do I control what tools an agent can call?&lt;/li&gt;
&lt;li&gt;How do I review behavior changes in pull requests?&lt;/li&gt;
&lt;li&gt;How do I reproduce the same governance in another repo or environment?&lt;/li&gt;
&lt;li&gt;How do I know what context the agent actually saw on turn 27?&lt;/li&gt;
&lt;li&gt;How do I test that my guardrails work before trusting the agent with real autonomy?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is "we have a really good prompt," you don't have governance. You have hope.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Harness as Code Actually Means
&lt;/h2&gt;

&lt;p&gt;HashiCorp defines Infrastructure as Code as a &lt;strong&gt;declarative&lt;/strong&gt;, version-controlled way to define systems you can review, test, and automate. Harness as Code takes that same mental model and applies it to agent runtime behavior.&lt;/p&gt;

&lt;p&gt;For me, a system only qualifies as Harness as Code if it gives you these properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Declarative&lt;/strong&gt; — behavior is defined in files, not buried in runtime branches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioned&lt;/strong&gt; — harness changes go through Git like any other engineering change&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviewable&lt;/strong&gt; — permissions, hooks, and context rules show up in diffs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composable&lt;/strong&gt; — you can layer capabilities without rewriting the core&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observable&lt;/strong&gt; — you can inspect what was active and why&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testable&lt;/strong&gt; — you can validate behavior in CI instead of relying on vibes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portable&lt;/strong&gt; — the harness survives model churn and vendor churn&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the big leap. The prompt stops being the product. The harness becomes the product.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The DevOps parallel is real.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;DevOps gave us&lt;/th&gt;
&lt;th&gt;Harness as Code gives agents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure as Code&lt;/td&gt;
&lt;td&gt;Agent governance as code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD gates&lt;/td&gt;
&lt;td&gt;Approval and autonomy gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RBAC / least privilege&lt;/td&gt;
&lt;td&gt;Tool access boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build pipelines&lt;/td&gt;
&lt;td&gt;Agent loops with retries and termination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Context provenance and event trails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hooks and policy checks&lt;/td&gt;
&lt;td&gt;Pre-tool and post-tool governance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This matters because the failure mode for agents is rarely raw model quality. It is almost always &lt;strong&gt;control-plane quality&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An agent fails because it had the wrong tools, the wrong context, no retry policy, no safety hook, no way to explain its current state, or no clean boundary between static identity and dynamic runtime behavior. That's a harness problem.&lt;/p&gt;

&lt;p&gt;If you care about &lt;a href="https://htek.dev/articles/context-engineering-key-to-ai-development/" rel="noopener noreferrer"&gt;context engineering&lt;/a&gt;, this is the missing operational layer. Context engineering decides what the model should see. Harness as Code decides &lt;strong&gt;how that decision is defined, evaluated, audited, and evolved over time&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  How It's Different From Existing Approaches
&lt;/h2&gt;

&lt;p&gt;A lot of current agent tooling is useful. I use and study these systems constantly. But they're optimized around different centers of gravity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/copilot/concepts/agents/coding-agent/about-coding-agent" rel="noopener noreferrer"&gt;GitHub Copilot cloud agent&lt;/a&gt; is optimized around GitHub-native repo work in a GitHub-hosted environment.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.openai.com/api/docs/guides/agents" rel="noopener noreferrer"&gt;OpenAI's Agents SDK&lt;/a&gt; is optimized around code-first orchestration, tools, guardrails, and state inside your application.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.openai.com/api/docs/guides/agents/sandboxes" rel="noopener noreferrer"&gt;OpenAI Sandbox Agents&lt;/a&gt; cleanly split harness and compute, which is an important architectural move.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/earendil-works/pi/blob/main/packages/coding-agent/README.md" rel="noopener noreferrer"&gt;Pi&lt;/a&gt; is one of the strongest examples of a &lt;strong&gt;minimal terminal coding harness&lt;/strong&gt;, extended through TypeScript extensions, skills, prompt templates, themes, packages, and multiple runtime surfaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are real strengths. But Harness as Code has a different bias: &lt;strong&gt;extensibility through portable governance artifacts&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means the center of the system should stay tiny while the edges get powerful. It also means behavior should not require rewriting the runtime every time you want a new rule. Add an artifact. Add a hook. Add a condition. Review the diff. Re-run validation. Ship.&lt;/p&gt;

&lt;p&gt;That's a very different philosophy from both prompt-heavy setups and batteries-included mega-frameworks.&lt;/p&gt;
&lt;h2&gt;
  
  
  How AI Harness Implements Harness as Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/htekdev/ai-harness" rel="noopener noreferrer"&gt;AI Harness&lt;/a&gt; is my reference implementation of this idea. The repo tagline says it plainly: &lt;strong&gt;declarative AI agent governance in Go&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's how the product makes Harness as Code concrete.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Markdown-first control plane
&lt;/h3&gt;

&lt;p&gt;The harness starts with &lt;code&gt;harness.md&lt;/code&gt; and a &lt;code&gt;.harness/&lt;/code&gt; directory tree. Identity, tools, hooks, and sub-agents are defined as files you can diff, review, and move between projects.&lt;/p&gt;

&lt;p&gt;That sounds simple, but it's the whole point. The harness isn't hidden behind a SaaS UI or locked into a provider-specific workflow. It lives in the repo, next to the code it governs.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Typed artifacts instead of loose files
&lt;/h3&gt;

&lt;p&gt;AI Harness doesn't treat all context as an undifferentiated blob. It introduces a typed artifact model with explicit precedence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;override (100) &amp;gt; harness (80) &amp;gt; builtin (60) &amp;gt; plugin (40) &amp;gt; model (20)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oaegztjf0ldl27u10la.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oaegztjf0ldl27u10la.webp" alt="Typed artifact precedence — each layer has a declared role, priority number, and composition semantics" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Typed Artifact Precedence: deterministic composition, not accidental overrides&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That gives each capability a declared role, a priority, and composition semantics. Instead of asking, "Why did this rule win?" you can answer it deterministically.&lt;/p&gt;

&lt;p&gt;This is one of the key differences between generic file-based customization and real Harness as Code. Composition is not accidental. It's designed.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Per-turn evaluation, not startup-only config
&lt;/h3&gt;

&lt;p&gt;AI Harness evaluates artifact conditions &lt;strong&gt;every turn&lt;/strong&gt;. If an artifact says it should only activate in review mode, after multiple errors, or once the session reaches a certain phase, the runtime reevaluates that condition continuously.&lt;/p&gt;

&lt;p&gt;The implementation uses &lt;a href="https://bazel.build/rules/language" rel="noopener noreferrer"&gt;Starlark&lt;/a&gt; for those conditional expressions, which keeps the language familiar and constrained while making the runtime dynamic.&lt;/p&gt;

&lt;p&gt;That means governance can evolve with the session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an error-recovery artifact can activate after repeated failures&lt;/li&gt;
&lt;li&gt;a language-specific ruleset can appear only when relevant files are active&lt;/li&gt;
&lt;li&gt;a stricter override can kick in when risk increases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the difference between static config and living governance. If you want the deeper implementation details, I wrote a separate breakdown on &lt;a href="https://htek.dev/articles/per-turn-evaluation-dynamic-governance-ai-agents/" rel="noopener noreferrer"&gt;per-turn evaluation and dynamic governance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkltod8fr1lm1peg4t2zu.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkltod8fr1lm1peg4t2zu.webp" alt="Per-turn evaluation — governance that evolves with the session, activating and deactivating artifacts as conditions change" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Per-Turn Evaluation: static config is hope — per-turn evaluation is engineering&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Context observability as a first-class feature
&lt;/h3&gt;

&lt;p&gt;This is the feature I think most harnesses still underinvest in.&lt;/p&gt;

&lt;p&gt;AI Harness ships &lt;code&gt;harness context&lt;/code&gt; so you can inspect what the agent sees, where each section came from, which artifacts are active, which are inactive, and how much of your token budget is already gone.&lt;/p&gt;

&lt;p&gt;That matters because agent behavior is downstream of context state. If you can't inspect context composition, you're debugging a black box.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. A tiny core with powerful edges
&lt;/h3&gt;

&lt;p&gt;The repo's philosophy is simple: &lt;strong&gt;keep the core tiny and make the edges powerful&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;AI Harness already ships commands like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/htekdev/ai-harness/cmd/harness@latest
harness init my-agent
harness validate
harness artifacts &lt;span class="nt"&gt;--verbose&lt;/span&gt;
harness context &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command set reflects the product thesis. Scaffold fast. Validate fast. Inspect the harness. Inspect the active context. Don't bury governance inside a maze of framework internals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I Think This Goes Next
&lt;/h2&gt;

&lt;p&gt;I don't think "harness engineering" is a side topic. I think it becomes its own discipline.&lt;/p&gt;

&lt;p&gt;The same way teams eventually stopped debating whether infrastructure should be hand-managed, teams will stop debating whether agent behavior should live in undocumented prompt glue. They'll expect agent governance to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;versioned&lt;/li&gt;
&lt;li&gt;inspectable&lt;/li&gt;
&lt;li&gt;testable&lt;/li&gt;
&lt;li&gt;composable&lt;/li&gt;
&lt;li&gt;vendor-portable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why I keep framing Harness as Code as &lt;strong&gt;the DevOps of AI agents&lt;/strong&gt;. It's not about replacing good prompts. It's about putting prompts in their proper place: as one input inside a larger, engineered runtime.&lt;/p&gt;

&lt;p&gt;If you want the broader market landscape, read my &lt;a href="https://htek.dev/articles/all-agent-harnesses-live-comparison/" rel="noopener noreferrer"&gt;live comparison of agent harnesses&lt;/a&gt;. If you want the product implementation, start with the &lt;a href="https://github.com/htekdev/ai-harness" rel="noopener noreferrer"&gt;AI Harness repository&lt;/a&gt; and treat it as a working reference, not just a pitch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Harness as Code is the shift from &lt;strong&gt;"trust the model"&lt;/strong&gt; to &lt;strong&gt;"trust the system around the model."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the move from prompts as governance to architecture as governance. And once agents start doing real work in real environments, I don't think that move is optional.&lt;/p&gt;

&lt;p&gt;Your model is not your control plane. Your harness is.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>agenticdevelopment</category>
      <category>devops</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Fri, 22 May 2026 21:35:21 +0000</pubDate>
      <link>https://dev.to/htekdev/custom-copilot-agents-building-domain-expert-ai-teammates-with-skills-mcp-tools-and-custom-1p46</link>
      <guid>https://dev.to/htekdev/custom-copilot-agents-building-domain-expert-ai-teammates-with-skills-mcp-tools-and-custom-1p46</guid>
      <description>&lt;h2&gt;
  
  
  Most Teams Are Still Using 5% of Copilot
&lt;/h2&gt;

&lt;p&gt;Most developers still treat &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; like a very good autocomplete engine. That's useful, but it's not the real unlock.&lt;/p&gt;

&lt;p&gt;The interesting shift happens when Copilot stops acting like a generic assistant and starts acting like a &lt;strong&gt;domain-expert teammate&lt;/strong&gt;. Instead of re-explaining your deployment rules, your content pipeline, or your release checklist every session, you package that expertise once. Then Copilot shows up already knowing the job.&lt;/p&gt;

&lt;p&gt;That's the difference between &lt;strong&gt;using Copilot&lt;/strong&gt; and &lt;strong&gt;building with Copilot&lt;/strong&gt;. One gives you better suggestions. The other gives you reusable specialists that understand your repo, your patterns, and your operating model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want the complete agent manifest, &lt;code&gt;copilot-instructions.md&lt;/code&gt;, YAML skill files, and MCP integration code? It's all in Newsletter Issue #11 → &lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;subscribe here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Mean by a Custom Copilot Agent
&lt;/h2&gt;

&lt;p&gt;At a high level, a custom Copilot agent is a &lt;strong&gt;packaged specialization layer&lt;/strong&gt; for GitHub Copilot. It gives Copilot a clear identity, focused instructions, and the right tools for a specific domain.&lt;/p&gt;

&lt;p&gt;GitHub's customization story already points in this direction. You can &lt;a href="https://docs.github.com/en/enterprise-cloud@latest/copilot/how-tos/copilot-on-github/customize-copilot/customize-copilot-overview" rel="noopener noreferrer"&gt;customize Copilot for your project&lt;/a&gt;, add &lt;a href="https://docs.github.com/en/enterprise-cloud@latest/copilot/managing-copilot/managing-github-copilot-in-your-organization/managing-copilot-knowledge-bases" rel="noopener noreferrer"&gt;knowledge bases&lt;/a&gt;, and connect &lt;a href="https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/add-mcp-servers" rel="noopener noreferrer"&gt;MCP servers to Copilot CLI&lt;/a&gt;. I think of a custom agent as the point where those ideas converge into one opinionated package.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; gives Copilot a standard way to reach external systems and tools. Your agent design decides &lt;strong&gt;which&lt;/strong&gt; tools matter, &lt;strong&gt;which&lt;/strong&gt; context belongs in memory, and &lt;strong&gt;which&lt;/strong&gt; workflows are worth encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3-Layer Architecture
&lt;/h2&gt;

&lt;p&gt;The easiest way to think about custom agent architecture is as three layers working together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent profile layer&lt;/strong&gt; — the identity declaration.&lt;br&gt;
This is the small config surface that says: this Copilot specialist owns this domain, responds to these triggers, and should load this knowledge. In practice, that's an agent manifest in an &lt;code&gt;.agent.md&lt;/code&gt; file, often paired with &lt;code&gt;copilot-instructions.md&lt;/code&gt; when the runtime needs repository-wide guidance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Skills layer&lt;/strong&gt; — the structured prompts.&lt;br&gt;
This is where repeatable expertise lives. A skill isn't vague guidance. It's a reusable procedure: what to check, what to avoid, what sequence to follow, and what "done" looks like. I've written before about this in &lt;a href="https://htek.dev/articles/agent-skills-microsoft-just-shipped-what-youve-been-building/" rel="noopener noreferrer"&gt;Agent Skills: Microsoft Just Shipped What You've Been Building&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools layer&lt;/strong&gt; — the execution boundary.&lt;br&gt;
This is where the agent gets hands. &lt;a href="https://docs.github.com/en/copilot/how-tos/use-copilot-extensions" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; lets Copilot reach beyond text and interact with real systems. That might mean GitHub workflows, a video pipeline, internal APIs, or a governed task system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you've read &lt;a href="https://htek.dev/articles/three-layers-your-ai-agent-is-missing/" rel="noopener noreferrer"&gt;The Three Layers Your AI Agent Is Missing&lt;/a&gt;, this should feel familiar. The point is separation of concerns. The agent profile says &lt;strong&gt;who&lt;/strong&gt; the specialist is. Skills say &lt;strong&gt;how&lt;/strong&gt; it should behave. Tools define &lt;strong&gt;what&lt;/strong&gt; it can actually do.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Newsletter subscribers get the full 3-layer custom agent architecture with real TypeScript, configs, and production patterns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Issue #11 I share the actual implementation details: the agent manifest pattern, the &lt;code&gt;copilot-instructions.md&lt;/code&gt; setup, the YAML skill layout, and the MCP integration shape I use to turn Copilot into domain-specific teammates instead of generic chat sessions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Two Production Examples
&lt;/h2&gt;

&lt;p&gt;Here are two real patterns from production that made this click for me.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. A DevOps Copilot Specialist
&lt;/h3&gt;

&lt;p&gt;One custom agent pattern I keep coming back to is a DevOps-focused Copilot specialist.&lt;/p&gt;

&lt;p&gt;The domain is narrow but deep: release prep, workflow governance, branch rules, dependency checks, and CI visibility. The agent profile establishes the role, the skills encode the repeatable procedures, and the tools expose the right capabilities to inspect workflows and surface the next action.&lt;/p&gt;

&lt;p&gt;That means I don't start every session with "here are our branch rules, here is how we label releases, here is what counts as a blocker." Copilot starts there already.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. A Vidpipe Media Workflow Specialist
&lt;/h3&gt;

&lt;p&gt;The second example is from my media pipeline.&lt;/p&gt;

&lt;p&gt;A video workflow has a lot of hidden knowledge: ingestion steps, transcript expectations, caption rules, retry paths, and publishing handoffs.&lt;/p&gt;

&lt;p&gt;A custom agent turns that operational knowledge into a reusable asset. The skills explain the workflow stages, the tools expose pipeline state, and the agent profile keeps the agent locked into the right role.&lt;/p&gt;

&lt;p&gt;And those two custom agents are only the teaser. The third production pattern in Issue #11 is the meta one: a custom agent that helps me scaffold more custom agents faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Build a Custom Agent vs. Use Skills Directly
&lt;/h2&gt;

&lt;p&gt;Not every recurring workflow needs a full custom agent.&lt;/p&gt;

&lt;p&gt;If the problem is mostly procedural, start with skills. Skills are the cheapest leverage point. They let you capture repeatable know-how without adding a new identity surface or tool boundary.&lt;/p&gt;

&lt;p&gt;Build a full custom agent when all three signals show up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you keep repeating the same domain context in session after session&lt;/li&gt;
&lt;li&gt;the domain needs its own toolset, not just better instructions&lt;/li&gt;
&lt;li&gt;the work benefits from a clear specialist identity instead of a generic assistant persona&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My rule of thumb is simple: if Copilot only needs a better playbook, write a skill. If Copilot needs a &lt;strong&gt;job title, a toolkit, and a memory of how your team works&lt;/strong&gt;, build a custom agent.&lt;/p&gt;

&lt;p&gt;That line matters because agent sprawl is real. A skill library can stay lightweight. A custom agent should earn its existence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Deep Dive Lives
&lt;/h2&gt;

&lt;p&gt;This article is deliberately the overview.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The full agent manifest, &lt;code&gt;copilot-instructions.md&lt;/code&gt;, YAML examples, and TypeScript MCP integration is in Issue #11.&lt;/strong&gt; That's where I walk through the actual layering, show the production examples in more detail, and explain why this architecture compounds once you have more than one specialist running.&lt;/p&gt;

&lt;p&gt;If this topic connects with the rest of your platform work, the next stop after the newsletter is &lt;a href="https://htek.dev/blueprints/the-agentic-development-blueprint" rel="noopener noreferrer"&gt;The Agentic Development Blueprint&lt;/a&gt;. It connects custom agent architecture to the bigger system: context engineering, guardrails, workflows, and governance.&lt;/p&gt;

&lt;p&gt;You should also read the surrounding pieces if you want the bigger picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/what-is-context-engineering-practical-guide-50-agents/" rel="noopener noreferrer"&gt;What Is Context Engineering? A Practical Guide from Building 50 Production AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/agent-skills-microsoft-just-shipped-what-youve-been-building/" rel="noopener noreferrer"&gt;Agent Skills: Microsoft Just Shipped What You've Been Building&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/github-copilot-cli-extensions-complete-guide/" rel="noopener noreferrer"&gt;GitHub Copilot CLI Extensions: The Complete Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/three-layers-your-ai-agent-is-missing/" rel="noopener noreferrer"&gt;The Three Layers Your AI Agent Is Missing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The teams getting the biggest lift from Copilot are not the ones asking better one-off questions. They're the ones turning Copilot into reusable specialists that understand their actual environment.&lt;/p&gt;

&lt;p&gt;This was the overview. Newsletter Issue #11 has the step-by-step implementation with real files from 3 production custom agents → Subscribe at &lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;htek.dev/newsletter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>copilotcli</category>
      <category>aiagents</category>
      <category>agenticdevelopment</category>
    </item>
    <item>
      <title>Copilot Plugins: Building Domain-Expert AI Teammates</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Fri, 22 May 2026 20:58:13 +0000</pubDate>
      <link>https://dev.to/htekdev/copilot-plugins-building-domain-expert-ai-teammates-287c</link>
      <guid>https://dev.to/htekdev/copilot-plugins-building-domain-expert-ai-teammates-287c</guid>
      <description>&lt;h2&gt;
  
  
  Most Developers Are Still Using Copilot at the Shallow End
&lt;/h2&gt;

&lt;p&gt;Most developers use &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; like a faster autocomplete engine. Useful, yes — but still the shallow end of the pool.&lt;/p&gt;

&lt;p&gt;The bigger opportunity is building &lt;strong&gt;domain-expert plugins&lt;/strong&gt;: packages that give Copilot a clear identity, specialized knowledge, and real tools it can use on your behalf. That's when the experience changes from "I ask Copilot for help" to &lt;strong&gt;"I built a Copilot teammate that understands my world."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you think in terms of VS Code extensions or chat participants, the mental model is similar: package context, behavior, and capability into a specialist that shows up with judgment built in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want the complete implementation? &lt;a href="https://htek.dev/newsletter/011-copilot-plugins-domain-expert-ai-teammates" rel="noopener noreferrer"&gt;Subscribe to the newsletter →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Copilot Plugin Is Identity + Knowledge + Actions
&lt;/h2&gt;

&lt;p&gt;The pattern is simple once you see it. A strong plugin has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity&lt;/strong&gt; — what the plugin is, when it should activate, and what components it includes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt; — reusable domain expertise encoded as skills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions&lt;/strong&gt; — MCP-connected tools that let the plugin do work instead of just talk about it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That combination is what turns a generic assistant into a specialist.&lt;/p&gt;

&lt;p&gt;I've been using this pattern across production systems because it solves a real problem: you stop re-explaining the same domain context every session. Instead of reminding Copilot about your release process, video pipeline, or internal conventions over and over, you package that knowledge once and let the runtime load it when relevant.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;plugin.json&lt;/code&gt; Is the Identity Card
&lt;/h2&gt;

&lt;p&gt;Every plugin starts with &lt;a href="https://docs.github.com/en/copilot/reference/cli-plugin-reference" rel="noopener noreferrer"&gt;&lt;code&gt;plugin.json&lt;/code&gt;&lt;/a&gt;. GitHub's plugin docs are explicit: a Copilot CLI plugin must include a manifest at the root, and that manifest tells the runtime what this package is and where its pieces live.&lt;/p&gt;

&lt;p&gt;That sounds small, but it's the architectural pivot.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;plugin.json&lt;/code&gt; is the plugin's &lt;strong&gt;identity card&lt;/strong&gt;. It declares the name, description, metadata, and the paths to things like skills and MCP configuration. In practice, that means your plugin stops being "some loose prompt files in a repo" and becomes a portable capability you can install, share, and evolve.&lt;/p&gt;

&lt;p&gt;GitHub's own docs on &lt;a href="https://github.com/github/docs/blob/main/content/copilot/how-tos/copilot-cli/customize-copilot/plugins-creating.md" rel="noopener noreferrer"&gt;creating Copilot CLI plugins&lt;/a&gt; show the expected structure clearly: a root manifest, optional &lt;code&gt;agents/&lt;/code&gt;, optional &lt;code&gt;skills/&lt;/code&gt;, optional hooks, and optional &lt;code&gt;.mcp.json&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills Encode Human Expertise into Reusable Capability
&lt;/h2&gt;

&lt;p&gt;The second layer is where the plugin gets smart.&lt;/p&gt;

&lt;p&gt;A skill is a &lt;code&gt;SKILL.md&lt;/code&gt; file with &lt;a href="https://docs.github.com/en/copilot/how-tos/copilot-on-github/customize-copilot/customize-cloud-agent/add-skills" rel="noopener noreferrer"&gt;YAML frontmatter&lt;/a&gt;. GitHub's skills docs make two things clear: the frontmatter describes what the skill does and when to use it, and the body contains the actual instructions, examples, and guidance.&lt;/p&gt;

&lt;p&gt;That's more important than it sounds.&lt;/p&gt;

&lt;p&gt;A lot of teams still treat AI behavior as a giant blob of prompt text. Skills are better because they package expertise at the workflow level: release prep, video captioning rules, GitHub Actions review, whatever your domain needs.&lt;/p&gt;

&lt;p&gt;That means domain knowledge becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured&lt;/strong&gt; instead of scattered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reusable&lt;/strong&gt; instead of copy-pasted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composable&lt;/strong&gt; instead of monolithic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wrote about this broader shift in &lt;a href="https://htek.dev/articles/agent-skills-microsoft-just-shipped-what-youve-been-building/" rel="noopener noreferrer"&gt;Agent Skills: Microsoft Just Shipped What You've Been Building&lt;/a&gt;. The short version is simple: skills are how you scale judgment without turning your agent into a bloated mess.&lt;/p&gt;

&lt;p&gt;And this isn't limited to one surface. GitHub's docs explicitly note that skills work with the Copilot cloud agent, Copilot CLI, and agent mode in VS Code. That's a big deal. You aren't building a one-off hack for one interface. You're building reusable capability for the Copilot runtime.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Newsletter subscribers get the real configs, code, and architecture files.&lt;/strong&gt; The full issue includes the exact manifest patterns, production &lt;code&gt;SKILL.md&lt;/code&gt; structure, and the wiring that makes these plugins behave like specialists instead of glorified prompts. &lt;strong&gt;&lt;a href="https://htek.dev/newsletter/011-copilot-plugins-domain-expert-ai-teammates" rel="noopener noreferrer"&gt;Get the deep dive →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  MCP Gives Plugins Hands
&lt;/h2&gt;

&lt;p&gt;Knowledge alone is not enough. A useful AI teammate needs the ability to act.&lt;/p&gt;

&lt;p&gt;That's where &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; changes the game. MCP gives plugins a structured way to expose tools the model can call: query APIs, inspect state, kick off workflows, fetch artifacts, or validate outputs.&lt;/p&gt;

&lt;p&gt;This is the line between an assistant that says, "You should probably check the pipeline," and one that &lt;strong&gt;actually checks the pipeline, reads the result, and tells you what matters.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a manifest that establishes identity,&lt;/li&gt;
&lt;li&gt;skills that encode judgment, and&lt;/li&gt;
&lt;li&gt;MCP tools that expose actions,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you stop building assistants that merely explain work. You start building teammates that participate in it.&lt;/p&gt;

&lt;p&gt;If you want the broader architecture behind that jump, it's tightly connected to the patterns I broke down in &lt;a href="https://htek.dev/articles/github-copilot-cli-extensions-complete-guide/" rel="noopener noreferrer"&gt;GitHub Copilot CLI Extensions: The Complete Guide&lt;/a&gt; and &lt;a href="https://htek.dev/articles/three-layers-your-ai-agent-is-missing/" rel="noopener noreferrer"&gt;The Three Layers Your AI Agent Is Missing&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Production Plugins That Prove the Pattern
&lt;/h2&gt;

&lt;p&gt;This isn't hypothetical. The pattern already holds across three public repos:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;a href="https://github.com/htekdev/devops-copilot-skill" rel="noopener noreferrer"&gt;&lt;code&gt;htekdev/devops-copilot-skill&lt;/code&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This one is the DevOps Workflow Orchestrator: repo health, release prep, workflow linting, dependency audit, and migration readiness. It shows what happens when Copilot starts understanding your operational workflow as a system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;a href="https://github.com/htekdev/vidpipe-copilot-plugin" rel="noopener noreferrer"&gt;&lt;code&gt;htekdev/vidpipe-copilot-plugin&lt;/code&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This plugin specializes in media production: video analysis, FFmpeg editing workflows, silence removal, captions, and multi-platform output generation. Same pattern, completely different domain. That's the point. Once the architecture is right, the domain can change and the shape still holds.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;a href="https://github.com/htekdev/copilot-plugin-skill" rel="noopener noreferrer"&gt;&lt;code&gt;htekdev/copilot-plugin-skill&lt;/code&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This is the meta move: a plugin for building plugins. It packages the conventions, templates, and hard-won lessons required to scaffold new Copilot capabilities faster. That repo is proof that once you figure out the pattern, you can teach Copilot to reproduce it.&lt;/p&gt;

&lt;p&gt;Taken together, these three examples make the case clearly: the progression is not just "I use Copilot." It's &lt;strong&gt;"I build on Copilot."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More Than Another Prompt Trick
&lt;/h2&gt;

&lt;p&gt;Prompt tricks are fragile. Domain architecture scales.&lt;/p&gt;

&lt;p&gt;Plugins let you move important behavior out of ephemeral chat context and into versioned, inspectable files. Your team can review them, evolve them, reuse them, and ship them.&lt;/p&gt;

&lt;p&gt;If you're serious about building agentic systems, this is exactly why I created &lt;a href="https://htek.dev/blueprints/the-agentic-development-blueprint" rel="noopener noreferrer"&gt;The Agentic Development Blueprint&lt;/a&gt;. It pairs naturally with the newsletter issue if you want both the strategic model and the practical build path.&lt;/p&gt;

&lt;p&gt;If your team wants help applying these patterns to real delivery workflows, infrastructure automation, or internal platform tooling, my &lt;a href="https://htek.dev/services" rel="noopener noreferrer"&gt;consulting services&lt;/a&gt; are built for that kind of work.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This was the overview. The newsletter issue has the step-by-step implementation → &lt;a href="https://htek.dev/newsletter/011-copilot-plugins-domain-expert-ai-teammates" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>copilotcli</category>
      <category>modelcontextprotocol</category>
      <category>agenticdevelopment</category>
    </item>
    <item>
      <title>Platform Engineering with GitHub: How to Build an Internal Developer Platform Using Copilot, IssueOps, and Golden-Path Starter Repos</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Thu, 21 May 2026 13:58:30 +0000</pubDate>
      <link>https://dev.to/htekdev/platform-engineering-with-github-how-to-build-an-internal-developer-platform-using-copilot-adh</link>
      <guid>https://dev.to/htekdev/platform-engineering-with-github-how-to-build-an-internal-developer-platform-using-copilot-adh</guid>
      <description>&lt;p&gt;The platform engineering movement is accelerating — and most teams are building it wrong.&lt;/p&gt;

&lt;p&gt;They're adopting Backstage, standing up Kubernetes operators, hiring dedicated portal teams, and spending 6-12 months before delivering any real value to developers. Meanwhile, the actual developer platform — the thing engineers use every single day — is sitting right in front of them.&lt;/p&gt;

&lt;p&gt;It's GitHub.&lt;/p&gt;

&lt;p&gt;I built an enterprise-scale internal developer platform at a Fortune 500 energy company. Thousands of developers, hundreds of repos, strict compliance requirements. We didn't need a separate portal because &lt;strong&gt;GitHub already IS the platform&lt;/strong&gt; — the service catalog, the self-service automation, the golden paths, the governance layer. All native primitives, composed together.&lt;/p&gt;

&lt;p&gt;Here's the overview of how that architecture works — and the 7 open-source repos that make it real.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the complete implementation?&lt;/strong&gt; This article covers the architecture and approach. The full step-by-step guide — with IssueOps workflows, Copilot extension code, and hookflow governance configs — lives in &lt;a href="https://htek.dev/newsletter/issues/008-platform-engineering-with-github" rel="noopener noreferrer"&gt;Issue 008 of the htek.dev newsletter&lt;/a&gt;. &lt;strong&gt;&lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;Subscribe to get it →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Platform Engineering Movement Is Real — But the Tooling Is Wrong
&lt;/h2&gt;

&lt;p&gt;Platform engineering isn't a trend. It's a structural response to a measurable problem: developer teams spend &lt;a href="https://humanitec.com/blog/state-of-platform-engineering-report-volume-2" rel="noopener noreferrer"&gt;30% of their time on operational tasks&lt;/a&gt; instead of shipping features. The &lt;a href="https://tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model/" rel="noopener noreferrer"&gt;CNCF Platform Engineering Maturity Model&lt;/a&gt; formalized this, and &lt;a href="https://teamtopologies.com/key-concepts" rel="noopener noreferrer"&gt;Team Topologies&lt;/a&gt; gave us the vocabulary — platform teams exist to reduce cognitive load for stream-aligned teams.&lt;/p&gt;

&lt;p&gt;But the industry made a wrong turn. Everyone assumed platform engineering meant &lt;strong&gt;Backstage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Backstage is powerful — and it's also a React application requiring a PostgreSQL database, a dedicated team to maintain, a plugin ecosystem with varying quality, and months of customization. The &lt;a href="https://backstage.io/blog/2025/03/25/adopter-survey-results-2025/" rel="noopener noreferrer"&gt;2025 Backstage Adopter Survey&lt;/a&gt; showed most teams take 6-12 months to reach "useful." Many never get there.&lt;/p&gt;

&lt;p&gt;If your developers already live in GitHub — PRs, issues, Actions, Codespaces — why send them to a separate portal?&lt;/p&gt;




&lt;h2&gt;
  
  
  The 4 Capabilities Every IDP Needs
&lt;/h2&gt;

&lt;p&gt;Every internal developer platform, regardless of implementation, needs four core capabilities:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Self-Service Provisioning
&lt;/h3&gt;

&lt;p&gt;Developers need to spin up environments, repos, and infrastructure without filing tickets. On GitHub, this means &lt;strong&gt;IssueOps&lt;/strong&gt; — open an issue with a structured template, and GitHub Actions provisions everything automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Golden-Path Templates
&lt;/h3&gt;

&lt;p&gt;Opinionated, well-lit routes through your stack. Developers &lt;em&gt;can&lt;/em&gt; deviate — but the default path is fast, correct, and maintained. On GitHub? These are &lt;strong&gt;starter repos with embedded AI context&lt;/strong&gt; via &lt;code&gt;copilot-instructions.md&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Governance &amp;amp; Guardrails
&lt;/h3&gt;

&lt;p&gt;Policy enforcement that doesn't require developers to remember rules. On GitHub, this is &lt;strong&gt;Copilot hooks and extensions&lt;/strong&gt; — guardrails that intercept dangerous operations before they happen, not after.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Unified Developer Experience
&lt;/h3&gt;

&lt;p&gt;One place where developers see their services, their environments, their compliance status. On GitHub, you already have the repo as the unit of ownership — what's missing is the composition layer, which &lt;a href="https://docs.github.com/en/copilot/building-copilot-extensions" rel="noopener noreferrer"&gt;GitHub Copilot extensions&lt;/a&gt; now provide.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why GitHub-Native Wins Over Backstage
&lt;/h2&gt;

&lt;p&gt;The fundamental advantage: &lt;strong&gt;zero adoption friction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Your developers are already authenticated to GitHub. They already know how to open issues, review PRs, and trigger Actions. A GitHub-native IDP doesn't require new logins, new UIs, new mental models, or new SSO integrations.&lt;/p&gt;

&lt;p&gt;Here's the architectural comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Backstage Approach&lt;/th&gt;
&lt;th&gt;GitHub-Native Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Service catalog&lt;/td&gt;
&lt;td&gt;Custom plugins + PostgreSQL&lt;/td&gt;
&lt;td&gt;Repository topics + &lt;code&gt;CODEOWNERS&lt;/code&gt; + org-level metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-service&lt;/td&gt;
&lt;td&gt;Software templates + scaffolder&lt;/td&gt;
&lt;td&gt;IssueOps + reusable Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Golden paths&lt;/td&gt;
&lt;td&gt;Template catalog&lt;/td&gt;
&lt;td&gt;Starter repos + copilot-instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance&lt;/td&gt;
&lt;td&gt;TechDocs + manual reviews&lt;/td&gt;
&lt;td&gt;Copilot hooks + required workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer UI&lt;/td&gt;
&lt;td&gt;Custom React portal&lt;/td&gt;
&lt;td&gt;GitHub UI + Copilot chat extensions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The GitHub-native approach isn't "less capable." It's &lt;strong&gt;differently capable&lt;/strong&gt; — and it ships in weeks, not months.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 7 Starter Repos
&lt;/h2&gt;

&lt;p&gt;I've open-sourced the building blocks as 7 starter repositories. Each handles one piece of the IDP puzzle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;copilot-instructions-starter&lt;/strong&gt; — Golden-path context engineering templates that shape how AI agents interact with your codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;copilot-hooks-starter&lt;/strong&gt; — Hook configurations and safety guardrails for controlling what AI agents can and cannot do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;copilot-agent-starter&lt;/strong&gt; — Multi-agent delegation patterns and orchestration templates for PR review, deployment, and triage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;issueops-starter&lt;/strong&gt; — Self-service provisioning workflows triggered by structured issue templates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;github-governance-starter&lt;/strong&gt; — Organization-wide policy enforcement via required workflows, rulesets, and compliance checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;platform-catalog-starter&lt;/strong&gt; — Service catalog metadata conventions using repository topics, custom properties, and CODEOWNERS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;golden-path-app-starter&lt;/strong&gt; — A complete application template wired with all of the above — the "new project" button for your platform&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each repo is standalone but designed to compose. The platform team maintains the starters; stream-aligned teams consume them through &lt;code&gt;gh repo create --template&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Newsletter subscribers get the full implementation details&lt;/strong&gt; — complete IssueOps workflow YAML, Copilot extension source code, hookflow governance configs, and the composition patterns that wire all 7 repos together. &lt;strong&gt;&lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;Subscribe to Issue 008 →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How This Connects to Context Engineering
&lt;/h2&gt;

&lt;p&gt;If you've read my piece on &lt;a href="https://htek.dev/articles/what-is-context-engineering-practical-guide-50-agents/" rel="noopener noreferrer"&gt;context engineering&lt;/a&gt;, you already understand the core insight: &lt;strong&gt;the quality of AI output is determined by the context you provide, not the prompts you write&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Golden-path starter repos are context engineering at the organizational level. Every new repo created from a starter inherits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture decisions (via &lt;code&gt;copilot-instructions.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Safety guardrails (via hook configurations)&lt;/li&gt;
&lt;li&gt;Governance rules (via required workflows)&lt;/li&gt;
&lt;li&gt;Agent behaviors (via agent definitions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what I call &lt;a href="https://htek.dev/articles/three-layers-your-ai-agent-is-missing/" rel="noopener noreferrer"&gt;the three layers your AI agent is missing&lt;/a&gt; — scaled to the platform level. And it ties directly into the &lt;a href="https://htek.dev/articles/7-layer-ai-governance-stack/" rel="noopener noreferrer"&gt;governance stack&lt;/a&gt; I wrote about recently.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Blueprint: Part 4 — Platform Engineering
&lt;/h2&gt;

&lt;p&gt;I'm releasing a new &lt;strong&gt;Part 4: Platform Engineering&lt;/strong&gt; chapter in &lt;a href="https://htek.dev/blueprints/the-agentic-development-blueprint" rel="noopener noreferrer"&gt;The Agentic Development Blueprint&lt;/a&gt;. It covers the full architecture — from IssueOps provisioning flows to Copilot extension development to hookflow governance patterns — with production-ready code and configuration you can deploy this week.&lt;/p&gt;

&lt;p&gt;If you're already running the blueprint patterns from Parts 1-3 (agent harnesses, multi-agent orchestration, context engineering), Part 4 shows how to scale those patterns across your entire organization as a platform team.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;You don't need Backstage. You don't need a 6-month implementation timeline. You don't need a dedicated React portal team.&lt;/p&gt;

&lt;p&gt;You need GitHub — which you already have — composed with IssueOps, golden-path starters, Copilot extensions, and hook-based governance. The platform is already there. You just need to wire it together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This was the overview.&lt;/strong&gt; The newsletter issue has the full step-by-step implementation — complete IssueOps workflows, Copilot extension code, all 7 repos explained in depth, and the composition patterns that make them work together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;→ Subscribe to the htek.dev newsletter to get Issue 008&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/what-is-context-engineering-practical-guide-50-agents/" rel="noopener noreferrer"&gt;Context Engineering: Practical Guide with 50+ Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/agent-hooks-controlling-ai-codebase/" rel="noopener noreferrer"&gt;Agent Hooks: Controlling AI Agents in Your Codebase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/53-agents-zero-chaos-multi-agent-orchestration-patterns/" rel="noopener noreferrer"&gt;53 Agents, Zero Chaos: Multi-Agent Orchestration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/7-layer-ai-governance-stack/" rel="noopener noreferrer"&gt;The 7-Layer AI Governance Stack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>platformengineering</category>
      <category>github</category>
      <category>githubactions</category>
      <category>devex</category>
    </item>
    <item>
      <title>GitOps for Everything: The *-as-Code Revolution That Changes How You Ship, Govern, and Scale</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Wed, 20 May 2026 23:41:37 +0000</pubDate>
      <link>https://dev.to/htekdev/gitops-for-everything-the-as-code-revolution-that-changes-how-you-ship-govern-and-scale-4g8e</link>
      <guid>https://dev.to/htekdev/gitops-for-everything-the-as-code-revolution-that-changes-how-you-ship-govern-and-scale-4g8e</guid>
      <description>&lt;h2&gt;
  
  
  The *-as-Code Pattern Is Eating Operations
&lt;/h2&gt;

&lt;p&gt;Every major operational discipline has gone through the same evolution: manual clicks in a dashboard → scripts in a wiki → &lt;strong&gt;declarative code in a Git repository, enforced through CI/CD&lt;/strong&gt;. It happened to infrastructure. Then policy. Then identity. Then documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrqvox4omzcnrxw22a6x.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrqvox4omzcnrxw22a6x.webp" alt="The *-as-Code Evolution — from manual dashboard clicks to scripts to declarative code in Git, delivering automation, repeatability, reliability, and audit trail" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The pattern that wins every time: manual → scripts → declarative code in Git.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The pattern keeps winning because it always delivers the same four things: &lt;strong&gt;automation, repeatability, reliability, and audit trail&lt;/strong&gt;. Once you define something as code and apply GitOps to it, you gain PR-based review, rollback on merge revert, blame for forensics, and branch protection as a governance gate. It's not clever — it's structural leverage.&lt;/p&gt;

&lt;p&gt;And the family keeps growing. Here's the landscape in 2026 — and why the newest member, &lt;strong&gt;Harness as Code&lt;/strong&gt;, might be the most important addition yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The *-as-Code Family
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Infrastructure as Code
&lt;/h3&gt;

&lt;p&gt;The one that started it all. &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;, &lt;a href="https://www.pulumi.com/" rel="noopener noreferrer"&gt;Pulumi&lt;/a&gt;, &lt;a href="https://opentofu.org/" rel="noopener noreferrer"&gt;OpenTofu&lt;/a&gt;, AWS CDK, Azure Bicep — declare your compute, networking, and platform services in version-controlled files. IaC eliminated server snowflakes. Every environment is reproducible from a single source of truth. Drift detection catches unauthorized changes. PRs become infrastructure review gates.&lt;/p&gt;

&lt;p&gt;In 2026, this is table stakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Policy as Code
&lt;/h3&gt;

&lt;p&gt;Tools like &lt;a href="https://www.openpolicyagent.org/" rel="noopener noreferrer"&gt;Open Policy Agent (OPA)&lt;/a&gt; and &lt;a href="https://www.hashicorp.com/sentinel" rel="noopener noreferrer"&gt;HashiCorp Sentinel&lt;/a&gt; let you express governance rules — security boundaries, compliance requirements, cost controls — as testable, enforceable code.&lt;/p&gt;

&lt;p&gt;Instead of "don't deploy public S3 buckets" being a wiki page someone ignores, it becomes a Rego policy that blocks the Terraform plan. Instead of "all containers must run as non-root" being a Slack reminder, it becomes a &lt;a href="https://kyverno.io/" rel="noopener noreferrer"&gt;Kyverno&lt;/a&gt; policy that rejects the admission. The policy is version-controlled, reviewed via PR, and enforced automatically. Shift-left governance that actually works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identity as Code
&lt;/h3&gt;

&lt;p&gt;SSO configurations, user group memberships, service account definitions, federation trust relationships — these traditionally live in admin consoles where changes are invisible and unauditable. Identity as Code means managing your &lt;a href="https://www.okta.com/" rel="noopener noreferrer"&gt;Okta&lt;/a&gt;, Azure Entra ID, or Keycloak configurations through Terraform providers or declarative YAML. Onboarding a new team? That's a PR that adds them to the right groups with the right app assignments. Offboarding? A PR that revokes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Access as Code
&lt;/h3&gt;

&lt;p&gt;IAM policies, RBAC role definitions, permission sets, and authorization rules — declared as code rather than clicked through consoles. AWS IAM policies in Terraform. Kubernetes RBAC manifests in Git. Authorization logic expressed in &lt;a href="https://www.cedarpolicy.com/" rel="noopener noreferrer"&gt;Cedar&lt;/a&gt; or OPA rather than scattered &lt;code&gt;if&lt;/code&gt; statements across microservices.&lt;/p&gt;

&lt;p&gt;When your access rules are code, you can answer "who has access to what and why?" by reading a repo instead of auditing twelve different admin panels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Management as Code
&lt;/h3&gt;

&lt;p&gt;Runbooks, escalation matrices, incident response playbooks — versioned in Git rather than tribal knowledge in someone's head. On-call rotations as YAML. Incident response as Markdown with automated triggers. Consistency across shifts, updates through review, history preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docs as Code
&lt;/h3&gt;

&lt;p&gt;Documentation built from Markdown/MDX in Git repos, deployed through CI/CD, reviewed via PRs. Tools like &lt;a href="https://docusaurus.io/" rel="noopener noreferrer"&gt;Docusaurus&lt;/a&gt;, &lt;a href="https://www.mkdocs.org/" rel="noopener noreferrer"&gt;MkDocs&lt;/a&gt;, and &lt;a href="https://buildwithfern.com/" rel="noopener noreferrer"&gt;Fern&lt;/a&gt; make this seamless. When docs are code, they stay in sync because updating them is part of the same PR that changes the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Harness as Code — The Newest Member
&lt;/h3&gt;

&lt;p&gt;Here's where it gets interesting. Every &lt;em&gt;-as-code pattern above governs a **technical&lt;/em&gt;* surface — servers, policies, identities, documentation. But what governs the &lt;strong&gt;behavioral&lt;/strong&gt; surface of autonomous AI agents?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Harness as Code&lt;/strong&gt; is the practice of defining AI agent governance — decision boundaries, autonomy levels, communication rules, escalation paths, tool permissions, and safety constraints — as declarative files in a Git repository, enforced through the same PR-review-merge workflow as everything else.&lt;/p&gt;

&lt;p&gt;It's the natural next step. If your infrastructure changes require a reviewed PR, why would your AI agent's behavioral boundaries be defined in an unversioned prompt that someone edits directly? That's the equivalent of manually configuring a production server in 2010.&lt;/p&gt;

&lt;p&gt;Harness as Code means agent instructions, constitutions, skill definitions, and governance rules all live in Git. Change an agent's autonomy level? PR. Expand what tools an agent can access? PR. Every behavioral change goes through code review, gets attributed, and is reversible.&lt;/p&gt;

&lt;p&gt;I cover the full &lt;a href="https://htek.dev/articles/7-layer-ai-governance-stack/" rel="noopener noreferrer"&gt;7-layer governance architecture&lt;/a&gt; in a separate piece — but the key insight is that Harness as Code doesn't require inventing new tooling. It requires applying the pattern that already works everywhere else to the newest operational surface: agent behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GitOps Is the Multiplier
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;-as-code patterns above are powerful on their own. Combine them with **GitOps&lt;/em&gt;* — Git as single source of truth, changes reconciled automatically — and you get compounding leverage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: Merge triggers reconciliation. No manual "apply" steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeatability&lt;/strong&gt;: Clone the repo, get an identical system. Every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt;: Branch protection + required reviews = no unreviewed changes reach production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit trail&lt;/strong&gt;: Every change attributed, timestamped, diffable, revertable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there's a second-order unlock that changes the entire equation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Unlock: Once It's Code, Agents Can Maintain It
&lt;/h2&gt;

&lt;p&gt;Here's the thesis most *-as-code articles miss — the one that changes the math entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxk9wo5qju3etbk905rfn.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxk9wo5qju3etbk905rfn.webp" alt="Once it's code, agents can maintain it — AI agents proposing PRs, detecting drift, and auto-remediating across all as-code domains, with Harness as Code closing the governance loop" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;As-code is the prerequisite for agent-maintained. Harness as Code closes the loop.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Automation, repeatability, and audit trails justify the migration to code on their own. But there's a second-order effect that &lt;strong&gt;drastically skyrockets your velocity&lt;/strong&gt;: once something is expressed as code in a Git repository, an AI agent can maintain it.&lt;/p&gt;

&lt;p&gt;Think about what that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Want agents managing your identity?&lt;/strong&gt; Make identity as code first. Once your Okta configs are Terraform files, an agent can propose onboarding PRs, detect stale accounts, and enforce least-privilege — autonomously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want agents managing your policy?&lt;/strong&gt; Make policy as code first. Once your governance rules are OPA policies in a repo, an agent can detect compliance drift and propose remediation PRs automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want agents managing your infrastructure?&lt;/strong&gt; Make it as code first. Once your cloud resources are declarative configs, an agent can right-size instances, rotate certificates, and propose cost optimizations — all as reviewable PRs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want agents managing your documentation?&lt;/strong&gt; Make docs as code first. Once your docs are Markdown in Git, an agent can detect staleness, update API references, and flag broken links.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is universal: &lt;strong&gt;as-code is the prerequisite for agent-maintained.&lt;/strong&gt; Without code, agents have nothing to operate on. With code, agents get a structured, diffable, reviewable surface they can read, reason about, and propose changes to.&lt;/p&gt;

&lt;p&gt;This is where Harness as Code completes the loop. The agents themselves are governed as code — their instructions, boundaries, and permissions live in Git. So you get agents maintaining infrastructure, policy, identity, and docs... while the agents &lt;em&gt;themselves&lt;/em&gt; are maintained through the same pattern. Code governing agents governing code.&lt;/p&gt;

&lt;p&gt;That's not a nice-to-have. That's the difference between a team of five managing ten services and a team of five managing a hundred.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;-as-code revolution isn't just about automation and audit trails — it's about **creating the surface that agents can operate on&lt;/em&gt;*. Every domain you migrate to code becomes a domain that agents can maintain. Every domain you leave in dashboards and admin consoles stays a domain that requires human clicks.&lt;/p&gt;

&lt;p&gt;Infrastructure was first. Policy, identity, access, docs, and management followed. &lt;strong&gt;Harness as Code&lt;/strong&gt; closes the loop by governing the agents themselves through the same pattern.&lt;/p&gt;

&lt;p&gt;The progression: make it code → apply GitOps → let agents maintain it → govern the agents as code too.&lt;/p&gt;

&lt;p&gt;That's not a framework. That's compound leverage.&lt;/p&gt;

&lt;p&gt;
  headline="Want the complete Harness as Code architecture?"&lt;br&gt;
  description="The &lt;strong&gt;Agentic Development Blueprint&lt;/strong&gt; ($129) includes the full governance architecture — agent constitutions, skill definitions, hookflow pipelines, and the decision framework for applying GitOps to AI agent behavior at scale."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

</description>
      <category>infrastructureascode</category>
      <category>devops</category>
      <category>cicd</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Aspect-Oriented Programming for AI Agents: Hookflows as an Event Bus</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Wed, 20 May 2026 13:55:26 +0000</pubDate>
      <link>https://dev.to/htekdev/aspect-oriented-programming-for-ai-agents-hookflows-as-an-event-bus-1if7</link>
      <guid>https://dev.to/htekdev/aspect-oriented-programming-for-ai-agents-hookflows-as-an-event-bus-1if7</guid>
      <description>&lt;h2&gt;
  
  
  The Pattern That Made Me Say "That's Exactly What AOP Is"
&lt;/h2&gt;

&lt;p&gt;I was debugging a notification problem in my 53-agent home assistant when I stumbled onto something unexpectedly powerful. I needed every agent dispatch to notify me via Telegram — but I didn't want to burn tokens on a separate &lt;code&gt;telegram_send_message&lt;/code&gt; call. The agents were already being validated by a governance hookflow. Why not piggyback the notification onto the validation step?&lt;/p&gt;

&lt;p&gt;One tool call. Validation &lt;strong&gt;and&lt;/strong&gt; notification. Zero additional tokens consumed by the agent.&lt;/p&gt;

&lt;p&gt;Then it hit me: I'd accidentally reinvented &lt;a href="https://en.wikipedia.org/wiki/Aspect-oriented_programming" rel="noopener noreferrer"&gt;aspect-oriented programming&lt;/a&gt; — but for AI agents instead of Java classes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is AOP, and Why Does It Matter Here?
&lt;/h2&gt;

&lt;p&gt;Aspect-oriented programming emerged in the late 1990s to solve a specific problem: &lt;strong&gt;cross-cutting concerns&lt;/strong&gt;. Logging, security checks, transaction management — these behaviors cut across every module in your application, but they don't belong in any single module's core logic.&lt;/p&gt;

&lt;p&gt;The AOP solution: define these concerns once, then weave them into your code at specific &lt;strong&gt;join points&lt;/strong&gt; (method calls, property access) using &lt;strong&gt;advice&lt;/strong&gt; (before, after, around). The original code never knows it's being augmented.&lt;/p&gt;

&lt;p&gt;Now apply that mental model to AI agents:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AOP Concept&lt;/th&gt;
&lt;th&gt;Agent Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Join point&lt;/td&gt;
&lt;td&gt;Tool call (e.g., &lt;code&gt;task&lt;/code&gt;, &lt;code&gt;edit&lt;/code&gt;, &lt;code&gt;create&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pointcut&lt;/td&gt;
&lt;td&gt;Hook trigger rule (which tools, which args)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advice&lt;/td&gt;
&lt;td&gt;Hook step logic (validate, notify, log)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aspect&lt;/td&gt;
&lt;td&gt;A hookflow YAML definition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weaving&lt;/td&gt;
&lt;td&gt;The hook engine intercepting tool calls at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agent doesn't know. It just calls a tool. The governance layer intercepts, validates, and fires side effects. This is textbook AOP — applied to a fundamentally new domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Implementation: Enforcement-Triggered Side Effects
&lt;/h2&gt;

&lt;p&gt;Here's the actual hookflow running in my platform. It requires every agent dispatch to include a notification tag, validates it, and then sends the notification as a side effect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Require task or write_agent originator notify&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="s"&gt;Blocks task/write_agent calls unless they contain a valid&lt;/span&gt;
  &lt;span class="s"&gt;originator_notify tag. On success, sends the parsed message&lt;/span&gt;
  &lt;span class="s"&gt;to the originator via Telegram Bot API.&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;preToolUse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;write_agent&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;blocking&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;TOOL_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ event.tool.name }}&lt;/span&gt;
  &lt;span class="na"&gt;TASK_PROMPT_JSON&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ toJSON(event.tool.args.prompt) }}&lt;/span&gt;
  &lt;span class="na"&gt;WRITE_AGENT_MESSAGE_JSON&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ toJSON(event.tool.args.message) }}&lt;/span&gt;
&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Validate and send originator notification&lt;/span&gt;
    &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;# Determine which tool arg contains the text&lt;/span&gt;
      &lt;span class="s"&gt;$text = if ($env:TOOL_NAME -eq 'write_agent') {&lt;/span&gt;
        &lt;span class="s"&gt;$env:WRITE_AGENT_MESSAGE_JSON | ConvertFrom-Json&lt;/span&gt;
      &lt;span class="s"&gt;} else {&lt;/span&gt;
        &lt;span class="s"&gt;$env:TASK_PROMPT_JSON | ConvertFrom-Json&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;

      &lt;span class="s"&gt;# Parse the XML tag from tool arguments&lt;/span&gt;
      &lt;span class="s"&gt;$pattern = '&amp;lt;originator_notify\b(?&amp;lt;attrs&amp;gt;[^&amp;gt;]*)&amp;gt;(?&amp;lt;message&amp;gt;[\s\S]*?)&amp;lt;/originator_notify&amp;gt;'&lt;/span&gt;
      &lt;span class="s"&gt;$matches = [regex]::Matches($text, $pattern)&lt;/span&gt;

      &lt;span class="s"&gt;if ($matches.Count -eq 0) {&lt;/span&gt;
        &lt;span class="s"&gt;Write-Error "Missing &amp;lt;originator_notify&amp;gt; block"&lt;/span&gt;
        &lt;span class="s"&gt;exit 1  # BLOCKS the tool call&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;

      &lt;span class="s"&gt;# Extract telegram_id via nested regex on attributes&lt;/span&gt;
      &lt;span class="s"&gt;$attrs = $matches[0].Groups['attrs'].Value&lt;/span&gt;
      &lt;span class="s"&gt;$idMatch = [regex]::Match($attrs, 'telegram_id=["\x27](?&amp;lt;id&amp;gt;\d+)["\x27]')&lt;/span&gt;
      &lt;span class="s"&gt;$telegramId = $idMatch.Groups['id'].Value&lt;/span&gt;

      &lt;span class="s"&gt;$notifyMessage = $matches[0].Groups['message'].Value.Trim()&lt;/span&gt;

      &lt;span class="s"&gt;# Side effect: send Telegram notification&lt;/span&gt;
      &lt;span class="s"&gt;$botToken = $env:TELEGRAM_BOT_TOKEN&lt;/span&gt;
      &lt;span class="s"&gt;$body = @{ chat_id = $telegramId; text = $notifyMessage } | ConvertTo-Json&lt;/span&gt;
      &lt;span class="s"&gt;Invoke-RestMethod -Uri "https://api.telegram.org/bot$botToken/sendMessage" `&lt;/span&gt;
        &lt;span class="s"&gt;-Method Post -ContentType 'application/json' -Body $body&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent's perspective? It's just including metadata in its prompt — a governance requirement. It has no idea that including that tag triggers a Telegram message. The hookflow handles both &lt;strong&gt;enforcement&lt;/strong&gt; (blocking if the tag is missing) and a &lt;strong&gt;side effect&lt;/strong&gt; (sending the notification) in a single interception.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Better Than Explicit Tool Calls
&lt;/h2&gt;

&lt;p&gt;The naive approach is straightforward: after dispatching a sub-agent, call &lt;code&gt;telegram_send_message&lt;/code&gt; explicitly. But that approach has serious problems in production multi-agent systems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token cost compounds.&lt;/strong&gt; Every tool call consumes tokens — the call itself, the response, the reasoning about the response. In a system dispatching dozens of agents per hour, those extra calls add up fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent discretion is unreliable.&lt;/strong&gt; Agents skip steps. They forget. They decide a notification "isn't necessary this time." When notifications are a side effect of governance, they happen deterministically. Every single time. No exceptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composability degrades.&lt;/strong&gt; When you want to add a second cross-cutting concern — say, audit logging — you'd need to update every agent's instructions. With hookflows, you stack another aspect. The agents remain untouched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Composability Advantage
&lt;/h2&gt;

&lt;p&gt;This pattern isn't limited to notifications. Here are enforcement-triggered side effects I'm now implementing across the platform:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task creation with auto-notification:&lt;/strong&gt;&lt;br&gt;
A hookflow on &lt;code&gt;add_task&lt;/code&gt; validates the task structure, then parses a &lt;code&gt;notify&lt;/code&gt; block to Telegram the assignee. The agent creating the task doesn't need to know who gets notified or how.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File edits with watch triggers:&lt;/strong&gt;&lt;br&gt;
A hookflow on &lt;code&gt;edit&lt;/code&gt; for certain file paths validates the change, then queues a test run. The editing agent doesn't know it just triggered CI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content creation with publish intent:&lt;/strong&gt;&lt;br&gt;
A hookflow on &lt;code&gt;create&lt;/code&gt; for content files validates frontmatter, then notifies the content scheduler to slot the piece. The writing agent doesn't manage scheduling.&lt;/p&gt;

&lt;p&gt;Each hookflow is independent. Stack them. Compose them. The agent sees one tool call; the platform executes an entire workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prior Art: I'm Not the First, But the Domain Is New
&lt;/h2&gt;

&lt;p&gt;The broader software world has been exploring this territory:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://spring.io/blog/2024/10/02/supercharging-your-ai-applications-with-spring-ai-advisors" rel="noopener noreferrer"&gt;Spring AI's Advisor system&lt;/a&gt; draws an explicit parallel to Spring AOP — intercepting and enhancing AI calls with logging, memory injection, and retrieval augmentation. &lt;a href="https://docs.crewai.com/en/learn/llm-hooks" rel="noopener noreferrer"&gt;CrewAI's LLM Call Hooks&lt;/a&gt; expose before/after interception points for inspection, approval gates, and response transformation. &lt;a href="https://newsletter.victordibia.com/p/agent-middleware-adding-control-and" rel="noopener noreferrer"&gt;Victor Dibia's analysis of agent middleware&lt;/a&gt; frames middleware as a control and observability mechanism for agent execution loops.&lt;/p&gt;

&lt;p&gt;Microsoft's own &lt;a href="https://github.com/microsoft/agent-governance-toolkit" rel="noopener noreferrer"&gt;Agent Governance Toolkit&lt;/a&gt; provides application-level policy enforcement for autonomous agents via Python middleware.&lt;/p&gt;

&lt;p&gt;What's different about the hookflow approach is the &lt;strong&gt;enforcement-plus-side-effect fusion&lt;/strong&gt;. Most prior art treats governance (blocking/allowing) and side effects (notifications/logging) as separate middleware layers. The hookflow pattern combines them: the same rule that enforces compliance also triggers downstream actions. One interception, multiple outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Architecture
&lt;/h2&gt;

&lt;p&gt;The engine powering this is &lt;a href="https://github.com/htekdev/gh-hookflow" rel="noopener noreferrer"&gt;gh-hookflow&lt;/a&gt; — a Go-based workflow engine that intercepts GitHub Copilot CLI tool calls using &lt;code&gt;preToolUse&lt;/code&gt; and &lt;code&gt;postToolUse&lt;/code&gt; triggers. Hookflows are defined in YAML (GitHub Actions syntax) and live in &lt;code&gt;.github/hookflows/&lt;/code&gt;. I've written about the governance layer before in &lt;a href="https://htek.dev/articles/hookflows-governed-git-for-ai-agents/" rel="noopener noreferrer"&gt;Stop Trusting AI Agents with Git&lt;/a&gt; and the broader &lt;a href="https://htek.dev/articles/three-layers-your-ai-agent-is-missing/" rel="noopener noreferrer"&gt;three-layer architecture&lt;/a&gt; that makes autonomous agents production-ready.&lt;/p&gt;

&lt;p&gt;The execution model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent calls a tool (e.g., &lt;code&gt;task&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Hook engine intercepts via &lt;code&gt;preToolUse&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Hookflow steps execute: parse args, validate, fire side effects&lt;/li&gt;
&lt;li&gt;If validation fails → tool call is &lt;strong&gt;blocked&lt;/strong&gt; (agent sees denial message)&lt;/li&gt;
&lt;li&gt;If validation passes → tool call proceeds, side effects already fired&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is deterministic. It runs on every tool call matching the trigger. The agent cannot bypass it — unlike instructions, which are suggestions the model can ignore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Token Efficiency Matters at Scale
&lt;/h2&gt;

&lt;p&gt;In a system running 53 agents with scheduled cron jobs, every unnecessary tool call matters. Here's the math:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each &lt;code&gt;telegram_send_message&lt;/code&gt; call: ~200 tokens (call + response + reasoning)&lt;/li&gt;
&lt;li&gt;Agent dispatches per day: ~80-120&lt;/li&gt;
&lt;li&gt;Tokens saved by hookflow side effects: &lt;strong&gt;16,000-24,000 tokens/day&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;At current model pricing, that's $0.15-0.75/day depending on model tier — which compounds across every cross-cutting concern you'd otherwise implement as explicit tool calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the bigger win isn't cost. It's &lt;strong&gt;reliability&lt;/strong&gt;. Those 80-120 notifications now happen with 100% certainty. Not "usually" or "when the agent remembers." Every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own Enforcement-Triggered Side Effects
&lt;/h2&gt;

&lt;p&gt;If you're building with GitHub Copilot CLI extensions or any hook-based agent framework, here's the pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify a cross-cutting concern&lt;/strong&gt; — something that should happen on every tool call of a certain type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require metadata&lt;/strong&gt; — make the agent include structured data (XML, JSON, YAML) in its tool arguments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate the metadata&lt;/strong&gt; — block the call if it's missing or malformed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fire the side effect&lt;/strong&gt; — parse the metadata and trigger your downstream action&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The agent never needs to know&lt;/strong&gt; — it thinks it's satisfying a governance requirement&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: by framing side effects as governance requirements, you get both compliance enforcement AND automated workflows from a single hook. The agent is incentivized to include the metadata (because the call fails without it), and the platform gets free automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Aspect-oriented programming solved cross-cutting concerns in enterprise Java twenty years ago. The same pattern — interception at defined join points, transparent augmentation, composition of independent aspects — solves cross-cutting concerns in autonomous AI agent systems today.&lt;/p&gt;

&lt;p&gt;The difference is that agents can't be refactored to call the right methods. They're non-deterministic. They forget. They improvise. That's exactly why enforcement-triggered side effects are more powerful than explicit tool calls: you remove agent discretion from the equation entirely.&lt;/p&gt;

&lt;p&gt;One tool call. Governance satisfied. Side effects fired. Zero extra tokens. That's AOP for AI agents.&lt;/p&gt;

</description>
      <category>github</category>
      <category>copilotcli</category>
      <category>agenticdevelopment</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>GitHub Just Shipped What I Built 2 Months Ago — And That's a Good Thing</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Wed, 20 May 2026 13:53:56 +0000</pubDate>
      <link>https://dev.to/htekdev/github-just-shipped-what-i-built-2-months-ago-and-thats-a-good-thing-1j9g</link>
      <guid>https://dev.to/htekdev/github-just-shipped-what-i-built-2-months-ago-and-thats-a-good-thing-1j9g</guid>
      <description>&lt;h2&gt;
  
  
  The Pattern Is Undeniable Now
&lt;/h2&gt;

&lt;p&gt;On May 18, GitHub made &lt;a href="https://github.blog/changelog/2026-05-18-remote-control-for-copilot-cli-sessions-now-generally-available-on-mobile-web-and-vs-code/" rel="noopener noreferrer"&gt;remote control for Copilot CLI sessions generally available&lt;/a&gt; — on mobile, web, and VS Code. You start a session on your workstation, scan a QR code, and steer your agent from your phone while walking the dog.&lt;/p&gt;

&lt;p&gt;I published &lt;a href="https://htek.dev/articles/copilot-cli-telegram-bridge-mobile-ai-terminal/" rel="noopener noreferrer"&gt;a 3,000-word guide to doing exactly this via Telegram&lt;/a&gt; on April 11. Same core concept: your AI agent runs on your machine, you interact with it from your pocket. Different implementation, identical insight.&lt;/p&gt;

&lt;p&gt;This isn't an "I told you so" moment. This is a &lt;strong&gt;validation moment&lt;/strong&gt;. When the team building the tool arrives at the same architectural conclusion you reached independently — mobile-first agent interaction isn't optional, it's inevitable — that tells you something important about where this industry is headed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GitHub Shipped
&lt;/h2&gt;

&lt;p&gt;The feature dropped in &lt;a href="https://github.blog/changelog/2026-04-13-remote-control-cli-sessions-on-web-and-mobile-in-public-preview/" rel="noopener noreferrer"&gt;public preview on April 13&lt;/a&gt; and hit GA on May 18. Here's the core workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start a session: &lt;code&gt;copilot --remote&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The CLI displays a link and QR code&lt;/li&gt;
&lt;li&gt;Open it in the GitHub Mobile app or any browser&lt;/li&gt;
&lt;li&gt;Your session streams in real time — you can steer, approve permissions, send follow-up prompts, switch modes, or stop execution entirely&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The GA release expanded the scope significantly: it now works with non-GitHub repositories, supports VS Code and JetBrains as surfaces, and lets you queue messages while the agent is mid-turn. The &lt;code&gt;--remote&lt;/code&gt; flag transforms your local agent into a service you can access from anywhere.&lt;/p&gt;

&lt;p&gt;This is excellent engineering. Clean, secure (sessions are private to the authenticated user), and integrated directly into the existing GitHub ecosystem. Business and Enterprise users get admin controls. The session link lives alongside your repo in the Agents tab. It's clearly a first-class feature, not an afterthought.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built in April
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://htek.dev/articles/copilot-cli-telegram-bridge-mobile-ai-terminal/" rel="noopener noreferrer"&gt;Telegram bridge extension&lt;/a&gt; solves the same fundamental problem with a different architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional messaging&lt;/strong&gt; — every Telegram message becomes a prompt, every response forwards back&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Photo support&lt;/strong&gt; — send images from your phone for vision analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice notes&lt;/strong&gt; — transcribed via Whisper and forwarded as text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron scheduling&lt;/strong&gt; — agents run on schedules, report back to Telegram automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom tools&lt;/strong&gt; — &lt;code&gt;telegram_send_message&lt;/code&gt;, &lt;code&gt;telegram_send_photo&lt;/code&gt;, &lt;code&gt;telegram_get_status&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire thing is one &lt;code&gt;.mjs&lt;/code&gt; file in &lt;code&gt;.github/extensions/&lt;/code&gt;. No external servers, no Docker, no cloud functions. It uses the Telegram Bot API over HTTP — the same protocol Telegram has provided since 2015.&lt;/p&gt;

&lt;p&gt;Here's the key architectural difference: GitHub's remote sessions stream your existing CLI session to a viewer. My Telegram bridge creates a &lt;strong&gt;new interaction surface&lt;/strong&gt; — the agent is always listening, even when no terminal is open. Combined with &lt;a href="https://docs.github.com/en/copilot/github-copilot-in-the-cli/using-copilot-cli/scheduling-agents-with-cron" rel="noopener noreferrer"&gt;cron-scheduled agents&lt;/a&gt;, it becomes a persistent service. My daily briefing agent fires at 6:30 AM and sends me a compiled report in Telegram before I'm out of bed.&lt;/p&gt;

&lt;p&gt;I wrote more about this always-on pattern in &lt;a href="https://htek.dev/articles/copilot-home-assistant-ai-runs-my-household/" rel="noopener noreferrer"&gt;the article about open-sourcing my home assistant&lt;/a&gt; — 17 agents, 16 extensions, all orchestrated through Telegram.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Insight Both Approaches Share
&lt;/h2&gt;

&lt;p&gt;Strip away the implementation details and both GitHub's &lt;code&gt;--remote&lt;/code&gt; and my Telegram bridge express the same thesis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The interface to AI agents shouldn't be limited to the device running them.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sounds obvious in hindsight. But look at the AI coding tool landscape even six months ago: every tool assumed you were sitting at your computer, staring at the terminal, actively supervising. The "agentic" revolution was still tethered to a physical desk.&lt;/p&gt;

&lt;p&gt;The insight that unlocks everything is recognizing that &lt;strong&gt;agents don't need real-time supervision&lt;/strong&gt; — they need periodic steering. And steering can happen from anywhere. A quick message from your phone while you're in line at the grocery store. A plan approval while waiting for your kid's soccer practice to end. A "stop, wrong approach" while scrolling on the couch.&lt;/p&gt;

&lt;p&gt;The reason mobile-first matters isn't convenience. It's &lt;strong&gt;parallelism&lt;/strong&gt;. When your agent interaction model requires a terminal in front of you, you're serializing your attention. One task at a time. But when your agent can work autonomously and you steer from your phone, suddenly you're genuinely running parallel workflows. The agent handles the mechanical work; you handle judgment calls asynchronously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the DIY Approach Goes Further
&lt;/h2&gt;

&lt;p&gt;GitHub's implementation is polished and production-ready out of the box. But an extension-based approach has capabilities that a platform-native solution can't easily replicate:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent orchestration.&lt;/strong&gt; I'm not steering one session. I'm running &lt;a href="https://htek.dev/articles/53-agents-zero-chaos-multi-agent-orchestration-patterns/" rel="noopener noreferrer"&gt;53 agents&lt;/a&gt; that communicate via &lt;a href="https://htek.dev/articles/agent-mesh-cross-session-communication-copilot-cli/" rel="noopener noreferrer"&gt;cross-session mesh&lt;/a&gt;. An orchestrator agent dispatches work to specialized sub-agents — finance, content, scheduling, health — and they report back through Telegram. Try doing that with a single &lt;code&gt;--remote&lt;/code&gt; session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proactive notifications.&lt;/strong&gt; GitHub's remote sessions are pull-based: you open the link to check status. My Telegram bridge is push-based: the agent messages &lt;em&gt;me&lt;/em&gt; when something needs attention. "Your CI failed on PR #47." "Your briefing is ready." "The grocery order is confirmed." No polling required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance hooks.&lt;/strong&gt; Because it's an extension, I wire it into my &lt;a href="https://htek.dev/articles/hookflows-governed-git-for-ai-agents/" rel="noopener noreferrer"&gt;hookflow system&lt;/a&gt; — approval gates, spending limits, safety protocols. The agent can't merge a PR without my explicit Telegram reply of "approved." That's not just remote access — it's remote governance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform independence.&lt;/strong&gt; Telegram works on iOS, Android, desktop, web, tablets, smartwatches. It works offline and syncs when you reconnect. It doesn't require a GitHub account on the device. My wife can send my agent a message ("add diapers to the grocery list") without knowing what GitHub is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Convergence Means for the Industry
&lt;/h2&gt;

&lt;p&gt;When GitHub, a platform serving 150M+ developers, ships a feature that independent builders already prototyped — that's a signal. It means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mobile-first agent interaction is table stakes.&lt;/strong&gt; Every AI coding tool will ship this within 12 months. The desk-bound model is dead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The extension ecosystem is where innovation happens.&lt;/strong&gt; My Telegram bridge existed months before the native feature because &lt;a href="https://docs.github.com/en/copilot/github-copilot-in-the-cli/developing-copilot-cli-extensions" rel="noopener noreferrer"&gt;Copilot CLI extensions&lt;/a&gt; let you build outside the product roadmap. The extensibility model is the product.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The real competition isn't between tools — it's between interaction paradigms.&lt;/strong&gt; Chat interfaces, terminal sessions, IDE panels, mobile apps, Telegram bots, voice commands — the winners will be platforms that support all of them simultaneously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agents are becoming services, not tools.&lt;/strong&gt; A tool requires your presence. A service works for you whether you're watching or not. GitHub's &lt;code&gt;--remote&lt;/code&gt; moves Copilot from tool toward service. Extensions like the Telegram bridge complete that transformation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;GitHub's remote sessions will get better. I expect deeper mobile app integration, richer notification controls, and eventually multi-session management from the phone. The public preview to GA path was fast — barely 5 weeks — which tells me the team has conviction about this direction.&lt;/p&gt;

&lt;p&gt;On my end, I'm pushing the Telegram bridge toward &lt;strong&gt;voice-first interaction&lt;/strong&gt;. Voice notes already work via Whisper transcription, but I want real-time voice conversations with my agent — think phone calls, not text messages. I'm also exploring &lt;a href="https://htek.dev/articles/phone-mcp-server-android-ai-assistant/" rel="noopener noreferrer"&gt;MCP-connected phones&lt;/a&gt; as a deeper integration layer where the agent doesn't just &lt;em&gt;receive&lt;/em&gt; messages from your phone — it &lt;em&gt;controls&lt;/em&gt; phone capabilities directly.&lt;/p&gt;

&lt;p&gt;The future isn't "AI in the terminal." The future is AI everywhere, steered from whatever device you're holding. GitHub just proved that isn't a fringe opinion — it's the roadmap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2026-05-18-remote-control-for-copilot-cli-sessions-now-generally-available-on-mobile-web-and-vs-code/" rel="noopener noreferrer"&gt;GitHub Changelog: Remote control for Copilot CLI sessions (GA)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2026-04-13-remote-control-cli-sessions-on-web-and-mobile-in-public-preview/" rel="noopener noreferrer"&gt;GitHub Changelog: Remote control CLI sessions (Public Preview)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/copilot-cli-telegram-bridge-mobile-ai-terminal/" rel="noopener noreferrer"&gt;My Telegram Bridge guide on htek.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/phone-mcp-server-android-ai-assistant/" rel="noopener noreferrer"&gt;Phone as MCP Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/53-agents-zero-chaos-multi-agent-orchestration-patterns/" rel="noopener noreferrer"&gt;53 Agents, Zero Chaos&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/agent-mesh-cross-session-communication-copilot-cli/" rel="noopener noreferrer"&gt;Agent Mesh: Cross-Session Communication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://htek.dev/articles/copilot-cli-remote-access-your-agent-from-anywhere/" rel="noopener noreferrer"&gt;Copilot CLI Remote Access Deep Dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/copilot/github-copilot-in-the-cli/developing-copilot-cli-extensions" rel="noopener noreferrer"&gt;Copilot CLI Extensions documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>github</category>
      <category>copilotcli</category>
      <category>agenticdevelopment</category>
      <category>automation</category>
    </item>
    <item>
      <title>Platform Engineering with GitHub: Build Your IDP with Copilot, IssueOps, and Golden-Path Repos</title>
      <dc:creator>Hector Flores</dc:creator>
      <pubDate>Wed, 20 May 2026 02:42:04 +0000</pubDate>
      <link>https://dev.to/htekdev/platform-engineering-with-github-build-your-idp-with-copilot-issueops-and-golden-path-repos-4gah</link>
      <guid>https://dev.to/htekdev/platform-engineering-with-github-build-your-idp-with-copilot-issueops-and-golden-path-repos-4gah</guid>
      <description>&lt;p&gt;Every enterprise team I talk to is drowning in the same problem: &lt;strong&gt;toolchain sprawl&lt;/strong&gt;. Backstage instances nobody maintains. ServiceNow tickets that take 3 days to provision a repo. Confluence pages with onboarding steps from 2022. Developers spending 40% of their time fighting infrastructure instead of shipping product.&lt;/p&gt;

&lt;p&gt;Platform engineering promises to fix this — and the industry agrees. &lt;a href="https://www.gartner.com/en/articles/what-is-platform-engineering" rel="noopener noreferrer"&gt;Gartner predicts&lt;/a&gt; that by 2026, 80% of software engineering organizations will establish platform teams. But here's what most teams get wrong: &lt;strong&gt;they think they need to build another tool.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They don't. GitHub already is the platform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the complete implementation?&lt;/strong&gt; This article covers the architecture overview. Newsletter subscribers get the real configs, full code, and step-by-step implementation details. &lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;Subscribe to the htek.dev newsletter →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem with "Build Your Own IDP"
&lt;/h2&gt;

&lt;p&gt;I've seen it play out the same way at every Fortune 500 company I've worked with. A platform team spins up a Backstage instance, spends 6 months building plugins, and ends up with a portal that developers still don't want to use — because it's &lt;em&gt;another tab&lt;/em&gt;. Another login. Another thing to maintain.&lt;/p&gt;

&lt;p&gt;Meanwhile, every developer on the team already lives in GitHub 8 hours a day.&lt;/p&gt;

&lt;p&gt;The insight that changed everything for me: &lt;strong&gt;the best platform is invisible&lt;/strong&gt;. It meets developers where they already are — in their IDE, in their pull requests, in their issues. You don't need a separate portal. You need GitHub, used correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Golden Path Pattern
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;golden path&lt;/strong&gt; isn't a locked-down template. It's an opinionated default that accelerates developers without restricting them. Think of it like Rails conventions — you &lt;em&gt;can&lt;/em&gt; deviate, but the default path is so good that most people don't need to.&lt;/p&gt;

&lt;p&gt;In the GitHub ecosystem, golden paths are &lt;strong&gt;starter repos + Copilot context&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/htekdev/copilot-instructions-starter" rel="noopener noreferrer"&gt;copilot-instructions-starter&lt;/a&gt;&lt;/strong&gt; — Drop-in &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt; templates that give Copilot the context to understand your org's conventions from day one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/htekdev/copilot-agent-starter" rel="noopener noreferrer"&gt;copilot-agent-starter&lt;/a&gt;&lt;/strong&gt; — Scaffold custom Copilot CLI agents with proper extension architecture, hooks, and skill files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/htekdev/copilot-life-os-starters" rel="noopener noreferrer"&gt;copilot-life-os-starters&lt;/a&gt;&lt;/strong&gt; — Full starter kits for building agentic systems on top of Copilot CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a new developer joins the team and creates a repo from your golden-path template, they inherit the right CI/CD pipelines, the right Copilot context, the right linting rules, and the right security policies. &lt;strong&gt;Onboarding drops from days to minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  IssueOps: Eliminate the Ticketing Layer
&lt;/h2&gt;

&lt;p&gt;Why send developers to ServiceNow when they can just open a GitHub Issue?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IssueOps&lt;/strong&gt; turns GitHub Issues into the interface for platform operations. Need a new environment? Open an issue with a specific label. Need a database provisioned? Issue template with the right inputs. GitHub Actions picks it up, runs the automation, and comments back with the result.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/htekdev/gh-aw-overview" rel="noopener noreferrer"&gt;gh-aw-overview&lt;/a&gt; repo demonstrates this pattern — using GitHub's native primitives (Issues, Actions, labels, comments) as the control plane for platform operations. Developers never leave GitHub. No context switching. No ticket queue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hookflows: Governance Without Friction
&lt;/h2&gt;

&lt;p&gt;The hardest part of platform engineering isn't building the golden path — it's keeping people on it without becoming a bottleneck.&lt;/p&gt;

&lt;p&gt;This is where &lt;a href="https://github.com/htekdev/gh-hookflow" rel="noopener noreferrer"&gt;hookflows&lt;/a&gt; change the game. Hookflows intercept actions at the agent layer — validating commits, enforcing branch naming, checking policy compliance — &lt;em&gt;before&lt;/em&gt; they hit the repo. They're governance guardrails that run automatically.&lt;/p&gt;

&lt;p&gt;Combined with &lt;a href="https://github.com/htekdev/copilot-hooks-starter" rel="noopener noreferrer"&gt;copilot-hooks-starter&lt;/a&gt;, you get a pre-built framework for intercepting and validating agent operations. The &lt;a href="https://github.com/htekdev/copilot-ci-pipeline" rel="noopener noreferrer"&gt;copilot-ci-pipeline&lt;/a&gt; repo extends this into CI — giving you a full feedback loop from commit to deployment.&lt;/p&gt;

&lt;p&gt;I wrote more about this pattern in &lt;a href="https://htek.dev/articles/hookflows-governed-git-for-ai-agents/" rel="noopener noreferrer"&gt;my article on governing AI agents in git&lt;/a&gt; — the principles apply equally to human and AI-driven workflows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Newsletter subscribers get the real configs.&lt;/strong&gt; The full hookflow definitions, the IssueOps action templates, and the architecture diagrams that connect all 7 repos into a cohesive platform. &lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;Get the implementation details →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The 7-Repo Stack
&lt;/h2&gt;

&lt;p&gt;Here's the full stack, all open source and production-tested:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repo&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/copilot-instructions-starter" rel="noopener noreferrer"&gt;copilot-instructions-starter&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Org-wide Copilot context templates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/copilot-agent-starter" rel="noopener noreferrer"&gt;copilot-agent-starter&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Custom Copilot CLI agent scaffolding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/copilot-hooks-starter" rel="noopener noreferrer"&gt;copilot-hooks-starter&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Agent-layer governance hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/copilot-ci-pipeline" rel="noopener noreferrer"&gt;copilot-ci-pipeline&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;CI feedback loop for AI-assisted dev&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/gh-hookflow" rel="noopener noreferrer"&gt;gh-hookflow&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Governed git operations framework&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/gh-aw-overview" rel="noopener noreferrer"&gt;gh-aw-overview&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;IssueOps platform operations pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/htekdev/copilot-life-os-starters" rel="noopener noreferrer"&gt;copilot-life-os-starters&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Full agentic system starter kits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't demos. I built and validated this stack while running a DevOps enablement platform at a Fortune 500 energy company — supporting hundreds of repos and dozens of development teams. The patterns scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GitHub IS the Platform
&lt;/h2&gt;

&lt;p&gt;The realization that unlocked all of this: &lt;strong&gt;you don't need a separate platform layer.&lt;/strong&gt; GitHub already has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity &amp;amp; access&lt;/strong&gt; (Teams, CODEOWNERS, branch protection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service catalog&lt;/strong&gt; (repo topics, README conventions, &lt;code&gt;copilot-instructions.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-service provisioning&lt;/strong&gt; (IssueOps + Actions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance &amp;amp; governance&lt;/strong&gt; (hookflows, required checks, audit logs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer AI&lt;/strong&gt; (GitHub Copilot with full repo context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment&lt;/strong&gt; (Actions + environments + OIDC)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every piece of Backstage functionality has a native GitHub equivalent — and developers already know how to use it. Your platform team's job isn't to build a portal. It's to &lt;strong&gt;configure GitHub as a platform&lt;/strong&gt; and encode golden paths that make the right thing the easy thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Go Deeper
&lt;/h2&gt;

&lt;p&gt;If you're building (or rebuilding) an internal developer platform, I wrote a full implementation guide as part of &lt;a href="https://htek.dev/blueprints/the-agentic-development-blueprint" rel="noopener noreferrer"&gt;The Agentic Development Blueprint&lt;/a&gt; — including architecture diagrams, configuration files, and the decision framework for what goes in golden paths versus what stays flexible.&lt;/p&gt;

&lt;p&gt;For related patterns, check out &lt;a href="https://htek.dev/articles/github-copilot-cli-extensions-complete-guide/" rel="noopener noreferrer"&gt;my guide to Copilot CLI extensions&lt;/a&gt; and &lt;a href="https://htek.dev/articles/hookflows-governed-git-for-ai-agents/" rel="noopener noreferrer"&gt;how hookflows enforce governed git for AI agents&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This was the architecture overview.&lt;/strong&gt; The newsletter issue has the step-by-step implementation — exact configs, IssueOps templates, hookflow definitions, and the full wiring diagram connecting all 7 repos into one cohesive IDP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://htek.dev/newsletter" rel="noopener noreferrer"&gt;Subscribe to the htek.dev newsletter →&lt;/a&gt;&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>github</category>
      <category>devops</category>
      <category>devex</category>
    </item>
  </channel>
</rss>
