~K¹yle Million

Posted on Apr 21

Claude Code in CI/CD: Running Autonomous Agents in GitHub Actions

#claudecode #devtools #aiagents #githubactions

Claude Code in CI/CD: Running Autonomous Agents in GitHub Actions

Part of a series on Claude Code in production. Previous: Error Recovery Patterns

You've built an autonomous Claude Code agent that works locally. Now you want it to run in CI/CD — triggered on PR open, on schedule, or as part of a deploy pipeline. The mechanics seem simple. The failure modes aren't.

This post covers the actual patterns for running Claude Code agents in GitHub Actions: authentication, tool approval, output handling, and the three mistakes that cause pipeline agents to silently fail.

Why CI/CD agents fail silently

Local Claude Code agents fail loudly. They block on prompts. They show you stderr. They stop when they can't proceed.

GitHub Actions agents fail silently. A subprocess exits with code 0. The pipeline passes. The work was never done.

Three root causes:

1. Missing --allowedTools causes silent prompt block. Claude Code, when it encounters a tool call that needs approval, will prompt. In a non-TTY environment, that prompt blocks forever until the process is killed by the runner timeout. Your pipeline "hangs," then fails — but your logs just show "process killed after 6h."

2. Output goes to stdout but gets swallowed. Claude Code's --output-format text mode writes the agent's output to stdout. If you don't explicitly capture it, it disappears into the Actions log. You'll have no artifact, no evidence the agent ran, nothing to route to downstream steps.

3. Auth fails at runtime, not at startup. The ANTHROPIC_API_KEY gets injected correctly, the process starts, and then the first API call fails with a 401. Exit code 1. The pipeline shows "Error" but nothing about what went wrong.

The fix for all three is in how you invoke Claude Code.

The invocation pattern

- name: Run Claude Code agent
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: |
    claude -p "$TASK_PROMPT" \
      --allowedTools "Bash(*),Read(*),Write(*),Edit(*)" \
      --output-format text \
      > outputs/agent_result.txt 2>outputs/agent_errors.txt
    echo "Exit code: $?"

Three things happening here:

-p (print mode): Non-interactive headless execution. No TTY needed. The agent runs, writes output, and exits. This is the mode to always use in CI/CD.

--allowedTools: Pre-approves tool calls. Without this, Claude Code silently blocks on the first tool call that needs approval. List every tool your agent uses. Bash(*) approves all bash commands; Bash(cd,ls,cat) restricts to specific commands. The asterisk is appropriate for trusted pipelines; be more restrictive for public PRs.

Output redirect: > outputs/agent_result.txt captures the agent's output as a file artifact. 2>outputs/agent_errors.txt captures stderr separately. Upload both as Actions artifacts so you can debug failures without re-running.

Authentication in Actions

env:
  ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

This is the minimum. Claude Code reads ANTHROPIC_API_KEY from the environment.

If your agent calls other APIs (Supabase, Stripe, Railway, etc.), those credentials need to be injected the same way:

env:
  ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
  SUPABASE_SERVICE_KEY: ${{ secrets.SUPABASE_SERVICE_KEY }}
  STRIPE_SECRET_KEY: ${{ secrets.STRIPE_SECRET_KEY }}

Don't try to source a .env file in CI/CD. The file won't exist in the runner environment, and if it does (because you committed it), you have a bigger problem. Inject each credential explicitly.

The CLAUDE.md in CI/CD pipelines

Claude Code auto-discovers CLAUDE.md in the working directory. This matters in CI/CD: if your repo has a CLAUDE.md, the agent running in your pipeline will read it.

This is a feature, not a bug — as long as your CLAUDE.md is written for it.

Patterns that work well in CI/CD CLAUDE.md:

## CI/CD Context
When running in GitHub Actions (detect: CI=true env var):
- Write all outputs to ./ci_outputs/ directory
- Do not prompt for confirmation on any action
- Exit 0 on success, exit 1 on any unrecoverable error
- Include a summary section at the end of every output file

Patterns that break CI/CD:

## Ask Kyle Before...

Any instruction that assumes human review will cause the agent to either silently skip the check or block.

Three useful pipeline patterns

1. PR review agent

Trigger: pull_request event. Agent reads the diff, checks against architectural guidelines, posts a comment.

name: Claude Code PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run review agent
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.number }}
        run: |
          DIFF=$(git diff origin/${{ github.base_ref }}...HEAD)
          claude -p "Review this PR diff for architectural issues, security vulnerabilities, and violations of our CLAUDE.md guidelines. PR #$PR_NUMBER. Diff: $DIFF. Post findings as a GitHub PR comment using: gh pr comment $PR_NUMBER --body 'findings'" \
            --allowedTools "Bash(gh,git),Read(*)" \
            --output-format text \
            > ci_outputs/review_result.txt

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: review-output
          path: ci_outputs/

The key: --allowedTools "Bash(gh,git),Read(*)" — only the specific commands the review agent needs. gh for posting the comment, git for reading the diff. Nothing else.

2. Scheduled maintenance agent

Trigger: cron schedule. Agent checks for stale dependencies, expired credentials, drift from baseline.

name: Weekly Maintenance

on:
  schedule:
    - cron: '0 9 * * 1'  # Monday 9am UTC

jobs:
  maintain:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run maintenance agent
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude -p "Run weekly maintenance: check package.json for outdated major versions, verify .env.example matches current .env schema, check for any TODO comments added in the last 7 days. Write a maintenance report to ci_outputs/maintenance_$(date +%Y%m%d).md" \
            --allowedTools "Bash(npm,git,grep,find,date),Read(*),Write(*)" \
            --output-format text

      - uses: actions/upload-artifact@v4
        with:
          name: maintenance-report
          path: ci_outputs/

3. Post-deploy verification agent

Trigger: after deploy step completes. Agent hits the deployed endpoints, checks health, writes a verification report.

- name: Verify deployment
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    DEPLOY_URL: ${{ steps.deploy.outputs.url }}
  run: |
    claude -p "Verify the deployment at $DEPLOY_URL. Check: /health endpoint returns 200, /api/status returns expected schema, response times under 500ms. Write verification report to ci_outputs/deploy_verify.md. Exit 1 if any check fails." \
      --allowedTools "Bash(curl,jq)" \
      --output-format text \
      > ci_outputs/verify.txt
    cat ci_outputs/verify.txt

Note the Exit 1 if any check fails instruction in the prompt. This is how you make a Claude Code agent fail the pipeline — instruct it explicitly. Claude Code itself exits 0 unless the process errors; if you want business-logic failures to fail the pipeline, you need to tell the agent to exit non-zero.

Cost control in CI/CD

Every pipeline run is an API call. On a busy repo, uncontrolled agents will drain your API budget.

Mechanisms to control it:

Model routing in the prompt:

Use claude-haiku-4-5 for this task. It's a structured review, not open-ended reasoning.

Or set ANTHROPIC_MODEL=claude-haiku-4-5-20251001 in the environment before the claude invocation.

Scope restriction: Narrow --allowedTools to exactly what the agent needs. An agent that can only Read(*) can't accidentally trigger expensive downstream operations.

Conditional execution:

- name: Run expensive agent
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
  run: claude -p "..." ...

Only run the expensive agent on main branch pushes, not on every PR commit.

Output caching: If the agent's inputs haven't changed (same files, same config), skip the run:

- uses: actions/cache@v4
  id: agent-cache
  with:
    path: ci_outputs/
    key: agent-${{ hashFiles('src/**/*.ts') }}

- name: Run agent
  if: steps.agent-cache.outputs.cache-hit != 'true'
  run: claude -p "..." ...

The approval boundary in CI/CD

The hardest part of CI/CD agents isn't the mechanics — it's knowing what the agent is allowed to do without human review.

For public repositories with external contributors, an agent triggered by a PR should never have credentials that allow writes to production. It should only read and comment.

For internal automation, the boundary is:

Self-authorized: Read any file, write to designated output dirs, call read-only APIs, post comments
Requires review: Write to production databases, deploy to production, send external communications, modify CI/CD config

Write this boundary into your CLAUDE.md. Make the agent refuse anything outside it and write an explanation to ci_outputs/blocked.md instead.

What I'm running in production

My current Claude Code CI/CD setup:

10-minute inbox check (every 10min, 7am–11pm): reads task files, executes, archives
Weekly ClawMart report (Monday 9am): listing health, revenue data, next actions
Post-deploy verification: health endpoint checks after every Railway deploy
dev.to ban check: (removed — purpose complete) was a 6h scheduled check for API access

All of these are headless -p mode with explicit --allowedTools. All write outputs as files to a designated directory. All send Telegram notifications on completion/failure.

The pattern is the same for all: narrow tool approval, file-based outputs, explicit exit codes, notification on completion.

The single failure mode to avoid

The most expensive mistake in CI/CD agents: an agent that hangs instead of failing.

A hanging agent occupies a runner slot for hours, burns API credits as it blocks waiting for a prompt, and produces no output for the pipeline. GitHub Actions will kill it after 6 hours by default — by then you've paid for the runner time and the API time.

Prevent it: --allowedTools must include every tool the agent will call. Test locally with claude -p "..." before wiring into CI/CD. Catch any approval prompts. Add --timeout 300000 (5 minutes) for fast agents to enforce a hard deadline.

The full CI/CD integration pattern — with the CLAUDE.md template, the GitHub Actions workflow files, and the cost-control configuration — is packaged as a skill at shopclawmart.com/@thebrierfox. If you're wiring this into a production pipeline, the skeleton is there.

~K¹ (W. Kyle Million) / IntuiTek¹

DEV Community

Claude Code in CI/CD: Running Autonomous Agents in GitHub Actions

Claude Code in CI/CD: Running Autonomous Agents in GitHub Actions

Why CI/CD agents fail silently

The invocation pattern

Authentication in Actions

The CLAUDE.md in CI/CD pipelines

Three useful pipeline patterns

1. PR review agent

2. Scheduled maintenance agent

3. Post-deploy verification agent

Cost control in CI/CD

The approval boundary in CI/CD

What I'm running in production

The single failure mode to avoid

Top comments (0)