How I Cut My GitHub Actions CI From 11 to 4 Minutes and Added Claude-Powered Test Triage, Deploy, and Slack Alerts

#githubactions #cicd #claude #python

⚠️ この記事はアフィリエイト広告（プロモーション）を含みます。リンク先で発生した収益の一部が運営者に支払われますが、読者の購入価格には一切影響ありません。

If you copy the two workflow files in this article, you'll get a GitHub Actions pipeline that runs your Python test suite, asks the Claude API to triage only the failures into a human-readable root-cause summary, deploys to your server on green, and posts a Slack message that tells a human whether they need to wake up — all in one push. No paid CI add-ons, no third-party bots. I run this on a real repo today and it took my feedback loop from 11m20s to 4m05s on a 900-test suite.

This is not a "here's the YAML, good luck" post. I'll show you the exact mistakes that cost me two evenings: the pip cache that silently never hit, the Claude triage step that ran on green builds and burned tokens for nothing, and the deploy that fired twice because I forgot concurrency.

The actual problem: GitHub Actions logs are unreadable when 7 of 900 tests fail

The default failure UX is brutal. A red X, then you click in, scroll through 4,000 lines of pytest output, and try to find the 7 assertions that actually broke. My team was averaging ~6 minutes just locating failures before anyone started fixing them.

So the unique angle here: I don't use AI to write tests or generate code in CI. I use it as a log compressor that only runs on failure. The job pipes pytest's machine-readable report into the Claude API and gets back a ranked summary: "3 failures are one root cause (a timezone change in utils/dates.py), 4 are flaky network tests." That single change is what made people actually read the CI output.

Step 1: The pytest + pip cache job that's actually 2.7x faster

First, the test job. The non-obvious part is --json-report (from pytest-json-report), which gives us a structured artifact instead of scraping stdout later. Save this as .github/workflows/ci.yml:

name: CI
on:
  push:
    branches: [main]
  pull_request:

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"            # <- this one line saved ~90s/run

      - name: Install deps
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install pytest pytest-json-report

      - name: Run tests
        id: pytest
        run: |
          pytest -q --json-report --json-report-file=report.json
        continue-on-error: true   # <- we WANT to reach the triage step on failure

      - name: Upload test report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: pytest-report
          path: report.json

      - name: Fail job if tests failed
        if: steps.pytest.outcome == 'failure'
        run: exit 1

The failure story: I originally cached with a hand-rolled actions/cache keyed on hashFiles('requirements.txt'). It never hit, because my requirements.txt was generated by pip-compile with a timestamp comment that changed every run. The hash changed, the cache missed, every build reinstalled 140 packages (~90s wasted). Switching to setup-python's built-in cache: "pip", which keys on the resolved dependency tree, fixed it instantly. Check your own builds — if "Cache restored" never appears, you're paying this tax.

The other subtle bit is continue-on-error: true on the pytest step, paired with an explicit exit 1 at the end. Without this, the job dies on a red test and never reaches the AI triage step. We separate "collect the result" from "set the job status."

Step 2: Claude triages only the failures (and only when there are failures)

Here's the script the next job runs. It reads report.json, extracts only failed tests with their tracebacks, and sends them to the Claude API. The critical optimization: if there are zero failures, it exits before making an API call. On a green build this costs $0.00.

# scripts/triage.py
import json, os, sys
from anthropic import Anthropic

with open("report.json") as f:
    report = json.load(f)

failures = [t for t in report.get("tests", []) if t.get("outcome") == "failed"]

if not failures:
    print("All tests passed. Skipping AI triage (no API call).")
    sys.exit(0)

# Keep payload small: nodeid + the assertion/longrepr only.
lines = []
for t in failures[:40]:                       # cap to control token cost
    crash = t.get("call", {}).get("longrepr", "")
    lines.append(f"### {t['nodeid']}\n{crash[:1500]}")

prompt = (
    "You are triaging a failing CI run. Group these pytest failures by likely "
    "root cause. For each group give: (1) one-line root cause, (2) the files to "
    "look at, (3) whether it looks flaky (network/timing) vs a real regression. "
    "Be terse. Output GitHub-flavored markdown.\n\n" + "\n\n".join(lines)
)

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
)
summary = resp.content[0].text

# Hand the summary to later steps via the job summary + an output file.
with open(os.environ["GITHUB_STEP_SUMMARY"], "a") as f:
    f.write("## 🤖 Claude failure triage\n\n" + summary)

with open("triage.md", "w") as f:
    f.write(summary)

print(summary)

Notice three concrete decisions baked in from getting burned:

failures[:40] and crash[:1500]. My first version sent the full 4,000-line log. One run cost me a 38k-token prompt because a fixture error spammed the same traceback 200 times. Capping failures and truncating each traceback brought a typical triage call to ~2-4k input tokens (a fraction of a cent on Sonnet).
Writing to GITHUB_STEP_SUMMARY. This renders the AI summary directly on the run's summary page — you see the root cause without clicking into logs. This is the single highest-leverage line in the whole pipeline.
model="claude-sonnet-4-6" not Opus. Triage is summarization, not reasoning-heavy. Sonnet is fast and cheap enough that I never think about the bill.

Step 3: Wire triage + deploy + Slack into the second workflow

The test job lives in ci.yml. Deploy and notify go in a separate job that depends on it, so the dependency graph is explicit. Add this deploy and notify block (same file, after test):

  deploy:
    needs: test
    if: github.ref == 'refs/heads/main' && needs.test.result == 'success'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy over SSH
        uses: appleboy/ssh-action@v1.0.3
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.DEPLOY_SSH_KEY }}
          script: |
            cd /srv/app
            git pull --ff-only
            docker compose up -d --build

  notify:
    needs: [test, deploy]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }

      - name: Download report
        if: needs.test.result == 'failure'
        uses: actions/download-artifact@v4
        with: { name: pytest-report }

      - name: Run Claude triage
        if: needs.test.result == 'failure'
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install anthropic
          python scripts/triage.py

      - name: Post to Slack
        env:
          SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
          STATUS: ${{ needs.test.result }}
          DEPLOYED: ${{ needs.deploy.result }}
        run: |
          if [ "$STATUS" = "failure" ] && [ -f triage.md ]; then
            TEXT="❌ CI failed on \`${GITHUB_REF_NAME}\`. Triage:\n$(cat triage.md)"
          elif [ "$DEPLOYED" = "success" ]; then
            TEXT="✅ Deployed \`${GITHUB_SHA:0:7}\` to prod."
          else
            TEXT="⚠️ Pipeline finished with status: tests=$STATUS deploy=$DEPLOYED"
          fi
          PAYLOAD=$(python -c "import json,os;print(json.dumps({'text':os.environ['TEXT']}))" TEXT="$TEXT")
          curl -s -X POST -H 'Content-type: application/json' --data "$PAYLOAD" "$SLACK_WEBHOOK"

The deploy-twice bug that `concurrency` fixed

My worst failure: I merged two PRs within 90 seconds. Both main pushes triggered the pipeline, both passed, and both deployed — the second docker compose up started mid-build of the first, and prod served a half-built image for ~40 seconds. The fix is the concurrency block at the top of ci.yml with cancel-in-progress: true. The older run gets cancelled the moment a newer one starts. If you deploy from CI and don't have this, you have this bug and just haven't hit it yet.

A second gotcha: the notify job uses if: always() so it runs even when tests fail, but each step inside it is guarded (if: needs.test.result == 'failure'). Without the per-step guards, the triage step runs on green builds, fails to find report.json, and reds the whole pipeline for no reason.

What the numbers actually look like

After two weeks on this setup across ~120 runs:

CI wall time: 11m20s → 4m05s (pip cache hit + cancel-in-progress killing redundant runs).
Time-to-locate-failure: ~6 min → under 30 sec. People read the Slack triage and go straight to the named file.
Claude cost: ~$0.90 total. Because triage only runs on red builds (about 18% of runs) and the payload is capped.
One real save: the triage correctly grouped 5 failures as a single pytz deprecation and flagged 2 as flaky DNS — a human would have opened 7 tabs.

The lesson that surprised me: the value wasn't AI writing anything. It was AI deciding what a tired human should read first, gated tightly so it never runs when there's nothing to triage.

Where to take it next

Swap the SSH deploy for your platform (Fly.io, ECS, k8s — the job boundary is the same). Pipe the same triage.md into a PR comment with actions/github-script so failures annotate the PR directly. And if your suite is bigger than mine, shard pytest with pytest-xdist and merge the JSON reports before triage.

If you're leveling up your CI/CD and want the deeper GitHub Actions and Docker fundamentals behind this, the technical book and online-course offers below are the ones I actually used — pick one up here: [A8.net 技術書・プログラミングスクール計測リンク].

Clone the two YAML files, drop in scripts/triage.py, set three secrets (ANTHROPIC_API_KEY, SLACK_WEBHOOK, your SSH keys), and push. You'll have AI-triaged, auto-deploying CI before your coffee's cold.

If you found this useful: I packaged 50 copy-paste AI debugging prompts + drop-in Claude Code config templates (CLAUDE.md, settings.json, MCP) into a small kit.
Launch deal: code START50 = 50% off → 50 AI Debugging Prompts + Claude Code Config Pack (about $6, 50% off applied)
New: my 10-chapter ebook Practical Claude Code — automation & unattended operation (about $9, 50% off applied)

Top comments (1)

Mustafa ERBAY • Jun 4

The most valuable line in this entire article isn’t the deployment logic or the Claude integration.
It’s the observation that AI only runs on failure.
Too many AI workflows are built around “always generate more.” This design is different because it treats AI as an escalation mechanism rather than a default execution path.
That’s a pattern I expect to see much more often in production systems: deterministic systems first, AI involvement only when uncertainty or human attention becomes the bottleneck.