How I Built an AI That Diagnoses GitHub Actions Failures Automatically

#cicd #devops #github #ai

The Problem

GitHub Actions failure logs are noisy. Finding the actual error in 500 lines of output takes time you don't have during an incident. You get a red X, click through multiple pages, scroll past runner setup noise and dependency install output, land on the real error — and then you have to figure out what it means and what to do about it. I was doing this loop manually too often, so I automated it.

What It Does

The tool fetches your repository's failed workflow runs via the GitHub API, extracts the relevant error sections from job logs, pulls in the workflow YAML for context, and sends everything to Ollama running locally. You get back a root cause, an explanation of why it happened, exact YAML changes to make, and steps to prevent it from happening again — in about 15 seconds.

Zero cloud costs. Your logs and code never leave your machine.

The 5 Failure Types It Handles

Dependency conflicts — version mismatches, packages missing from requirements.txt or package.json
Missing secrets — env vars referenced in the workflow but not configured in repository settings
Permission errors — GITHUB_TOKEN scope issues, OIDC misconfiguration, action not allowed
Docker build failures — base image not found, build context issues, registry auth failures
Flaky tests — identifies non-deterministic failures vs real bugs based on error patterns and exit codes

Architecture

The flow is straightforward: GitHub REST API → Python log parser → Ollama.

GitHub API (workflow runs + job logs + YAML)
         │
         ▼
  extract_error_from_logs()   ← keyword scan, context windows, dedup
         │
         ▼
  analyze_failure()           ← structured prompt to Ollama
         │
         ▼
  Terminal report             ← root cause + YAML changes + prevention

The log extraction step is where most of the work happens. Raw GitHub Actions logs are thousands of lines — runner diagnostics, apt output, pip install progress bars, none of which is useful. The parser scans for a list of error keywords (error:, failed, Traceback, exit code, permission denied, etc.), then captures 8 lines of context before and 12 after each hit, deduplicates overlapping windows, and caps the result at 1500 characters before sending to the AI.

def extract_error_from_logs(self, logs: str) -> str:
    lines = logs.split('\n')
    error_keywords = [
        'error:', 'failed', 'Traceback', 'fatal:',
        'command not found', 'permission denied',
        'exit code', 'returned non-zero'
    ]
    error_sections = []
    seen_lines = set()

    for i, line in enumerate(lines):
        if any(kw in line.lower() for kw in error_keywords):
            if i in seen_lines:
                continue
            start, end = max(0, i - 8), min(len(lines), i + 12)
            for j in range(start, end):
                seen_lines.add(j)
            error_sections.append('\n'.join(lines[start:end]))

    result = '\n\n'.join(error_sections)
    return result[:1500] + "\n... (truncated)" if len(result) > 1500 else result

The AI prompt includes the workflow name, failed job name, the extracted error section, and up to 1000 characters of the workflow YAML. That YAML context is what lets the AI suggest specific YAML fixes rather than generic advice.

prompt = f"""You are a GitHub Actions expert analyzing a failed CI/CD workflow.

Workflow: {workflow_name}
Failed Job: {job_name}

{workflow_context}

ERROR LOGS:
{limited_logs}

**ROOT CAUSE:** [each specific error, all of them]
**WHY THIS HAPPENED:** [plain language explanation]
**HOW TO FIX:** [numbered, actionable steps]
**YAML CHANGES:** [exact changes needed]
**PREVENTION:** [how to avoid this next time]
"""

The Most Useful Feature: Failure Classification

The thing that saves the most time isn't root cause identification — it's the flaky vs real distinction.

When a test fails with a non-deterministic error (race condition, network timeout, port already in use), re-running the workflow is the right call. When a test fails because you introduced a bug, re-running wastes 5 minutes and doesn't help. Before this tool, I'd often re-run first and only look at logs after the second failure confirmed it wasn't flaky. That's a 5-10 minute delay every time.

The AI picks this up from patterns in the error output. A ConnectionRefused or socket timeout alongside a test failure is a different signal than a clean AssertionError: expected 200, got 404. The prompt explicitly asks for this classification, and Llama 3.2 handles it reliably — it's the kind of pattern matching that LLMs are genuinely good at.

The accuracy isn't perfect (the README documents ~85%), but it's good enough that I check the classification before deciding whether to re-run or investigate.

Demo

Input — a failed workflow log containing:

Run pytest tests/
ERROR: No module named 'requests'
FAILED tests/test_api.py::test_health - ModuleNotFoundError
error: Process completed with exit code 1.

Output:

🔍 GITHUB ACTIONS FAILURE ANALYSIS
Repository: myname/myrepo | Workflow: CI Build | Failed Job: test

🤖 AI DIAGNOSIS:

**ROOT CAUSE:**
pytest is failing because the 'requests' library is not installed.
The test imports it, but it's not listed in requirements.txt, so
pip install -r requirements.txt doesn't include it.

**WHY THIS HAPPENED:**
The package works locally because you have it installed globally on your
machine. In the GitHub Actions runner, only packages in requirements.txt
get installed — nothing else is available.

**HOW TO FIX:**
1. Add 'requests' to requirements.txt: requests>=2.32.0
2. Commit and push — the workflow will pick it up automatically.
3. If requests is only needed for tests, add a requirements-dev.txt
   and install it separately in a dedicated workflow step.

**YAML CHANGES:**
No YAML changes needed. The fix is in requirements.txt.

**PREVENTION:**
Run your CI workflow locally with a clean virtualenv before pushing:
  python -m venv .venv && source .venv/bin/activate
  pip install -r requirements.txt
  pytest tests/

That took 14 seconds. The manual version of this (click to run page, scroll logs, identify error, Google it, figure out the fix) is 10-15 minutes minimum.

Try It

GitHub: https://github.com/ThinkWithOps/ai-devops-projects
Video: https://youtu.be/EwgdZ8KmBJg

# Prerequisites: Ollama running with llama3.2 pulled
git clone https://github.com/ThinkWithOps/ai-devops-projects
cd ai-devops-projects/04-ai-github-actions-healer
pip install -r requirements.txt

# Set your GitHub token (needs repo + workflow scopes)
export GITHUB_TOKEN="your_token_here"

# Run against your repo
python src/github_actions_healer.py --repo your-username/your-repo

# Save report to file
python src/github_actions_healer.py --repo owner/repo --output report.json

Project 4 in my AI+DevOps series — all tools run locally with Ollama, zero cloud AI costs. Project 3 was an AI AWS Cost Detective, Project 5 is an AI Terraform Code Generator. Links in my profile.

What GitHub Actions failure do you dread the most? For me it's the OIDC / GITHUB_TOKEN permission errors — the error message never tells you which specific permission is missing. The AI actually handles those surprisingly well. Drop yours in the comments.