I added AI code review and failure analysis to my CI/CD pipeline using GitHub Actions and GPT-4o-mini

#githubactions #ai #dotnet #devops

Every pull request in my IDP Platform project now gets an automatic AI
code review before anyone looks at it. When the pipeline fails, an AI
posts a root cause analysis explaining what went wrong and how to fix it.

Both run automatically inside GitHub Actions using GPT-4o-mini. No
external services, no extra infrastructure, no monthly subscription.

The problem with manual code review

Code review is valuable but it has a bottleneck. The reviewer needs
context, time, and attention. For a solo developer or a small team,
that bottleneck slows everything down. Even experienced developers miss
things when they are tired or rushing.

AI does not replace code review. It adds a first pass that catches
obvious issues before a human spends time on them. Things like missing
error handling, security anti-patterns, performance problems, and
violations of framework conventions.

How the AI code review works

When a pull request is opened or updated, a GitHub Actions workflow runs
automatically. It gets the diff of all changed C# files, sends it to
GPT-4o-mini with a prompt describing the review criteria, and posts the
response as a comment on the PR.

The whole thing runs in under 15 seconds.

# Get the diff using subprocess for reliability
result = subprocess.run(
    ['git', 'diff', f'origin/{base_ref}...HEAD', '--', '*.cs'],
    capture_output=True, text=True
)
diff = result.stdout[:6000]

# Send to OpenAI
payload = {
    "model": "gpt-4o-mini",
    "max_tokens": 1000,
    "messages": [
        {
            "role": "system",
            "content": "You are a senior .NET engineer reviewing a pull 
            request. Give concise actionable feedback on correctness, 
            security, performance, and .NET best practices."
        },
        {
            "role": "user",
            "content": f"Review this diff:\n\n```
{% endraw %}
diff\n{diff}\n
{% raw %}
```"
        }
    ]
}

The system prompt is what controls the quality of the review. I spent
more time on the prompt than on the code around it. Telling the AI to
act as a senior .NET engineer and focus on specific categories produces
much more useful output than a generic review request.

A real review from the pipeline

Here is what the AI posted on an actual PR in my project:

It flagged an invalid port number in a comment I had left in Program.cs.
It questioned whether database migration error handling was sufficient.
It noted that Swagger should be restricted in production environments.
It pointed out a missing newline at the end of the file.

None of those are critical issues but all of them are worth knowing about
before merging. The AI caught them in 14 seconds before I even looked at
the PR.

How the failure analysis works

The second workflow runs when the CI/CD pipeline fails. It fetches the
build logs from the GitHub API, sends them to GPT-4o-mini, and posts the
analysis as a check on the commit.

This is particularly useful for cryptic build errors. Instead of reading
through hundreds of lines of MSBuild output, you get a plain English
explanation of what failed and what to do about it.

The workflow structure

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - 'src/**/*.cs'
      - 'src/**/*.csproj'

jobs:
  ai-review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: AI Code Review
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          BASE_REF: ${{ github.base_ref }}
        run: |
          python3 << 'PYTHON'
          # get diff, call OpenAI, post comment
          PYTHON

The paths filter is important - the workflow only runs when C# files
change. Updating a README does not trigger a code review. This keeps the
pipeline fast and the API costs minimal.

What it costs

GPT-4o-mini charges roughly $0.15 per million input tokens and $0.60 per
million output tokens. A typical code review diff is around 2000 tokens.
At that rate you could run 500 code reviews for about $0.15.

For a portfolio project or small team this is effectively free. Even at
50 PRs a month the cost is under $0.02.

What I learned

The system prompt matters more than anything else. A vague prompt like
"review this code" produces generic output. A specific prompt that names
the language, the role, the focus areas, and the output format produces
review comments that are actually useful.

Passing environment variables into Python heredocs requires care. GitHub
Actions expressions like ${{ github.base_ref }} are not expanded inside
heredocs. The fix is to set them as environment variables first and read
them with os.environ inside the script.

subprocess is more reliable than os.popen for running shell commands in
Python. It captures stdout and stderr separately and handles errors more
predictably.

What is next

The natural next step is AI-generated release notes. When a PR is merged
to main, the AI reads all the commit messages and diff since the last
release and writes a structured changelog entry automatically. No more
manually writing release notes.

I am also looking at adding a security scan step that uses AI to check
for common vulnerability patterns before the standard SAST tools run.

Source code: https://github.com/aftabkh4n/idp-platform

If you are building something similar or have ideas for improving the
prompts, drop a comment below.