DEV Community

Aditya Bhusal
Aditya Bhusal

Posted on

I built an AI code reviewer as a GitHub Action — here's what I learned

If you've spent time on software engineering teams, you know pull request reviews are the ultimate bottleneck. They're slow, inconsistent, and often skipped entirely under deadline pressure. Reviewers get fatigued, rubber-stamp approvals become the norm, and suddenly, subtle bugs creep into the codebase. Human review is essential for architectural alignment, but for catching obvious code smells or logical flaws, it relies heavily on mental energy we simply don't always have.

Meanwhile, LLMs have exploded in capability. They are genuinely good at reading diffs, understanding context, and pointing out issues. I kept expecting someone to release a dead-simple, plug-and-play GitHub Action that harnesses this power without requiring massive enterprise subscriptions or clunky self-hosted runners. But looking around, nobody had built a lightweight, open-source tool that just works out of the box. So, I built it myself.

Enter Argus. Argus is a GitHub Action that acts as an automated first pass for pull requests. Whenever a developer opens, synchronizes, or reopens a PR, Argus triggers on the event, fetches the diff, and intelligently sends each modified file's context to Groq's Llama 3.3 70B model. The LLM then analyzes the code for potential bugs, security vulnerabilities, or performance bottlenecks.

Instead of dumping a massive wall of text into a single comment, Argus parses the structured output from the model and posts specific, inline review comments directly on the problematic lines. It categorizes each comment with a severity label—like high, medium, or low—so developers know exactly what needs immediate attention. Setting it up is ridiculously easy. Just drop this snippet into your workflow:

name: Argus Code Review
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Rozer402/argus@main
        with:
          groq_api_key: ${{ secrets.GROQ_API_KEY }}
Enter fullscreen mode Exit fullscreen mode

What genuinely surprised me was how exceptionally good Llama 3.3 70B is at reading and understanding diffs. I was skeptical an open-weights model could handle the nuance of isolated code changes, but it proved me wrong. During testing, Argus caught a hardcoded API secret I accidentally committed, flagged a missing await on an async function that would have caused a nasty race condition, and pointed out an unused variable in my own code. It wasn't hallucinating generic advice; it provided razor-sharp, context-aware feedback that saved me from pushing broken code.

Ironically, the AI wasn't the bottleneck—the plumbing was. The hardest part of building Argus was prompt engineering the model to reliably return structured JSON output so every single comment maps to an exact line number in the GitHub PR diff. GitHub's API is notoriously strict; if you try to post a comment on a line that wasn't modified, the API throws an error and the action fails. Getting the LLM to consistently return valid JSON with perfect line number correlation took countless iterations and rigorous fallback logic.

Another major hurdle was building the .argus/config.yml system. I quickly realized that different teams have wildly different tolerances for automated feedback. If the bot comments on every minor stylistic choice, developers will get annoyed and ignore it. So, I implemented a configuration system so teams can fine-tune the action's behavior directly in their repo. By setting severity thresholds (like only showing high severity issues) and ignoring specific file paths, teams can easily control the noise-to-signal ratio, which is critical for real-world usage.

If I were to build this from scratch again, I'd definitely start with the config system from day one instead of hardcoding everything. In early versions, I baked all assumptions, thresholds, and ignored paths directly into the core logic. When I started testing across different codebases, those hardcoded rules immediately broke down. Refactoring the action to read and parse a .argus/config.yml file late in the game was messy. Building with user configuration in mind right from the start would have saved me a massive amount of technical debt.

If you're tired of PRs lingering in review purgatory or just want a fast, automated second pair of eyes on your code, give Argus a shot. You can find the repo at https://github.com/Rozer402/argus. It's free, open source, and uses Groq's generous free tier, so you don't have to worry about racking up an API bill just to get quality code reviews. Drop it into your workflow, tweak the config, and let the AI do the heavy lifting.

Let your team focus on the architecture, and let Argus catch the bugs.

Top comments (0)