DEV Community

nav-StackAI
nav-StackAI

Posted on

I built a tool that pits 9 free LLMs against your code as adversarial reviewers

The problem with AI code review

You paste your code into ChatGPT. It tells you everything looks fine, maybe flags a missing null check. You feel good.

But here's the thing — every model has blind spots. GPT might miss a SQL injection that Claude catches. Llama might flag a race condition that Gemini overlooks. No single model sees everything.

So I thought: what if instead of trusting one model, I made them compete?

What AIF does

AIF (Architecture Impact Framework) sends your code to 9 different LLMs across 6 providers simultaneously. The models don't see each other's outputs — they review your code independently, like adversarial auditors.

Then AIF cross-references every finding and tells you where multiple models independently flagged the same issue.

The agreement levels work like this:

  • ALL — 60%+ of models flagged the same issue. This is almost certainly a real problem.
  • MAJORITY — 2 or more models agree. Worth investigating.
  • SINGLE — Only one model flagged it. Could be a false positive, or could be the one model that caught what others missed.

After synthesis, you get an interactive triage session where you walk through each finding and mark it: valid, partially valid, already fixed, defer, reject, or skip.

The model stack (all free)

This is the default configuration — every one of these runs on a free API tier:

Provider Models Free?
Groq Llama 3.3 70B, Llama 4 Scout Yes
Cerebras Qwen3 235B, Llama 3.1 8B Yes
OpenRouter Llama 3.3 70B, Nemotron 120B Yes
Mistral Mistral Small Yes
Gemini 2.5 Flash Yes
Anthropic Claude Sonnet 4 Paid

You can swap in any model or provider you want. The tool is fully configurable — just edit your .env file and optionally an aif.config.json.

How it works under the hood

npx aif-review --dir ./src --ext js,ts
Enter fullscreen mode Exit fullscreen mode
  1. Discovery — Recursively finds all code files matching your extensions
  2. Chunking — Splits large files into ~80K token chunks so models with smaller context windows can still participate
  3. Parallel dispatch — Fires off all 9 model calls simultaneously
  4. Structured parsing — Extracts severity, category, file, line number, and description from each model's response
  5. Cross-model synthesis — Groups findings by location and type, calculates agreement levels
  6. Interactive triage — Walks you through each finding with keyboard shortcuts

The whole thing runs from a single JS file with zero dependencies beyond Node.js built-ins.

What I actually found running it on real code

I ran AIF against my own production codebase and the results were eye-opening.

The ALL-agree findings were consistently real issues — things like hardcoded secrets, missing input validation on API endpoints, and overly permissive CORS configs. When 6+ models independently say "this is a problem," it's a problem.

The MAJORITY findings were a mix — maybe 70% real issues, 30% stylistic concerns that multiple models happened to flag.

The SINGLE findings were the most interesting. Most were false positives or overly cautious suggestions. But occasionally one model would catch something genuinely subtle that every other model missed. That's the value of the adversarial approach — you get coverage across different model architectures and training data.

Setup takes 2 minutes

# Install
npm install -g aif-review

# Run the setup wizard — it walks you through getting free API keys
aif-setup

# Review your code
aif --dir ./my-project --ext js,ts,py
Enter fullscreen mode Exit fullscreen mode

The setup wizard tells you exactly where to sign up for each provider and which ones are free. You only need API keys for the providers you want to use — the tool gracefully skips any provider without a key configured.

Try it

The repo is MIT licensed and ready to use:

GitHub: github.com/nav-StackAI/aif-review

If you try different model combinations, I'd love to hear what works well. The whole point is that you can swap in whatever models you have access to — local models via Ollama, paid APIs, whatever.

Built by GoXero — we use this internally for our own code reviews and decided to open-source it.

Top comments (0)