DEV Community

NEXADiag Nexa
NEXADiag Nexa

Posted on

# I stopped trusting a single AI for code review — here's

I stopped trusting a single AI for code review — here's why

We've all been there:

  1. You ask GPT-4o or Claude 3.5 to review your PR.
  2. It says "Looks good!"
  3. You ship.
  4. Production crashes on an edge case the AI hallucinated its way past.

The problem isn't that AI makes mistakes. It's that we trust a single model. Every family has blind spots — GPT overexplains but misses logic errors, Gemini is great at structure but weak on security patterns, Claude is thorough but sometimes invents APIs that don't exist.

One model's "looks good" is just one vote.

Multi-LLM consensus

I built NexaVerify v1.6.0 — a local-first Windows tool that runs your code through 8 AI engines in parallel (Claude, GPT-4o, Gemini, Groq, Cerebras, Mistral, OpenRouter, Ollama).

The core idea:

  • Agreement is noise. If all models say it's fine, it probably is.
  • Disagreement is the signal. When Claude flags a security risk but GPT ignores it — that's where you need to look.

Every issue gets a confidence score based on how many providers confirmed it. Disagreements are surfaced, not buried.

Proof by fire: scanning itself

I ran v1.6.0 against its own 26,000-line codebase with 3 free-tier providers (Gemini, Groq, Cerebras).

99 real issues found. Including a potential NameError in constants.py and a missing try-except in main.py that would crash the app before logging initializes. The bug-finder had bugs — and it found them.

The full report is live here.

Under the hood

  • 3-stage JSON repair — syntactic repair → fallback extraction → schema validation. Catches truncated LLM responses instead of silently returning [].
  • RSA Proof Bundles — every verdict carries a SHA-256 audit trail. Verifiable after the fact.
  • Local-first, BYOK — your code and keys never touch my server.
  • Ollama support — full offline consensus for sensitive projects.
  • Free tier available today — 3 analyses/day, 3 providers (Gemini, Groq, Cerebras), 10 files/scan.

No signup wall. Download, add your API keys, run.

Try it free or grab Pro (€19 lifetime): https://nexaverify.netlify.app/


I ship solo. Every feature is driven by what early users actually need. What's your current workflow for verifying AI-generated code? Running a single pass, or already testing multi-model pipelines?

Top comments (0)