DEV Community

Gde
Gde

Posted on

How We Built a 6-Layer AI Code Audit Pipeline (And Why Each Auditor Has Its Own Scope)

The Problem

You ask an LLM to review your code. It comes back with 30 findings. Half of them overlap. Some contradict each other. You spend more time triaging the audit output than you saved by automating it.

This is the fundamental problem with single-pass LLM code review: the model tries to check everything at once, with no clear boundaries on what it should and shouldn't flag.

The Solution: Non-Overlapping Scopes

I solved this by splitting the audit into 6 specialized agents, each with an exclusive scope. The key is the "Does NOT Check" column:

Auditor Checks Does NOT Check
Code Quality Type safety, DRY, complexity, naming, dead code Security, runtime bugs, performance
Bug Scanner Null refs, error handling, race conditions, resource leaks Security vulnerabilities, code style
Security OWASP Top 10, injection, auth, secrets, CVEs Runtime bugs, code quality
Performance Slow queries, hot paths, memory, connection pools Security, code style
Documentation Missing docs, stale comments, type annotations TODOs, debug statements
Environment Config consistency, format validation, naming Secrets (owned by Security)

Security is the single authority for all security findings. The bug scanner handles runtime issues but explicitly avoids anything that's a security vulnerability. This eliminates the most common source of duplicates.

The Pipeline

Step 0: Detect changed files. Works with uncommitted changes, specific commits, or explicit file lists.

Step 0.5: Auto-detect language. Detects Python, TypeScript, Go, Rust, Java, Ruby from file extensions. Also detects the test runner and linter so the pipeline can re-verify after fixing.

Step 1: 6 parallel auditors. All 6 launch simultaneously. Each gets the same file list and diff, but a different scope and checklist.

Step 2: Deduplicate. Same file:line across auditors = merge into one finding, keep the highest severity.

Step 3: Prioritize. P1 Critical (security, data corruption) = fix before deploy. P2 High (DRY violations, stale comments) = fix now. P3 Nice-to-have (cosmetic) = defer.

Step 4: Auto-fix. Implements P1 and P2 fixes with minimal diffs. No refactoring beyond what the audit found.

Step 5: Re-verify. Runs the detected test suite and linter. If tests fail, diagnoses and fixes before continuing.

Step 6: Architect review gate. A final reviewer agent assesses the full diff and gives a verdict: APPROVED, REVISE, or BLOCKED.

Step 7: Commit. Structured commit message with P1/P2/P3 breakdown and dedup stats.

The Two-Pass Workflow

One design choice that saved a lot of noise: defer cosmetic items to a separate pass.

Round 1 fixes P1 Critical and P2 High. Lists P3 items in the commit message under "Deferred."

Round 2 (--deferred) reads the deferred list from the previous commit, checks each item is still relevant, fixes what remains, marks stale items. Commits separately.

This keeps your main PR focused on what matters, with a clean follow-up for cosmetic cleanup.

Three Ways to Use It

Claude Code (recommended)

curl -fsSL https://raw.githubusercontent.com/GiulioDER/cca-audit/main/claude-code/install.sh | bash
/audit-fix
Enter fullscreen mode Exit fullscreen mode

Codex CLI

bash cca-audit.sh
Enter fullscreen mode Exit fullscreen mode

Any model via OpenRouter

pip install cca-audit
cca-audit --model anthropic/claude-sonnet-4
Enter fullscreen mode Exit fullscreen mode

Results

On a production codebase (Python, ~200 files), a typical run:

  • 6 auditors return ~40-50 raw findings
  • Dedup brings it down to ~15-20 unique
  • P1: 2-3 (usually security or error handling)
  • P2: 5-8 (DRY, stale comments, config)
  • P3: 5-10 (deferred)
  • Tests pass after fixes
  • Architect review: APPROVED on first try ~80% of the time

The non-overlapping scope design is what makes the output actionable. Every finding is unique, every fix is targeted.

Try It

MIT licensed: github.com/GiulioDER/cca-audit

Feedback welcome, especially on non-Python codebases. The language auto-detection is the newest part and I'd love to hear how it works for TypeScript, Go, and Rust projects.

Top comments (0)