Gde

Posted on May 19

How We Built a 6-Layer AI Code Audit Pipeline (And Why Each Auditor Has Its Own Scope)

#ai #productivity #programming #opensource

The Problem

You ask an LLM to review your code. It comes back with 30 findings. Half of them overlap. Some contradict each other. You spend more time triaging the audit output than you saved by automating it.

This is the fundamental problem with single-pass LLM code review: the model tries to check everything at once, with no clear boundaries on what it should and shouldn't flag.

The Solution: Non-Overlapping Scopes

I solved this by splitting the audit into 6 specialized agents, each with an exclusive scope. The key is the "Does NOT Check" column:

Auditor	Checks	Does NOT Check
Code Quality	Type safety, DRY, complexity, naming, dead code	Security, runtime bugs, performance
Bug Scanner	Null refs, error handling, race conditions, resource leaks	Security vulnerabilities, code style
Security	OWASP Top 10, injection, auth, secrets, CVEs	Runtime bugs, code quality
Performance	Slow queries, hot paths, memory, connection pools	Security, code style
Documentation	Missing docs, stale comments, type annotations	TODOs, debug statements
Environment	Config consistency, format validation, naming	Secrets (owned by Security)

Security is the single authority for all security findings. The bug scanner handles runtime issues but explicitly avoids anything that's a security vulnerability. This eliminates the most common source of duplicates.

The Pipeline

Step 0: Detect changed files. Works with uncommitted changes, specific commits, or explicit file lists.

Step 0.5: Auto-detect language. Detects Python, TypeScript, Go, Rust, Java, Ruby from file extensions. Also detects the test runner and linter so the pipeline can re-verify after fixing.

Step 1: 6 parallel auditors. All 6 launch simultaneously. Each gets the same file list and diff, but a different scope and checklist.

Step 2: Deduplicate. Same file:line across auditors = merge into one finding, keep the highest severity.

Step 3: Prioritize. P1 Critical (security, data corruption) = fix before deploy. P2 High (DRY violations, stale comments) = fix now. P3 Nice-to-have (cosmetic) = defer.

Step 4: Auto-fix. Implements P1 and P2 fixes with minimal diffs. No refactoring beyond what the audit found.

Step 5: Re-verify. Runs the detected test suite and linter. If tests fail, diagnoses and fixes before continuing.

Step 6: Architect review gate. A final reviewer agent assesses the full diff and gives a verdict: APPROVED, REVISE, or BLOCKED.

Step 7: Commit. Structured commit message with P1/P2/P3 breakdown and dedup stats.

The Two-Pass Workflow

One design choice that saved a lot of noise: defer cosmetic items to a separate pass.

Round 1 fixes P1 Critical and P2 High. Lists P3 items in the commit message under "Deferred."

Round 2 (--deferred) reads the deferred list from the previous commit, checks each item is still relevant, fixes what remains, marks stale items. Commits separately.

This keeps your main PR focused on what matters, with a clean follow-up for cosmetic cleanup.

Three Ways to Use It

Claude Code (recommended)

curl -fsSL https://raw.githubusercontent.com/GiulioDER/cca-audit/main/claude-code/install.sh | bash
/audit-fix

Codex CLI

bash cca-audit.sh

Any model via OpenRouter

pip install cca-audit
cca-audit --model anthropic/claude-sonnet-4

Results

On a production codebase (Python, ~200 files), a typical run:

6 auditors return ~40-50 raw findings
Dedup brings it down to ~15-20 unique
P1: 2-3 (usually security or error handling)
P2: 5-8 (DRY, stale comments, config)
P3: 5-10 (deferred)
Tests pass after fixes
Architect review: APPROVED on first try ~80% of the time

The non-overlapping scope design is what makes the output actionable. Every finding is unique, every fix is targeted.

Try It

MIT licensed: github.com/GiulioDER/cca-audit

Feedback welcome, especially on non-Python codebases. The language auto-detection is the newest part and I'd love to hear how it works for TypeScript, Go, and Rust projects.

DEV Community