<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gde</title>
    <description>The latest articles on DEV Community by Gde (@gde03).</description>
    <link>https://dev.to/gde03</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940212%2Fce25c655-2594-4344-88d3-392d388be01a.png</url>
      <title>DEV Community: Gde</title>
      <link>https://dev.to/gde03</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gde03"/>
    <language>en</language>
    <item>
      <title>How We Built a 6-Layer AI Code Audit Pipeline (And Why Each Auditor Has Its Own Scope)</title>
      <dc:creator>Gde</dc:creator>
      <pubDate>Tue, 19 May 2026 12:40:34 +0000</pubDate>
      <link>https://dev.to/gde03/how-we-built-a-6-layer-ai-code-audit-pipeline-and-why-each-auditor-has-its-own-scope-144j</link>
      <guid>https://dev.to/gde03/how-we-built-a-6-layer-ai-code-audit-pipeline-and-why-each-auditor-has-its-own-scope-144j</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;You ask an LLM to review your code. It comes back with 30 findings. Half of them overlap. Some contradict each other. You spend more time triaging the audit output than you saved by automating it.&lt;/p&gt;

&lt;p&gt;This is the fundamental problem with single-pass LLM code review: the model tries to check everything at once, with no clear boundaries on what it should and shouldn't flag.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Non-Overlapping Scopes
&lt;/h2&gt;

&lt;p&gt;I solved this by splitting the audit into 6 specialized agents, each with an &lt;strong&gt;exclusive scope&lt;/strong&gt;. The key is the "Does NOT Check" column:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Auditor&lt;/th&gt;
&lt;th&gt;Checks&lt;/th&gt;
&lt;th&gt;Does NOT Check&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code Quality&lt;/td&gt;
&lt;td&gt;Type safety, DRY, complexity, naming, dead code&lt;/td&gt;
&lt;td&gt;Security, runtime bugs, performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug Scanner&lt;/td&gt;
&lt;td&gt;Null refs, error handling, race conditions, resource leaks&lt;/td&gt;
&lt;td&gt;Security vulnerabilities, code style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;OWASP Top 10, injection, auth, secrets, CVEs&lt;/td&gt;
&lt;td&gt;Runtime bugs, code quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Slow queries, hot paths, memory, connection pools&lt;/td&gt;
&lt;td&gt;Security, code style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;Missing docs, stale comments, type annotations&lt;/td&gt;
&lt;td&gt;TODOs, debug statements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environment&lt;/td&gt;
&lt;td&gt;Config consistency, format validation, naming&lt;/td&gt;
&lt;td&gt;Secrets (owned by Security)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Security is the &lt;strong&gt;single authority&lt;/strong&gt; for all security findings. The bug scanner handles runtime issues but explicitly avoids anything that's a security vulnerability. This eliminates the most common source of duplicates.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 0: Detect changed files.&lt;/strong&gt; Works with uncommitted changes, specific commits, or explicit file lists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 0.5: Auto-detect language.&lt;/strong&gt; Detects Python, TypeScript, Go, Rust, Java, Ruby from file extensions. Also detects the test runner and linter so the pipeline can re-verify after fixing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: 6 parallel auditors.&lt;/strong&gt; All 6 launch simultaneously. Each gets the same file list and diff, but a different scope and checklist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Deduplicate.&lt;/strong&gt; Same file:line across auditors = merge into one finding, keep the highest severity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Prioritize.&lt;/strong&gt; P1 Critical (security, data corruption) = fix before deploy. P2 High (DRY violations, stale comments) = fix now. P3 Nice-to-have (cosmetic) = defer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Auto-fix.&lt;/strong&gt; Implements P1 and P2 fixes with minimal diffs. No refactoring beyond what the audit found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Re-verify.&lt;/strong&gt; Runs the detected test suite and linter. If tests fail, diagnoses and fixes before continuing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Architect review gate.&lt;/strong&gt; A final reviewer agent assesses the full diff and gives a verdict: APPROVED, REVISE, or BLOCKED.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 7: Commit.&lt;/strong&gt; Structured commit message with P1/P2/P3 breakdown and dedup stats.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two-Pass Workflow
&lt;/h2&gt;

&lt;p&gt;One design choice that saved a lot of noise: defer cosmetic items to a separate pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 1&lt;/strong&gt; fixes P1 Critical and P2 High. Lists P3 items in the commit message under "Deferred."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 2&lt;/strong&gt; (&lt;code&gt;--deferred&lt;/code&gt;) reads the deferred list from the previous commit, checks each item is still relevant, fixes what remains, marks stale items. Commits separately.&lt;/p&gt;

&lt;p&gt;This keeps your main PR focused on what matters, with a clean follow-up for cosmetic cleanup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Ways to Use It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Code (recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/GiulioDER/cca-audit/main/claude-code/install.sh | bash
/audit-fix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Codex CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash cca-audit.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Any model via OpenRouter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cca-audit
cca-audit &lt;span class="nt"&gt;--model&lt;/span&gt; anthropic/claude-sonnet-4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;On a production codebase (Python, ~200 files), a typical run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 auditors return ~40-50 raw findings&lt;/li&gt;
&lt;li&gt;Dedup brings it down to ~15-20 unique&lt;/li&gt;
&lt;li&gt;P1: 2-3 (usually security or error handling)&lt;/li&gt;
&lt;li&gt;P2: 5-8 (DRY, stale comments, config)&lt;/li&gt;
&lt;li&gt;P3: 5-10 (deferred)&lt;/li&gt;
&lt;li&gt;Tests pass after fixes&lt;/li&gt;
&lt;li&gt;Architect review: APPROVED on first try ~80% of the time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The non-overlapping scope design is what makes the output actionable. Every finding is unique, every fix is targeted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;MIT licensed: &lt;a href="https://github.com/GiulioDER/cca-audit" rel="noopener noreferrer"&gt;github.com/GiulioDER/cca-audit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome, especially on non-Python codebases. The language auto-detection is the newest part and I'd love to hear how it works for TypeScript, Go, and Rust projects.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
