<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aaron Sood</title>
    <description>The latest articles on DEV Community by Aaron Sood (@aaronsood10).</description>
    <link>https://dev.to/aaronsood10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909970%2Facd7e7c5-7919-4591-a6f9-a8e86876dc11.gif</url>
      <title>DEV Community: Aaron Sood</title>
      <link>https://dev.to/aaronsood10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aaronsood10"/>
    <language>en</language>
    <item>
      <title>I Built a Multi-Agent AI Pen Tester Because AI Coding Tools Are Shipping Vulnerable Code</title>
      <dc:creator>Aaron Sood</dc:creator>
      <pubDate>Sun, 03 May 2026 07:35:14 +0000</pubDate>
      <link>https://dev.to/aaronsood10/i-built-a-multi-agent-ai-pen-tester-because-ai-coding-tools-are-shipping-vulnerable-code-6bd</link>
      <guid>https://dev.to/aaronsood10/i-built-a-multi-agent-ai-pen-tester-because-ai-coding-tools-are-shipping-vulnerable-code-6bd</guid>
      <description>&lt;p&gt;AI coding assistants are everywhere. Developers are shipping code faster than ever using Claude, Copilot, and Cursor.&lt;br&gt;
They're also shipping SQL injection, hardcoded secrets, broken authentication, and XSS - faster than ever.&lt;br&gt;
The problem is obvious once you think about it: AI tools optimize for working code, not secure code. They'll write a login form that functions perfectly and is trivially bypassable with ' OR 1=1--. They'll hardcode an API key because it's the fastest way to make the demo work. They'll skip input validation because you didn't ask for it.&lt;br&gt;
Most solo developers and small teams will never hire a penetration tester. A basic pen test costs $500–$2,000 and takes weeks to schedule. So the vulnerabilities just ship.&lt;br&gt;
I built VulnSwarm to fix that.&lt;/p&gt;

&lt;p&gt;What VulnSwarm Does&lt;br&gt;
VulnSwarm deploys a swarm of specialized AI agents that mirror a real penetration testing team. Instead of one model trying to do everything, each agent has a distinct role:&lt;br&gt;
🔭 Recon Agent — maps the attack surface. Identifies entry points, fingerprints the tech stack, flags the highest-risk areas.&lt;br&gt;
💥 Exploit Agent — takes the recon and determines what's actually exploitable. Rates each finding by severity, exploitability, and impact. Assigns CVSS-like scores.&lt;br&gt;
🗡️ Red Team Agent — thinks like an attacker. Chains vulnerabilities together into realistic attack paths. Finds the worst-case scenario.&lt;br&gt;
🛡️ Blue Team Agent — the defender. Takes everything the red team found and writes specific, code-level fixes. Prioritizes by effort vs. impact.&lt;br&gt;
📄 Report Agent — synthesizes everything into a professional penetration testing report with an overall risk score, severity breakdown, and remediation roadmap.&lt;br&gt;
The agents debate each other. The red team challenges the exploit analysis. The blue team pushes back on severity ratings. The result is more nuanced than any single model pass.&lt;/p&gt;

&lt;p&gt;Testing It on OWASP Juice Shop&lt;br&gt;
To test VulnSwarm, I pointed it at OWASP Juice Shop — a deliberately vulnerable web app designed for security testing practice.&lt;br&gt;
I also tested it manually first. In about 30 seconds I:&lt;/p&gt;

&lt;p&gt;Logged in as admin using ' OR 1=1-- in the email field&lt;br&gt;
Accessed the admin panel at /administration&lt;br&gt;
Retrieved 21 user email addresses&lt;br&gt;
Found an exposed crypto wallet seed phrase in customer feedback&lt;/p&gt;

&lt;p&gt;Then I ran VulnSwarm. Here's what it found automatically:&lt;br&gt;
Risk Score: CRITICAL (90/100)&lt;/p&gt;

&lt;p&gt;🔴 File Upload Endpoints — CVSS 9.0&lt;br&gt;
   Exploitable to inject malicious code or exfiltrate sensitive data.&lt;/p&gt;

&lt;p&gt;🔴 Unvalidated API Endpoints — CVSS 9.0&lt;br&gt;
   API endpoints lack input validation and sanitization.&lt;/p&gt;

&lt;p&gt;🟠 Missing Content-Security-Policy — CVSS 5.3&lt;br&gt;
🟠 Missing Strict-Transport-Security — CVSS 5.3&lt;br&gt;
🟠 Missing X-XSS-Protection — CVSS 5.3&lt;br&gt;
🟠 Missing Referrer-Policy — CVSS 5.3&lt;br&gt;
🟠 Missing Permissions-Policy — CVSS 5.3&lt;br&gt;
This ran in about 15 minutes on a CPU-only VPS using llama3.2:3b. Larger models produce deeper findings — the SQL injection I found manually would have been caught by qwen2.5:14b or Claude.&lt;/p&gt;

&lt;p&gt;How the Multi-Agent Architecture Works&lt;br&gt;
The key insight is that security analysis benefits from multiple perspectives arguing with each other — the same way a real security team works.&lt;br&gt;
A single model asked "find vulnerabilities in this app" will produce a list. It won't challenge its own assumptions. It won't think about how vulnerabilities chain together. It won't prioritize fixes by what a developer can actually implement today.&lt;br&gt;
The agent pipeline forces specialization:&lt;br&gt;
Your Code/App&lt;br&gt;
     │&lt;br&gt;
     ▼&lt;br&gt;
┌──────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐&lt;br&gt;
│  Recon   │───▶│  Exploit  │───▶│ Red Team │───▶│  Blue   │&lt;br&gt;
│  Agent   │    │   Agent   │    │  Agent   │    │  Team   │&lt;br&gt;
└──────────┘    └───────────┘    └──────────┘    └────┬────┘&lt;br&gt;
                                                       │&lt;br&gt;
                                                       ▼&lt;br&gt;
                                                 ┌──────────┐&lt;br&gt;
                                                 │  Report  │&lt;br&gt;
                                                 │  Agent   │&lt;br&gt;
                                                 └──────────┘&lt;br&gt;
Each agent only sees what it needs to. The exploit agent doesn't know about fixes — it just finds problems. The blue team agent doesn't know about attack chains — it just writes solutions. The report agent synthesizes everything into something a developer or CTO can actually act on.&lt;/p&gt;

&lt;p&gt;Running It Yourself&lt;br&gt;
VulnSwarm supports Claude, GPT-4o, Gemini, OpenRouter, and Ollama. If you want to run it completely free and locally:&lt;br&gt;
bashgit clone &lt;a href="https://github.com/aaronsood/VulnSwarm.git" rel="noopener noreferrer"&gt;https://github.com/aaronsood/VulnSwarm.git&lt;/a&gt;&lt;br&gt;
cd VulnSwarm&lt;br&gt;
pip install -r requirements.txt&lt;/p&gt;

&lt;h1&gt;
  
  
  Pull a local model
&lt;/h1&gt;

&lt;p&gt;ollama pull llama3.2:3b&lt;/p&gt;

&lt;h1&gt;
  
  
  Run it
&lt;/h1&gt;

&lt;p&gt;python -m cli.main&lt;br&gt;
For web app scanning, spin up a test target first:&lt;br&gt;
bashdocker run --rm -p 3000:3000 bkimminich/juice-shop&lt;br&gt;
Then point VulnSwarm at &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;.&lt;br&gt;
Web scanning is localhost-only by default — VulnSwarm won't touch anything you don't own.&lt;/p&gt;

&lt;p&gt;What It Doesn't Do (Yet)&lt;br&gt;
VulnSwarm is early. It's a first pass, not a replacement for a professional security team.&lt;br&gt;
It misses zero-days. It won't find novel attack chains that require deep business logic understanding. Smaller models miss things that larger models catch. It doesn't yet integrate with CI/CD pipelines or GitHub Actions.&lt;br&gt;
The roadmap includes all of that. For now it solves the problem that matters most: the 99% of developers who ship with zero security review and no budget to fix that.&lt;/p&gt;

&lt;p&gt;The Bigger Picture&lt;br&gt;
There's something poetic about using AI to find the vulnerabilities that AI introduced. As AI coding tools become the default way software gets written, AI security tooling needs to keep pace.&lt;br&gt;
VulnSwarm is open source, MIT licensed, and early. If you're in security or AI tooling, contributions are very welcome.&lt;br&gt;
GitHub: github.com/aaronsood/VulnSwarm&lt;/p&gt;

&lt;p&gt;Built and tested on a Saturday with a CPU-only VPS, a deliberately hackable web app, and too much coffee.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
