Traditional security scanners: report 100 vulnerabilities, 95 are false positives. Devin Security Swarm: reports 5, all real - because it verified each one in a sandbox.
What Is It
Cognition released a security scanning agent system for large codebases. Core innovation: Agentic MapReduce - split codebase into regions, each agent scans independently + sandbox verification + auto-submit fix PRs.
Test Results on 50 Real GHSA Vulnerabilities
| Metric | Devin Swarm | Traditional SAST |
|---|---|---|
| Recall | 72% | ~45% |
| Precision | ~90% | ~20% |
| Cost/run | ~ | ~ (but high verification cost) |
| Auto-fix PRs | Yes | No |
72% recall = found 26 out of 36 real vulnerabilities. Traditional SAST might report 200 "vulnerabilities" but only 10 are real.
Why Precision Matters More Than Recall
Security teams' real pain: too many false positives wasting verification time.
100 reports ? 80 false positives ? 3 days verifying ? exhaustion ? ignoring all reports
vs 5-10 reports ? 1 day ? high trust ??? each one
Security scanner value = precision x developer trust.
Agentic MapReduce
Map phase: codebase split into regions, each agent independent scan
Reduce phase: findings aggregated, deduplicated, prioritized
Verify phase: each vulnerability actually tested in sandbox - can it be exploited? Not "possibly exploitable."
Limitations
- 72% recall ? 100% - can't fully replace manual audit
- /run - high for small projects
- Sandbox can't reproduce vulnerabilities requiring specific infrastructure
- Best for large codebases - traditional SAST suffices for small projects
Developer Impact
- CI/CD integration as GitHub Action
- Sandbox verification removes "is this actually exploitable" uncertainty
- Auto-fix PRs - not just reports, but fix code
- For large codebases: 100x cheaper than manual audit
Bilingual version at wdsega.github.io
Top comments (0)