DEV Community

Rohit Sriram
Rohit Sriram

Posted on

I scanned 5 open source repos for bugs. 50 real findings, 4 false positives.

Last week I built Faultmark, an AI code scanner that uses a multi-model debate to verify bugs before surfacing them. One AI finds candidates, other AI models challenge each finding, and anything that can't survive the debate gets dropped. The goal is zero false positives.

To test it before launch, I scanned 5 real open source repos: dub, documenso, cal.com, formbricks, and trigger.dev.

Results across all 5: 54 total findings, 50 real bugs, 4 false positives. 7 of those bugs have already been fixed or are in the process of being merged by maintainers.

Here's what I found in documenso alone, since it had the most findings.

documenso: 24 bugs, 0 false positives

The scan flagged issues across authentication flows, form handling, and error states. Most were real bugs that had been sitting in the codebase unnoticed. 7 of them were pushed by a maintainer within a few days of the issue being filed.

The kind of bugs that come up are not obscure edge cases. They are missing null checks, stale state after form submissions, incorrect error handling, and race conditions that only show up under specific conditions. The kind of thing that passes code review because everything looks fine at a glance.

Why the debate layer matters

Running a single AI model against a codebase produces a lot of noise. You get flagged variables that look suspicious but are fine in context, missing checks that are actually handled upstream, and patterns that match known bugs but do not apply here.

The debate layer cuts that noise. When two models disagree on a finding, it gets dropped or downgraded. Only the bugs that survive that cross-examination get reported. That is how we got to 4 false positives across 50 findings.

What is next

Faultmark is live at faultmark.com. Free tier gets 3 scans/month. Pro is up to unlimited scans with deeper coverage and the full debate layer.

If you want to see what it finds on your codebase, try it. If you maintain an open source repo and want a free scan, reply here and I will run one.

Top comments (0)