I cut my security tool from 9 detectors to 6, on purpose

#security #ai #devtools #showdev

Last month I almost shipped a security tool that lied about itself.

Not on purpose. I had detectors for SQL injection, XSS, command injection, and path traversal. The kind of list that looks complete on a landing page. The problem was underneath: the model behind those detectors kept finding clever-sounding reasons to clear real bugs, and I had no way to measure how often it was wrong on actual code. So a "finding" was really the model's confidence in itself, dressed up as a result.

The failure that made me cut them

I pointed the detector at a well-built Remix app and watched it flag a dozen routes that were not vulnerable at all. They looked unguarded on their own, but the auth check lived one file up, in a parent layout every route inherited. The tool read one file at a time and cried wolf on correct code.

For a security tool that is the worst failure. A missed bug is quiet. A false positive on safe code is loud, and it burns trust on the first scan.

So I told the model about the parent layouts and added a careful rule to the prompt: never clear a finding unless you can actually verify the route is covered. It did not hold. Given a plausible-looking case, the model cleared the finding anyway and wrote a confident justification for why. I tightened the wording. It rationalized past the tighter version too.

That was the lesson. A safety rule you ask an LLM to follow is a suggestion, not a guarantee. The model is built to find a reasonable-sounding reading, so it will find the one that lets it break your rule. The fix was not a better prompt. It was moving the invariant into deterministic code the model cannot argue with, and letting the model judge only what it is genuinely good at.

What Fixor actually runs now

Six classes I can stand behind: auth bypass, missing admin gates, IDOR, env and secret exposure, and unverified webhooks. It reads the changed code in a pull request with Claude and posts a comment with the bug and how to fix it. Node, TypeScript, Python. It sits next to Snyk and Semgrep and reasons about your own logic instead of trying to replace them.

The four injection detectors are gone from the tree, not hidden behind a flag. Depth over a longer list.

The honest part

I have tuned Fixor on test cases, but I have not measured its accuracy on real production code, and I am not going to pretend I have. That is exactly why I am looking for a few design partners.

You install it free on a repo that actually ships PRs. I run the first scan with you and walk through every finding, in writing, on your schedule. You get a direct line to me. In return I want your real read on what was useful and what was noise.

If you ship a Node, TypeScript, or Python app and security is on you alone, this is for you. Drop a comment or reach out and I will send the details.