What "merge-ready" actually requires when an AI writes the security fix

#ai #security #opensource #devops

Most code-scanning tools stop at "we found a vulnerability." That's the easy part.
The hard part — the part nobody talks about until they try to ship it — is everything that happens between "vulnerability detected" and "PR a maintainer will actually merge." Tests passing. Style matching. The fix actually fixing the thing. The fix not breaking anything else. A PR description a maintainer can verify in 30 seconds.
We work on this problem at Kolega, and we want to walk through what's in that gap honestly — including the parts we got wrong.
The five-stage pipeline
Every auto-generated fix in our system goes through five stages:

Detect with context — find the vuln, but also understand the code around it
Generate a candidate fix — LLM-assisted, but heavily constrained
Validate correctness — does it compile, do tests pass, is the vuln actually gone
Match the project's style — formatting, naming, patterns
Decide whether to ship it at all — some fixes shouldn't be automated

If any stage fails, the PR doesn't get opened. We'd rather open zero PRs than one bad one. Maintainers will block your bot for a week of bad PRs in a way they won't for none at all.
Stage 1: Detection with context
Static analysis tools will tell you "user input flows into eval() on line 47." That's true. It's also basically useless on its own, because it doesn't tell you:

What eval() is being used for
Whether the input is already sanitised upstream
What the function's contract is (does it need to handle strings, or always JSON?)
Whether replacing eval() with JSON.parse() would break legitimate callers

Without this context, an LLM asked to "fix this" will generate something that compiles, looks reasonable, and is wrong.
Our detection layer pulls:

The vulnerable function and its callers (one or two hops out)
Type information where available
Existing tests that exercise the function
Recent git history for the file (recently-touched code is more fragile)
The project's dependencies and their versions

This context is what gets passed to generation. Detection isn't "where's the bug" — it's "here's everything someone reviewing a fix would need."
Stage 2: Generating the fix
We use LLMs for fix generation. We do not let them generate freely.
The constraints we apply:
Scope locking. The model is only allowed to modify a small, specified region of the file. If a fix would require changes outside that region, we surface it for human review instead of auto-generating.
Pattern catalogues. For common vulnerability classes — SQL injection, prototype pollution, hardcoded secrets, missing auth checks — we have known-good fix patterns. The model picks and adapts a pattern rather than inventing one. This dramatically reduces hallucinated "fixes" that don't actually fix anything.
Explanation alongside code. The model has to produce a structured explanation of why the change works, in a format we can validate against the original CVE/CWE. Forcing the model to articulate its reasoning catches a lot of confidently-wrong outputs.
The thing we learned the hard way: if the model can't explain its fix in terms of the vulnerability class, the fix is usually wrong. "I added a check" isn't an explanation. "I added a check that ensures proto cannot be assigned via this code path, closing the prototype pollution vector identified in CWE-1321" is.
Stage 3: Validation
Generation produces a candidate. Validation decides if it's actually mergeable. We run three layers:
Layer 1 — does it compile / parse? Sounds trivial; isn't, especially in dynamic languages where syntactic correctness doesn't catch broken imports or undefined references.
Layer 2 — do existing tests pass? The fix has to leave the existing test suite green. This catches a huge class of "fix introduces regression" failures. If the project has no tests, we treat that as a signal to be more conservative, not less.
Layer 3 — is the original vulnerability actually gone? We re-run the detection step against the fixed code. If the same finding still fires, the "fix" didn't fix it. This sounds obvious, but it's a step a lot of pipelines skip — and it's the difference between security theatre and an actual fix.
If a candidate fails any layer, we either regenerate (passing the failure back as additional context) or escalate to human review. We cap regenerations at three. After that, the problem isn't a bad model output — it's a fix that requires judgement we don't have.
Stage 4: Matching project style
A correctly-functioning fix that's formatted wrong, named wrong, or imported wrong will get closed without comment. Maintainers can smell bot PRs in two seconds.
Things we match:

Indentation, quote style, semicolons (run the project's formatter before opening the PR)
Naming conventions (camelCase vs snake_case, prefix patterns)
Import style (relative vs absolute, grouping)
Comment style (do they write JSDoc? Do they write any comments at all?)
Commit message format (Conventional Commits? Specific prefixes?)

This is unglamorous work, and it's where a lot of automated tools fail. You don't get a second chance at first-PR impression with a maintainer.
Stage 5: Knowing when not to ship
Some fixes shouldn't be automated. Examples from our own ruleset:

Auth and authz logic. A fix that changes who can access what needs human eyes. Always.
Cryptographic primitives. Swapping algorithms or key sizes can have downstream consequences a pipeline can't see.
Code touched in the last 7 days. Active development means context we don't have.
Repos with no tests. We can't validate the fix doesn't break anything, so we surface findings without auto-PRs.
Findings below a confidence threshold. Better to flag and let a human triage.

The single biggest credibility lever for a tool like this is how often it shuts up when it doesn't know. Every false positive PR costs trust. Every "we found this but didn't auto-fix it because [specific reason]" message builds trust.
What we got wrong
A few things, in case it's useful:
We over-trusted model self-validation. We asked the model "is this fix correct?" and weighted its answer. It said yes too often. Switching to external validation (run the tests, re-run the scanner) was the single biggest quality jump we made.
We let the model write PR descriptions freely. They were verbose and sometimes inaccurate. Now descriptions are templated, with the model filling specific slots: vulnerability class, file/function affected, fix pattern applied, validation results. Boring, but verifiable.
We didn't track merge rate by pattern. Once we did, we found two of our pattern catalogues were producing fixes that maintainers rejected for style reasons we hadn't noticed. Boring data work, big quality gain.
If you're building something similar
Three things, if you're working on any kind of automated code-modification pipeline:

Detection without context is a trap. Spend more time on context than on detection.
Constrain your model. Free-form generation is fine for prototypes. Production needs scope locks, pattern catalogues, and structured outputs.
External validation beats self-validation. The model can't reliably grade its own work. Run the tests. Re-run the scanner. Don't ask it.

Happy to go deeper on any of these in a follow-up — drop a comment if there's a stage you'd want more detail on.

DEV Community

What "merge-ready" actually requires when an AI writes the security fix

Top comments (0)