Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like.
This is a worked example of running a continuous security scanner on a public surface and being wrong, in both directions, in close succession. The scanner is AgentScore, which scans MCP packages on npm and publishes a public security record. Over four days in mid-May 2026 it went through three corrections: an over-flagged class, a too-broad mitigator pass that produced a false negative on a known-credential-leak package, and a fresh sample-check that uncovered new sanitiser patterns we had not yet recognised. Each correction is in the public changelog. None of them was silent.
The point of the post is the loop, not the resolution.
What the data looked like on 2026-05-15
A class-tracker counts how many MCP packages have HIGH command_injection findings in a rolling 7-day window. Mid-April that number was a handful. Mid-May it was 31 distinct packages. Most were in the browser, CLI, or terminal-automation segment, where shell execution is genuinely common because the packages drive other CLIs.
The first hypothesis was real maintainer drift: maybe enough package authors in this segment were writing unsafe ${...} shell wrappers that the public-record arc had a story.
The second hypothesis, which became more probable when one more advisory took the count to 31 distinct packages, was that the scanner's regex was over-flagging legitimate template-literal patterns. Thirty distinct maintainers all genuinely shipping unsafe shell exec in 30 days, in a community of ~1,300 packages, would be an ecosystem failure. More likely the scanner had a false-positive class.
The actual answer, after four rounds of work over 96 hours, turned out to be mixed. Some of the 31 were false positives the scanner could downgrade with new context-aware mitigators. Some were real static-analysis hits in single-user CLI threat models that the scanner correctly continued to flag. The post is about how I got from "31 packages, hypothesis unclear" to "scanner correctly distinguishes which is which" and what each iteration round had to fix.
Round 1: the initial sample audit
The first move on 2026-05-16 was to manually inspect a sample of the flagged packages. Five were picked across the class: safari-mcp, brave-real-browser-mcp-server, memoir-cli, s3db.js, and claude-flow. Each was rescanned by hand against the regex that originally flagged them.
Of the five samples, four had patterns the scanner was catching incorrectly. Examples:
- A postinstall script invoking
codesignagainst an internal helper path constructed viapath.join(__dirname, ...). Not user-controllable. - A
this.exec(\SELECT ${fields} FROM ...)SQL query in a sqlite client. Not a child process call at all. - Hardcoded ALL_CAPS module constants like
${REPO_URL}interpolated for readability. Not user input. - A numeric ID from a GitHub webhook payload (
event.pull_request.number). Cannot carry shell metacharacters. - A
.mdfile titledv3-security-architect.mdwith the line literally annotated// ❌ Dangerous: shell injection possibleas a teaching example. The scanner caught a security tutorial.
Initial estimated false-positive rate on the sample: high, but the number itself ended up being revised twice over the next 48 hours as the scope of "false positive" tightened.
Rounds 2 and 3: shipping the fix, then catching the fix's bugs
Three corrective passes shipped within hours of each other on 2026-05-16. Each was followed by external review that caught a structural issue in the previous one.
Pattern-level mitigators (round 1 of these three). Seven extensions to the existing sanitizer category (recognise this.exec, __dirname, ${ALL_CAPS}, numeric coercion, code-signing toolchain, npm auto-update patterns) plus a new documentation_context category for markdown code fences and anti-pattern annotations. A local verifier reported 100% suppression on the five-package sample.
Per-file iteration (round 2, caught by review): the 100% was an artifact. The scanner had been reading the gunzipped tarball as one buffer and running mitigators against a ±2000 character window in that buffer. A README heading three files away could downgrade a real finding in another file. The fix: walk the tar archive entry-by-entry, run mitigators against each file's own content only. Re-verification: still 100%.
All-matches per file (round 3, caught by review again): even per-file, single-match-per-file was masking. The scanner ran each pattern as a single .exec() per file, so an early benign shell call in a file would silently hide a later real unsafe one in the same file. Replaced with an all-matches walk that scores each match independently and keeps the worst-severity result. Re-verification: 75 percent, not 100.
The honest number was 75. memoir-cli@3.6.1 actually does contain exec(\open "${url}") in upgrade.js and execSync(\git clone ${config.gitRepo} .) in diff.js. In a single-user CLI threat model these are benign because the user is attacking only themselves. But the scanner cannot infer the threat model from static analysis, and the flag is correct at that level.
The previous "100%" claim was a measurement bug, not progress.
What I did with the historic advisories
Two paths were possible.
Option A: rewrite each of the affected advisories to the corrected severity. Clean for the casual reader. But quietly editing past records contradicts the public-correction-loop principle that is literally on the methodology page.
Option B: keep the original advisories visible at their original severity, add a correction record at the top of the advisories page pointing readers to the mitigator changelog, and let the live /report/<package> pages reflect the corrected severity once the monitor cron re-scans each affected package over the following 3-4 days.
I took Option B. The yellow correction banner on /security/advisories reads:
The scanner shipped a precision pass on 2026-05-16 targeting a self-detected false-positive class in browser/CLI/terminal MCP packages. Advisories below published before that pass on the affected class remain visible at their original severity. The live
/report/<package>page will reflect the corrected severity once the monitor cron rescans that package. Until then, the cached scan-history value on the report page may still show the pre-mitigator severity. We do not silently rewrite the public record.
The mitigator changelog at /scanner/precision carries the May 16 mitigator-pass entry AND a follow-up entry documenting the per-file iteration and the corrected 75 percent suppression number. Both are on the public surface. Neither was edited after the fact.
What rounds 1 to 3 proved
It did not prove the scanner was correct. It proved three other things.
One: the in-class running count plus a sample audit is enough to detect a false-positive class before it does serious damage to credibility. I caught this in a 30-day window with no maintainer pushing back.
Two: the iteration loop works on me, not just on the packages I scan. The same /scanner/precision page that documents mitigators shipped in response to maintainers like Agions and HomenShum now carries an entry where the trigger was my own internal review.
Three: refusing to silently rewrite history is uncomfortable but it is the only credibility move. A reader who finds an old advisory on a package and a corrected scan on the live report page can see the gap and the correction note explaining it. They do not have to trust that the system always told the truth. They can read both versions and decide.
Round 4 (the false-negative correction, 48 hours later)
The work above was substantially complete after the 2026-05-16 fix. Two days later it needed an update, because the fix itself had introduced a false negative.
The new documentation_context mitigator category shipped on 2026-05-16 included a markdown-heading pattern /^#{1,4}\s+\S/m. That regex matches markdown headings. It also matches YAML comments, shell-script comments, TOML headers, and anything else that starts with #. Without a filename gate, the category fired on any file that happened to contain a #-prefixed line within 2000 characters of a real finding.
Concrete miss: fa-mcp-sdk, the package whose config/local.yaml we publicly disclosed in late April for shipping credentials in the published tarball, scored 30 / HIGH on every scan from April 25 through May 13. On May 17 a fresh scan with the new v2.2 ruleset returned 65 / ELEVATED. The CRITICAL hardcoded_secret finding was now MEDIUM. Looking at it on May 18 morning, the digest showed a score recovery that looked like maintainer action after four weeks of silence. It was not. The YAML file's own header comments matched the markdown-heading regex, the documentation_context mitigator fired, and the credentials we'd publicly disclosed were silently downgraded by our own scanner.
Two other packages had the same effect with materially-changed public severity (mcpbrowser and opencode-gitlab-dap). Four more had the same misfire but their findings were already correctly downgraded by parallel sanitizer mitigators, so the public score did not move.
The fix was a six-line patch: a CATEGORY_FILE_GATES table that requires documentation_context patterns to fire only on files whose extension is .md, .mdx, .markdown, .rst, .txt, .adoc, or .asciidoc. Other mitigator categories were not file-gated because their patterns are tied to language syntax that does not overlap with comment characters in other languages.
Within the same morning, I rescanned the seven affected packages with the fixed scanner and pushed the corrected scan_history rows. fa-mcp-sdk is back at 45 / HIGH with the CRITICAL credential finding restored. The /scanner/precision changelog carries a new entry documenting the fix exactly the same way the original false-positive entry was documented two days earlier.
So now the public correction record contains two entries: one for an over-correction on the false-positive side that affected 31 advisories, and one for an under-correction on the false-negative side that affected 3 public scores. Both visible. Neither rewritten silently.
The pattern this surfaces: precision passes on a scanner have a natural overshoot. You catch a class of false positives, you ship mitigators, the mitigators are slightly too broad, you catch the resulting false negatives, you tighten. The thing that makes this a credibility move rather than a credibility cost is doing all of it on the public surface, where readers can audit the shape of the correction loop rather than trust that we always told them the truth.
What's reproducible
The mitigator commits are public. The 5-package sample is version-pinned in scripts/verify-mitigators.cjs so the precision claim can be reproduced. The pattern tracker is at scripts/track-command-injection-pattern.cjs. The corrected scanner is at SCANNER_VERSION = 2.2 in src/lib/kya/scanner.js, with the May 18 file-gate fix in the same file. The list of 7 affected packages and their corrected scores is in the /scanner/precision changelog entry dated 2026-05-18.
The 31 historic advisories are still at /security/advisories with the correction banner pointing at the changelog.
What the tracker count actually stabilised at
Three days after the May 16 mitigator pass, the running count in the browser/CLI command_injection class dropped from 31 to 14 in 48 hours. We expected it to keep dropping toward zero as the v2.2 scanner propagated through the corpus.
It did not. The count moved back up to around 20 and stayed there.
The naive read of that is "the fix did not work." The honest read is different. The tracker counts packages with HIGH command_injection findings the scanner did NOT downgrade. If the v2.2 + file-gate mitigators are working, FPs disappear from the count and only real-pattern hits remain. The count stabilising at roughly 20 means the underlying rate at which real template-literal shell-exec patterns appear in new browser/CLI MCP publishes is about 20 packages per rolling 7-day window. That is the ecosystem's actual signal, not our scanner's failure.
To verify, we sampled 5 packages from the post-fix corpus: beecork, memex-mvp, @piyushdua/engram-dev, agentic-flow, @kevinrabun/judges. Manual inspection of each:
beecork wraps a user-config-derived
binname intoexecSync(\${whichCmd} ${bin})indist/cli/doctor.js. Real static-analysis hit. The threat model is single-user CLI (the user is configuring their own tool), so the practical risk is low, but the scanner correctly cannot infer that.memex-mvp does
execSync(\launchctl unload ${JSON.stringify(PLIST_PATH)}).JSON.stringifywraps the value in escaped double quotes, which is a shell-safe quoting technique. False positive that the scanner did not yet recognise as a sanitiser.@piyushdua/engram-dev does
execSync(\git worktree remove ${shellQuote(record.path)}). The maintainer is explicitly wrapping input inshellQuote(). False positive that the scanner did not yet recognise as a sanitiser.agentic-flow does
execSync(\gh ${args.join(' ')})in.claude/helpers/github-safe.js.argsisprocess.argv. Real static-analysis hit in a CLI threat model.@kevinrabun/judges is a code-judging benchmark tool. The dangerous-looking code is embedded as STRING LITERALS in a fixture array (
expectedRuleIds: ["AUTH-001", ...]), specifically as test corpus for the tool to detect. False positive that the scanner did not yet recognise as a fixture marker pattern.
3 of 5 are false positives the scanner could downgrade with additional mitigator patterns. 2 of 5 are real interpolation-into-shell patterns the scanner correctly keeps flagged at HIGH.
The third precision pass shipped today, 2026-05-19, adds the missing mitigators:
-
shellQuote(),shell_quote,shq.quote(,require('shell-quote')as sanitiser patterns -
${JSON.stringify(...)}directly inside the interpolation slot as a sanitiser pattern -
expectedRuleIds:,dangerousPatterns:,benchmarkCases:as test-fixture markers - File-path heuristics for
benchmark*.js,rules*.js,judges*.jsthat contain detection corpora - A meta-template marker: source containing both backslash-escaped backticks and backslash-escaped
${interpolation markers in close proximity. That combination means the surrounding string is a template literal embedded as string data, e.g. a code-judging tool's test fixture where the dangerous-looking code is corpus to be detected rather than executable code.
After this third pass, the 5-package post-fix sample suppression rate is 60 percent. Two stay at HIGH because they really are real-pattern hits in single-user CLI threat models. The remaining count in the tracker now reflects something closer to the genuine rate of real template-literal shell exec in new browser/CLI MCP publishes, not measurement noise.
What the iteration loop actually looks like
Four rounds of precision work in 96 hours:
| Round | Date | What it corrected |
|---|---|---|
| 1 | 2026-05-16 | Initial mitigator set: this.exec, path.join, ALL_CAPS, numeric coercion, codesign, npm auto-update, plus a documentation_context category for markdown anti-pattern examples. |
| 2 | 2026-05-16 | Per-file iteration (mitigators only see same-file context), all-matches-per-file (an early benign call cannot mask a later real one), GNU/pax tar parsing, version-pinned verification. |
| 3 | 2026-05-18 |
documentation_context only fires on .md, .mdx, .rst, .txt files. The previous loose form was matching YAML # comments as if they were markdown headings, which silently downgraded fa-mcp-sdk's CRITICAL credential finding to MEDIUM. False-negative correction, 7 packages re-scanned, public correction record kept. |
| 4 | 2026-05-19 | Sanitiser additions for shellQuote(), ${JSON.stringify(...)}, and benchmark fixture markers. False-positive correction on the post-fix sample. |
Each round was prompted by either a fresh sample audit or a peer review noticing a structural issue with the previous round. None of the rounds were silent. Each one shipped a /scanner/precision changelog entry naming what was wrong and what changed.
The point is not that AgentScore got everything right. The point is that the iteration is visible. A reader who finds an old advisory on a package and a corrected scan on the live report page can see the gap and the correction note explaining it. They do not have to trust the system. They can read both versions and decide.
For anyone running continuous scanning at scale on a public surface, the lesson is: the loud direction (false positives) is easier to catch than the quiet direction (false negatives), the FN risk gets harder once you start tightening, and the only thing that compounds credibility through all of it is doing the corrections in public.
AgentScore continuously scans MCP packages on npm and publishes a public security record. Live data, advisories, and the full mitigator changelog are at agentscores.xyz.
Top comments (0)