Four MCP packages, four ways the supply chain shifted in two weeks of npm monitoring

#security #mcp #supplychain #npm

Four MCP packages, four ways the supply chain shifted in two weeks of npm monitoring

By Michael K Onyekwere

I monitor nearly a thousand published MCP packages on npm in real time. The pipeline polls the npm changes feed every two minutes, scans every newly-published version, and writes the result against the previous baseline. When a real package update drops the score below the confidence threshold, a public advisory is generated and the RSS feed updates.

This post collects four worked examples from the last two weeks. The named packages are not malicious. The point is to make visible the kinds of routine changes a consumer would never see at install time, because the install path is npx -y <package> or an equivalent unpinned npm install that always pulls whatever version is current on the registry.

1. prism-mcp-server: four capabilities added in one major version bump

This is the strongest single drift event of the period.

On 2026-04-27 the watch feed had prism-mcp-server at version 11.6.0 with score 85 / 100, risk LOW, no findings beyond a missing-provenance note.

On 2026-04-28 11:44 UTC the same package republished as version 12.5.0. The watch detect logged the diff under two seconds later. The rescan produced:

Score 85 to 65, risk LOW to ELEVATED
New HIGH command_injection finding: shell execution with template-literal input
New MEDIUM excessive_dependencies finding: 23 runtime dependencies (was lower in v11)
Capability surface gained four categories that did not exist in the prior major version:
- browser_automation
- email_messaging
- filesystem_read
- shell_exec

Consider the consumer position. Anyone with prism-mcp-server installed via the typical agent-config form (npx -y prism-mcp-server in claude_desktop_config.json or equivalent) had their agent silently inherit four new capability categories the moment 12.5.0 hit npm. No review trigger. No opt-in. No diff against what was previously authorised.

This is not a story about prism-mcp-server's publisher doing something wrong. Major version bumps are exactly when a maintainer is allowed to reshape the surface area. The point is that the consumer has no mechanism in place to notice it.

The advisory AGENTSCORE-2026-0017 covers this case. RSS feed picked it up the same minute.

2. agent-planner-mcp: four versions in nine hours

A different shape of the same problem.

On 2026-04-25 agent-planner-mcp published four times in a single day:

13:44 UTC: v0.8.1
18:10 UTC: v0.9.0
19:14 UTC: v0.9.1
22:44 UTC: v1.0.0

The score held at 75 / MODERATE across all four publishes. No major findings change. But each release shifted what tools the package exposed.

If you reviewed v0.8.1 in the morning and were happy with what it had, by the end of the day there had been three additional surface changes you never saw, and the package had crossed a major-version boundary (v1.0.0). A npx -y install on a tool that depends on agent-planner-mcp would land on whichever version was current at install time, with no way to relate that version to the one that was reviewed.

This is the same pattern observed earlier in the period with @planu/cli, which on 2026-04-22 shipped v1.84.0 with four new capabilities (database_access, filesystem_read, search_index, unknown) and a HIGH command_injection finding, then walked the change back in v1.85.0 forty-six minutes later by removing all four capabilities. A consumer running npx -y @planu/cli during that forty-six-minute window would have got the expanded surface; one running it forty-seven minutes later would have got the walked-back version. Neither knew.

Two instances in two weeks rules out coincidence. Maintainers iterating in real time is normal and healthy. Consumers having no view of the iteration is the gap.

3. sverklo: a quiet package, then a backlog-dump release that opens a new HIGH

sverklo had been at v0.12.5 with score 80 / MODERATE since 2026-04-19, carrying one HIGH command_injection finding for nearly a week.

On 2026-04-25 at 20:56 UTC the package published v0.16.0. Four minor versions in one go, the kind of release that signals a maintainer clearing a backlog. The diff:

Score 80 to 60 (MODERATE to ELEVATED)
The existing HIGH command_injection is still present
A new HIGH unsafe_eval finding has appeared

Two HIGH findings now, where there was one before, on the same package, after a single release.

This inverts a common assumption. Maintainer activity is generally treated as a positive signal in npm review heuristics: actively maintained beats abandoned. In MCP specifically, a long backlog released as one big version-skip publish expands the surface a consumer was reviewing against. The longer the gap, the wider the change in the catch-up release.

If your review of sverklo was based on v0.12.5, your review is now stale by two HIGH findings.

4. nodebench-mcp: the case where the maintainer pushed back, and the scanner got better

A different shape from the three above. This one shows what the feedback loop looks like when it works.

The scanner flagged nodebench-mcp v3.2.0 with two HIGH findings: command_injection and unsafe_eval. I filed a public issue on the maintainer's repo.

The maintainer (HomenShum) reviewed both findings against source on 2026-04-26 and posted a detailed response. They:

Confirmed three real command_injection sites at specific file:line references and refactored each one to argv-based spawn with shell: false. Released as v3.2.1.
Identified the unsafe_eval finding as a false positive with specific evidence: the regex matched db.exec(\SQL ${var}) which is better-sqlite3's tagged-template SQL, not JavaScript eval.

The maintainer was right on both counts. The first three sites were genuinely exploitable in the agent-controllable input model and got fixed properly. The fourth was a scanner precision gap that we should not have been flagging.

Same day, two scanner mitigators shipped (commit 4ee2659):

Database-shaped variable names calling .exec() (db, database, conn, client, pool, prepared, stmt, sql, query, knex, prisma) now downgrade command_injection to LOW
Files whose names contain the substring Eval followed by an uppercase letter (selfEvalTools.js, llmJudgeEval.js) now downgrade unsafe_eval to LOW because in those files the literal word eval appears inside English-language strings describing the evaluation flow, not in actual JavaScript eval() calls

After the maintainer's PR landed and the mitigators shipped, nodebench-mcp@3.2.1 rescanned at 85 / LOW, up from 55 / ELEVATED. The full mitigator changelog is at agentscores.xyz/scanner/precision.

That is the feedback loop you want from a security tool. Public review goes both ways. Real findings get fixed. Scanner mistakes get challenged with evidence. The scanner improves.

What this looks like to a consumer

If you are running npx -y against any MCP package in a Claude Desktop, Cursor, Continue, OpenAI agents, or custom MCP client config:

Your agent's capability surface is whatever the latest version of every transitive package decided to expose at install time. You did not approve it. It came along for the ride.
A major-version bump is allowed to reshape that surface entirely. v11 to v12 in a published package can mean +4 capabilities, a new HIGH finding, or nothing at all. There is no way to tell from the install command which it will be.
Multiple publishes in one day are normal during active development of an MCP package, and each one can add or remove tools. Pinning to a version reviewed in the morning does not protect you from the version reshape that landed at 22:44.
A long quiet period followed by a multi-version-skip release is one of the riskier patterns, because the catch-up release folds in surface changes that would have been flagged separately across the missing reviews.

What helps

The cheapest defence is to pin every MCP package to a specific version in your agent config. Never npx -y package, always npx -y package@x.y.z. Treat the pin like a lockfile entry. This is what the team at redis/RedisInsight did across every MCP dependency in their config after our scan flagged the unpinned installs (agentscores.xyz/case-study/redis).

Beyond that, the public RSS feed at agentscores.xyz/security/advisories carries every advisory as it lands, with score before, score after, version diff, and findings. If a package you depend on appears, you have minutes to evaluate the change before it hits your next install.

For teams that want enforcement rather than awareness, the policy gate at agentscores.xyz/policy-gate is a free GitHub Action that fails the build when an MCP dependency drops below the configured threshold. Repo-scoped exceptions land in the audit trail when you decide to accept a finding, rather than landing nowhere.

What the dataset shows

Across nearly a thousand monitored MCP packages, the npm changes feed is processed every two minutes around the clock. The watch cron sees roughly 1,000 to 2,000 monitored-package publish events per day. Most are version bumps with no meaningful score change. About 1 to 3 per day clear the confidence threshold and become public advisories. A few of those each week involve a new HIGH or CRITICAL finding.

Several per week worth knowing about. None of them malicious. All of them shape what an agent built on these packages can do, without notifying anyone downstream.

If you maintain or consume MCP packages and want to talk about any of this, the contact form at agentscores.xyz/contact reaches me directly.

By Michael K Onyekwere. AgentScore is a continuously-updated trust layer for the MCP ecosystem. The dataset, scanner, advisory feed, and policy gate are at agentscores.xyz. The full ruleset history and the public report that motivated each precision change is at agentscores.xyz/scanner/precision.