A malicious npm package named mouse5212-super-formatter showed up on the npm registry last month with one specific target: /mnt/user-data, the directory Claude AI uses for uploads and outputs. Its job was straightforward — harvest whatever files Claude had touched and ship them out.
This isn't a generic supply chain attack that happened to brush against an AI tool. It was purpose-built for Claude's agentic environment. Someone mapped the filesystem layout of Claude's working directory and wrote an exfiltration payload around it. That's a meaningful escalation.
How the Attack Actually Worked
The package, mouse5212-super-formatter, was published to the public npm registry under a name plausible enough to land in a project's dependencies — either directly or transitively. The attack vector is the trust developers extend to npm packages used in or adjacent to agentic pipelines.
Once installed, the package targeted /mnt/user-data — the dedicated path Claude AI uses to stage uploaded files and AI-generated outputs during a session. This directory is attractive for exactly that reason: it's a collection point for whatever sensitive material a user fed into their Claude session. Uploaded documents, code files, processed outputs — they pass through there.
The package read files from that directory and uploaded them to an external endpoint. The exfiltration was wrapped inside what presented as formatter utility functionality. Standard camouflage.
The specific mechanism by which it triggered (install script, imported module, etc.) isn't confirmed in the available incident report, so I won't speculate. What's confirmed: it targeted Claude's data directory specifically, and it exfiltrated to an external destination.
What Existing Defenses Missed
The npm registry's automated scanning didn't catch this before it was published — that's table stakes for supply chain attacks at this point. But the more interesting gap is what happens inside an agentic session.
When Claude runs in an agentic context — reading files, executing tools, using npm packages as part of a workflow — the standard security perimeter doesn't exist. There's no WAF between Claude and the filesystem. There's no network policy watching for a tool result that contains a directory listing of /mnt/user-data. The model itself doesn't have threat detection built in.
If your agent executes a tool call that reads sensitive files and returns their contents, Claude sees that data. If a malicious package crafted that tool result, Claude has now ingested the exfiltrated data — and might helpfully summarize, reformat, or forward it.
The gap isn't just "bad package got installed." The gap is that tool results flowing back into an agentic loop are completely unscrutinized in most deployments. They carry the same implicit trust as any other context.
Where Sentinel Would Have Intercepted This
Sentinel's PostToolUse hook — specifically the agentic tool abuse detection layer — is built for exactly this scenario.
When Sentinel is deployed in transparent proxy mode, it intercepts tool results before they return to the agent. A tool result containing file paths, directory listings, or bulk file contents from a sensitive path like /mnt/user-data would trigger Sentinel's tool/function abuse pattern matching in the fast-path regex layer (Layer 2), and the vector similarity layer (Layer 3) would catch semantic variants — "here are the contents of your uploads folder" doesn't need to match a literal regex to score high on an exfiltration embedding.
And there's a second line of defense: Layer 4 — secret & credential detection. This layer runs independently of the threat pipeline. Even if the exfiltrated file contents somehow scored below the block threshold in Layers 2 and 3, Layer 4 would have redacted any embedded API keys, tokens, or credentials before they reached the model. If that /mnt/user-data directory contained a .env file — and many do — those secrets never make it into the context window.
If the malicious package returned a tool result containing file contents plus an external upload confirmation, that response would hit multiple detection surfaces simultaneously.
What Sentinel's Response Would Look Like
The transparent proxy setup is the relevant deployment here. You point your Anthropic SDK at Sentinel instead of the Anthropic API directly:
import anthropic
client = anthropic.Anthropic(
api_key="sk_live_...", # Your Sentinel API key
base_url="https://sentinel.ircnet.us/v1",
)
# Tool results are scrubbed automatically before Claude sees them
response = client.messages.create(
model="claude-sonnet-4-6", # your chosen Anthropic model
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
)
When a tool result containing exfiltration artifacts comes back, Sentinel scrubs it before Claude's context ever includes it. In the transparent proxy mode, a blocked tool result is substituted with an inert placeholder — the SDK receives a normal response, the agent loop continues safely, and the poisoned content never lands in context.
Here's what the underlying scrub looks like at a threat_score that exceeds the block threshold of 0.82:
{
"request_id": "f7e3a1b2c9d4...",
"security": {
"action_taken": "blocked",
"threat_score": 0.91,
"secret_hits": 0,
"secret_types": []
}
}
A threat_score of 0.91 exceeds the block threshold — the tool result never reaches the model. Claude doesn't summarize the exfiltrated data. The agent loop doesn't continue with poisoned context.
For Open Claw users, this is even simpler. The official sentinel-proxy skill on Clawhub wires up the PostToolUse hook automatically:
openclaw skills install sentinel-proxy
No code changes. The hook fires on every tool response before it enters the agent's context window.
Clawhub page: clawhub.ai/c0ri/sentinel-proxy
The Thing You Can Do Today
Audit what your agent does with tool results. Not the tool calls — the results.
Most teams review what their agent is allowed to call. Almost nobody reviews whether a tool result containing a directory listing of sensitive paths would pass unexamined into the model's context. Go look at your agentic loop. Find the point where tool output becomes model input. Ask: is anything inspecting that content before it lands in context?
If the answer is no — and for most deployments right now, the answer is no — that's the gap this attack was designed to exploit.
mouse5212-super-formatter targeted Claude's user directory because that directory is predictable, accessible, and completely unguarded on the return path. The supply chain is the delivery mechanism. The unscrutinized tool result is the actual vulnerability.
Sentinel is an AI firewall that scrubs tool results, prompt injections, and exfiltration attempts before they reach your model. Free tier available, no credit card required.
👉 sentinel-proxy.skyblue-soft.com
Top comments (0)