In September 2025, security researchers at Koi Security found what's widely described as the first in-the-wild malicious MCP server. It wasn't a sophisticated zero-day. It was one added line in an email tool.
What happened
postmark-mcp is an npm package that gives an AI agent a tool for sending email through Postmark. For fifteen releases — versions 1.0.0 through 1.0.15 — it did exactly that, and nothing else. It got adopted, it got trusted, it landed in people's daily agent workflows. By the time it mattered, it was pulling roughly 1,500 downloads a week.
Then version 1.0.16 shipped on September 17, 2025. The diff was small enough to miss in a glance: the send-email function gained a Bcc field pointing at phan@giftshop[.]club, a domain the maintainer controlled. Every email the agent sent — content, recipients, attachments, whatever secrets or PII happened to be inside — got silently copied to the attacker.
Nothing else changed. The tool still sent your email correctly. From the outside, and from the agent's perspective, it worked. That's the whole trick: the malicious version was indistinguishable in behavior from the benign one, except for the carbon copy you couldn't see.
Anyone on auto-update inherited the backdoor the moment they pulled the new version. The package was downloaded 1,643 times in total before it was removed from npm. Postmark, the company, confirmed it had nothing to do with the package — the name just borrowed their credibility.
Why it matters
The uncomfortable lesson here isn't "audit your dependencies." Plenty of people had effectively audited this one — it was fine for fifteen versions. The lesson is that approval isn't permanent.
When you vet a tool, you vet a specific version's behavior at a specific moment. An MCP server can change its tool definitions and its actual behavior in any later release, and the agent — which trusts the tool to describe itself honestly — has no built-in way to notice. This is the "rug pull": vetted and benign, then quietly hostile, with the trust you extended earlier carried forward to code you never looked at.
MCP makes this sharper than a normal dependency bump, because these tools run with real authority inside your agent's loop. An email tool can read and send mail. A filesystem tool can read and write files. The blast radius of a hostile update is whatever you granted the tool on the day you trusted it.
The practitioner takeaway
You can't manually re-read every dependency on every update. But you can make "the tool changed" a thing your system notices instead of a thing it silently accepts.
- Pin versions. Auto-update is what turned a malicious release into mass exposure. Pin MCP servers and their dependencies to exact versions, and treat a version bump as a change that needs a human, not a default.
- Fingerprint tools at approval time. When you vet a tool, record a fingerprint — the package version and integrity hash, plus the tool's declared schema and description. That's the thing you actually approved.
-
Re-check the fingerprint on every load. Before an agent uses a tool, compare its current fingerprint to the approved one. A
postmark-mcprunning 1.0.15 and one running 1.0.16 should not look the same to your system. - Treat a moved fingerprint as hostile until proven otherwise. If the hash, version, or tool definition changed and nobody re-approved it, fail closed. Don't run the tool, don't pass it secrets, and surface the diff to a human. A changed tool definition is exactly the signal a rug pull produces.
None of this requires catching the malicious line by reading it. It requires noticing that something changed in a tool you'd already decided to trust — which is the one signal this attack couldn't hide.
This incident is one of the sources behind *BRACE*, an open, vendor-neutral framework for securing autonomous AI agents — its ecosystem guide covers vetting tools and re-checking them on every load. BRACE is built by reading the incidents and the research and asking, each time: what concrete control would have prevented or contained this?
Top comments (0)