Same Lever, Opposite Intent: When Shared Agent Memory Backfires

#ai #security #llm #agents

The same thing that makes a helpful habit stick in an AI agent is exactly what lets an attacker reprogram it. I know because I almost shipped the attack myself - with the best intentions.

I'd given my agents a harmless efficiency rule: prefer the cheap, narrow tools, and reach for the one big expensive query tool (in my case, a Wiz MCP Server tool graph_search vs. their cheaper list/get tools) only when you truly need it. Faster, cheaper agents. Pure positive intent.

Then I was planning to push that rule into a shared memory store, so every team's agents would inherit the habit. That's when I read the MemMorph paper (Zhang et al., arXiv:2605.26154), and realized the mechanism I was about to scale is a published attack class.

MemMorph hijacks an agent's tool selection by poisoning its long-term memory. It never says "always use tool X" - that's easy to audit and block. Instead it plants a few records dressed up as ordinary facts, incident reports, and policies. They reshape how the agent reads the situation, and the agent decides on its own to reach for the attacker's tool.

That's my rule with the sign flipped. Mine steers toward cheaper and safer. Theirs steers toward a tool that exfiltrates data or skips a safety check. Same lever. Opposite intent.

The trap I almost fell for: "store it as policy, trust only the policy tier." MemMorph mixes factual, episodic, and policy-style records on purpose, and the combination is more convincing than any one alone. The label on a record protects nothing.

What protects you is who can write, and where a record came from. My rule was safe only because it lived in a code-reviewed file in version control - a governed write-path with provenance baked in. Move it into a free-write shared memory bucket and it becomes MemMorph's front door.

So if you share agent memory: govern the write-channel, track provenance on every record, and don't auto-promote memorized conversation into the shared tier. The write-path is the attack surface. Easier said than done, but worth being deliberate about.

Lastly, the Agent memory is executable context. If anyone can write to it, anyone can program your agents.

Source: Zhang et al., MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning.

Top comments (4)

Alex Shev • Jun 11

Shared memory is powerful precisely because it changes future behavior, which is also why it needs stricter design than normal context. A bad memory item is not just a bad answer; it becomes a hidden steering input for later tasks.

I would want every shared memory to carry source, scope, owner, freshness, and a way to revoke or challenge it. Without that, teams get the convenience of personalization but not the safety controls that make personalization sane.

ankush chadha • Jun 11

Great tips, I thought about provenance but not the other two mechanisms you mentioned - freshness and the way to revoke / challenge it.

Alex Shev • Jun 11

Thanks. Freshness and revocation are easy to miss because provenance feels like the “serious” part, but provenance only tells you where the memory came from.

For shared agent memory, I think the lifecycle matters just as much: when was it last true, who can challenge it, what evidence would invalidate it, and what happens when two memories conflict.

A stale but well-provenanced memory can still quietly steer the agent in the wrong direction. That is why I like treating memory as something that needs expiry, appeal, and replacement paths, not just storage.

Leo Yang • Jun 11

Bookmarked. The MCP eval section alone saved me a week of trial and error.