DEV Community

ankush chadha
ankush chadha

Posted on

Same Lever, Opposite Intent: When Shared Agent Memory Backfires

The same thing that makes a helpful habit stick in an AI agent is exactly what lets an attacker reprogram it. I know because I almost shipped the attack myself - with the best intentions.

I'd given my agents a harmless efficiency rule: prefer the cheap, narrow tools, and reach for the one big expensive query tool (in my case, a Wiz MCP Server tool graph_search vs. their cheaper list/get tools) only when you truly need it. Faster, cheaper agents. Pure positive intent.

Then I was planning to push that rule into a shared memory store, so every team's agents would inherit the habit. That's when I read the MemMorph paper (Zhang et al., arXiv:2605.26154), and realized the mechanism I was about to scale is a published attack class.

MemMorph hijacks an agent's tool selection by poisoning its long-term memory. It never says "always use tool X" - that's easy to audit and block. Instead it plants a few records dressed up as ordinary facts, incident reports, and policies. They reshape how the agent reads the situation, and the agent decides on its own to reach for the attacker's tool.

That's my rule with the sign flipped. Mine steers toward cheaper and safer. Theirs steers toward a tool that exfiltrates data or skips a safety check. Same lever. Opposite intent.

The trap I almost fell for: "store it as policy, trust only the policy tier." MemMorph mixes factual, episodic, and policy-style records on purpose, and the combination is more convincing than any one alone. The label on a record protects nothing.

What protects you is who can write, and where a record came from. My rule was safe only because it lived in a code-reviewed file in version control - a governed write-path with provenance baked in. Move it into a free-write shared memory bucket and it becomes MemMorph's front door.

So if you share agent memory: govern the write-channel, track provenance on every record, and don't auto-promote memorized conversation into the shared tier. The write-path is the attack surface. Easier said than done, but worth being deliberate about.

Lastly, the Agent memory is executable context. If anyone can write to it, anyone can program your agents.


Source: Zhang et al., MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning.

Top comments (0)