The Silent Threat Killing Your AI Agents in Production
You've deployed your AI agent. It's working great. Then, three weeks later, it starts behaving strangely — recommending wrong things, leaking data, ignoring safety rules. You check the model weights. Fine. You check the code. Fine. The problem is in the memory.
This is AI Agent Memory Poisoning — OWASP Agentic Top 10 ASI06 — and it's one of the most underestimated attack vectors in production AI systems today.
What Is Memory Poisoning?
An attacker (or a malicious tool output) injects crafted content into your agent's persistent memory:
- Conversation history
- RAG/vector stores
- External memory systems (Mem0, Zep, etc.)
The injected content silently corrupts future reasoning across all sessions. The model weights are fine. The memory isn't.
Example attack:
If your agent stores malicious tool output in memory without scanning it, every future user gets poisoned responses.
Introducing OWASP Agent Memory Guard (AMG)
I built agent-memory-guard to fix this. It's an open-source Python library under the OWASP umbrella that wraps any memory store as a transparent security layer.
Install: pip install agent-memory-guard
GitHub: https://github.com/OWASP/www-project-agent-memory-guard
How It Works
AMG intercepts every memory read and write and scans for:
- Prompt injection patterns — 150+ regex patterns + semantic analysis
- PII/secret leakage — SSNs, credit cards, API keys, passwords
- Protected key tampering — prevents overwriting critical system instructions
- Anomalous content — statistical outliers that indicate injection attempts
Works with LangChain, LangGraph, AutoGen, Mem0, custom RAG pipelines, and any dict-like memory store.
Performance
- 92.5% detection rate on the AgentThreatBench evaluation suite
- 0% false positives on benign workloads
- 59µs median latency — imperceptible overhead
- Zero external dependencies — fully local, no cloud calls
The AgentThreatBench (ATB) Evaluation Suite
AMG ships with AgentThreatBench — a curated dataset of 400+ adversarial memory attack scenarios for benchmarking agent memory defenses.
Install: pip install agent-threat-bench
Why This Matters Now
The OWASP Agentic Top 10 (released 2025) identifies memory poisoning as a critical risk for production AI agents. As agentic systems become more autonomous and long-running, the attack surface grows exponentially.
AMG is the first open-source, production-ready defense specifically targeting this threat class.
Get Involved
- GitHub: https://github.com/OWASP/www-project-agent-memory-guard
- PyPI: pip install agent-memory-guard
- Benchmark: pip install agent-threat-bench
Star the repo, open issues, contribute attack scenarios to ATB, or just try it in your next agent project. Feedback welcome!
Top comments (1)
Memory poisoning is the sleeper threat that the "give agents persistent memory" hype completely glosses over - the moment an agent's memory is writable from untrusted input, an attacker can plant a false "fact" once and have it influence every future decision, silently. It's prompt injection with persistence: poison the memory today, exploit it next week, and there's no obvious moment of attack to catch. Far nastier than a one-shot injection.
The defenses that matter (which I'd guess your guard implements): treat memory writes as untrusted by default, provenance-tag every memory so you can trace and revoke a poisoned entry, and never let recalled memory flow into a privileged action without the same gate you'd apply to fresh input. Memory is just another untrusted boundary. It's the discipline I lean on in Moonshift (a multi-agent pipeline shipping a prompt to a real SaaS) - durable context, but validated/scoped, never blindly trusted. Genuinely important and under-covered - is the guard doing write-time validation, or detecting poisoned entries at recall time? Write-time feels safer but recall-time catches what slipped through.