OWASP Agent Memory Guard: Stop AI Agent Memory Poisoning Before It Corrupts Your Production Systems

#agents #ai #security #showdev

The Silent Threat Killing Your AI Agents in Production

You've deployed your AI agent. It's working great. Then, three weeks later, it starts behaving strangely — recommending wrong things, leaking data, ignoring safety rules. You check the model weights. Fine. You check the code. Fine. The problem is in the memory.

This is AI Agent Memory Poisoning — OWASP Agentic Top 10 ASI06 — and it's one of the most underestimated attack vectors in production AI systems today.

What Is Memory Poisoning?

An attacker (or a malicious tool output) injects crafted content into your agent's persistent memory:

Conversation history
RAG/vector stores
External memory systems (Mem0, Zep, etc.)

The injected content silently corrupts future reasoning across all sessions. The model weights are fine. The memory isn't.

Example attack:

If your agent stores malicious tool output in memory without scanning it, every future user gets poisoned responses.

Introducing OWASP Agent Memory Guard (AMG)

I built agent-memory-guard to fix this. It's an open-source Python library under the OWASP umbrella that wraps any memory store as a transparent security layer.

Install: pip install agent-memory-guard

GitHub: https://github.com/OWASP/www-project-agent-memory-guard

How It Works

AMG intercepts every memory read and write and scans for:

Prompt injection patterns — 150+ regex patterns + semantic analysis
PII/secret leakage — SSNs, credit cards, API keys, passwords
Protected key tampering — prevents overwriting critical system instructions
Anomalous content — statistical outliers that indicate injection attempts

Works with LangChain, LangGraph, AutoGen, Mem0, custom RAG pipelines, and any dict-like memory store.

Performance

92.5% detection rate on the AgentThreatBench evaluation suite
0% false positives on benign workloads
59µs median latency — imperceptible overhead
Zero external dependencies — fully local, no cloud calls

The AgentThreatBench (ATB) Evaluation Suite

AMG ships with AgentThreatBench — a curated dataset of 400+ adversarial memory attack scenarios for benchmarking agent memory defenses.

Install: pip install agent-threat-bench

Why This Matters Now

The OWASP Agentic Top 10 (released 2025) identifies memory poisoning as a critical risk for production AI agents. As agentic systems become more autonomous and long-running, the attack surface grows exponentially.

AMG is the first open-source, production-ready defense specifically targeting this threat class.

Get Involved

GitHub: https://github.com/OWASP/www-project-agent-memory-guard
PyPI: pip install agent-memory-guard
Benchmark: pip install agent-threat-bench

Star the repo, open issues, contribute attack scenarios to ATB, or just try it in your next agent project. Feedback welcome!

Top comments (1)

Harjot Singh • May 31

Memory poisoning is the sleeper threat that the "give agents persistent memory" hype completely glosses over - the moment an agent's memory is writable from untrusted input, an attacker can plant a false "fact" once and have it influence every future decision, silently. It's prompt injection with persistence: poison the memory today, exploit it next week, and there's no obvious moment of attack to catch. Far nastier than a one-shot injection.

The defenses that matter (which I'd guess your guard implements): treat memory writes as untrusted by default, provenance-tag every memory so you can trace and revoke a poisoned entry, and never let recalled memory flow into a privileged action without the same gate you'd apply to fresh input. Memory is just another untrusted boundary. It's the discipline I lean on in Moonshift (a multi-agent pipeline shipping a prompt to a real SaaS) - durable context, but validated/scoped, never blindly trusted. Genuinely important and under-covered - is the guard doing write-time validation, or detecting poisoned entries at recall time? Write-time feels safer but recall-time catches what slipped through.