Last month, an AI agent I built told a user "As a Senior Engineer at Google, you should consider..."
The user had been promoted to Staff Engineer three months earlier. The agent had no idea. No error. No warning. Just a confident, wrong answer served from stale memory.
That's when I realized: the biggest risk in AI agents isn't hallucination — it's stale memory served with high confidence.
The Problem Nobody Talks About
AI agents using memory systems (Mem0, Zep, Letta, LangMem) store facts about users, companies, and decisions. Things like:
- "John works as Senior Engineer at Google"
- "Pro plan costs $99/month"
- "Sarah reports to Mike in Engineering"
These facts get stored once and served forever. No expiration. No re-verification. No staleness check.
Here's what makes it dangerous: memory systems decay facts by access frequency or TTL timers. But a frequently-retrieved memory about a user's job title is highly relevant until the moment it's wrong — at which point it becomes confidently wrong rather than just outdated.
An agent without memory would ask "What do you do?" again. Slightly annoying, but honest. An agent with stale memory states the wrong answer as established fact. That's worse.
How Big Is This Problem?
I ran a simple experiment. I stored 24 real-world facts in Mem0 — job titles, pricing, company info, policies, technical details. Then I checked each one against its original source after simulating 90 days:
- Pricing facts — 55% had changed
- Policy facts — 45% had changed
- Job titles — 15% had changed
- Addresses — 5% had changed
More than a third of stored facts were wrong within 3 months. And agents were retrieving them hundreds of times without knowing.
What I Built: MemGuard
I built an open-source platform that sits beside your memory system (doesn't replace it) and continuously validates whether stored facts are still true.
Think of it as Datadog for agent memory — it monitors, validates, and alerts, but doesn't own the data.
How It Works
1. Connect — MemGuard plugs into your existing memory system. Native connectors for Mem0, Zep, Letta, LangMem, or any REST API.
2. Validate — Five strategies, from simple to AI-powered:
| Strategy | How | Needs LLM? |
|---|---|---|
| Source-Linked | Re-fetch original source URL, compare values | No |
| Cross-Reference | Check against 2-3 independent sources | No |
| Temporal Pattern | Statistical staleness prediction per fact-type | No |
| Semantic Drift | LLM detects contradictions in recent context | Yes |
| Causal Chain | Find dependent facts that break together | Yes |
3. Score — Every memory gets a composite trust score (0-100%) based on source reliability, freshness, cross-reference agreement, and retrieval frequency.
4. Quarantine — Facts below 30% trust are automatically quarantined so agents stop using them. Facts below 50% are flagged for review.
5. Alert — Dashboard, webhooks, or MCP tools so agents can call validate_memory() before acting on stored facts.
The Trust Score
This is the core of MemGuard. Each memory's trust score is a weighted combination of:
Trust = 0.20 x source_reliability
+ 0.25 x freshness (exponential decay by fact-type)
+ 0.20 x cross_reference_agreement
+ 0.10 x dependency_health
+ 0.15 x historical_accuracy
+ 0.10 x retrieval_importance
The key insight: retrieval frequency increases urgency, not trust. A stale memory retrieved 100 times/day is more dangerous than one retrieved once/month. High retrieval + low trust = highest risk.
MCP Integration — Agents Validate Before Acting
MemGuard exposes an MCP server so agents can self-check before using memories:
# Agent's internal flow
memory = get_memory("user_job_title")
# Before acting on it, validate
result = mcp.call("validate_memory", {"memory_id": memory.id})
if result.trust_score > 0.7:
# Safe to use
respond(f"As a {memory.content}...")
else:
# Don't trust it, ask the user instead
respond("Can you confirm your current role?")
Four MCP tools available:
-
validate_memory— check a specific fact before using it -
get_memory_health— overall health metrics -
report_stale_memory— agent reports suspected staleness -
get_trusted_memories— retrieve only high-trust facts
Quick Start
One command:
git clone https://github.com/ac12644/MemGuard.git
cd MemGuard
docker-compose up
Dashboard at localhost:3000. API docs at localhost:8001/docs.
Then: Add Connector -> Pick Mem0/Zep/Letta -> Enter API key -> Sync -> Run Validation.
Tech Stack
- Backend: Python 3.12, FastAPI, SQLAlchemy 2.0, Celery
- Database: PostgreSQL 16, Redis 7
- Dashboard: React 18, Tailwind CSS, Vite, Recharts
- LLM: Anthropic Claude (optional — core works without it)
- MCP: Python MCP SDK for agent integration
- Deploy: Docker Compose, Caddy for auto-TLS in production
What I Learned Building This
1. Fact-type matters more than age. Pricing changes every quarter. Addresses change every decade. A blanket TTL is useless — you need per-category staleness curves.
2. The most dangerous memories are the most useful ones. High-retrieval memories are the ones agents rely on most. When they go stale, the blast radius is massive.
3. Agents should validate, not just retrieve. The MCP integration changes the agent's behavior from "retrieve and trust" to "retrieve, validate, then decide." That single change prevents most stale-memory errors.
4. You don't need LLM for most validation. Source re-fetch and temporal patterns catch 80% of staleness without any LLM cost. Save the AI-powered strategies for edge cases.
Open Source — Apache 2.0
The full project is on GitHub:
ac12644
/
MemGuard
AI Agent Memory Validation Platform — continuously verify whether facts stored in AI agent memory systems (Mem0, Zep, Letta, LangMem) are still true. Like Datadog for agent memory.
AI Agent Memory Validation Platform
Continuously verify whether facts stored in AI agent memory systems are still true
Quick Start · Connectors · Strategies · API · Contributing
Why MemGuard?
AI agents store facts in memory systems — a user's job title, a product's price, a company's address. These facts go stale silently. The agent keeps using them with high confidence, delivering wrong answers without any warning.
MemGuard sits beside your memory system (Mem0, Zep, Letta, LangMem, or any REST API) as a sidecar that monitors, validates, and alerts — like Datadog for agent memory.
Core insight: Memory systems decay facts by access frequency or TTL timers. But a frequently-retrieved memory about a user's employer is highly relevant until it's wrong — then it becomes confidently wrong rather than just outdated. MemGuard detects this proactively.
Screenshots
Validations — Run
- 5 connectors (Mem0, Zep, Letta, LangMem, Generic REST)
- 5 validation strategies
- 40 API endpoints
- Dashboard with onboarding
- MCP server for agent integration
- Production-ready with Caddy TLS + automated backups
Contributions welcome. If you're building AI agents with memory systems, I'd love to hear what validation strategies matter most for your use cases.
If your agent has ever confidently told a user something that was true six months ago but not today — that's the problem MemGuard solves.





Top comments (0)