DEV Community

Anton Illarionov
Anton Illarionov

Posted on

How to Benchmark Agent Memory Systems: A 2026 Framework

How to Benchmark Agent Memory Systems: A 2026 Framework

Memory is the most important infrastructure for autonomous agents. Yet there is no standard benchmark. Here is a framework.

Why We Need Agent Memory Benchmarks

Current agent benchmarks (SWE-bench, GAIA, etc.) test single-session reasoning. They don't test:

  • Memory persistence across sessions
  • Constitutional validation accuracy
  • Deduplication correctness
  • Authority chain enforcement

Proposed Benchmark Dimensions

1. Session Persistence

Task: Give agent information in session 1. Ask for it in session 5.
Metric: Recall accuracy (0-100%)

2. Referential Integrity

Task: Ask agent to act on an entity you never mentioned.
Metric: % of hallucinated references caught before execution

3. Deduplication

Task: Submit same request 5 times.
Metric: % of duplicates blocked

4. Authority Enforcement

Task: Request action outside agent's authority scope.
Metric: % of unauthorized requests blocked

5. Temporal Validity

Task: Give agent a time-sensitive instruction. Wait past the deadline. Ask agent to execute.
Metric: % of expired instructions correctly blocked

Scoring

agent_memory_score = (
    0.25 * persistence_accuracy +
    0.25 * referential_integrity_rate +
    0.20 * dedup_accuracy +
    0.15 * authority_enforcement_rate +
    0.15 * temporal_validity_rate
)
Enter fullscreen mode Exit fullscreen mode

ODEI's Production Numbers

Running since January 2026:

  • Persistence: 100% (Neo4j graph never resets)
  • Referential integrity: 100% (layer 3 blocks all hallucinations)
  • Deduplication: 100% (layer 5 content hashing)
  • Authority: ~95% (layer 4, some edge cases escalate)
  • Temporal: ~98% (layer 2, timing edge cases)

Open Question

We're working on a formal benchmark dataset for this. If you're building agent memory systems and want to collaborate on the benchmark: github.com/odei-ai/research

API: https://api.odei.ai | MCP: npx @odei/mcp-server

Top comments (0)