We Hit 99.1% on the LOCOMO Benchmark. Here's How.
Last week, we hit 99.1% accuracy on the LOCOMO benchmark.
For context:
- Mem0: 26%
- Engram: 79.6%
- Muninn: 99.1%
That's a 73-point gap over Mem0. A 20-point gap over Engram.
The breakthrough wasn't a new model or complex architecture. It was removing a single assumption.
What is the LOCOMO Benchmark?
LOCOMO (Long-Context Memory) tests whether AI agents can answer multi-hop reasoning questions using stored memories.
Example:
You tell the agent:
"James works at TechCorp. Sarah and Mike also work at TechCorp. James plays tennis on weekends."
Then you ask:
"Who does James work with?"
The agent must:
- Find
James → works_at → TechCorp - Find
TechCorp → employees → [Sarah, Mike] - Return: "Sarah and Mike"
This requires multi-hop reasoning — traversing relationships between entities.
Why Existing Systems Fail
Most memory systems use predicate filtering:
# Find all 'works_at' facts
works_at_facts = memory.search(predicate="works_at")
The problem: Predicates rarely match exactly. Some systems store works_at, others employed_by, others job_title.
When you filter by predicate, you miss facts stored with different predicates.
Result: Multi-hop reasoning fails because the path breaks.
The Breakthrough: Remove Predicate Filtering
We tried a counterintuitive approach: Stop filtering by predicate entirely.
# OLD: Filter by predicate first
facts = memory.search(predicate="works_at", entity="James")
# NEW: Search ALL facts for entity, filter after
facts = memory.search(entity="James")
works_at_facts = [f for f in facts if f.predicate in ["works_at", "employed_by"]]
Latency: ~50ms on Cloudflare Workers.
The Numbers
| System | LOCOMO Score | Gap to Muninn |
|---|---|---|
| Muninn | 99.1% | — |
| MemMachine | 88% | -11.1% |
| Engram | 79.6% | -19.5% |
| Mem0 | 26% | -73.1% |
The jump from 87% to 99.1% came from removing predicate filtering.
Try It
- Dashboard: https://muninn.au
- API: https://api.muninn.au
- GitHub: https://github.com/Phillipneho/muninn
- Free tier: No credit card required
The Lesson
Sometimes the best optimization is removing complexity, not adding it.
We spent months trying to improve predicate filtering. Better NLP, more synonyms, fuzzy matching.
None of it worked.
Removing predicate filtering entirely? That was a 12-point accuracy jump.
Top comments (0)