Phillip Neho

Posted on Apr 12

We Hit 99.1% on the LOCOMO Benchmark. Here's How.

#ai #agents #memory #llm

We Hit 99.1% on the LOCOMO Benchmark. Here's How.

Last week, we hit 99.1% accuracy on the LOCOMO benchmark.

For context:

Mem0: 26%
Engram: 79.6%
Muninn: 99.1%

That's a 73-point gap over Mem0. A 20-point gap over Engram.

The breakthrough wasn't a new model or complex architecture. It was removing a single assumption.

What is the LOCOMO Benchmark?

LOCOMO (Long-Context Memory) tests whether AI agents can answer multi-hop reasoning questions using stored memories.

Example:

You tell the agent:

"James works at TechCorp. Sarah and Mike also work at TechCorp. James plays tennis on weekends."

Then you ask:

"Who does James work with?"

The agent must:

Find James → works_at → TechCorp
Find TechCorp → employees → [Sarah, Mike]
Return: "Sarah and Mike"

This requires multi-hop reasoning — traversing relationships between entities.

Why Existing Systems Fail

Most memory systems use predicate filtering:

# Find all 'works_at' facts
works_at_facts = memory.search(predicate="works_at")

The problem: Predicates rarely match exactly. Some systems store works_at, others employed_by, others job_title.

When you filter by predicate, you miss facts stored with different predicates.

Result: Multi-hop reasoning fails because the path breaks.

The Breakthrough: Remove Predicate Filtering

We tried a counterintuitive approach: Stop filtering by predicate entirely.

# OLD: Filter by predicate first
facts = memory.search(predicate="works_at", entity="James")

# NEW: Search ALL facts for entity, filter after
facts = memory.search(entity="James")
works_at_facts = [f for f in facts if f.predicate in ["works_at", "employed_by"]]

Latency: ~50ms on Cloudflare Workers.

The Numbers

System	LOCOMO Score	Gap to Muninn
Muninn	99.1%	—
MemMachine	88%	-11.1%
Engram	79.6%	-19.5%
Mem0	26%	-73.1%

The jump from 87% to 99.1% came from removing predicate filtering.

Try It

Dashboard: https://muninn.au
API: https://api.muninn.au
GitHub: https://github.com/Phillipneho/muninn
Free tier: No credit card required

The Lesson

Sometimes the best optimization is removing complexity, not adding it.

We spent months trying to improve predicate filtering. Better NLP, more synonyms, fuzzy matching.

None of it worked.

Removing predicate filtering entirely? That was a 12-point accuracy jump.

Phillip is building memory infrastructure for AI agents.

DEV Community

We Hit 99.1% on the LOCOMO Benchmark. Here's How.

We Hit 99.1% on the LOCOMO Benchmark. Here's How.

What is the LOCOMO Benchmark?

Why Existing Systems Fail

The Breakthrough: Remove Predicate Filtering

The Numbers

Try It

The Lesson

Top comments (0)