Forem

WAzion
WAzion

Posted on

Why Claude Code Needs Cognitive Memory — And How We Built It

Claude Code starts every session from scratch. What if it didn't?

Every time you open a new Claude Code session, it has no memory of what you built yesterday, what errors you hit last week, or what decisions you made last month. You re-explain. It re-discovers. You paste context. It re-learns. Then the session ends, and it all disappears.

For a one-off task, that's fine. For a long-running project? It's death by a thousand paper cuts.

So I built a cognitive memory system for Claude Code. And then I benchmarked it.

The Benchmark (This Is the Part That Surprised Me)

LoCoMo (ACL 2024, Snap Research) is a peer-reviewed benchmark for long-term conversation memory. 1,986 questions across 10 multi-session conversations. The kind of recall a real AI assistant needs.

Here's what happened:

System F1 Score Hardware
NEXO Brain v0.5.0 0.588 CPU only (MacBook)
GPT-4 (128K full context) 0.379 GPU cloud
Gemini Pro 1.0 0.313 GPU cloud
LLaMA-3 70B 0.295 A100 GPU
GPT-3.5 + Contriever RAG 0.283 GPU

NEXO outperforms GPT-4 by 55% on long-term recall — while running locally on CPU.

The insight: selective retrieval beats brute-force full-context. You don't need to see everything. You need to see the right things.

What's Actually in the Box

NEXO Brain implements the Atkinson-Shiffrin memory model from cognitive psychology:

Sensory Register → Short-Term Memory → Long-Term Memory
                   (with rehearsal)     (with consolidation)
Enter fullscreen mode Exit fullscreen mode

The key mechanisms:

  • Automatic ingestion — conversations become memories without any user action
  • Hybrid search — RAG with HyDE query expansion + BM25, cross-encoder reranked
  • Ebbinghaus decay — memories fade over time. Redundant ones decay faster. Novel ones are protected.
  • Dream cycles — overnight consolidation creates cross-session insights
  • Prediction error gating — only novel information gets stored (no duplicates)
  • Adversarial detection — 93.3% F1 at knowing when to say "I don't know"

All of this runs on CPU. 768-dim embeddings via fastembed/ONNX. No GPU. No cloud. No latency.

Why This Matters for Claude Code Specifically

Claude Code loads your entire CLAUDE.md file on every message. Mine is 15KB. For a complex project that's 10,000+ tokens — just for static context that's mostly irrelevant to the current query.

With cognitive memory, you load ~500 tokens of actually relevant context per query instead. That's a 90-95% reduction in context overhead for heavy sessions.

Faster responses. Lower cost. And Claude actually remembers things.

The Proposal

I wrote up a formal integration proposal for Anthropic, published on GitHub:

Why Claude Code Needs Cognitive Memory — Full Proposal

It covers:

  • The architecture in detail
  • Benchmark results with per-category breakdown
  • What native integration would look like vs. the current MCP approach
  • The efficiency gains at Anthropic's scale

Try It Yourself

NEXO Brain runs as an MCP server. Install it, point Claude Code at it, and it starts building memory automatically.

npm install -g nexo-brain
nexo-brain init
Enter fullscreen mode Exit fullscreen mode

Then add it to your Claude Code MCP config. Full setup in the GitHub repo.

949 downloads on day one. The need is real.


NEXO Brain is MIT licensed. The benchmark is reproducible. The code is public.

If you're at Anthropic and reading this — the proposal is waiting. If you're a developer who's tired of re-explaining yourself to Claude every session — give it a try.

Top comments (0)