DEV Community

Cover image for Brain-Inspired Memory for LLMs
Wayne Ma
Wayne Ma

Posted on

Brain-Inspired Memory for LLMs

Your brain does three things with memory that LLMs don't: it forgets what's irrelevant, it connects related ideas when you recall one, and it consolidates fragmented experiences into coherent knowledge while you sleep.

I borrowed all three for nan-forget, a long-term memory system I built with Claude Code for AI coding tools.

The Problem

I use Claude Code daily. After a few weeks I noticed I was re-explaining the same things: the tech stack, why we picked JWT over sessions, which deployment target we chose. Claude would suggest approaches I'd already rejected. Every session reset the relationship.

Memory tools exist. Most store everything permanently and retrieve by raw vector similarity. They treat memory as a database problem. Brains don't work that way.

Three Ideas from Neuroscience

1. The Forgetting Curve

Hermann Ebbinghaus showed in 1885 that memory retention drops exponentially over time without reinforcement. Your brain doesn't store everything forever. It lets unused memories fade.

nan-forget applies a 30-day half-life to every memory:

decay_weight = 0.5 ^ (days_since_accessed / 30)
Enter fullscreen mode Exit fullscreen mode

A memory you accessed yesterday scores near 1.0. One you haven't touched in 60 days scores 0.25. After 100 days with no access, garbage collection archives it.

Memories you access often get a frequency boost:

frequency_boost = log2(access_count + 1) / 10 + 1
Enter fullscreen mode Exit fullscreen mode

The final score combines vector similarity with both signals:

score = cosine_similarity × decay_weight × frequency_boost
Enter fullscreen mode Exit fullscreen mode

This means search results shift over time. Fresh, frequently-used context ranks higher. Stale decisions that were never referenced again sink.

2. Spreading Activation

When you think about "coffee," related concepts activate: morning, caffeine, your favorite mug. Psychologist John Anderson formalized this in 1983 as spreading activation — retrieving one node in a semantic network activates connected nodes.

nan-forget's retrieval has three stages:

flowchart LR
    Q["Query: 'auth system'"] --> S1["Stage 1: Recognition\n50 candidates, top 5 summaries"]
    S1 --> S2["Stage 2: Recall\nFull content + cross-project expansion"]
    S2 --> S3["Stage 3: Spreading Activation\nCentroid of results, find neighbors"]
Enter fullscreen mode Exit fullscreen mode

Stage 1: Recognition. Fast vector search. Returns summaries only, scored with decay and frequency. Like the tip-of-your-tongue feeling — you know something is there before you recall the details.

Stage 2: Recall. Fetches full content for Stage 1 hits. Expands search cross-project so an auth decision from Project A surfaces when you work on Project B.

Stage 3: Spreading activation. Computes the vector centroid of all found memories, then runs a second search around that centroid. Surfaces related context you didn't search for. A search for "JWT tokens" might pull in a memory about your API rate limiting setup because the vectors are neighbors in embedding space.

Most memory tools stop at Stage 1. Flat vector search, return top-K, done.

3. Sleep Consolidation

While you sleep, your brain replays and compresses the day's experiences. Fragmented short-term memories consolidate into structured long-term knowledge. Redundant details get pruned.

nan-forget runs a consolidation engine after every 10 saves or 24 hours:

flowchart TB
    A["10+ aging memories\nabout the same topic"] --> B["Cluster by project + type\n+ cosine similarity > 0.8"]
    B --> C["Summarize cluster\ninto 1-2 sentences"]
    C --> D["Save consolidated entry\nwith fresh vector embedding"]
    D --> E["Archive originals"]
Enter fullscreen mode Exit fullscreen mode

Five separate memories about your auth setup ("chose JWT," "added refresh tokens," "Clerk for OAuth," "session length is 24h," "added rate limiting") consolidate into one entry that captures the full picture. Originals get archived, not deleted.

Garbage collection runs alongside: dedup catches near-identical memories (cosine > 0.95), expiration archives memories past their expiry date.

All of this is deterministic. Zero LLM calls. No API cost.

The Implementation

Storage

One SQLite file at ~/.nan-forget/memories.db. Vector search via sqlite-vec, a SQLite extension for cosine KNN written in pure C.

nan-forget originally used Qdrant in Docker. That broke when updates recreated the container with a different storage mount — docker-compose.yml pointed to a named volume, the setup script pointed to a bind mount. Same container name, different data directory. I replaced it with sqlite-vec in one session with Claude Code.

flowchart TB
    subgraph Short["Short-Term"]
        MD[".md files — current session only"]
    end
    subgraph Long["Long-Term"]
        DB["SQLite + sqlite-vec\n~/.nan-forget/memories.db"]
    end
    subgraph Auto["Automatic"]
        H["4 Hooks\nPostToolUse · UserPromptSubmit\nSessionEnd · CLAUDE.md directives"]
        C["Consolidation\nClusters aging memories"]
        G["Garbage Collection\nDecay · dedup · expiry"]
    end
    MD -->|"hooks intercept"| H
    H --> DB
    DB --> C
    C --> DB
    DB --> G
Enter fullscreen mode Exit fullscreen mode

Structured Memories

Flat text misses context. nan-forget memories carry structured fields:

{
  "content": "Fixed JWT refresh bug — tokens expired silently",
  "type": "decision",
  "problem": "Tokens expired after 1 hour, refresh wasn't triggered",
  "solution": "Added interceptor that checks expiry 5 min before deadline",
  "concepts": ["auth", "jwt", "token-refresh"],
  "files": ["src/auth.ts", "src/middleware.ts"]
}
Enter fullscreen mode Exit fullscreen mode

All fields get embedded together into one vector. A search for "authentication bug" finds this memory even though those exact words appear nowhere in the content.

Auto-Capture

You never call save. Four hooks capture context at every stage:

Hook When What
UserPromptSubmit Every message you send Searches memory, injects relevant context
CLAUDE.md directives During conversation Instructs Claude to save decisions as they happen
PostToolUse Claude writes a .md file Intercepts the write, persists to SQLite
SessionEnd Session closes Scans transcript for unsaved decisions, saves top 5

Cross-LLM Support

Claude Code uses the MCP server. Other tools hit the REST API. CLI for terminal use. Same database, same memories.

# MCP for Claude Code
npx nan-forget serve

# REST API for Codex, Cursor
npx nan-forget api
curl http://localhost:3456/memories/search?q=auth

# CLI
nan-forget search "what auth system"
Enter fullscreen mode Exit fullscreen mode

Try It

npx nan-forget setup
Enter fullscreen mode Exit fullscreen mode

Ollama + SQLite. No Docker, no cloud, no API keys. Free, open source, MIT licensed.

GitHub: NaNMesh/nan-forget

Built with Claude Code — Claude designed the retrieval pipeline, wrote the SQLite migration layer, and generated the test suite.

Top comments (0)