Jason Sosa

Posted on Feb 27

Why Flat Files Break as AI Agent Memory (And What We Built Instead)

#python #ai #mcp #devtools

Your AI coding agent has amnesia.

Every Claude Code session, every Cursor chat, every Windsurf interaction starts from zero. The architectural decision you explained on Monday? Gone. The debugging lesson from Friday? Never happened. The style preferences you've stated twelve times? Say them again.

The common fix is a flat file. Claude Code has CLAUDE.md. Cursor has .cursorrules. They work — for a while.

Then they don't.

Where flat files break

I kept hitting the same five failures as my CLAUDE.md grew past 200 lines:

1. Search is impossible. You're grepping for context that may or may not be there. The phrase you used three weeks ago doesn't match how you'd describe it today.

2. Nothing is auto-captured. Every lesson has to be manually written. You debug a Docker volume mount issue for 30 minutes, and unless you type "remember this," it's gone.

3. It grows forever. No deduplication. No decay. No contradiction detection. "We use REST" from January sits next to "We migrated to WebSockets" from February. The agent picks whichever it attends to first.

4. It's one file per project. That debugging pattern you learned in project A? Invisible in project B. No cross-project learning.

5. No checkpoint/resume. Stop mid-refactor, and there's no structured way to pick up where you left off.

CLAUDE.md is fine for "always use tabs." It breaks when your agent needs to actually learn.

What a real memory pipeline looks like

I built OMEGA to solve this. It runs as an MCP server with 12 tools, and it handles the full memory lifecycle:

Store pipeline (12 sub-phases)

When a memory comes in, it doesn't just get appended:

input → validate → dedup check → evolution check → conflict detection
  → store → entity extraction → auto-relate → contradiction supersession
  → fact splitting → reminder check → feedback

Dedup runs three layers: SHA256 hash (exact match), content hash (near-exact), and embedding cosine similarity with per-type Jaccard thresholds (decisions at 0.80, lessons at 0.85). This catches the agent restating the same decision six times in six paraphrases.

Evolution detects when new content overlaps 55-95% with an existing memory. Instead of creating a duplicate, it appends the new insights to the existing entry.

Conflict detection catches contradictions automatically. "We use JWT" followed by "We switched to session cookies" — the old decision gets superseded, not silently ignored.

Retrieval pipeline

Search isn't keyword matching. It's a five-stage pipeline:

# Simplified from omega/sqlite_store.py
def query(text, limit=10):
    # 1. Vector similarity (bge-small-en-v1.5, 384-dim via sqlite-vec)
    vec_results = cosine_search(embed(text), limit * 3)

    # 2. Full-text search via FTS5
    fts_results = fts5_search(text, limit * 3)

    # 3. Reciprocal rank fusion
    merged = rrf_merge(vec_results, fts_results)

    # 4. Type-weighted scoring (decisions/lessons weighted 2x)
    scored = apply_type_weights(merged)

    # 5. Time-decay (old unaccessed memories rank lower)
    return apply_decay(scored)[:limit]

10 relevant memories out of 500, under 50ms. All local — SQLite + ONNX embeddings, no API keys, no cloud.

Forgetting

This is the feature nobody talks about. Memories that aren't accessed lose ranking weight over time. The floor is 0.35 — nothing disappears completely — but stale context stops dominating retrieval.

Preferences and error patterns are exempt from decay. Your "always use early returns" preference never fades. But that one-off debugging note about a dependency version from six months ago? It quietly drops out of relevance.

Auto-capture: the part that actually matters

The explicit omega_store tool is useful, but the real value is what happens without being asked.

OMEGA hooks into your editor's session lifecycle:

SessionStart: Surfaces relevant memories from past sessions
UserPromptSubmit: Detects decisions and lessons in the conversation and stores them automatically
PostToolUse: Surfaces memories relevant to files you're editing
Stop: Generates a session summary

When you spend 30 minutes debugging and Claude says "The node_modules volume mount was shadowing the container's node_modules. Fixed by adding an anonymous volume" — OMEGA auto-captures that as a lesson. Next time anyone hits the same Docker issue, it's already there.

Checkpoint/resume

This is what convinced me flat files would never work:

You: "Checkpoint this — I'm halfway through migrating the auth middleware."

# OMEGA saves: task description, files modified, current state,
# remaining steps, and links to all related memories

# ...next day, new session...

You: "Resume the auth middleware task."

# OMEGA restores full context. Claude picks up exactly where you left off.

Task state survives session boundaries. No copy-pasting "here's where I was."

The benchmark

OMEGA scores 95.4% on LongMemEval (ICLR 2025) — a 500-question benchmark testing extraction, multi-session reasoning, temporal understanding, knowledge updates, and preference tracking.

System	Score
OMEGA	95.4%
Mastra	94.87%
Emergence	86.0%
Zep/Graphiti	71.2%

The full methodology is at omegamax.co/benchmarks.

Try it

pip3 install omega-memory[server]
omega setup
omega doctor

That's it. omega setup downloads the embedding model, registers the MCP server, and installs hooks. Start a Claude Code session and say "Remember that we always use early returns." Close the session. Open a new one. Ask "What are my code style preferences?"

It's there.

Works with Claude Code, Cursor, Windsurf, Zed, and any MCP client. Apache-2.0 licensed. No API keys. Everything runs on your machine.

GitHub: omega-memory/omega-memory
Website: omegamax.co
PyPI: omega-memory

I built OMEGA because I was tired of re-explaining the same architectural decisions to an agent that forgot everything between sessions. If you're hitting the same problem, give it a try. And if you find bugs, the issue tracker is open.

DEV Community