DEV Community

Scott Crawford
Scott Crawford

Posted on • Originally published at hifriendbot.com

Why Claude Code Forgets Everything (And How to Fix It)

Every Claude Code session starts from zero. No memory of yesterday's work. No awareness of the architectural decisions you explained last week. No recall of the debugging session that took three hours.

You re-explain your tech stack. You re-describe your file structure. You re-state your preferences. Every. Single. Session.

If this sounds familiar, you're not alone. It's the most common complaint in the Claude Code community — and it has real consequences for productivity.

The Problem Has a Name: Context Compaction

Claude Code operates within a 200,000-token context window. That sounds like a lot, but complex coding sessions fill it fast. When you hit roughly 83% utilization (~167K tokens), Claude Code triggers auto-compaction — a lossy, one-way compression of your conversation history.

Here's what that means in practice: your detailed explanations, resolved debugging sessions, and exploratory discussions get "summarized away." The DoltHub engineering blog put it bluntly:

"Claude Code is definitely dumber after the compaction. It doesn't know what files it was looking at and needs to re-read them."

One GitHub issue (#3841) captured the developer experience perfectly:

"The model completely lost memory of very basic things, such as how to run a python command in a uv environment. I have to tell it literally every time after the auto compact summary."

And this isn't a rare edge case. Search the claude-code issues for "memory," "compaction," or "context loss" and you'll find dozens of reports — many auto-closed by bots despite active community discussion.

CLAUDE.md: The Official Answer (With Hidden Limits)

Anthropic's recommended solution is CLAUDE.md — a markdown file loaded into Claude's system prompt at the start of every session. You can put project instructions, coding conventions, and architectural notes in it.

It works... up to a point. Here are the limitations most developers discover the hard way:

The 200-line ceiling

Claude Code's auto-generated MEMORY.md — the file Claude writes its own notes to — has a hard 200-line cap. Content beyond line 200 is silently dropped. No warning. No error. Your carefully curated context just vanishes. (Issue #25006)

Post-compaction amnesia

CLAUDE.md is supposed to reload after compaction. In theory, the new session re-ingests it. In practice, multiple bug reports document Claude ignoring it entirely:

"After compaction, Claude stops respecting the instructions defined in CLAUDE.md and begins to behave unpredictably."
Issue #4017 (20 upvotes)

One developer caught Claude red-handed (Issue #19471):

When confronted, Claude admitted: "I didn't read CLAUDE.md" and "I skipped it and ran the Glob command directly."

A Medium analysis explained the mechanism: after compression, "CLAUDE.md no longer counts as a rule, but as information, and information can be ignored."

No search, no structure, no intelligence

CLAUDE.md is a flat text file. There's no semantic search. No way to find the right piece of context when you have hundreds of lines of notes. No automatic extraction of important facts from your conversations. It's a sticky note on a PhD thesis.

The hidden token tax

Every message re-sends the full CLAUDE.md as cached context. One developer discovered that cache reads consumed 99.93% of their total token usage — 5.09 billion cache read tokens versus 3.9 million actual I/O tokens. A large CLAUDE.md bleeds your budget silently.

No Memory Between Sessions: The Real Pain

The compaction problem is bad enough within a single session. But the deeper issue is that Claude Code has zero native memory between sessions.

Every new terminal, every claude invocation — it's a stranger who happens to have access to your codebase. As one developer put it in Issue #14228:

"I'm paying for ONE Claude. I should get ONE Claude. When I talk to Claude on the web, it knows me. When I open Claude Code, it's like meeting a stranger who happens to have the same name."

The frustration is compounded by the price tag. From Issue #14227:

"Paying $200/mo for a product we can't reliably use, with no workaround permitted, is not acceptable."

And from Issue #3508:

"I'm downgrading my account. I'm not going to continue to pay $100/mo for something I have to constantly stop from doing incredibly dumb things."

The community coined a phrase that stuck: "You're paying for a goldfish with a PhD." Brilliant capabilities, zero recall.

What the Community Has Built

The gap between Claude Code's capabilities and its memory has spawned an entire ecosystem of workarounds. Here are the main approaches developers are using:

1. Manual CLAUDE.md Curation

The simplest approach: maintain your own markdown files. Some developers report maintaining 500+ line CLAUDE.md files that they manually update after every session. It works, but it's tedious, doesn't scale, and — as we covered — Claude may ignore it after compaction anyway.

Pros: Zero dependencies, built-in, works offline
Cons: Manual effort, no search, 200-line auto-memory cap, ignored after compaction

2. Local Vector Database Solutions (~29,700 GitHub stars)

The most popular third-party approach. Uses hooks to capture session context, compresses it with Ai, and stores it in a local database with vector search.

Pros: Large community, battle-tested, open source
Cons: Requires multiple local dependencies, significant resource usage reported, local-only (no cross-device sync)

3. Other MCP Memory Servers (~1,200 GitHub stars)

MCP servers providing persistent memory with knowledge graph features and autonomous consolidation.

Pros: Knowledge graph structure, semantic search
Cons: Requires multiple local dependencies (Python, ONNX, ChromaDB), stability varies, complex setup

4. Mem0 (~46,000 GitHub stars)

A VC-backed universal Ai memory layer with an MCP adapter. Targets the broader Ai agent ecosystem (LangGraph, CrewAI, etc.) rather than Claude Code specifically.

Pros: Well-funded, broad ecosystem support, enterprise features
Cons: Not Claude Code-specific, requires additional infrastructure, overkill for individual developers

5. Cloud-Based MCP Memory

A newer approach: move the memory system to the cloud entirely. The MCP server becomes a thin HTTP client with zero local dependencies. Extraction, embedding, and search happen server-side.

CogmemAi takes this approach — semantic search, Ai-powered memory extraction, automatic compaction recovery, and project-scoped memories that follow you across machines. One command setup:

npx cogmemai-mcp setup
Enter fullscreen mode Exit fullscreen mode

Pros: Zero local databases, zero RAM issues, cross-device sync, compaction recovery
Cons: Requires network connection, data stored in the cloud (not local-first)

6. Roll Your Own

Some developers build custom solutions with markdown files, SQLite databases, or even Neo4j knowledge graphs. The claude-code repo has multiple issues where developers describe elaborate multi-agent workaround systems they've built just to maintain basic project continuity.

How to Choose

There's no single right answer. The best solution depends on your priorities:

Priority Best Fit
Zero dependencies CLAUDE.md (built-in)
Largest community Local vector database solutions
No local setup Cloud-based (CogmemAi)
Enterprise / multi-agent Mem0
Full control Roll your own
Survives compaction Solutions with compaction recovery (CogmemAi)

What I'd Love to See From Anthropic

The community has made it clear: persistent memory is the #1 missing feature in Claude Code. The GitHub issues, the Reddit threads, the tens of thousands of stars on community memory tools — it all points in the same direction.

Here's what would make the biggest difference:

  1. Native cross-session memory — like claude.ai's memory system, but for Claude Code
  2. Compaction that asks before destroyingIssue #24201 (17 upvotes) requests exactly this
  3. Reliable CLAUDE.md reload after compaction — fix the documented bugs where instructions get ignored
  4. Remove the silent 200-line cap on MEMORY.md — or at minimum, warn when content is being truncated

Until then, the community solutions are the best we've got. Pick one that fits your workflow, set it up, and stop re-explaining your architecture every morning.


I built CogmemAi after getting tired of re-explaining my tech stack every session. It's one approach among several — try whichever fits your workflow. The important thing is to stop losing context.

Have a different solution that works for you? I'd love to hear about it in the comments.

Top comments (0)