Claude Code memory: how to manage context windows across long sessions
If you've used Claude Code for more than 30 minutes on a complex project, you've hit the wall: the context window fills up, Claude starts forgetting earlier decisions, and you get contradictory behavior.
Here's exactly how I manage this — and the CLAUDE.md tricks that make long sessions actually work.
The problem: Claude Code's memory is a sliding window
Claude Code doesn't have persistent memory. It has a context window — a fixed amount of text it can "see" at once. On long sessions:
- Early decisions get pushed out of context
- Claude forgets constraints you established 2 hours ago
- You get contradictory code that breaks earlier architecture
- Worst case: Claude starts refactoring things you explicitly told it not to touch
This isn't a bug. It's how transformers work. But it's completely manageable once you understand it.
The 3-layer memory system I use
Layer 1: CLAUDE.md as permanent memory
Anything that must survive the entire session goes in CLAUDE.md:
# CLAUDE.md
## Architecture decisions (DO NOT change without asking)
- Using PostgreSQL, NOT SQLite — we have concurrent writes
- Auth is JWT, NOT sessions — we're stateless for horizontal scaling
- All API routes under /api/v1/ — versioning is non-negotiable
- NO ORM — raw SQL only, performance is critical
## Active constraints
- Node 18 compatibility required (production is Node 18)
- No new npm dependencies without checking with me first
- TypeScript strict mode is ON
## Current sprint goal
- Building the payment webhook handler
- Do NOT refactor existing code during this sprint
CLAUDE.md is injected at the START of every context. This means architecture decisions survive session resets.
Layer 2: Session summary file
Every hour (or at major milestones), I ask Claude to write a summary:
> Write a summary of what we've built so far and what decisions we made.
> Save it to SESSION_NOTES.md
Claude writes something like:
# Session notes — 2024-01-15
## What we built
- Stripe webhook handler at /api/v1/webhooks/stripe
- Idempotency key storage in Redis (TTL 24h)
- Retry logic with exponential backoff (3 attempts)
## Decisions made
- Using Redis for idempotency (NOT database) — faster, cheaper
- Webhook secret stored in env var STRIPE_WEBHOOK_SECRET
- Failed webhooks logged to webhook_failures table
## Next steps
- Test with Stripe CLI
- Add monitoring alerts for failure rate > 5%
Then I add to CLAUDE.md:
@SESSION_NOTES.md
The @filename syntax imports the file into context. Now the session notes are permanent memory.
Layer 3: Explicit context refresh
When Claude starts acting confused (contradicting earlier decisions), I run:
> Re-read CLAUDE.md and SESSION_NOTES.md. Summarize the current state
> of the project before we continue.
This forces Claude to process the permanent memory files and realign.
The /compact command (built-in reset)
Claude Code has a /compact slash command that compresses the conversation history. Run it when:
- The context bar in the UI shows > 80% full
- Claude starts getting slow to respond (large context = more tokens to process)
- You're switching to a different part of the codebase
/compact
Claude summarizes everything important and resets the window. You lose raw conversation history but keep the compressed summary.
Pro tip: Before running /compact, make sure your CLAUDE.md is up to date. The compact summary + CLAUDE.md = your full context after reset.
The checkpoint pattern
For very long projects (multiple days), I use explicit checkpoints:
# CHECKPOINT.md — Last updated 2024-01-15 14:30
## Project state
- [x] Database schema (complete, DO NOT change)
- [x] Auth system (complete, DO NOT change)
- [x] User API (complete, DO NOT change)
- [ ] Payment system (IN PROGRESS)
- [ ] Email notifications (NOT STARTED)
## Files that are DONE (do not refactor)
- src/models/user.ts
- src/routes/auth.ts
- src/middleware/auth.ts
## Current focus
- src/routes/payments.ts
- src/services/stripe.ts
This prevents the #1 Claude Code failure mode: refactoring finished code while you're working on new features.
Context budget math
Claude's context window is roughly 200K tokens. In a typical Claude Code session:
| Item | Tokens |
|---|---|
| System prompt | ~2,000 |
| CLAUDE.md | ~500-2,000 |
| Conversation so far | grows to ~50,000 |
| Files being edited | ~5,000-20,000 |
| Budget left | ~130,000 |
You have more room than you think. But if you're asking Claude to read large files + have a long conversation + maintain complex state, you'll feel it.
Tip: Keep CLAUDE.md under 1,000 tokens. Use @imports for longer reference docs.
The API angle: flat-rate context
One thing worth knowing: Claude Code's rate limits are based on tokens, not messages. Long context sessions hit limits faster.
If you're using Claude via API (via ANTHROPIC_BASE_URL), you pay per token — so a 200K token session costs more than ten 20K token sessions that accomplish the same thing.
This is why context management isn't just about Claude's memory — it's also about cost. Compact aggressively. Use CLAUDE.md for permanent state instead of re-explaining context every time.
(I use SimplyLouie for flat-rate Claude API access — $2/month removes the per-token anxiety entirely.)
Quick reference
| Problem | Solution |
|---|---|
| Claude forgets architecture | Put it in CLAUDE.md |
| Claude refactors finished code | Add "DO NOT change" list to CLAUDE.md |
| Context getting full | Run /compact
|
| Multi-day project | Use CHECKPOINT.md with @import
|
| Claude acting confused | Ask it to re-read CLAUDE.md before continuing |
The CLAUDE.md → SESSION_NOTES.md → CHECKPOINT.md stack has made my Claude Code sessions dramatically more reliable. No more context drift, no more contradictory refactors.
What's your memory management approach? Curious what others have found works.
Top comments (0)