Claude Code performance: why your AI coding sessions slow down and how to fix it
You've been in a Claude Code session for 2 hours. At first it was fast — instant responses, tight suggestions. Now it's sluggish, verbose, losing track of what you asked 10 minutes ago.
This isn't Claude getting tired. It's context saturation, and it's fixable.
What's actually happening
Claude Code maintains a running context window. Everything you've typed, every file it's read, every command it's run — that's all in memory. When the context fills up:
- Responses get longer and more cautious (it's trying to re-establish what it knows)
- It starts "forgetting" things you discussed earlier in the session
- You get more hedging: "I think we were working on..." instead of direct action
- Latency increases as it processes more tokens per response
The context window for Claude 3.5 Sonnet is 200,000 tokens. Sounds like a lot. A 2-hour coding session will burn through most of it.
The 5 performance fixes
1. Use /compact before you hit the wall
Don't wait until Claude forgets your architecture. Run /compact proactively every 30-40 minutes.
/compact
This tells Claude to summarize the session into a dense memory snapshot and continue from there. You lose some detail but keep the strategic context.
When to compact: When you're about to switch to a new feature area. The previous context isn't needed, and the fresh start will be faster.
2. Pin your architecture in CLAUDE.md
Context that gets loaded at session start never needs to be re-explained during the session.
Create .claude/CLAUDE.md in your repo root:
## Architecture
- Node.js API at /api, Express
- PostgreSQL via pg library, no ORM
- Auth: JWT in httpOnly cookies
- Frontend: vanilla JS, no framework
## Current sprint
- Feature: user notifications
- Pattern: event emitter → notification service → websocket push
## Do not change
- Never add dependencies without asking
- Database schema changes require migration file
Now Claude starts every session knowing your codebase. No re-explanation means no wasted context.
3. Scope your file reads
When you ask Claude to "look at the codebase", it will read everything it can find. A 50-file codebase loaded upfront burns half your context before you've typed a question.
Be specific:
# Instead of: "look at the auth system"
# Say: "look at src/auth/middleware.js and src/auth/tokens.js"
Or use .claudeignore to permanently exclude files:
# .claudeignore
node_modules/
dist/
*.test.js
coverage/
*.log
Test files and build artifacts are context poison — they're large, they change constantly, and Claude doesn't need them to understand your code.
4. Use rate-limit-free sessions for long runs
This is the one most developers don't know about.
Claude Code on the standard API has rate limits. Hit the limit mid-session and your flow gets interrupted — and you lose the remaining context when you restart.
The fix is ANTHROPIC_BASE_URL. Point it at a proxy that removes rate limits:
export ANTHROPIC_BASE_URL=https://simplylouie.com/api
SimplyLouie runs at $2/month — it's a rate-limit-free proxy for Claude. Start a session, run it for 4 hours, finish the feature. No interruptions.
This is especially useful for:
- Long refactoring sessions across many files
- Test generation runs (these burn tokens fast)
- Architecture analysis of large codebases
5. Break big tasks into scoped subtasks
Instead of: "Rewrite the entire auth system to use refresh tokens"
Use sequential scoped sessions:
- "Analyze current auth system and list what needs to change. Stop after the analysis."
- "Implement step 1 from your analysis: update the token generation function."
- "Implement step 2: update the middleware to validate refresh tokens."
Each session is small, fast, and recoverable. If session 2 goes wrong, you haven't corrupted sessions 3 and 4.
Measuring your context usage
Claude Code shows token usage in the status bar. Watch for:
-
< 50%: You're fine, work freely -
50-75%: Consider/compactbefore the next major task switch -
> 75%: Compact now or start fresh for the next feature -
> 90%: You'll see degraded performance — compact immediately
The pattern that works
Session start:
1. CLAUDE.md loads automatically (architecture pinned)
2. .claudeignore active (no junk files)
During session:
3. Scope file reads explicitly
4. /compact at 50% context or every 30min
5. ANTHROPIC_BASE_URL set (no rate limit interruptions)
Session end:
6. Update CLAUDE.md with decisions made this session
The developers who get the most out of Claude Code aren't the ones with the best prompts — they're the ones who manage context deliberately.
Running long Claude Code sessions without rate limit interruptions? SimplyLouie is a $2/month proxy at export ANTHROPIC_BASE_URL=https://simplylouie.com/api — 7-day free trial, no card required to start.
Top comments (0)