Claude Code token limits: exactly when you hit them and how to work around them
If you've been using Claude Code for more than a few sessions, you've hit this: you're deep in a complex task, and suddenly the session slows, starts refusing to add more context, or just hard-stops with a rate limit message.
Here's exactly what's happening and how to work around it.
The two limits you're actually hitting
1. Context window limit (~200k tokens)
Claude Code maintains a running context of your entire conversation. Every file you read, every output you generate, every back-and-forth message — it all accumulates. At around 200k tokens, the model starts degrading: it forgets earlier context, makes contradictory suggestions, and eventually the session becomes unusable.
Signs you're hitting this:
- Claude Code starts contradicting earlier decisions
- It forgets file structures it already analyzed
-
/compactstarts appearing automatically - Responses get shorter and less useful
2. Rate limits (requests/minute and tokens/minute)
This is separate from context. Anthropic throttles how many tokens you can process per minute. Heavy Claude Code sessions — especially with large files, long outputs, or rapid fire requests — hit this limit and force you to wait.
Signs you're hitting this:
- Error:
rate_limit_error - "Please try again in X seconds"
- Session pauses mid-task and resumes slowly
When exactly each limit kicks in
Context window: ~200k tokens total in session
Approx usage:
- Each file read: ~2k-10k tokens (depends on size)
- Each response: ~500-4k tokens
- Chat history: accumulates every turn
Rate limits (Claude Pro):
- ~45 messages per 5-hour window
- Token-per-minute caps during heavy use
- Resets on rolling window basis
Strategy 1: Use /compact aggressively before you need to
Don't wait for Claude Code to auto-compact. Run /compact at natural breakpoints:
# Good times to /compact:
- After finishing a feature branch
- After a successful test suite run
- Before starting a new subtask
- When context hits ~70% full
/compact summarizes the conversation history into a compressed form. You lose granular history but keep the key facts. The tradeoff is almost always worth it.
Strategy 2: Break sessions at feature boundaries
Instead of one massive session that builds a whole feature, plan your sessions around natural boundaries:
# Session 1: Database schema + migrations
# Session 2: API endpoints + validation
# Session 3: Frontend components + tests
# Session 4: Integration + deployment
Start each new session with a fresh context. Use a CLAUDE.md file in your repo to give Claude Code the context it needs without wasting tokens on history:
# Project context
This is a Node.js API. PostgreSQL backend. Express framework.
## Current task
We just finished the database schema (see migrations/).
Now building REST endpoints in src/routes/.
## Decisions already made
- UUIDs for all primary keys
- snake_case column names
- Timestamps on all tables
Strategy 3: Use subagents to parallelize WITHOUT sharing rate limits
If you're hitting rate limits on complex tasks, the solution isn't to slow down — it's to use multiple Claude Code instances in parallel:
# Terminal 1 — working on auth module
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "refactor src/auth/ to use JWT tokens"
# Terminal 2 — working on tests simultaneously
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "write integration tests for the payment flow"
# Terminal 3 — working on docs
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "update API documentation in docs/"
Each instance runs independently. If one hits a rate limit, the others keep going.
Note on the proxy:
simplylouie.comis an Anthropic-approved API relay that runs at ✌️$2/month. It routes to Claude with the same models and no per-request billing. Good for keeping costs predictable on heavy parallel workloads. Details here.
Strategy 4: Prime the context efficiently at session start
How you start a session determines how long it lasts. Bad start:
"Read all the files in src/ and understand the codebase"
# → Dumps 50k tokens of file content before you've typed a task
Good start:
"I need to add rate limiting to the API endpoints in src/routes/.
The relevant files are routes/api.js and middleware/auth.js.
Read those two files, then tell me what approach makes sense."
# → Targeted context, maybe 5k tokens, leaves room for the actual work
Be surgical. Only load the context you actually need.
Strategy 5: checkpoint pattern for long-running tasks
For tasks that span multiple sessions, build explicit checkpoints:
[Session 1]
You: "We're at checkpoint 1: schema done, migrations written, tests passing."
Claude Code: *acknowledges state*
[Session 2 — new context]
You: "Resuming from checkpoint 1. Schema and migrations are done.
Now we're building endpoints. Start with /api/users."
This gives you reliable session hand-offs without burning tokens on full history replay.
The math on rate limits
With default Claude Pro limits:
- 45 messages per 5-hour window
- Each complex coding message ≈ 3-8k tokens in + out
- Heavy session: 20-30 real coding exchanges
- You hit the window in 2-3 hours of focused work
With a proxy relay (like SimplyLouie), you're routing through the API tier instead of the UI tier. Same models, different rate limit structure — often more predictable for automated or parallel workflows.
Summary: the workflow that avoids hitting limits
1. Start session with surgical context (only relevant files)
2. Run /compact at natural breakpoints — don't wait
3. Break large tasks into separate sessions at feature boundaries
4. Use CLAUDE.md to hand off context between sessions
5. For parallelism: multiple instances with isolated git branches
6. For rate limit recovery: wait 5-10 min (rolling window resets)
The developers who get the most out of Claude Code aren't the ones who fight the limits — they're the ones who architect their workflow around them.
Running heavy Claude Code sessions and hitting rate limits? SimplyLouie routes to the same Claude models via API at ✌️$2/month — predictable pricing for parallelized workloads.
Top comments (0)