DEV Community

brian austin
brian austin

Posted on

Claude Code token limits: exactly when you hit them and how to work around them

Claude Code token limits: exactly when you hit them and how to work around them

If you've been using Claude Code for more than a few sessions, you've hit this: you're deep in a complex task, and suddenly the session slows, starts refusing to add more context, or just hard-stops with a rate limit message.

Here's exactly what's happening and how to work around it.

The two limits you're actually hitting

1. Context window limit (~200k tokens)

Claude Code maintains a running context of your entire conversation. Every file you read, every output you generate, every back-and-forth message — it all accumulates. At around 200k tokens, the model starts degrading: it forgets earlier context, makes contradictory suggestions, and eventually the session becomes unusable.

Signs you're hitting this:

  • Claude Code starts contradicting earlier decisions
  • It forgets file structures it already analyzed
  • /compact starts appearing automatically
  • Responses get shorter and less useful

2. Rate limits (requests/minute and tokens/minute)

This is separate from context. Anthropic throttles how many tokens you can process per minute. Heavy Claude Code sessions — especially with large files, long outputs, or rapid fire requests — hit this limit and force you to wait.

Signs you're hitting this:

  • Error: rate_limit_error
  • "Please try again in X seconds"
  • Session pauses mid-task and resumes slowly

When exactly each limit kicks in

Context window: ~200k tokens total in session
Approx usage:
- Each file read: ~2k-10k tokens (depends on size)
- Each response: ~500-4k tokens
- Chat history: accumulates every turn

Rate limits (Claude Pro):
- ~45 messages per 5-hour window
- Token-per-minute caps during heavy use
- Resets on rolling window basis
Enter fullscreen mode Exit fullscreen mode

Strategy 1: Use /compact aggressively before you need to

Don't wait for Claude Code to auto-compact. Run /compact at natural breakpoints:

# Good times to /compact:
- After finishing a feature branch
- After a successful test suite run
- Before starting a new subtask
- When context hits ~70% full
Enter fullscreen mode Exit fullscreen mode

/compact summarizes the conversation history into a compressed form. You lose granular history but keep the key facts. The tradeoff is almost always worth it.

Strategy 2: Break sessions at feature boundaries

Instead of one massive session that builds a whole feature, plan your sessions around natural boundaries:

# Session 1: Database schema + migrations
# Session 2: API endpoints + validation
# Session 3: Frontend components + tests
# Session 4: Integration + deployment
Enter fullscreen mode Exit fullscreen mode

Start each new session with a fresh context. Use a CLAUDE.md file in your repo to give Claude Code the context it needs without wasting tokens on history:

# Project context
This is a Node.js API. PostgreSQL backend. Express framework.

## Current task
We just finished the database schema (see migrations/). 
Now building REST endpoints in src/routes/.

## Decisions already made
- UUIDs for all primary keys
- snake_case column names
- Timestamps on all tables
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Use subagents to parallelize WITHOUT sharing rate limits

If you're hitting rate limits on complex tasks, the solution isn't to slow down — it's to use multiple Claude Code instances in parallel:

# Terminal 1 — working on auth module
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "refactor src/auth/ to use JWT tokens"

# Terminal 2 — working on tests simultaneously  
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "write integration tests for the payment flow"

# Terminal 3 — working on docs
export ANTHROPIC_BASE_URL=https://simplylouie.com
claude --model claude-opus-4-5 "update API documentation in docs/"
Enter fullscreen mode Exit fullscreen mode

Each instance runs independently. If one hits a rate limit, the others keep going.

Note on the proxy: simplylouie.com is an Anthropic-approved API relay that runs at ✌️$2/month. It routes to Claude with the same models and no per-request billing. Good for keeping costs predictable on heavy parallel workloads. Details here.

Strategy 4: Prime the context efficiently at session start

How you start a session determines how long it lasts. Bad start:

"Read all the files in src/ and understand the codebase"
# → Dumps 50k tokens of file content before you've typed a task
Enter fullscreen mode Exit fullscreen mode

Good start:

"I need to add rate limiting to the API endpoints in src/routes/.
The relevant files are routes/api.js and middleware/auth.js.
Read those two files, then tell me what approach makes sense."
# → Targeted context, maybe 5k tokens, leaves room for the actual work
Enter fullscreen mode Exit fullscreen mode

Be surgical. Only load the context you actually need.

Strategy 5: checkpoint pattern for long-running tasks

For tasks that span multiple sessions, build explicit checkpoints:

[Session 1]
You: "We're at checkpoint 1: schema done, migrations written, tests passing."
Claude Code: *acknowledges state*

[Session 2 — new context]
You: "Resuming from checkpoint 1. Schema and migrations are done. 
Now we're building endpoints. Start with /api/users."
Enter fullscreen mode Exit fullscreen mode

This gives you reliable session hand-offs without burning tokens on full history replay.

The math on rate limits

With default Claude Pro limits:

  • 45 messages per 5-hour window
  • Each complex coding message ≈ 3-8k tokens in + out
  • Heavy session: 20-30 real coding exchanges
  • You hit the window in 2-3 hours of focused work

With a proxy relay (like SimplyLouie), you're routing through the API tier instead of the UI tier. Same models, different rate limit structure — often more predictable for automated or parallel workflows.

Summary: the workflow that avoids hitting limits

1. Start session with surgical context (only relevant files)
2. Run /compact at natural breakpoints — don't wait
3. Break large tasks into separate sessions at feature boundaries
4. Use CLAUDE.md to hand off context between sessions
5. For parallelism: multiple instances with isolated git branches
6. For rate limit recovery: wait 5-10 min (rolling window resets)
Enter fullscreen mode Exit fullscreen mode

The developers who get the most out of Claude Code aren't the ones who fight the limits — they're the ones who architect their workflow around them.


Running heavy Claude Code sessions and hitting rate limits? SimplyLouie routes to the same Claude models via API at ✌️$2/month — predictable pricing for parallelized workloads.

Top comments (0)