brian austin

Posted on Apr 5

Claude Code performance: why your AI coding sessions slow down and how to fix it

#claudecode #ai #productivity #programming

Claude Code performance: why your AI coding sessions slow down and how to fix it

You've been in a Claude Code session for 2 hours. At first it was fast — instant responses, tight suggestions. Now it's sluggish, verbose, losing track of what you asked 10 minutes ago.

This isn't Claude getting tired. It's context saturation, and it's fixable.

What's actually happening

Claude Code maintains a running context window. Everything you've typed, every file it's read, every command it's run — that's all in memory. When the context fills up:

Responses get longer and more cautious (it's trying to re-establish what it knows)
It starts "forgetting" things you discussed earlier in the session
You get more hedging: "I think we were working on..." instead of direct action
Latency increases as it processes more tokens per response

The context window for Claude 3.5 Sonnet is 200,000 tokens. Sounds like a lot. A 2-hour coding session will burn through most of it.

The 5 performance fixes

1. Use `/compact` before you hit the wall

Don't wait until Claude forgets your architecture. Run /compact proactively every 30-40 minutes.

/compact

This tells Claude to summarize the session into a dense memory snapshot and continue from there. You lose some detail but keep the strategic context.

When to compact: When you're about to switch to a new feature area. The previous context isn't needed, and the fresh start will be faster.

2. Pin your architecture in CLAUDE.md

Context that gets loaded at session start never needs to be re-explained during the session.

Create .claude/CLAUDE.md in your repo root:

## Architecture
- Node.js API at /api, Express
- PostgreSQL via pg library, no ORM
- Auth: JWT in httpOnly cookies
- Frontend: vanilla JS, no framework

## Current sprint
- Feature: user notifications
- Pattern: event emitter → notification service → websocket push

## Do not change
- Never add dependencies without asking
- Database schema changes require migration file

Now Claude starts every session knowing your codebase. No re-explanation means no wasted context.

3. Scope your file reads

When you ask Claude to "look at the codebase", it will read everything it can find. A 50-file codebase loaded upfront burns half your context before you've typed a question.

Be specific:

# Instead of: "look at the auth system"
# Say: "look at src/auth/middleware.js and src/auth/tokens.js"

Or use .claudeignore to permanently exclude files:

# .claudeignore
node_modules/
dist/
*.test.js
coverage/
*.log

Test files and build artifacts are context poison — they're large, they change constantly, and Claude doesn't need them to understand your code.

4. Use rate-limit-free sessions for long runs

This is the one most developers don't know about.

Claude Code on the standard API has rate limits. Hit the limit mid-session and your flow gets interrupted — and you lose the remaining context when you restart.

The fix is ANTHROPIC_BASE_URL. Point it at a proxy that removes rate limits:

export ANTHROPIC_BASE_URL=https://simplylouie.com/api

SimplyLouie runs at $2/month — it's a rate-limit-free proxy for Claude. Start a session, run it for 4 hours, finish the feature. No interruptions.

This is especially useful for:

Long refactoring sessions across many files
Test generation runs (these burn tokens fast)
Architecture analysis of large codebases

5. Break big tasks into scoped subtasks

Instead of: "Rewrite the entire auth system to use refresh tokens"

Use sequential scoped sessions:

"Analyze current auth system and list what needs to change. Stop after the analysis."
"Implement step 1 from your analysis: update the token generation function."
"Implement step 2: update the middleware to validate refresh tokens."

Each session is small, fast, and recoverable. If session 2 goes wrong, you haven't corrupted sessions 3 and 4.

Measuring your context usage

Claude Code shows token usage in the status bar. Watch for:

< 50%: You're fine, work freely
50-75%: Consider /compact before the next major task switch
> 75%: Compact now or start fresh for the next feature
> 90%: You'll see degraded performance — compact immediately

The pattern that works

Session start:
1. CLAUDE.md loads automatically (architecture pinned)
2. .claudeignore active (no junk files)

During session:
3. Scope file reads explicitly
4. /compact at 50% context or every 30min
5. ANTHROPIC_BASE_URL set (no rate limit interruptions)

Session end:
6. Update CLAUDE.md with decisions made this session

The developers who get the most out of Claude Code aren't the ones with the best prompts — they're the ones who manage context deliberately.

Running long Claude Code sessions without rate limit interruptions? SimplyLouie is a $2/month proxy at export ANTHROPIC_BASE_URL=https://simplylouie.com/api — 7-day free trial, no card required to start.

DEV Community

Claude Code performance: why your AI coding sessions slow down and how to fix it

Claude Code performance: why your AI coding sessions slow down and how to fix it

What's actually happening

The 5 performance fixes

1. Use `/compact` before you hit the wall

2. Pin your architecture in CLAUDE.md

3. Scope your file reads

4. Use rate-limit-free sessions for long runs

5. Break big tasks into scoped subtasks

Measuring your context usage

The pattern that works

Top comments (0)

Claude Code performance: why your AI coding sessions slow down and how to fix it

What's actually happening

The 5 performance fixes

1. Use /compact before you hit the wall

2. Pin your architecture in CLAUDE.md

3. Scope your file reads

4. Use rate-limit-free sessions for long runs

5. Break big tasks into scoped subtasks

Measuring your context usage

The pattern that works

1. Use `/compact` before you hit the wall