Claude Code is incredible for development — but the API costs can add up fast. If you're spending $200-800/month on Claude Code, here are 7 practical ways to cut that bill without sacrificing quality.
1. Route Through a Cheaper Endpoint
The simplest optimization: swap your API base URL to a multi-model gateway that offers volume discounts.
# Add to ~/.zshrc or ~/.bashrc
export ANTHROPIC_BASE_URL="https://futurmix.ai/v1"
export ANTHROPIC_API_KEY="your-gateway-key"
This routes all Claude Code requests through a gateway that charges 10% less than direct Anthropic pricing. Same models, same quality, lower bill.
Savings: 10% on every request, zero code changes.
2. Use Haiku for Simple Tasks
Not every task needs Sonnet. Claude Haiku 4.5 costs $1/$5 per 1M tokens vs Sonnet's $3/$15 — that's 3x cheaper.
Tasks where Haiku performs equally well:
- File exploration and understanding
- Simple refactoring (rename, restructure)
- Test generation from existing patterns
- Documentation updates
- Quick one-line fixes
In Claude Code, you can switch models mid-session. Use Sonnet for complex architecture decisions, Haiku for everything else.
Savings: 60-70% on simple tasks.
3. Write Better CLAUDE.md Files
A well-structured CLAUDE.md file reduces token usage by giving Claude Code the context it needs upfront — instead of letting it explore your codebase to figure things out.
# CLAUDE.md
## Project Overview
Express.js API with PostgreSQL, deployed on AWS ECS.
Monorepo: /api (backend), /web (React frontend), /shared (types).
## Architecture
- API routes: /api/src/routes/*.ts
- DB models: /api/src/models/*.ts (Prisma)
- Auth: JWT + refresh tokens, middleware in /api/src/middleware/auth.ts
## Conventions
- Use zod for request validation
- All API responses use ApiResponse<T> wrapper
- Tests: co-located, *.test.ts, use vitest
- Error handling: throw AppError, caught by global handler
## Common Tasks
- Add new endpoint: create route file, add zod schema, register in router
- Add DB migration: npx prisma migrate dev --name <name>
This saves Claude from reading 50+ files to understand your project. Fewer tool calls = fewer tokens = lower cost.
Savings: 15-30% reduction in token usage per session.
4. Use /compact Aggressively
Claude Code's /compact command summarizes the conversation and reduces context size. Use it:
- After every major task completion
- When context exceeds 100K tokens
- Before starting a new task in the same session
The alternative is a bloated context window where you're paying for tokens Claude already used. Compact early, compact often.
Savings: 20-40% reduction in ongoing context costs.
5. Set a Token Budget with Max Turns
For batch tasks, set explicit limits:
# Limit to 10 turns for simple tasks
claude --max-turns 10 "Fix the TypeScript errors in src/utils.ts"
This prevents Claude from going down rabbit holes on tasks that should be quick. Without limits, a "fix this one file" task can balloon into a 50-turn exploration.
Savings: Prevents runaway costs on simple tasks.
6. Use DeepSeek for Bulk Operations
For tasks that need volume but not peak quality — like processing hundreds of files, generating boilerplate, or mass-renaming — use a cheaper model.
from openai import OpenAI
client = OpenAI(
base_url="https://futurmix.ai/v1",
api_key="your-key"
)
# DeepSeek V3: $0.27/$1.10 per 1M tokens (10x cheaper than Sonnet)
response = client.chat.completions.create(
model="deepseek-v3",
messages=[{"role": "user", "content": "Generate a unit test for: ..."}]
)
Use Claude for the hard stuff (architecture, complex refactoring), DeepSeek for the repetitive stuff.
Savings: 90% on bulk/repetitive tasks.
7. Enable Prompt Caching
If you're making repeated API calls with the same system prompt (common in CI/CD pipelines and automated workflows), Anthropic's prompt caching can reduce input costs by up to 90%.
Claude Code handles this automatically for conversation history, but if you're building custom tools on top of the Claude API, make sure your system prompts are structured for cache hits:
- Put static content first (system prompt, CLAUDE.md content)
- Put dynamic content last (user message, file contents)
Savings: Up to 90% on repeated system prompts.
The Math
Here's what a typical developer spending $500/month on Claude Code could save:
| Optimization | Monthly Savings |
|---|---|
| Gateway routing (10% off) | $50 |
| Haiku for simple tasks | $75-100 |
| Better CLAUDE.md | $30-50 |
| Regular /compact | $40-60 |
| DeepSeek for bulk tasks | $50-80 |
| Total | $245-340 |
That's a 49-68% reduction in monthly spend.
TL;DR
- Set
ANTHROPIC_BASE_URLto a cheaper gateway → instant 10% off - Use Haiku for simple tasks → 3x cheaper
- Write a good CLAUDE.md → fewer exploration tokens
- Use
/compactafter each task → smaller context - Set
--max-turnsfor simple tasks → prevent runaway costs - Use DeepSeek for bulk operations → 10x cheaper
- Structure prompts for cache hits → up to 90% off repeated prompts
The developers spending the least on Claude are the ones who use it most strategically — right model for the right task, with the right optimizations.
What's your Claude Code monthly bill? Share your cost-saving tips in the comments.
Top comments (0)