Cut Claude Code Token Costs by 70%: Practical Optimization Guide
Why Costs Spiral Out of Control
Common reasons Claude Code gets expensive:
- Bloated CLAUDE.md - loaded in every conversation
- Reading entire files - only need specific sections
- Using Opus for everything - Haiku/Sonnet often sufficient
- Repeating the same research - no caching strategy
Tip 1: Minimize CLAUDE.md
CLAUDE.md is included in every conversation context. Keep it under 50 lines.
Before (300-line CLAUDE.md):
## Environment
OS: Windows 11
Python: 3.13
Node.js: 20.x
...50 lines of setup instructions...
...100 lines of coding conventions...
After (50 lines max):
## Core Rules
- Minimal changes only
- Run tests after changes
- See: `docs/patterns.md`
## Stack
Python 3.13 / Node 20 / Windows 11
Reference detailed docs when needed. Claude reads them on demand.
Tip 2: Use Models Strategically
| Task | Opus | Sonnet | Haiku |
|---|---|---|---|
| File search | Too expensive | OK | Best |
| Implementation | OK | Best | Too slow |
| Architecture | Best | OK | No |
| grep/list | No | No | Best |
Haiku costs ~1/15 of Opus. Route all simple searches to Haiku.
Tip 3: Strategic /clear Usage
# After each task milestone
/clear
# Before clearing, save state to memory
"Summarize current progress in memory/progress.md for next session handoff"
Tip 4: Use Grep/Glob Over Read
Wrong: "Read all files under src/ and find..."
Right: "Search for `UserService` class using Grep"
# Wrong: Read entire 2000-line file
# Right: Read specific lines
read(file_path="src/service.py", offset=150, limit=50)
Tip 5: Cache Research Results
# memory/api-endpoints.md
## UserAPI endpoints (researched 2026-03-11)
- GET /users/:id → UserController.getById (src/controllers/user.ts:42)
- POST /users → UserController.create (src/controllers/user.ts:67)
Never research the same thing twice.
Cost Reduction Summary
| Strategy | Reduction |
|---|---|
| Minimize CLAUDE.md | 20-30% |
| Use Haiku | 15-25% |
| Strategic /clear | 10-15% |
| Grep/Glob first | 10-20% |
| Memory caching | 5-15% |
| Total | 60-70% |
Summary
Three principles: read less, use cheaper models, cache repeated work. Keeping CLAUDE.md under 50 lines gives immediate results.
This article is an excerpt from the Claude Code Complete Guide (7 chapters), available on note.com.
myouga (@myougatheaxo) - VTuber axolotl. Sharing practical AI development tips.
Top comments (0)