Cut Claude Code Token Costs by 70%: Practical Optimization Guide
Why Costs Spiral Out of Control
Common reasons Claude Code gets expensive:
- Bloated CLAUDE.md - loaded in every conversation
- Reading entire files - only need specific sections
- Using Opus for everything - Haiku/Sonnet often sufficient
- Repeating the same research - no caching strategy
Tip 1: Minimize CLAUDE.md
CLAUDE.md is included in every conversation context. Keep it under 50 lines.
Before (300-line CLAUDE.md):
## Environment
OS: Windows 11
Python: 3.13
Node.js: 20.x
...50 lines of setup instructions...
...100 lines of coding conventions...
After (50 lines max):
## Core Rules
- Minimal changes only
- Run tests after changes
- See: `docs/patterns.md`
## Stack
Python 3.13 / Node 20 / Windows 11
Reference detailed docs when needed. Claude reads them on demand.
Tip 2: Use Models Strategically
| Task | Opus | Sonnet | Haiku |
|---|---|---|---|
| File search | Too expensive | OK | Best |
| Implementation | OK | Best | Too slow |
| Architecture | Best | OK | No |
| grep/list | No | No | Best |
Haiku costs ~1/15 of Opus. Route all simple searches to Haiku.
Tip 3: Strategic /clear Usage
# After each task milestone
/clear
# Before clearing, save state to memory
"Summarize current progress in memory/progress.md for next session handoff"
Tip 4: Use Grep/Glob Over Read
Wrong: "Read all files under src/ and find..."
Right: "Search for `UserService` class using Grep"
# Wrong: Read entire 2000-line file
# Right: Read specific lines
read(file_path="src/service.py", offset=150, limit=50)
Tip 5: Cache Research Results
# memory/api-endpoints.md
## UserAPI endpoints (researched 2026-03-11)
- GET /users/:id → UserController.getById (src/controllers/user.ts:42)
- POST /users → UserController.create (src/controllers/user.ts:67)
Never research the same thing twice.
Cost Reduction Summary
| Strategy | Reduction |
|---|---|
| Minimize CLAUDE.md | 20-30% |
| Use Haiku | 15-25% |
| Strategic /clear | 10-15% |
| Grep/Glob first | 10-20% |
| Memory caching | 5-15% |
| Total | 60-70% |
Summary
Three principles: read less, use cheaper models, cache repeated work. Keeping CLAUDE.md under 50 lines gives immediate results.
This article is an excerpt from the Claude Code Complete Guide (7 chapters), available on note.com.
myouga (@myougatheaxo) - VTuber axolotl. Sharing practical AI development tips.
Top comments (3)
Practical and specific to Claude Code - good, because the generic "use a cheaper model" advice doesn't apply cleanly to an agentic coding tool where the spend is dominated by context re-sends across a long session. For Claude Code specifically the highest-leverage moves are usually: keep CLAUDE.md / project context lean (it rides along every turn), don't let it re-read files already in context, compact aggressively, and scope each task tightly instead of one mega-session that drags an ever-growing transcript.
The Claude-specific lever worth emphasizing: prompt caching. Claude's cache makes the stable parts of your context (system prompt, project files) dramatically cheaper on repeat, so structuring your session to maximize cache hits is often a bigger win than any single trim. That cache-aware structuring is part of how I keep Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) at ~$3 flat - the stable context is cached, only the deltas are full-price. Really useful guide. Did your 70% lean more on context trimming or on cache utilization? For Claude specifically I find caching is the sleeper win people underuse.
The model routing strategy is spot on. One thing that makes all of these tips way more actionable is having a live token counter running while you work. Knowing your CLAUDE.md is bloated is one thing, but actually watching the token count jump on every prompt because of it makes you trim it immediately. Same with the grep vs read tip -- when you can see 50k tokens burned on a full file read vs 2k on a targeted grep in real time, the habit change is instant. I built a macOS menu bar tool that does exactly this. DM me if you want the link.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.