DEV Community

myougaTheAxo
myougaTheAxo

Posted on • Originally published at zenn.dev

Cut Claude Code Token Costs by 70%: Practical Optimization Guide

Cut Claude Code Token Costs by 70%: Practical Optimization Guide

Why Costs Spiral Out of Control

Common reasons Claude Code gets expensive:

  1. Bloated CLAUDE.md - loaded in every conversation
  2. Reading entire files - only need specific sections
  3. Using Opus for everything - Haiku/Sonnet often sufficient
  4. Repeating the same research - no caching strategy

Tip 1: Minimize CLAUDE.md

CLAUDE.md is included in every conversation context. Keep it under 50 lines.

Before (300-line CLAUDE.md):

## Environment
OS: Windows 11
Python: 3.13
Node.js: 20.x
...50 lines of setup instructions...
...100 lines of coding conventions...
Enter fullscreen mode Exit fullscreen mode

After (50 lines max):

## Core Rules
- Minimal changes only
- Run tests after changes
- See: `docs/patterns.md`

## Stack
Python 3.13 / Node 20 / Windows 11
Enter fullscreen mode Exit fullscreen mode

Reference detailed docs when needed. Claude reads them on demand.

Tip 2: Use Models Strategically

Task Opus Sonnet Haiku
File search Too expensive OK Best
Implementation OK Best Too slow
Architecture Best OK No
grep/list No No Best

Haiku costs ~1/15 of Opus. Route all simple searches to Haiku.

Tip 3: Strategic /clear Usage

# After each task milestone
/clear

# Before clearing, save state to memory
Enter fullscreen mode Exit fullscreen mode
"Summarize current progress in memory/progress.md for next session handoff"
Enter fullscreen mode Exit fullscreen mode

Tip 4: Use Grep/Glob Over Read

Wrong: "Read all files under src/ and find..."
Right: "Search for `UserService` class using Grep"
Enter fullscreen mode Exit fullscreen mode
# Wrong: Read entire 2000-line file
# Right: Read specific lines
read(file_path="src/service.py", offset=150, limit=50)
Enter fullscreen mode Exit fullscreen mode

Tip 5: Cache Research Results

# memory/api-endpoints.md
## UserAPI endpoints (researched 2026-03-11)
- GET /users/:id → UserController.getById (src/controllers/user.ts:42)
- POST /users → UserController.create (src/controllers/user.ts:67)
Enter fullscreen mode Exit fullscreen mode

Never research the same thing twice.

Cost Reduction Summary

Strategy Reduction
Minimize CLAUDE.md 20-30%
Use Haiku 15-25%
Strategic /clear 10-15%
Grep/Glob first 10-20%
Memory caching 5-15%
Total 60-70%

Summary

Three principles: read less, use cheaper models, cache repeated work. Keeping CLAUDE.md under 50 lines gives immediate results.


This article is an excerpt from the Claude Code Complete Guide (7 chapters), available on note.com.
myouga (@myougatheaxo) - VTuber axolotl. Sharing practical AI development tips.

Top comments (2)

Collapse
 
godnick profile image
Henry Godnick

The model routing strategy is spot on. One thing that makes all of these tips way more actionable is having a live token counter running while you work. Knowing your CLAUDE.md is bloated is one thing, but actually watching the token count jump on every prompt because of it makes you trim it immediately. Same with the grep vs read tip -- when you can see 50k tokens burned on a full file read vs 2k on a targeted grep in real time, the habit change is instant. I built a macOS menu bar tool that does exactly this. DM me if you want the link.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.