DEV Community

jidonglab
jidonglab

Posted on

10 Habits That Cut My Claude Code Bill in Half

I read 80,000 lines of Claude Code's TypeScript source to understand why the tool behaves the way it does. One chapter I wrote after that analysis was titled "10 Cost-Cutting Habits." These aren't generic LLM tips. They come directly from what the source reveals about how tokens are counted, cached, and burned.

Here's what actually moves the bill.


First: Understand the Token Economy

Claude Code has four layers of token-saving mechanisms built in.

Stage 1 — snipCompact: Removes stale snippets. Lightweight.

Stage 2 — microcompact: Cached transforms and tombstone cleanup. Still lightweight.

Stage 3 — contextCollapse: Parallel summarization of read-only context segments. Medium cost.

Stage 4 — autocompact: Full LLM summarization call. Heavy. Triggers a circuit breaker after 3 failures.

After any compaction, postCompactCleanup re-injects the 5 most recently modified files. The system is smart — but it only helps if you're not actively breaking it.

There's also prompt caching. When the same prefix repeats, cache reads cost 10% of base price. Cache writes (first load) cost 125%. Every call after the first: 90% off. The math rewards stability.


The 10 Habits

1. Stabilize CLAUDE.md — touch it less than once a week

CLAUDE.md enters the system prompt as a stable prefix. That prefix is the cache body. Change one sentence and the cache breaks. Once the cache breaks, every subsequent call pays full price until the prefix stabilizes again.

Before: Global CLAUDE.md full of project-specific build commands, updated whenever something changes.

After: Global file holds coding style and commit rules only. Project-specific rules live in .claude/CLAUDE.md. The two caches are independent — editing one doesn't bust the other.

The global file should be under 20 lines. Modify it less than once a month.

2. Front-load context into the first message

Prompt caching works on prefixes. If you put goal, constraints, and verification criteria all in the first message, the whole block lands in cache. Adding things piecemeal — "oh and also don't touch auth.ts" two turns later — puts those additions outside the cached prefix. The discount doesn't apply.

One more thing: images bust the cache for everything after them. If you need to attach a screenshot, put it at the end of your prompt, never in the middle.

3. Don't put examples in CLAUDE.md

Long "do it like this" example blocks in CLAUDE.md inflate input tokens on every single turn. Rules belong in CLAUDE.md. Examples belong in your first prompt, once, for the session they're relevant to.

Before: 50-line example in global CLAUDE.md showing preferred component structure.

After: 5-line rule describing the structure. Example added to the first prompt when building a new component.

4. Narrow Read scope with offset and limit

The built-in Read tool accepts offset and limit parameters. Using them matters. Reading an entire 2,000-line file dumps all of it into context. Most of the time you need 30 lines around a specific function.

# Instead of reading the whole file:
Read src/components/Form.tsx

# Read only what you need:
Read src/components/Form.tsx offset=120 limit=50
Enter fullscreen mode Exit fullscreen mode

5. Use the Grep tool — not bash grep

The built-in Grep tool returns structured results. Running grep or rg through Bash returns raw text output that eats into context. Same search, different token cost.

The same principle applies to Glob vs find. Use the dedicated tools. They were built to be token-efficient in ways the raw bash equivalents aren't.

6. Run /compact manually — right after finishing a feature

Auto-compaction is unpredictable. You don't control when Stage 4 (autocompact) fires an LLM summarization call. Manual /compact, run immediately after a feature is complete, is more predictable and keeps context cleaner going into the next task.

Build it into your workflow as a natural checkpoint: finish feature → run /compact → start next task.

7. Never paste the entire error log

A 500-line stack trace makes the model miss the actual signal. The model processes the whole thing — all 500 lines land in context — but the relevant information was in the first 10.

Before: Paste the full Rails error log (200 lines).

After: Paste the exception line and the 5–10 lines immediately following. Add: "Full log at /tmp/error.log if needed."

The model can request the full log if it genuinely needs it. Most of the time, it doesn't.

8. Use TaskCreate instead of managing todos in prompts

Prompts that contain to-do lists ("done: A, B; still todo: C, D") put task state into context and keep it there. Every subsequent turn carries that state as tokens.

TaskCreate and TaskUpdate manage task state in a separate store — completely outside the context window. The context stays clean. Tasks don't grow with every turn.

9. Kill failing sessions early

If the model goes off the rails twice in one session, type /clear. Continuing with a contaminated context burns money and rarely recovers cleanly. The temptation is to try one more prompt to steer it back. Resist it.

Start fresh with: "just finished X, the issue was Y, starting a clean session."

10. Check /cost a few times a day

You can't find leaks if you don't know where money is going. Running /cost twice a day takes five seconds and builds the habit of noticing when a session got expensive.

The first time you catch a 40-token session that cost $0.80 because the cache was cold, you'll start caring about prefix stability.


The Pattern

Look at all ten habits and one theme emerges: the system has four layers of token-saving built in, but users break them constantly by changing CLAUDE.md frequently, pasting massive inputs, and running long explorations in the main thread.

The habits are mostly about not fighting the system.


This post is part of Claude Code: 80K Lines Dissected — a 14-chapter ebook reverse-engineered from Claude Code's TypeScript source.

📖 Read the first 4 chapters free
📥 Get the full ebook ($9+)

Top comments (0)