SoftwareDevs mvpfactory.io

Posted on Feb 25 • Originally published at mvpfactory.io

Workshop: Build a Cost-Controlled Claude Code Workflow (Save 40-60% on AI Tokens)

#claudecode #ai #productivity #tutorial

What We Are Building

By the end of this workshop, you will have a fully configured cost-controlled Claude Code workflow that includes:

A model-switching strategy you can use in every session
Cache-optimized session habits that cut input costs by up to 83%
A real-time monitoring routine so you never get surprised by a bill
A historical tracking setup using ccusage for weekly spend audits

I use this exact system across every project I build. It keeps my monthly AI spend in the $100–$150 range while shipping production code daily. Let me show you how to set it up from scratch.

Prerequisites

Claude Code CLI installed and authenticated (installation docs)
An active Anthropic API plan or a Max subscription
A terminal you are comfortable working in
About 30 minutes of focused time

No prior cost optimization experience needed. We will start from zero.

Step 1: Understand What You Are Paying For

Before we optimize anything, you need a mental model of the token economy. Every Claude Code interaction costs tokens across three dimensions:

Dimension	What It Includes	Relative Cost
Input tokens	Your prompt, system instructions, loaded files, conversation history	Base rate
Output tokens	Claude's responses, generated code, explanations	3–5x more expensive than input
Cache reads	Previously cached context reused in the same session	~90% cheaper than fresh input

Here is the pattern I use in every project: I think of output tokens as the expensive resource and cache reads as my discount lever. Everything we build in this workshop targets those two numbers.

For concrete pricing on Sonnet 4.5, the model you will use most:

Input tokens:       $3.00 per million
Output tokens:      $15.00 per million
Cached input reads:  $0.30 per million

That cached read price is 10x cheaper than fresh input. This is the single most important number in this entire tutorial.

Step 2: Set Up Your Model-Switching Workflow

Here is the minimal setup to get this working. Claude Code supports switching models mid-session with the /model command. The trick is building a habit around when to switch.

Create this mental mapping and keep it somewhere visible — I tape it next to my monitor:

┌─────────────────────────────────────┬──────────┬───────────────┐
│ Task Type                           │ Model    │ Cost per Task │
├─────────────────────────────────────┼──────────┼───────────────┤
│ Architecture decisions, debugging   │ Opus     │ $0.30–$1.20   │
│   complex race conditions           │          │               │
├─────────────────────────────────────┼──────────┼───────────────┤
│ Feature work, code review, docs,    │ Sonnet   │ $0.04–$0.25   │
│   daily development                 │          │               │
├─────────────────────────────────────┼──────────┼───────────────┤
│ Test generation, formatting, commit │ Haiku    │ $0.002–$0.04  │
│   messages, bulk renames            │          │               │
└─────────────────────────────────────┴──────────┴───────────────┘

Now here is the actual workflow in your terminal:

# Start your session — Sonnet is your daily driver
/model sonnet

# You hit a gnarly bug involving async race conditions
# Upgrade BEFORE you start, not after you have already spent tokens
/model opus

# Work through the complex problem...
# Once resolved, downshift immediately
/model sonnet

# Time to generate tests for what you just built? Drop down
/model haiku

# Tests are written. Back to feature work
/model sonnet

Think of it like driving: you do not stay in first gear on the highway, and you do not need sixth gear in a parking lot.

The key discipline: switch before you start the task, not after you realize you overspent. This single habit is where most of the savings come from.

Step 3: Optimize Your Sessions for Cache Hits

This is the step most developers skip entirely, and it is where I see the biggest waste. Let me show you the difference caching makes with real numbers.

Say you are working on a file with 500 lines of code (~2,000 tokens of context):

WITHOUT cache optimization (20 messages in a session):
  Every message re-sends full context: 20 × 2,000 = 40,000 input tokens
  Cost for context alone: ~$0.12

WITH cache optimization (same 20 messages):
  First message: 2,000 tokens at full price
  Next 19 messages: 38,000 tokens at cache price (90% discount)
  Cost for context alone: ~$0.02

That is an 83% reduction on one file. Now multiply that across every file in your working session.

Here are the four rules I follow in every session:

Rule 1: Work in Longer Sessions

Every new Claude Code session starts with a cold cache. If you are bouncing in and out with quick questions, you pay full input price every single time.

# BAD: Five 5-minute sessions (5 cold cache starts)
# GOOD: One 30-minute focused session (1 cold cache start)

Aim for 30–60 minute focused blocks.

Rule 2: Front-Load Your Context

Load all relevant files at the beginning of a session rather than dripping them in one at a time. This establishes a stable cached prefix.

# Start your session by referencing the key files upfront:
"I'm working on src/auth/login.ts, src/auth/middleware.ts,
and src/types/user.ts. Here's what I need to implement..."

Rule 3: Keep Your Context Stable

Cache invalidation happens when the prefix of your conversation changes. Avoid rearranging which files are loaded or rewriting instructions mid-session.

Rule 4: Avoid Unnecessary Resets

The /clear command wipes your cache. Use it only when you are genuinely switching to unrelated work — not as a habit between related tasks.

Step 4: Build a Real-Time Monitoring Habit

Claude Code displays token usage after every interaction. Here is what the readout looks like:

Session tokens:  Input: 45,231 | Output: 12,847 | Cache Read: 38,102
Session cost:    $0.34

The docs do not mention this, but the number you actually want to track is the Cache Read to Input ratio. Here is how to read it:

Cache Read: 38,102  ÷  Total Input: 45,231  =  84% cache hit rate ✓ GOOD

Cache Read: 4,200   ÷  Total Input: 45,231  =  9% cache hit rate  ✗ BAD

In a well-optimized session, cache reads should represent 60–80% of your total input volume. If you are below 40%, revisit Step 3 — you are likely starting too many short sessions or resetting context unnecessarily.

Get in the habit of glancing at these numbers every few interactions. It takes two seconds and it is the same discipline as checking build times or test coverage.

Step 5: Set Up Historical Tracking With ccusage

Real-time monitoring handles tactical decisions. For strategic cost management, you need historical data. This is where ccusage comes in.

Install it:

npm install -g ccusage

Run it against your session history:

ccusage

You will get output like this:

Weekly Summary (2026-02-17 to 2026-02-23):
──────────────────────────────────────────
Total cost:        $42.17
Sessions:          34
Avg cost/session:  $1.24

By model:
  Opus:     $28.40 (67.3%) — 8 sessions
  Sonnet:   $12.10 (28.7%) — 21 sessions
  Haiku:    $1.67  (4.0%)  — 5 sessions

Highest cost session: $8.72 (Opus, Feb 19, "auth refactor")

Here is the gotcha that will save you hours: almost every time I audit a developer's spend, 2–3 Opus sessions account for the majority of the weekly bill. The fix is not avoiding Opus — it is being intentional about when and why you use it.

Set a calendar reminder to run ccusage every Monday morning. Five minutes of review saves real money.

Step 6: Set Your Monthly Budget Ceiling

Pick a tier based on your current workload:

┌────────────┬─────────────┬─────────────────────────────────────┐
│ Tier       │ Budget      │ Model Mix                           │
├────────────┼─────────────┼─────────────────────────────────────┤
│ Lean       │ $50–$75/mo  │ 80% Haiku, 15% Sonnet, 5% Opus     │
│ Balanced   │ $100–$150   │ 15% Haiku, 65% Sonnet, 20% Opus    │
│ Intensive  │ $200–$300   │ 10% Haiku, 50% Sonnet, 40% Opus    │
└────────────┴─────────────┴─────────────────────────────────────┘

Most solo founders doing active product development land in the Balanced tier. If you are consistently over $150/month, audit your Opus usage first — that is almost always where the overrun lives.

Gotchas and Common Mistakes

Here is the section that will save you from the most common pitfalls I see:

1. Using Opus for everything out of habit.
This is the $600 mistake. A developer I mentored ran Opus exclusively for three weeks. Over 70% of those tasks — test generation, formatting, commit messages — could have been handled by Haiku at 1/20th the cost.

2. Vague prompts that generate bloated output.
Output tokens are 5x more expensive than input. Vague prompts produce long responses because Claude covers multiple interpretations. Compare:

# Vague — generates ~500 tokens of output
"Help me improve this function"

# Precise — generates ~150 tokens of output
"Add input validation for null and empty string to parseConfig,
throw ConfigError with descriptive messages"

That is a 3x cost difference for the same result.

3. Fixing issues one at a time in separate prompts.
If you have five issues in the same file, batch them into one request. Five individual prompts pay for the file context five times. One batched prompt pays once.

4. Using /clear reflexively between related tasks.
This wipes your cache. Only reset when you are genuinely switching to unrelated work.

5. Never checking the usage readout.
The numbers are right there in your terminal after every interaction. Two seconds of attention prevents end-of-month surprises.

6. Forgetting to downshift after a complex task.
You switch to Opus for a hard debugging session — great. But then you stay on Opus for the next hour of routine work. Set yourself a mental trigger: problem solved → switch back to Sonnet.

Conclusion

Let me recap the workflow you just built:

Model switching as a habit — Sonnet for daily work, Opus for genuinely complex reasoning, Haiku for mechanical tasks. This alone cuts costs 40–60%.
Cache-optimized sessions — longer focused blocks, front-loaded context, minimal resets. The difference between 20% and 80% cache hit rates is enormous at scale.
Monitoring at two levels — real-time in the CLI (glance at cache ratios every few interactions) and weekly with ccusage (audit your highest-cost sessions every Monday).

This is not about limiting what you can do. It is about making deliberate choices so you can do more with the same budget. The developers who get this right treat their AI token budget the same way they treat their cloud infrastructure budget — with visibility, intention, and strategic allocation.

Now go check your current session's cache hit ratio. I will wait.

Have questions about optimizing your Claude Code workflow? Drop them in the comments — I read every one.

Top comments (1)

Harjot Singh • Jun 1

cost-controlled Claude workflows are the unsexy thing that actually decides whether autonomous agents are viable, so a whole workshop on it is great to see. Moonshift is basically this discipline productized: agents build + deploy + market a SaaS overnight, with token cost gated per step. aligned with everything here. first run's free if you want to see the budgeting live.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.