Stanislav Kremeň

Posted on Jun 2 • Edited on Jun 3 • Originally published at Medium

How I Stopped Burning Tokens in Claude Code

#ai #claude #devtools #vibecoding

I use Claude Code every day. After about a month on a larger project, I hit a wall that anyone running something longer than a weekend script knows: keeping a project on track across dozens of sessions costs an absurd amount of tokens.

The problem isn't the coding itself. The problem is context. Every new session starts from zero - Claude doesn't know what happened in the last phase, what decisions were already made, where the project is heading. So you have to re-orient it every single time. And the more context you pile on, the sooner you hit the limit.

Where the tokens go

I tried GSD (Get Shit Done) - a clever tool that breaks a project into phases and keeps documentation along the way. But that documentation is also its weakness: it generates a pile of markdown files - RESEARCH.md, PLAN.md, VERIFICATION.md and more - and re-reads them into context at every step.

I turned on a token counter and watched a single phase eat more context on reading its own bookkeeping than on the actual work. That felt wrong. Project state should be a thin index, not a book that Claude laboriously reads from scratch before every edit.
So I wrote my own tool. It's called mini.

Design principle: state is a header, not a book

The whole idea of mini fits into one sentence: keep minimal state and send Claude only the essentials.

Concretely:

project.md - one page. What you're building, for whom, what the constraints are. Nothing more.
state.json - a lightweight header: phase index, statuses, chosen models. The details of each phase (steps, report) live separately in .mini/phases/phase-{id}.json and are loaded only when needed. When I then start work on a phase, Claude typically gets ~600–1000 tokens: a page of project.md + the current phase + five steps. No history of old phases, no old plans, no verification reports.

And what if Claude needs to understand the existing code? It reads it itself via Read/Glob/Grep. That's far cheaper than stuffing the whole codebase into context up front - Claude pulls exactly the three files it's working on, not the entire src/.

And this isn't just theory from a token counter. In practice, one entire phase - from proposal through implementation to closing it out - fits within 10–15% of the context window on Opus 4.8 (1M). That leaves a huge margin: Claude has room to read files, iterate and fix things without the window filling up and the session "choking" on its own history. That headroom is the whole point - a phase never runs out of room to think.

Initializing a project:

init, or import-gsd from a .planning/ directory

The loop:

next → plan → do → done

Working with mini is one repeating cycle. I'll show it on a small example - a REST API for todos:

mini init create the project - 4 questions, produces .mini/project.md and state.json
mini next Claude proposes the next phase → phase 1 - Health endpoint
mini plan break it into concrete, verifiable steps → Phase 1 broken down into 2 steps
mini do work the phase - opens an interactive Claude Code session
mini done close it - "does it work?" → moves state forward, commits, writes memory

The key point is that state operations are done by non-trivially tested TypeScript, not by Claude. mini … - apply moves state.json, writes the report, closes the phase. Claude does only the agentic work in the session. That way the state can never break from a hallucination - the thing I hated about purely prompt-based solutions.

What helped me most: memory between phases

This is the detail that turned out to matter most. After closing a phase, mini writes a short record into .mini/memory/phase-{id}.md:
what was done key decisions - why solution X over Y loose ends - what was left undone, what to watch out for

And note - this does not call the Claude API. mini assembles that record directly in TypeScript from the phase data (metadata + the verbatim content of the discuss/report). It's free and instant. Claude gets involved only if you explicitly set a memory scope to a model.

Memory complements git log with a layer you won't find there: not what changed, but why. And a short summary of the latest record is automatically poured into the prompt for mini next - so the proposal for the next phase knows where we left off. That was the moment Claude stopped proposing phases that ignored earlier decisions.

Semi-autonomous mode

When I want to move fast, I use mini auto:

next → plan → do (acceptEdits) → done

It runs the whole phase in a single Claude session (not a restart per step - every restart means re-exploring the project with no added value) and stops only at the human checkpoint: "does it work?". In acceptEdits mode Claude doesn't ask before every file edit, but Bash still asks (unless you use - -allow-dangerously-skip-permissions) - no random rm -rf.

Auto also has a cooperative stop signal (mini stop from a second terminal), so I can halt an autonomous run cleanly at a step boundary.

Slash commands right inside Claude Code

The whole cycle doesn't have to run from the terminal. After mini install-commands I have /mini:* commands right inside Claude Code:

/mini:next - proposes and saves the next phase
/mini:plan - breaks the phase into steps
/mini:do - implements the phase and writes a report
/mini:done - human verification in the chat → moves state forward

The body of those commands is thin - it just runs mini context , which prints the current prompt. The agentic work is done by Claude in the running session, the state operations by untouched TypeScript. No nested Claude inside the session.

It runs on your subscription

No extra API keys, no separate billing. mini just runs claude as a subprocess - authentication is handled by Claude Code itself, based on how you have it configured. It runs on your Pro/Max account.

Try it

mini is free and open source (MIT). If burning tokens annoys you too, the fastest way to try it without touching ~/.claude:

cd your-project
npx mini-orchestrator install-commands

Repo with a demo GIF of the full cycle: github.com/czsoftcode/mini-orchestrator
- - -
I'm now looking for a few people to run it on a real project and tell me where it drags or where the onboarding is confusing. Honest "this is useless because X" feedback is just as welcome as bug reports - reach out in issues.

Top comments (5)

Alex Shev • Jun 9

Token reduction is a symptom of better context discipline. The real win is teaching the agent to fetch the smallest useful slice of the repo instead of dragging the whole project into every step.

Stanislav Kremeň • Jun 9

The whole mini-orchestrator is about the underlying idea. I try to load only the essential things in it that I need for my work. Instead of the entire project, I have its graph available in graph.json with a link to the /graph directory, which has a list of functions with links to which line in the code it is located.

Alex Shev • Jun 9

Exactly. That line-level graph is the useful middle ground: not dumping the repo into context, but also not making the agent guess where to look.

The important bit is that the graph stays cheap and deterministic. If it can point the agent to the right function/file first, then the model spends tokens reasoning about the change instead of rediscovering the codebase every time.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.