DEV Community

Cover image for Token Efficiency in Claude Code
kavyarani7
kavyarani7

Posted on

Token Efficiency in Claude Code

SectorFlow Engineering Series · Part 1 of 3 · Parent article

Notes on where our context budget was actually going, and what we did about it.

June 2026 · SectorFlow Engineering

In this series Part 2: The Skills File Pattern — fixing CLAUDE.md bloat with imports. Part 3: Picking Models and Tools — the MCPs we tried, refused, and why.

The problem nobody warns you about

Claude Code can do a lot. The catch is that all of it runs on context, and you don't get much of that.

When we started SectorFlow we did the obvious thing. Kept a CLAUDE.md at the repo root, and every time something went wrong — wrong model string, a cache TTL that didn't match, a chart that came out looking off — we'd write a rule and stick it on the end. The file kept growing. We didn't really clock it as a problem until it was one.

By about week six it was 400 lines. Every session loaded the whole thing. Frontend rules sitting next to deployment runbooks sitting next to database decisions, none of it sorted. And because we'd added the rules one at a time over weeks, some of them flatly disagreed with each other. Claude would follow the new one, or the old one, or try to split the difference. We got something wrong either way.

I want to be clear this isn't a Claude Code problem. It's on us, and it's fixable. But fixing it meant we had to stop treating CLAUDE.md like a junk drawer.

The thing that actually hurts isn't the per-token price. It's that every token spent loading context is a token you don't get back for the work. Burn 30,000 on setup and you've got far less room to write code than if you'd burned 5,000. You hit the ceiling partway through a file and whatever you were in the middle of is just gone.

Context is a budget

Here's the shift, and it's simple once you see it: anything Claude reads at the start of a session is something it can't use later for code. Most projects pile everything into CLAUDE.md on the theory that the model might need it someday. We flipped the question. What does the model need for this task? Load that. Skip the rest.

Two rules came out of it:

  • Precision over completeness. A small context that's right does more for you than a big one trying to cover all the bases.
  • Load on demand. Structure things so only the relevant part shows up for a given task.

Those turned into three actual practices, and each one gets its own article in this series. This one is just the overview — what we measured and why it matters.

What we actually changed

1. Session startup cost

Every session loads CLAUDE.md plus whatever it imports. Before, that was the one 400-line file, every time, regardless of the task. After we split it into separate skill files, a UI task pulls in core.md (the constraints) and design.md (the visual stuff) and nothing else. An infra task gets core.md and infrastructure.md. Startup cost dropped about 60%.

2. Reading tickets

We had the Linear MCP hooked up so Claude could read tickets itself. Nice in theory. But one list_issues call runs about 3,500 tokens, and the whole read-it / mark-done / comment loop is around 9,000. So now the engineer just pastes the acceptance criteria. That's maybe 400 tokens. The 8,600 difference doesn't sound like much until you multiply it across 60-plus tickets — that's something like 7 or 8 full context windows handed back to the actual work.

3. Reading files it was never asked to read

Left alone, Claude reads files to get its bearings, sometimes three or four of them before it writes a line. So we made a rule: only read files the task names. Need to find a function? grep for it, then view just those lines. Don't open a file to soak up "context." If something's actually missing, ask. Saves 2,000–4,000 tokens on a complex task.

4. Spinning up the dev server to check things

Verifying a change by eye means starting the server, waiting, navigating, screenshotting, evaluating — a whole chain of calls. For anything you can't see in a browser, like server logic or data contracts or route handlers, that chain tells you nothing. So we only do the visual check when the change is something a person could actually see in a browser. For syntax we run node --check. One Bash call.

The numbers

Source of overhead Before After Saving
Session context load ~400 lines, every session 60–120 lines, task-specific ~60%
Ticket ingestion (per ticket) ~9,000 tokens via MCP ~400 tokens via paste ~8,600 tokens
File reads per task 3–5 files speculatively Named files only 2,000–4,000 tokens
Verification overhead Dev server + screenshot node --check only 4–6 tool calls

Each of these on its own is fine, nothing dramatic. Put together they change what fits in a session. Stuff that used to take two or three sessions now usually fits in one. That's the whole point.

How the parts fit

The other two articles each take one piece of this:

  • Part 2, the skills file pattern, is about the startup cost and the contradicting-rules mess. It's where the import-on-demand structure comes from.
  • Part 3, models and tools, covers the ticket overhead and the file-reading habit, plus which MCPs we said no to and where we drew the line between Haiku and Sonnet.

Read this one first for the why. Then either of the others for the how.

One thing to keep in mind

Claude Code does its best work when the context is small, accurate, and honest about what's actually known versus what you're hoping for. Vague in, vague out. And a context file that tries to cover everything ends up covering nothing properly.


Continue reading

Top comments (0)