The first sign something was wrong was the invoice.
Not catastrophically wrong. More like “this escalated quietly while I was having fun” wrong. I’d been using Claude Code the way most people start: throw everything at it, stay in the session, keep talking. It felt productive. It was productive. And it was also quietly burning through tokens at a rate I hadn’t tracked because tracking it felt like it would interrupt the flow.
The flow, it turned out, was expensive.
The Context Window Is Not Your Notebook
The first thing I fixed was the worst habit I’d developed without noticing: treating a session like a shared workspace that persisted naturally.
You start a session in the morning. You push through a bug. You take a break, come back, keep going. By afternoon the context window is carrying around dead weight from every failed attempt, every speculative approach, every Docker error you resolved hours ago. The model still has all of it. It’s orienting against it constantly. You’re paying for that.
I started resetting sessions aggressively. Not because Claude “forgets” things in some damaging way. Because I wanted selective continuity. Before any substantive session, I write five lines:
objective: implement JWT refresh logic
relevant files: src/auth/session.ts, src/middleware/verify.ts
constraints: no new libraries, preserve existing error handling
what already failed: tried storing refresh tokens in memory — didn't survive restarts
expected output: working refresh endpoint with test coverage
That’s the whole handoff. Nothing else goes in until I need it.
The output got sharper immediately. The model stopped hedging around context it half-remembered and started working against something clean. Entropy in the context produces entropy in the response. That relationship is consistent.
Vague Prompts Are Token Multipliers
The second expensive habit was question prompts. Casual ones, specifically.
“What do you think about this approach?” “Any suggestions here?” “How would you handle this?”
These feel conversational. They are also open invitations for the model to generate possibility space. And it will. It will generate options, tradeoffs, philosophy, alternatives, adjacent considerations, and things you will never use. All of it billed per token.
The fix was treating prompts more like instructions on industrial equipment. Specific verb, specific scope, specific constraint.
Before
"what do you think about the auth flow?"
After
"Identify the security issue in the token validation logic. Return the vulnerable line and explain why it's exploitable. Under 150 words."
The model responds to scope constraints. Tell it the output size. Tell it the output format. Tell it exactly which part of the problem you’re in. Ambiguity isn’t a kindness to the model — it’s an invitation to talk until it finds something you respond to.
Mode Separation Is Not Optional
There’s a particular kind of session that reliably costs me the most tokens per hour of useful output: mixed-mode sessions.
Planning while implementing. Debugging while also redesigning. Asking “while we’re here, should we rethink the whole structure?” during what should have been a fifteen-minute fix.
The model carries all of that. Every architectural tangent you take becomes part of the working context it reasons against. You can almost feel the session getting heavier as it accumulates undecided decisions.
I now treat modes like physical zones in a workshop. Planning sessions are their own thing. They produce a structured document. That document then seeds implementation sessions …selectively, just the relevant sections. Debugging sessions start with a structured incident format:
expected: webhook processes in under 500ms
actual: times out after 30s on payloads over 10KB
reproduction: any request with body > 10KB to /api/webhooks
recent changes: added payload validation in middleware, PR #47
logs: [paste relevant lines only]
suspected scope: likely the base64 encoding step in validatePayload()
Not narrative. Not conversational. Clean surface.
Negative Prompts Actually Work
People who generate images know this instinctively. Almost nobody applies it to coding agents, which is strange because it’s just as effective.
I started adding explicit exclusion instructions to almost every substantive prompt.
do not redesign the architecture
do not explain basics
do not add dependencies
do not touch code outside the function I specified
do not rewrite working tests
Without constraints, the model defaults toward maximal helpfulness. It interprets your request charitably and expansively. A small fix becomes a refactor. A targeted change becomes a suggestion for systemic improvement. The response is not wrong, exactly. It’s just bigger than you needed, touching things you didn’t ask it to touch.
Fencing the scope with negative constraints brings generation size down and collateral damage to zero.
Stop Pasting Entire Files
This one hurt to admit because I did it constantly for the first few months.
Large files go in as context. The model orients against the entire file even when the problem lives in twenty lines. It spends output describing sections you can already read. It generates explanations for code that doesn’t need explanation. You’re paying for that orientation work on every prompt.
Now I isolate ruthlessly. The relevant function. The relevant interface. The error message and the five lines around where it occurs. Nothing else unless the problem genuinely requires more context, and most problems don’t.
There’s a mechanical feeling to doing this correctly. Like setting up a machine to do one specific job instead of asking it to improvise across a full workshop floor.
Answer Budgets Are Real
This sounds pedantic until you watch what happens when you enforce them.
“Answer in five bullets maximum.” “Use under 200 tokens.” “One paragraph only.” “Return the patch only, no explanation.”
Quality frequently improves. Not because the model gets smarter under constraint, but because constraint forces selection. The model identifies the most important information instead of generating everything plausible and letting you filter.
Engineers who write good issue tracker comments already know this. The person who writes three sentences and identifies the problem exactly is more useful than the person who writes four paragraphs of observed behavior. Constraints sharpen communication in humans and in models.
The model defaults toward abundance because continuation probability rewards continuation. You have to interrupt that instinct intentionally.
Reusable Prompt Fragments Are Not Optional Complexity
I resisted systematizing my prompts for a long time because it felt like overhead. It wasn’t. It was already overhead…I was just paying it every session by retyping the same constraints.
The instructions I repeated constantly:
preserve all existing comments
minimal diff only — do not touch working code
TypeScript strict mode, no any
explain your reasoning before writing code
no new dependencies
mobile-first for any UI changes
I keep these in a snippets file. Each session loads the relevant subset. Prompt consistency reduces output variance. Tiny wording differences across sessions produce surprisingly different model behaviors over time — consistent inputs produce consistent, auditable outputs.
Know When Not to Use Claude
This was the hardest thing to accept and the most valuable.
Sometimes the fastest path is opening the file and fixing it yourself.
AI tools create a distortion where every task starts feeling like an AI task. But localized fixes with no ambiguity, problems you already understand completely, changes where verification overhead is higher than doing the work directly — these often cost more in prompting and review than they would cost in direct action.
There’s a pattern emerging in coworking spaces and Discord servers where people spend more cognitive energy orchestrating AI than solving the problem. Staring at generated diffs with the same expression someone brings to a slot machine resolving its animations. Waiting for the answer to materialize rather than already knowing the answer and writing it.
Not every problem is better solved through an assistant.
The Real Variable Is Attention
All the specific techniques above are downstream of one change: I stopped treating token cost as abstract.
Once cost is visible, waste is visible. You see the giant prompts. The repetitive context-loading. The speculative rewrites on code that was already fine. The conversational drift that burned two hours and produced nothing actionable.
The optimizations also improved the thinking. Forced compression forces clarity. Good engineering is compression — extracting signal, removing noise, making decisions and recording them instead of relitigating them. Systems that accumulate uncontrolled context decay, whether the system is a codebase, a team, or a conversation.
Most people are still in the abundance phase with AI coding tools. Infinite context, infinite patience, no friction awareness. That changes when the invoice arrives, or when the outputs start degrading because no one modeled the cost of entropy.
The developers who adapt earliest tend to build calmer systems. Calmer systems are easier to debug, cheaper to run, and faster to iterate on.
That’s not a coincidence.
If you want a structured version of these workflows with templates, prompt patterns, and session setups, I put together a field guide: 21 Claude Code productivity patterns, usage you can actually measure.
Claude Code for Developers: 21 Productivity Tricks That Save Hours
Top comments (0)