There's been a noticeable trend recently with AI models, they are getting smarter, easier to use and increasingly capable, but that capability comes at a cost.
We are now paying for performance in a way that we weren't a few weeks or months ago and because of that, the way we utilise AI tools is shifting. Understanding what consumes tokens and how to use them effectively has become an important skill for anyone working with AI tools.
At the same time, it's important to remember that token optimisation isn't about using fewer tokens to save money, it's about spending tokens where they add the most value and avoiding waste where they don't.
What is eating up your tokens
At the start of your session:
There are a few tools that I know a lot of people use but what they don't realise is that these can take up a large amount of tokens before any prompt has even been given.
- MCP definitions, tools and schemas may be included in context provided to the model
- Large instruction files (
CLAUDE.md/copilot-instructions.md) - Context accumulation if restarting a previous session
These are all present at the start of your session and while you may not actually use the MCP tools or require everything in your instruction documentation, tokens may be required to initialise them and setup your session.
During your session:
The majority of tokens used will be while the session is running. This includes:
- Large, verbose model responses
- Images and screenshots included in a prompt
- Unnecessary files referenced in prompts
- Long prompts with unnecessary information to the current context
How to be more token conscious
I feel like a lot of what's mentioned above has become common practice but there are some ways to make your sessions more efficient while also improving on the output.
Keep instruction files short and precise! Instruction files should have the bare minimum to enable your coding tool to understand file structure, coding standards and business specific detail. Instruction files are often read multiple times through a session, so every additional line has a token cost.
Initiate MCPs only when required: MCPs can be disabled by default. This ensures that they are only setup to be used when they are actually required.
Be specific in your prompts: Prompts should be specific and to the point. Short and vague prompts can result in poor or incorrect responses which have a lot more back and forth to get the correct result.
Choose models wisely! Don't reach for a high performing model when a low cost model can perform just as well.
Utilise tools: Using tools like
/compactor/handoffcan help with context and memory when rejoining sessions.Know when to start a new session! Sessions can get long and complex quickly. Knowing when to stop using an existing session and to start over can help minimise context rot, hallucinations and can provide better overall results.
Recommended tools
There are a number of tools I recommend to help keep an eye on your usage and minimise token spending:
- ccusage for token tracking to see where your tokens are being spent
- caveman reduces long unnecessary text in responses
- handoff helps move important context between sessions without bringing in unrelated work
As we start to utilise AI tools more for our everyday coding tasks, it can be easy to fall back into bad practices. I'm sure we've all been guilty of defaulting to the same model for every task when a cheaper model could actually have completed the task just as well. Or forgetting to disable MCPs and then wondering why our token usage this session is so high when we haven't actually utilised any external tools.
It's also very easy to stay in the one session because all our context is there, even though all that context is probably not necessary for the next question we are going to be asking.
While these may seem to make life easier in the short term, in the long term the cost really adds up. For many of us (myself definitely included) we can sometimes have an over-reliance on AI tools because they're there, they do the work faster and the outcome is actually often better than if we coded ourselves.
What we really should be asking, before we even start on token optimisation, is should AI be doing this or is it something I could do as effectively on my own. Token optimisation starts with deciding when AI is the right tool for the job, choosing a model or trimming an instruction file is just the next step in the process.
Top comments (0)