When working with LLMs, most developers focus on prompt quality.
But there's another factor that often gets ignored:
token consumption.
Tokens directly impact:
- cost
- latency
- context limits
And small design decisions can have a big impact at scale.
Where tokens are wasted
Most token waste doesn't come from “bad prompts”.
It comes from structure.
Common sources:
- verbose instructions
- repeated context
- unnecessary formatting
- inefficient data representation
Even when the logic is correct, the representation can be expensive.
Example: structured data overhead
A lot of context is sent in JSON format:
{
"user": {
"name": "John",
"role": "developer",
"active": true
}
}
This is great for machines — but not optimized for token usage.
Why?
Because a large portion of tokens are structure:
- braces
- quotes
- repeated keys
- punctuation
Alternative representations (context-dependent)
In LLM-focused systems, some developers explore more compact formats.
For example, simplified structured text (or formats like TOON):
user:
name: John
role: developer
active: true
Same information. Fewer tokens.
This kind of representation can reduce context size significantly when scaled.
Other optimization strategies
1. Remove redundancy
Avoid repeating instructions in multiple forms.
2. Use structured prompts
Instead of natural language blocks:
Task: ...
Context: ...
Output: ...
3. Limit unnecessary verbosity
LLMs do not need polite filler text.
4. Compress context intentionally
Sometimes restructuring data matters more than shortening text.
5. Manage your context window intentionally
One of the biggest hidden costs in LLM applications is not the prompt itself —
but everything you keep inside the context window.
Developers often:
- keep full chat history
- resend large documents repeatedly
- include irrelevant past interactions
All of this consumes tokens.
A better approach is to be intentional about what stays in context.
Instead of sending full history:
- keep only relevant state
- summarize previous messages
- remove outdated or redundant information
Example:
Instead of:
"Here is the full conversation history..."
Use:
"Summary: user is building a TypeScript API with authentication."
This drastically reduces token usage while preserving meaning.
Trade-offs
Token optimization is not always free.
Less verbose prompts can lead to:
- ambiguity
- reduced clarity
- lower robustness in edge cases
So there is always a balance between:
clarity vs efficiency
Final thoughts
Token optimization is not about writing less.
It's about writing intentional context.
As LLM systems scale, efficiency becomes just as important as prompt quality.
Top comments (0)