Noriuki

Posted on Jun 7

Token Consumption Optimization in LLM Applications

#ai #llm #performance

When working with LLMs, most developers focus on prompt quality.

But there's another factor that often gets ignored:

token consumption.

Tokens directly impact:

cost
latency
context limits

And small design decisions can have a big impact at scale.

Where tokens are wasted

Most token waste doesn't come from “bad prompts”.

It comes from structure.

Common sources:

verbose instructions
repeated context
unnecessary formatting
inefficient data representation

Even when the logic is correct, the representation can be expensive.

Example: structured data overhead

A lot of context is sent in JSON format:

{
  "user": {
    "name": "John",
    "role": "developer",
    "active": true
  }
}

This is great for machines — but not optimized for token usage.

Why?

Because a large portion of tokens are structure:

braces
quotes
repeated keys
punctuation

Alternative representations (context-dependent)

In LLM-focused systems, some developers explore more compact formats.

For example, simplified structured text (or formats like TOON):

user:
  name: John
  role: developer
  active: true

Same information. Fewer tokens.

This kind of representation can reduce context size significantly when scaled.

Other optimization strategies

1. Remove redundancy

Avoid repeating instructions in multiple forms.

2. Use structured prompts

Instead of natural language blocks:

Task: ...
Context: ...
Output: ...

3. Limit unnecessary verbosity

LLMs do not need polite filler text.

4. Compress context intentionally

Sometimes restructuring data matters more than shortening text.

5. Manage your context window intentionally

One of the biggest hidden costs in LLM applications is not the prompt itself —

but everything you keep inside the context window.

Developers often:

keep full chat history
resend large documents repeatedly
include irrelevant past interactions

All of this consumes tokens.

A better approach is to be intentional about what stays in context.

Instead of sending full history:

keep only relevant state
summarize previous messages
remove outdated or redundant information

Example:

Instead of:

"Here is the full conversation history..."

Use:

"Summary: user is building a TypeScript API with authentication."

This drastically reduces token usage while preserving meaning.

Trade-offs

Token optimization is not always free.

Less verbose prompts can lead to:

ambiguity
reduced clarity
lower robustness in edge cases

So there is always a balance between:

clarity vs efficiency

Final thoughts

Token optimization is not about writing less.

It's about writing intentional context.

As LLM systems scale, efficiency becomes just as important as prompt quality.

Top comments (1)

Aoteman • Jun 30

"很多人以为'更详细的prompt = 更好的结果'，
但实际上，多写的每一个字都在烧你的钱。
我测试了同一个任务：

详细prompt：2,847 tokens，$0.085
精简prompt：412 tokens，$0.012 结果质量几乎一样。"