In the era of large language models, prompt size is power — but also a big cost.
The more context you provide, the more tokens you consume. And when working with long, structured prompts or repetitive query templates, that cost can escalate quickly.
TokenSpan isn’t a compression library, it’s a thought experiment — a different way of thinking about prompt optimization.
Can we reduce token usage by substituting repeated phrases with lightweight aliases?
Can we borrow ideas from dictionary encoding to constrain and compress the language we use to communicate with models?
This project explores those questions — not by building a full encoding system, but by probing whether such a technique might be useful, measurable, and worth pursuing.
💡 The Core Insight: Let the Model Do the Work
A crucial insight behind TokenSpan is recognizing where the real cost lies:
We pay for tokens, not computation.
So why not reduce the tokens we send, and let the model handle the substitution?
LLMs easily understand that §a
means "Microsoft Designer"
— and we’re already paying for those tokens, so there’s no extra cost for that mental mapping.
Dictionary: §a → Microsoft Designer
Rewritten Prompt: How does §a compare to Canva?
🔁 Scaling with Reusable Dictionaries
If you were to build a system around this idea, the best strategy wouldn't be to re-send the dictionary with every prompt. Instead:
- Build the dictionary once
- Embed it in the system prompt or long-term memory
- Reuse it across multiple interactions
This only makes sense when dealing with large or repetitive prompts, where the cost of setting up the dictionary is outweighed by the long-term savings.
By encouraging simpler, more structured language, your application can:
- Reduce costs
- Improve consistency
- Handle diverse user inputs more efficiently
After all, we’re often asking the same things — just in different ways.
📐 The Formula
What if we replaced a 2-token phrase like "Microsoft Designer"
with an alias like §a
?
Assume the phrase appears X
times:
-
Original Cost:
2 × X
tokens -
Compressed Cost:
X
(alias usage) +4
(dictionary overhead)
Savings Formula:
Saved = (2 × X) - (X + 4)
Example: "Microsoft Designer"
appears 15 times.
Saved = (2 × 15) - (15 + 4) = 30 - 19 = 11 tokens saved
That’s just one phrase — real prompts often contain dozens of reusable patterns.
🎯 Why Focus on Two-Token Phrases?
This experiment targets two-token phrases for a reason:
- ✅ Single tokens can’t be compressed
- ✅ Longer phrases save more but occur less
- ✅ Two-token phrases hit the sweet spot: frequent and compressible
🧾 Understanding the Overhead
Each dictionary entry adds 4 tokens:
-
1
token for the replacement code (e.g.§a
) -
1
token for the separator (e.g.→
) -
2
tokens for the original phrase
You only start saving tokens once a phrase appears 5 or more times.
📊 Real-World Results
Using a raw prompt of 8,019 tokens:
After substitution → 7,138 tokens
Savings: 881 tokens (~11.0%)
The model continued performing correctly with the encoded prompt.
🧠 Conclusion
Natural language gives users the freedom to communicate in flexible, intuitive ways.
But that freedom comes at a cost:
- 🔄 Repetition
- ❌ Inaccuracy from phrasing variations
- 💰 Higher usage costs
If applications limited vocabulary for most interactions, it could:
- Lower token usage
- Encourage more structured prompts
- Improve response consistency
🧪 Lessons from Tokenization Quirks
Here are some interesting quirks noticed during development:
Common Phrases = Fewer Tokens
e.g.,"the"
often becomes a single token.Capitalization Can Split Words
"Designer"
vs."designer"
— tokenizers treat them differently.Rare Words Get Chopped Up
"visioneering"
might tokenize into"vision"
+"eering"
.Numbers Don’t Tokenize Nicely
"123456"
can break into"123"
+"456"
.Digits as Aliases? Risky.
Using"0"
or"1"
as shortcuts often backfires — better to use symbols like§
or@
.
🔬 Try It Yourself
📍 GitHub: alexsanderhamir/TokenSpan
💬 Contributions & feedback welcome!
TokenSpan is a thought experiment in prompt optimization.
The savings are real — but the real value is in rethinking how we balance cost, compression, and communication with LLMs.
Top comments (0)