Alexsander Hamir

Posted on Aug 4

TokenSpan: Rethinking Prompt Compression with Aliases and Dictionary Encoding

#ai #llm #promptengineering #machinelearning

In the era of large language models, prompt size is power — but also a big cost.
The more context you provide, the more tokens you consume. And when working with long, structured prompts or repetitive query templates, that cost can escalate quickly.

TokenSpan isn’t a compression library, it’s a thought experiment — a different way of thinking about prompt optimization.

Can we reduce token usage by substituting repeated phrases with lightweight aliases?

Can we borrow ideas from dictionary encoding to constrain and compress the language we use to communicate with models?

This project explores those questions — not by building a full encoding system, but by probing whether such a technique might be useful, measurable, and worth pursuing.

💡 The Core Insight: Let the Model Do the Work

A crucial insight behind TokenSpan is recognizing where the real cost lies:
We pay for tokens, not computation.

So why not reduce the tokens we send, and let the model handle the substitution?
LLMs easily understand that §a means "Microsoft Designer" — and we’re already paying for those tokens, so there’s no extra cost for that mental mapping.

Dictionary: §a → Microsoft Designer  
Rewritten Prompt: How does §a compare to Canva?

🔁 Scaling with Reusable Dictionaries

If you were to build a system around this idea, the best strategy wouldn't be to re-send the dictionary with every prompt. Instead:

Build the dictionary once
Embed it in the system prompt or long-term memory
Reuse it across multiple interactions

This only makes sense when dealing with large or repetitive prompts, where the cost of setting up the dictionary is outweighed by the long-term savings.

By encouraging simpler, more structured language, your application can:

Reduce costs
Improve consistency
Handle diverse user inputs more efficiently

After all, we’re often asking the same things — just in different ways.

📐 The Formula

What if we replaced a 2-token phrase like "Microsoft Designer" with an alias like §a?

Assume the phrase appears X times:

Original Cost: 2 × X tokens
Compressed Cost: X (alias usage) + 4 (dictionary overhead)

Savings Formula:

Saved = (2 × X) - (X + 4)

Example: "Microsoft Designer" appears 15 times.

Saved = (2 × 15) - (15 + 4) = 30 - 19 = 11 tokens saved

That’s just one phrase — real prompts often contain dozens of reusable patterns.

🎯 Why Focus on Two-Token Phrases?

This experiment targets two-token phrases for a reason:

✅ Single tokens can’t be compressed
✅ Longer phrases save more but occur less
✅ Two-token phrases hit the sweet spot: frequent and compressible

🧾 Understanding the Overhead

Each dictionary entry adds 4 tokens:

1 token for the replacement code (e.g. §a)
1 token for the separator (e.g. →)
2 tokens for the original phrase

You only start saving tokens once a phrase appears 5 or more times.

📊 Real-World Results

Using a raw prompt of 8,019 tokens:
After substitution → 7,138 tokens
Savings: 881 tokens (~11.0%)

The model continued performing correctly with the encoded prompt.

🧠 Conclusion

Natural language gives users the freedom to communicate in flexible, intuitive ways.
But that freedom comes at a cost:

🔄 Repetition
❌ Inaccuracy from phrasing variations
💰 Higher usage costs

If applications limited vocabulary for most interactions, it could:

Lower token usage
Encourage more structured prompts
Improve response consistency

🧪 Lessons from Tokenization Quirks

Here are some interesting quirks noticed during development:

Common Phrases = Fewer Tokens
e.g., "the" often becomes a single token.
Capitalization Can Split Words
"Designer" vs. "designer" — tokenizers treat them differently.
Rare Words Get Chopped Up
"visioneering" might tokenize into "vision" + "eering".
Numbers Don’t Tokenize Nicely
"123456" can break into "123" + "456".
Digits as Aliases? Risky.
Using "0" or "1" as shortcuts often backfires — better to use symbols like § or @.

🔬 Try It Yourself

📍 GitHub: alexsanderhamir/TokenSpan
💬 Contributions & feedback welcome!

TokenSpan is a thought experiment in prompt optimization.
The savings are real — but the real value is in rethinking how we balance cost, compression, and communication with LLMs.

DEV Community