DEV Community

Cover image for What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)
Mei Hammer
Mei Hammer

Posted on

What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

Or: what happens when a frustrated developer thinks too hard about token costs


I keep hitting token limits.

Not occasionally — consistently. Every time I think Ive optimized enough, the bill creeps up or the context window fills mid-task. So I started thinking about creative ways to cut token usage. What started as a reasonable question turned into something genuinely unhinged.

The Observation

Somewhere in a Reddit thread about LLM cost optimization, someone claimed that Chinese text uses 30–50% fewer tokens than equivalent English for the same semantic content.

My first instinct: that cant be right. Chinese characters are complex — surely they cost more?

Turns out the intuition is wrong. Modern tokenizers map common Chinese characters to roughly 1 token per character. English looks cheaper per word, but English needs articles (a, the), prepositions (of, in, to), and filler words that carry almost no meaning. Chinese skips all of that.

Same idea. Fewer tokens. The density wins.

The Idea That Got Out of Hand

Once I accepted this was real, my brain immediately went somewhere dangerous:

What if I translated prompts to Chinese before sending them to the expensive model?

English prompt
    ↓  [cheap local model — translate to Chinese]
Chinese prompt  ← ~40% fewer tokens?
    ↓  [expensive frontier LLM]
Chinese response
    ↓  [cheap local model — translate back]
English response
Enter fullscreen mode Exit fullscreen mode

Local models (Ollama + Qwen or DeepSeek) are decent at translation and run on your own hardware — no API cost. The translation overhead is real, but for batch or async workloads, the intuition is: the savings on the frontier model should cover it.

I havent benchmarked this properly. But I like where its going.

Then It Got Weirder

Still in mad-scientist mode: even within Chinese text, emotional expressions could be swapped for emoji. 直冒冷汗 (breaking into cold sweat) is 4 characters. 😅 is 1 token. For high-frequency filler phrases, a lookup table of emoji substitutions could shave off a bit more.

The model would understand it perfectly — its been trained on the entire internet, emoji included.

So the full pipeline becomes:

English prompt
    ↓ translate to Chinese
    ↓ replace common phrases with emoji
    ↓ send to LLM
Response (also compressed)
    ↓ translate back
English response
Enter fullscreen mode Exit fullscreen mode

At this point your logs look like:

"吾 😅 此方案 💡 明日 📅 議之"
Enter fullscreen mode Exit fullscreen mode

Good luck explaining that in a postmortem.

Someone Already Had Half This Idea

I stumbled across caveman — a Claude Code plugin that makes AI respond in caveman-speak to cut output tokens by ~75%. They even have a 文言文 (Classical Chinese) mode, because classical Chinese might be the most information-dense written language ever invented.

Their angle is output compression. This pipeline idea is input compression. Stack them and theoretically youre hitting both ends.

Nobody seems to have done the emoji layer yet. That part might be mine to ruin.

Would This Actually Work?

Honestly — no idea. The translation quality for technical prompts with domain-specific terms could drift. The latency of two extra hops would hurt interactive use cases. And the debugging experience would be truly cursed.

But for the right workload? Batch jobs, background agents, high-volume async tasks where youre paying per token at scale — the logic isnt crazy.

Sometimes the most absurd idea is just one benchmark away from being a real project.


Building agent-chat-gateway — open source infrastructure for connecting AI agents to team chat. Powered and highly motivated by tokens. 🔨

Top comments (0)