I live in China and kept running into the same problem: I'd see Chinese text
I couldn't fully read and needed to quickly see the pronunciation (pinyin)
above each character.
Every tool I found was either:
- Paywalled after 5 uses
- Required creating an account
- Sent your text to a server
- Had terrible UI from 2009
So I built one myself. Single HTML file. Fully offline after first load.
Nothing sent anywhere.
How it works
The dictionary
The core is a ~2,500 character lookup table embedded directly in the JS:
const raw = `的:de:0:1|一:yī:1:1|是:shì:4:1|了:le:0:1|我:wǒ:3:1...`
// format: character : pinyin : tone(1-4, 0=neutral) : hsk_level(1-6)
I store it as a pipe-delimited string and parse it once on load.
Covers ~97% of common written Chinese. Characters outside the dictionary
show a "?" — there aren't many in normal text.
Ruby annotations
HTML has a built-in <ruby> tag for exactly this:
<ruby>
<span class="char">中</span>
<rt>zhōng</rt>
</ruby>
The rt element renders above the base character. No canvas tricks,
no absolute positioning — just semantic HTML doing what it was designed for.
Tone colors
Each pinyin string carries its tone in the data, and CSS classes handle
the rest:
.tone-on rt.t1 { color: #ff4d4d; } /* 1st tone — red */
.tone-on rt.t2 { color: #ff9900; } /* 2nd tone — orange */
.tone-on rt.t3 { color: #22c55e; } /* 3rd tone — green */
.tone-on rt.t4 { color: #a78bfa; } /* 4th tone — purple */
.tone-on rt.t0 { color: #8899aa; } /* neutral — grey */
Toggle the class on the container and all tones update instantly
without re-rendering anything.
HSK level highlight
Same pattern — a CSS class on the container, data attributes on
each character span:
.hsk-on .char-span.hsk1 { color: #7ee8bb; }
.hsk-on .char-span.hsk2 { color: #60d4b0; }
/* ... */
.hsk-on .char-span.unk { color: #6a7a9a; } /* unknown */
This lets learners instantly see which characters are beginner vs.
advanced vs. completely outside the HSK vocabulary list.
The offline constraint
I wanted this to work with zero network after the first load — useful
if you're on a plane with a downloaded article, or in China where
connectivity to foreign tools can be unreliable.
Everything is embedded: the dictionary, the CSS, the JS. The HTML file
is ~180KB total. Download once, use forever.
What I learned
<ruby> line-height is annoying. Getting the ruby annotations to
not blow up the line spacing required some CSS gymnastics:
ruby {
display: inline-flex;
flex-direction: column-reverse;
align-items: center;
vertical-align: bottom;
line-height: 1;
}
Polyphonic characters are a real problem. Many Chinese characters
have multiple pronunciations depending on context (e.g., 行 = xíng or
háng). I used the most common reading for each. A proper solution would
need NLP context analysis — out of scope for a single HTML file.
2,500 characters covers more than you'd think. The most frequent
2,500 Chinese characters account for ~97% of text in newspapers and
books. The long tail exists but it's genuinely rare.
Also built
This is part of a small suite of offline Chinese learning tools I've
been building:
- 📖 Chinese Reading Lab — 10 historical stories in Chinese (HSK4–6) with comprehension quiz
- 🐉 Chengyu Stories — 20 classic idioms with origin stories + scenario quiz
- 🃏 Mandarin Flashcards — HSK1–3 spaced repetition
- ✍️ Chinese Writing Toolkit — model essays for 11 HSK writing types
All single HTML files, all free: daligao.github.io/learn-chinese-free
Source for the pinyin annotator: github.com/daligao/pinyin-annotator
Questions welcome — especially if you know a clean way to handle
polyphonic characters without a server.
Top comments (0)