If you've ever tried to build an OCR system that handles Chinese, Japanese, or Korean text, you know the pain. Latin-script OCR has been "good enough" for years, but CJK languages? Still a minefield in 2026.
I've been working on Screen Translator, an Android app that uses a floating bubble to OCR and translate on-screen text in real time. Building it forced me to confront every ugly corner of CJK text recognition. Here's what I learned.
The Character Set Problem
English has 26 letters. Chinese has over 50,000 characters in common use (GB18030 standard). Japanese mixes three scripts — Hiragana, Katakana, and Kanji — sometimes in the same sentence. Korean Hangul has 11,172 possible syllable blocks.
For an OCR engine, this means:
- Massive classification space: Instead of distinguishing ~70 characters (upper/lower + digits + punctuation), you're classifying among tens of thousands
- Visually similar characters: 土/士, 末/未, 己/已/巳 — these differ by a single pixel-level stroke
- Mixed scripts: A Japanese game UI might show "HP回復アイテム" — that's Latin, Kanji, and Katakana in one string
Why Standard OCR Pipelines Struggle
Most OCR pipelines follow: Detection → Recognition → Post-processing.
For CJK, each step has unique failure modes:
Detection
CJK text can be vertical or horizontal. Game UIs love vertical text. Manga reads right-to-left. Most detection models are trained on horizontal Latin text and simply miss vertical CJK layouts.
Recognition
The standard CRNN (CNN + RNN + CTC) architecture works well for Latin scripts but struggles with CJK because:
# Simplified comparison
Latin: Fixed-width character assumption mostly works
CJK: Character width varies dramatically
Full-width: ABC (each takes 2x space)
Half-width: ABC
Mixed: 「Hello世界」
The CTC (Connectionist Temporal Classification) loss function assumes characters appear in sequence without overlap. CJK characters in stylized fonts (especially in games and manga) often break this assumption.
Post-processing
For English, you can use dictionary lookup and language models to fix OCR errors. "teh" → "the" is trivial. But for Chinese, a single wrong character can completely change meaning:
- 大人 (adult) vs 犬人 (not a word — but OCR might produce it)
- Context-based correction requires much larger language models
What Actually Works in 2026
After months of iteration, here's what I found effective:
1. Multi-scale text detection
Using a CRAFT-like detector with explicit vertical text support. Training data must include vertical Japanese manga panels and Chinese calligraphy-style game text.
2. Attention-based recognition over CTC
Transformer-based recognition models handle variable-width CJK characters much better than CTC-based approaches. The attention mechanism naturally handles the alignment problem.
3. Script-aware preprocessing
Before feeding text to the recognizer, detect the dominant script and adjust:
def preprocess_for_script(image, detected_script):
if detected_script in ['ja', 'zh']:
# CJK benefits from higher resolution input
image = upscale(image, factor=2)
# Binarization helps with stylized game fonts
image = adaptive_threshold(image)
if is_vertical(image):
image = rotate_90(image)
return image
4. Game/Manga-specific fine-tuning
Generic OCR models fail on stylized text. Fine-tuning on screenshots from actual games and manga pages made a huge difference in my app's accuracy.
The Real-World Test
The ultimate test for Screen Translator was Japanese gacha games. These combine:
- Stylized fonts with outlines and shadows
- Text over complex backgrounds (character art, particle effects)
- Mixed Japanese/English/numbers
- Small text in UI elements
Getting reliable OCR in this environment required all the techniques above, plus aggressive image preprocessing to isolate text from backgrounds.
Lessons for Fellow Developers
If you're building anything that touches CJK OCR:
- Don't assume horizontal text — support vertical from day one
- Test on real content — synthetic training data alone won't cut it for games/manga
- Character-level confidence matters — when OCR confidence is low on a CJK character, it's better to show the user than to guess wrong
- Translation quality depends on OCR quality — garbage in, garbage out. A mistranslation from bad OCR is worse than showing "recognition failed"
I'm still iterating on Screen Translator's OCR pipeline. If you're working on similar problems or have found good approaches for CJK text recognition, I'd love to hear about it in the comments.
You can try the app here: Screen Translator on Google Play
What's your experience with CJK OCR? Have you found any tricks that work well for specific use cases? Let me know below.
Top comments (0)