Bridging the language gap: Under the hood of an AI-powered game UI translator

#ocr #javascript #webdev #ai

When NetEase relaunched World of Warcraft in China (specifically the Titan Reforged / Project 80 Chrono servers), thousands of English-speaking players jumped in. The problem? Translating Chinese game UI, item tooltips, and GDKP chat logs in real-time is a complete nightmare. Standard OCR tools fail hard because they don't understand game-specific terminology (like "GDKP", "Chrono scaling", or class abbreviations).

So, I built a tailored web portal at titanreforged.com, featuring a custom AI-powered screenshot translator. Here’s how I put the technical pipeline together.

The Challenge: Low-Res MMO Fonts & Slang

Game screenshots are notoriously hard to parse for standard OCR models. The fonts are styled, pixelated, and overlaid on complex, high-contrast backgrounds. Traditional OCR often outputs garbage text or fails entirely.

Worse, standard translators fail at gaming slang. If a Chinese player writes "来个强力糖门", a generic translator translates it literally as "Come strong sugar gate". But in WoW speak, it actually means "Looking for a Warlock (who can create soulwells/healthstones)".

The Architecture: OCR + LLM Pipeline

To solve this, I designed a two-stage processing pipeline using Next.js, HTML5 Canvas, and a serverless backend connecting to a custom-tuned LLM.

Client-Side Canvas Preprocessing:
Before sending anything to the server, the user pastes a screenshot. I use HTML5 Canvas to convert the image to grayscale, increase contrast, and invert colors. This simple step improved OCR recognition accuracy by over 40%.
Vision-to-Text Processing:
The processed image is sent to an edge API route. We run it through a vision model optimized for simplified Chinese characters.
Context-Aware LLM Translation:
Instead of piping raw text directly to a translation API, we pass the OCR output to an LLM with a specialized system prompt. The prompt includes a mapping dictionary of Chinese WoW terminology, dungeon names, and raid slang.

Optimizing for Speed and Cost

Querying LLMs for every single screenshot gets expensive fast. To keep the site fast and free:

Redis Caching: Exact matches of OCR strings bypass the LLM entirely and fetch cached translation mappings.
Debounced Requests: On the frontend, user inputs are strictly debounced to prevent API abuse during rapid pasting.

The result is a clean, instant translator helper that turns Chinese raid listings into readable English in under 800ms. I'm currently looking to open-source the translation mapping dictionary. What do you think about this hybrid OCR setup? Drop your architectural suggestions below!