What is OLTranslator?
It is a Windows application that uses OCR to read foreign text on your screen and overlays Japanese translations in real-time.
Simply drag to select the area you want to translate, and the app automatically repeats the OCR and translation process. It works with anything that displays foreign languages on your screen, such as games, live streams, or documents.
Development took about two weeks using VS Code and GitHub Copilot. I followed a style where I determined the design and direction, and left the implementation to the AI.
Key Features
- Local OCR — Recognizes text on the screen without requiring an internet connection.
- Included Local Translation — Ready to translate right after installation. No API key configuration is required.
- Online Translation Support — Compatible with Google Cloud Translation, DeepL, Azure Translator, and Amazon Translate.
- Overlay Display — Translation results are displayed overlaid on the original text positions. Clicks pass through, so it won't interfere with your operations.
- English and Korean Support — You can switch the source language.
- Resident Application — Stays in the system tray and remembers your selected regions. Automatically resumes the next time you launch it.
How to Use
- Launch the application.
- Drag to select the area of the screen you want to translate.
- OCR and translation will start automatically, and the results will be displayed as an overlay.
System Requirements
- Windows 10 / 11 (64-bit)
- x64 processor (AVX2 support recommended)
Challenges in Development
The most difficult part was handling the text picked up by OCR.
Characters on the screen are recognized piece by piece line by line, meaning a single sentence is often split into multiple blocks. If you don't connect these correctly, the translation becomes garbled. Conversely, if you connect them too aggressively, different sentences get mixed up.
For example, separate texts lined up side-by-side in a game's UI might be combined into a single sentence, or a single sentence might be split into three lines, translated separately, and become meaningless. Judging purely by coordinate proximity leads to incorrect merging, while making it stricter leads to fragmentation. I spent a significant amount of time adjusting these thresholds.
Another challenge was avoiding the translation of content that shouldn't be translated—lines containing only numbers, symbols, or UI labels. Sending such noise to a translation engine wastes API tokens and degrades the results. Especially with online translation, where each request incurs a cost, reducing noise directly impacts both accuracy and cost. Developing filtering rules for what to translate and what to skip was surprisingly tedious work.
Focus Points
I made it easy to try out various translation services. Local translation is included so you can use it right away, and you can switch between Google, DeepL, Azure, and Amazon online translation services simply by entering your API keys. I wanted users to be able to actually compare which service works best for them.
Download
OLTranslator — Real-time Screen Translation App (BOOTH)
Related Articles
Following this, I created an audio version called LiveTR. That one took four days using Claude Code. I wrote about the differences in experience due to development environments in "Copilot → Cursor → Claude Code for VSC. My Journey to the Destination".
Top comments (0)