joe wang

Posted on Feb 20

How I Built a Floating Bubble OCR Translator for Android — Lessons Learned

#android #showdev

As a solo Android developer, I spent the last few months building a floating bubble OCR translator. The idea was simple: tap a bubble on your screen, select any text area, and get an instant translation — without leaving whatever app you're in.

Here's what I learned along the way, and some technical challenges that might help if you're building something similar.

Why a Floating Bubble?

Most translation apps require you to switch contexts. Copy text, open the translator, paste, read the result, switch back. For use cases like reading manga, playing foreign-language games, or chatting on international messaging apps, this flow is painfully slow.

A floating bubble overlay stays on top of everything. One tap, drag to select, instant result. The UX difference is massive.

The Technical Stack

Language: Kotlin
OCR Engine: ML Kit for on-device text recognition
Translation: Google ML Kit Translation API (on-device models)
Overlay: Android's SYSTEM_ALERT_WINDOW permission
Screen Capture: MediaProjection API

Challenge 1: Getting the Overlay Right

Android's overlay permission (SYSTEM_ALERT_WINDOW) is one of those things that sounds simple but has a ton of edge cases:

On Android 10+, you need to explicitly request the permission via Settings.ACTION_MANAGE_OVERLAY_PERMISSION
Some OEMs (looking at you, Xiaomi and OPPO) have additional overlay restrictions
The bubble needs to be draggable but also respond to taps — handling the touch event delegation correctly took more iterations than I'd like to admit

The key insight: use WindowManager.LayoutParams.FLAG_NOT_FOCUSABLE for the bubble itself, but switch to a focusable window when the selection overlay is active.

Challenge 2: OCR on CJK Languages

ML Kit's text recognition works great for Latin scripts out of the box. But for Japanese, Chinese, and Korean — which are the primary use cases for screen translation — you need the CJK-specific models.

Some gotchas:

Vertical text: Japanese manga is written vertically. ML Kit handles this, but you need to configure the recognizer for Japanese specifically
Mixed scripts: Manga often mixes kanji, hiragana, katakana, and sometimes romaji in the same panel
Small text: OCR accuracy drops significantly with small text. I added a zoom hint in the UI to encourage users to zoom in before capturing

Challenge 3: Screen Capture Performance

Using MediaProjection to capture the screen is straightforward, but performance matters:

// Capture only the selected region, not the full screen
val bitmap = Bitmap.createBitmap(
    fullScreenBitmap,
    selectionRect.left,
    selectionRect.top,
    selectionRect.width(),
    selectionRect.height()
)

Cropping to just the selected area before running OCR makes a huge difference in processing time. On a mid-range phone, full-screen OCR takes 800ms+, but a cropped manga panel takes ~200ms.

Challenge 4: Translation Quality

On-device translation models are convenient (no API costs, works offline), but the quality varies. For Japanese → English, the results are "good enough" for understanding context, but not publication-quality.

I found that keeping the source text visible alongside the translation helps users who know some of the source language fill in the gaps.

The Result

The app is called Screen Translator and it's live on Google Play:

👉 Screen Translator on Google Play

Main use cases people are finding:

Reading raw manga without waiting for fan translations
Playing Japanese/Korean/Chinese mobile games
Translating chat messages in foreign-language messaging apps
Reading foreign social media posts

What I'd Do Differently

Start with CJK support from day one — I initially built for Latin scripts and retrofitted CJK support. Should have been the other way around given the target audience.
Battery optimization earlier — Screen capture + OCR + translation is battery-hungry. I should have implemented smart capture intervals from the start.
User onboarding — The overlay permission flow confuses a lot of users. A step-by-step tutorial on first launch would have saved me a lot of support emails.

Wrapping Up

Building an overlay-based Android app is a unique challenge. You're essentially building a mini-app that lives on top of the entire OS. The permission model, touch handling, and performance constraints are all different from a standard app.

If you're thinking about building something similar, feel free to ask questions in the comments. Happy to share more technical details about any specific part of the implementation.

I'm a solo dev building tools that help people break language barriers on mobile. If you find this interesting, check out the app and let me know what you think!

DEV Community