DEV Community

joe wang
joe wang

Posted on

Optimizing OCR Performance on Mobile: From 5 Seconds to Under 1 Second

OCR on mobile needs to be fast. Users expect results in under 2 seconds. When I started building Screen Translator, our initial OCR pipeline took 4-5 seconds per screen capture. That's an eternity when you're trying to read a game menu or translate a chat message in real time.

Here's how we got it down to under 1 second on modern devices.

The Bottlenecks

Before optimizing, we profiled the pipeline:

  1. Screen capture: ~200ms (MediaProjection API)
  2. Image preprocessing: ~800ms 😱
  3. OCR inference: ~2500ms 😱😱
  4. Translation API call: ~500ms
  5. UI rendering: ~100ms

Total: ~4100ms. Steps 2 and 3 were the obvious targets.

Optimization 1: Smart Image Downscaling

The biggest win came from not feeding full-resolution screenshots to the OCR engine.

fun optimizeForOCR(bitmap: Bitmap): Bitmap {
    val maxDimension = 1280 // Sweet spot for accuracy vs speed
    val scale = minOf(
        maxDimension.toFloat() / bitmap.width,
        maxDimension.toFloat() / bitmap.height,
        1f // Don't upscale
    )

    if (scale >= 1f) return bitmap

    return Bitmap.createScaledBitmap(
        bitmap,
        (bitmap.width * scale).toInt(),
        (bitmap.height * scale).toInt(),
        true // Bilinear filtering
    )
}
Enter fullscreen mode Exit fullscreen mode

A 2400x1080 screenshot scaled to 1280x576 processes 3x faster with negligible accuracy loss for screen text.

Result: Image preprocessing dropped from 800ms to 250ms.

Optimization 2: Region of Interest (ROI) Detection

Why OCR the entire screen when the user only cares about a specific area?

fun detectTextRegions(bitmap: Bitmap): List<Rect> {
    // Convert to grayscale
    val gray = toGrayscale(bitmap)

    // Apply adaptive threshold
    val binary = adaptiveThreshold(gray)

    // Find contours and merge nearby text blocks
    val contours = findContours(binary)
    return mergeNearbyContours(contours, mergeDistance = 20)
}
Enter fullscreen mode Exit fullscreen mode

By detecting text regions first (which is fast — ~50ms), we only run the expensive OCR on areas that actually contain text. For a typical app screen, this means processing 30-40% of the image instead of 100%.

Result: OCR inference dropped from 2500ms to ~800ms.

Optimization 3: ML Kit On-Device vs Cloud

We use Google ML Kit's on-device text recognition as the default. It's free, fast, and works offline. For CJK languages (Chinese, Japanese, Korean), we use the V2 API which has significantly better accuracy.

val recognizer = TextRecognition.getClient(
    when (scriptType) {
        ScriptType.LATIN -> TextRecognizerOptions.DEFAULT_OPTIONS
        ScriptType.CJK -> ChineseTextRecognizerOptions.Builder().build()
        ScriptType.KOREAN -> KoreanTextRecognizerOptions.Builder().build()
        ScriptType.JAPANESE -> JapaneseTextRecognizerOptions.Builder().build()
        ScriptType.DEVANAGARI -> DevanagariTextRecognizerOptions.Builder().build()
    }
)
Enter fullscreen mode Exit fullscreen mode

The key insight: choose the right recognizer upfront. Running the Latin recognizer on Japanese text wastes time and gives garbage results. We detect the likely script from user settings and previous results.

Optimization 4: Background Threading with Coroutines

Never block the main thread. We use Kotlin coroutines with a dedicated dispatcher:

private val ocrDispatcher = Dispatchers.Default.limitedParallelism(2)

suspend fun processScreen(): TranslationResult = withContext(ocrDispatcher) {
    val capture = captureScreen()           // ~200ms
    val optimized = optimizeForOCR(capture)  // ~50ms
    val regions = detectTextRegions(optimized) // ~50ms

    // Process regions in parallel
    val results = regions.map { region ->
        async {
            val cropped = cropRegion(optimized, region)
            recognizeText(cropped)
        }
    }.awaitAll()

    // Translate in batch
    translateBatch(results)                  // ~400ms
}
Enter fullscreen mode Exit fullscreen mode

Processing multiple text regions in parallel on multi-core devices gives us another 20-30% speedup.

Optimization 5: Caching

If the screen hasn't changed much, don't re-OCR everything.

class OCRCache(private val maxSize: Int = 50) {
    private val cache = LruCache<Long, OCRResult>(maxSize)

    fun getOrProcess(bitmap: Bitmap, process: () -> OCRResult): OCRResult {
        val hash = computePerceptualHash(bitmap)
        cache.get(hash)?.let { return it }

        return process().also { cache.put(hash, it) }
    }

    private fun computePerceptualHash(bitmap: Bitmap): Long {
        // Downscale to 8x8, convert to grayscale, compute average
        // Compare each pixel to average -> 64-bit hash
        val small = Bitmap.createScaledBitmap(bitmap, 8, 8, true)
        // ... hash computation
    }
}
Enter fullscreen mode Exit fullscreen mode

Perceptual hashing means slightly different screenshots (e.g., a blinking cursor) still hit the cache.

Result: Repeated translations are instant (~10ms).

Final Numbers

After all optimizations on a mid-range device (Snapdragon 695):

Step Before After
Screen capture 200ms 200ms
Image preprocessing 800ms 50ms
ROI detection N/A 50ms
OCR inference 2500ms 400ms
Translation 500ms 400ms
UI rendering 100ms 50ms
Total 4100ms ~800ms

On flagship devices (Snapdragon 8 Gen 3), we're seeing 400-500ms total.

Key Takeaways

  1. Profile first — don't guess where the bottleneck is
  2. Downscale aggressively — screen text is high contrast, OCR handles lower resolution well
  3. ROI detection is cheap and saves massive OCR time
  4. Choose the right ML model for the script type
  5. Cache everything — screens don't change that often
  6. Parallelize where possible with coroutines

These techniques aren't specific to our app. If you're building anything with on-device OCR, these patterns will help.

If you want to see these optimizations in action, check out Screen Translator on Google Play.


What OCR performance challenges have you faced on mobile? Drop your experiences in the comments.

Top comments (0)