DEV Community

Okeke Chukwudubem
Okeke Chukwudubem

Posted on

Project Log #10: I'm Ditching Screenshots. Here's Why.

Day 10. OCR and template matching hit their limits. UI hierarchy inspection might be the real answer.

Nine days ago, I was proud of my screenshot-based vision system. ML Kit for text. Template matching for icons. A clever fallback chain that worked most of the time.

Today, I'm ripping most of it out.

The Breaking Point

Last week, I tested the agent on a friend's phone. Template matching failed. The same icons I cropped on my device didn't match on his—different screen density, different rendering, different pixel arrangement.

I explored building a multi-resolution icon library. Crop every icon at 5 different DPIs? That's tedious. I explored AI-based icon detection. Train a model to recognize buttons by shape? That's heavy for a phone CPU.

Then I remembered something. Android already knows what's on the screen. It has to—it's rendering the UI. And there's a way to read that information directly.

Enter UI Hierarchy Inspection

ADB has a command called uiautomator dump. It spits out an XML file containing every visible UI element on the screen—buttons, text fields, icons, images, everything. Each element has:

  • A class name (e.g., android.widget.Button, android.widget.ImageView)
  • Bounds (exact pixel coordinates of where it sits)
  • Text (if it has any)
  • A content description (accessibility label, often used for icons)
  • Whether it's clickable, scrollable, focused

This is not a screenshot. This is the app's internal blueprint.

Why This Changes Everything

Screenshot-Based (Old Way) UI Tree (New Way)
Run OCR on a screenshot (1.5–2s) Run one ADB command (0.5–1s)
If text not found, try template matching (2–4s) Not needed. Icons have content descriptions.
Accuracy depends on screen resolution and DPI Accuracy is 100%—the OS tells you exactly where things are
Breaks on different devices Works across all devices. Same XML structure.
Can't detect icons without reference images Icons are in the tree with coordinates

The First Experiment

I ran adb shell uiautomator dump on my phone, then pulled the XML file. I searched for "send." Here's a snippet of what I found:


xml
<node
  class="android.widget.ImageButton"
  content-desc="Send message"
  bounds="[924,1656][1020,1752]"
  clickable="true"
  package="com.whatsapp" />
Enter fullscreen mode Exit fullscreen mode

Top comments (0)