Day 10. OCR and template matching hit their limits. UI hierarchy inspection might be the real answer.
Nine days ago, I was proud of my screenshot-based vision system. ML Kit for text. Template matching for icons. A clever fallback chain that worked most of the time.
Today, I'm ripping most of it out.
The Breaking Point
Last week, I tested the agent on a friend's phone. Template matching failed. The same icons I cropped on my device didn't match on his—different screen density, different rendering, different pixel arrangement.
I explored building a multi-resolution icon library. Crop every icon at 5 different DPIs? That's tedious. I explored AI-based icon detection. Train a model to recognize buttons by shape? That's heavy for a phone CPU.
Then I remembered something. Android already knows what's on the screen. It has to—it's rendering the UI. And there's a way to read that information directly.
Enter UI Hierarchy Inspection
ADB has a command called uiautomator dump. It spits out an XML file containing every visible UI element on the screen—buttons, text fields, icons, images, everything. Each element has:
- A class name (e.g.,
android.widget.Button,android.widget.ImageView) - Bounds (exact pixel coordinates of where it sits)
- Text (if it has any)
- A content description (accessibility label, often used for icons)
- Whether it's clickable, scrollable, focused
This is not a screenshot. This is the app's internal blueprint.
Why This Changes Everything
| Screenshot-Based (Old Way) | UI Tree (New Way) |
|---|---|
| Run OCR on a screenshot (1.5–2s) | Run one ADB command (0.5–1s) |
| If text not found, try template matching (2–4s) | Not needed. Icons have content descriptions. |
| Accuracy depends on screen resolution and DPI | Accuracy is 100%—the OS tells you exactly where things are |
| Breaks on different devices | Works across all devices. Same XML structure. |
| Can't detect icons without reference images | Icons are in the tree with coordinates |
The First Experiment
I ran adb shell uiautomator dump on my phone, then pulled the XML file. I searched for "send." Here's a snippet of what I found:
xml
<node
class="android.widget.ImageButton"
content-desc="Send message"
bounds="[924,1656][1020,1752]"
clickable="true"
package="com.whatsapp" />
Top comments (0)