Day 12. WhatsApp is fully automatable. Banking apps are invisible. The reason is accessibility.
For 11 days, I've been building an AI agent that can see. UI trees. OCR. Template matching. I've benchmarked it. Optimised it. Proved it works.
But today I discovered something that changed how I think about this entire project.
My agent can control WhatsApp perfectly. Every button has a label. Every icon has a content description. The send button says "Send message." The back button says "Back." The search button says "Search." These are accessibility labels—designed for screen readers used by blind and visually impaired people.
WhatsApp invested in accessibility. And that investment accidentally makes their app fully automatable by my agent.
Then I tested a local banking app. And another. And another.
None of them had accessibility labels. The login button? content-desc="". The password field? content-desc="". The transfer button? Generic class name. Nothing identifiable. My UI tree approach was useless. The agent was blind.
This is bigger than my project. The same neglect that locks out visually impaired users also locks out AI agents. When developers skip accessibility, both humans and machines suffer.
For the apps that invested in accessibility, my agent works flawlessly. For the ones that didn't, I'm back to OCR and template matching—slower, less reliable, and device-dependent.
I'm now building a list of "agent-friendly" apps and "agent-hostile" apps. The pattern is clear: well-funded, global apps (WhatsApp, Google apps, Slack) are accessible. Local apps, especially banking and government apps, are not.
This project started as an automation tool. It's becoming an accessibility audit.
Next: Building fallback strategies for unlabeled elements.
👉 github.com/Dexter2344/phone-agent
Top comments (0)