DEV Community

Okeke Chukwudubem
Okeke Chukwudubem

Posted on

Project Log #4: The AI Phone Agent Just Completed Its First Full Task

Day 4. Fuzzy text matching works. Verification layer is live. The agent sent a real WhatsApp message.

Three days ago, this project was just an idea. Today, it did something real.

The Milestone

I gave the agent a command: "Open WhatsApp and send a message to Mom saying I'll call later."

It opened WhatsApp. It scanned the screen. It found "Mom" in the contact list. It tapped. It typed the message. It hit send.

All offline. All on a phone. No cloud. No API keys.

The Repo

github.com/Dexter2344/phone-agent

agent.py now includes the verification layer. vision.py has the fuzzy matching logic.

Today's Progress

Task Status
Added fuzzy text matching for OCR errors ✅ Done
Wrote the verification layer ✅ Done
Tested full 3-step task: open → find → send ✅ Success
Updated agent.py with verification logic ✅ Done
Added vision.py fuzzy matching module ✅ Done

The Two Big Fixes

1. Fuzzy Text Matching

OCR was misreading names. "Mom" became "Morn" or "M0m." I added a fuzzy matching function using Levenshtein distance. Now if the agent is looking for "Mom" and OCR returns "Morn," it calculates how close the strings are and accepts matches above an 80% similarity threshold.

2. Verification Layer

The verification layer takes a screenshot after each action and checks: Did the expected app open? Did the expected text appear on screen? Is the next UI element visible? If verification fails, the agent retries once. If it fails again, it stops and reports what went wrong.

What's Next (Day 5)

  • Add basic image recognition for icon-based UI elements
  • Write a recovery handler for unexpected interruptions
  • Test more complex commands

This is Day 4. The agent is no longer a prototype. It's a working system.

Top comments (0)