DEV Community

Okeke Chukwudubem
Okeke Chukwudubem

Posted on

Project Log #2: The AI Phone Agent Has a Repo

Yesterday I announced I'm building an autonomous AI agent that controls a phone. Today, it has a home on GitHub.

The Repo

github.com/Dexter2344/phone-wgent

Right now it's a single Python script. But that script already does three things:

  1. Talks to Gemma 4 via Ollama's local API. No cloud.
  2. Executes ADB commands — opens apps, taps, types text.
  3. Parses natural language commands into structured steps.

Today's Progress

Task Status
Created the repo ✅ Done
Wrote the core agent.py script ✅ Done
Added README with full project overview ✅ Done
Tested Ollama connection ✅ Working
Tested ADB connection ✅ Working
Parsed a test command into JSON steps 🔧 In progress

What the Script Does Right Now

You give it a command like "Open WhatsApp." It sends that to Gemma 4, which breaks it into a step-by-step plan. Then the script executes those steps via ADB.

It's basic. It's fragile. But the foundation is there.

What's Next (Day 3)

  • Add screen text detection using Tesseract OCR
  • Write the verification layer that checks if each step succeeded
  • Test a full 3-step task: open app → find element → tap

Why This Matters

Most AI agents live in the cloud. This one lives on a phone. No internet. No API keys. No data leaving the device.

I'm documenting every step publicly. If you're curious about building AI agents, building from a phone, or just watching someone figure it out in real time—follow along.

Top comments (0)