Project Log #2: The AI Phone Agent Has a Repo

#ai #android #automation #gemma

Yesterday I announced I'm building an autonomous AI agent that controls a phone. Today, it has a home on GitHub.

The Repo

github.com/Dexter2344/phone-wgent

Right now it's a single Python script. But that script already does three things:

Talks to Gemma 4 via Ollama's local API. No cloud.
Executes ADB commands — opens apps, taps, types text.
Parses natural language commands into structured steps.

Today's Progress

Task	Status
Created the repo	✅ Done
Wrote the core `agent.py` script	✅ Done
Added README with full project overview	✅ Done
Tested Ollama connection	✅ Working
Tested ADB connection	✅ Working
Parsed a test command into JSON steps	🔧 In progress

What the Script Does Right Now

You give it a command like "Open WhatsApp." It sends that to Gemma 4, which breaks it into a step-by-step plan. Then the script executes those steps via ADB.

It's basic. It's fragile. But the foundation is there.

What's Next (Day 3)

Add screen text detection using Tesseract OCR
Write the verification layer that checks if each step succeeded
Test a full 3-step task: open app → find element → tap

Why This Matters

Most AI agents live in the cloud. This one lives on a phone. No internet. No API keys. No data leaving the device.

I'm documenting every step publicly. If you're curious about building AI agents, building from a phone, or just watching someone figure it out in real time—follow along.

DEV Community

Project Log #2: The AI Phone Agent Has a Repo

Top comments (0)