ithiria894

Posted on Apr 6 • Edited on Apr 16 • Originally published at github.com

PokeClaw: The First Open-Source App That Controls Your Android Phone Offline

#android #ai #opensource #showdev

Your phone has more compute than most laptops from 10 years ago. But every AI assistant on it still works like this:

Your phone → Internet → Cloud API → Internet → Your phone

PokeClaw skips all of that:

Your phone → LLM → Your phone

No internet. No API key. No bill. The AI runs on your phone and controls your phone.

What is PokeClaw

PokeClaw (PocketClaw) is an open-source Android app. It runs Gemma 4 locally on your device and uses it to operate your phone autonomously. The model reads the screen, picks an action, executes it, reads the result, and keeps going until the task is done.

It is not a chatbot. It is a closed-loop agent that taps buttons, types text, opens apps, and navigates your phone the way you would.

Local mode is fully offline. No account, no API key, no data leaving your device. Cloud mode is optional if you want a smarter model for harder tasks.

What it can actually do

Not roadmap items. Real flows, tested end-to-end on real phones.

Offline, on-device:

Monitor a WhatsApp contact and auto-reply. The AI reads the conversation, understands context, and replies. All on your phone.
Summarize your notifications
Check battery, storage, Bluetooth, installed apps
Run quick tasks from cards and get results in chat

With a cloud API key (optional):

"Send hi to Mom on WhatsApp" — opens WhatsApp, finds the contact, types, sends, comes back
"Search for funny cat videos on YouTube" — opens YouTube, finds search bar, types, searches
"Draft an email saying I'll be late" — opens mail, fills the composer
Check Twitter trends, install apps from Play Store, search Reddit

The auto-reply is the most interesting one. It opens the chat, reads all visible messages on screen, generates a context-aware reply, sends it, and returns. Not "AI magic" — a concrete sequence of tools chained together.

How it works (short version)

PokeClaw gives the model more than 20 generic tools: tap, swipe, input_text, open_app, send_message, auto_reply, get_screen_info, and more. The model picks the right tool, fills parameters, executes. No per-app configuration. It reads the screen and acts.

Two design choices that matter:

Accessibility tree, not screenshots. PokeClaw reads the actual UI elements Android provides. It knows what is tappable, what text is on screen, what each button does. No vision model needed, works offline, and more reliable than pixel matching.

3-tier pipeline. Not everything needs the full AI. Simple commands (call, alarm) execute instantly with zero LLM calls. Tasks that match a known skill run a deterministic sequence. Only genuinely complex tasks hit the full agent loop. This is how you get reliable automation from a small local model on a phone.

On top of generic tools, PokeClaw has skills — reusable workflows built from the same tools. The tools are building blocks. The skills are recipes. We are designing this to be extensible so anyone can write new skills as simple text files.

The story

Gemma 4 launched with native tool calling on LiteRT-LM. I wanted to know if a phone could become a real agent, not just another chatbot. Two all-nighters later, v0.1.0 shipped.

In its first 4 days, PokeClaw shipped 5 releases, hit 411 GitHub stars, picked up 55 forks, and drew 175 comments on r/LocalLLaMA. People started testing on Samsung, Xiaomi, Pixel, OnePlus, and MediaTek phones. Bug reports came in. Security issues got reported and fixed within hours. Someone in India made a YouTube video about it.

The roadmap is not a product manager's spreadsheet. It is built directly from real device reports filed by people running the app on their actual phones.

I built this solo with Claude Code. I am a CS dropout with zero Android experience before this project. The future is genuinely wild.

The honest limits

I am not going to pretend this is a polished consumer product.

CPU-only phones take ~45 seconds for model warmup. Flagships with GPU are much faster.
Accessibility-based automation is powerful but Android OEM behavior is inconsistent.
Samsung flags the sideloaded APK as malware. False positive. The entire codebase is open source.
Small local models struggle with complex multi-step reasoning. That is why cloud mode exists.

It works. It is getting better every day. But it is honest about what it cannot do yet.

What is next

All from real user requests:

Smaller local models for lower-end phones
Custom model import (bring your own .litertlm or HuggingFace URL)
Google AI Core API integration
More skills beyond WhatsApp workflows
Broader device compatibility
F-Droid distribution

Try it

👉 Try the interactive demo — click through every screen without installing. Available in English, Hindi, Japanese, German, and Traditional Chinese.

The APK is on the GitHub releases page.

Install the APK
Grant Accessibility permission
Grant Notification Access if you want background monitoring
Model downloads on first local launch (~2.6 GB)
Chat or Task mode, your choice

No account needed. No API key needed. Cloud is optional.

About me

I'm Nicole (ithiria894). Data science background, moved into backend, got obsessed with the idea that AI should run where your data lives instead of shipping everything to someone else's server. PokeClaw is that idea applied to the phone in your pocket.

⭐ Star the repo if you think your phone should work without someone else's server.

agents-io / PokeClaw

PokeClaw (PocketClaw) — first on-device AI that controls your Android phone. Gemma 4, no cloud, no API key. Poke is short for Pocket.

🌐 Landing Page — available in English · हिन्दी · 日本語 · Deutsch · 繁中

PokeClaw (PocketClaw) — On-Device AI Phone Agent

PokeClaw, also known as PocketClaw, is an open-source Android app for AI phone automation.

It can run Gemma 4 on-device for local, private phone control, and it also supports optional cloud models when you want stronger reasoning for harder tasks.

The current public build is a local-first prototype for turning an Android phone into an AI-operated device.

In Local mode, model execution stays inside your device. No account or API key is required for Local mode.

Everyone else:  Phone → Internet → Cloud API → Internet → Phone
                       💳Credit card needed, API key required. Monthly bill attached
PokeClaw local: Phone → LLM → Phone
                       Local-first when you want it. Optional cloud when you need it.

AI can control your phone, with local-first execution and optional cloud…

View on GitHub

Top comments (5)

Henrik Solheim • Apr 6

It’s great to see that you’re working on such an awesome idea. I’ll definitely try it. But I’m not sure whether it will run on a low Snapdragon Samsung Galaxy Note 10 Plus. If it does, that would be really great for me. I have an iPhone 13 also

ithiria894 • Apr 6

Note 10 Plus has 12GB RAM so the model will fit fine. Snapdragon 855 doesn't have a dedicated ML accelerator so warmup will be slower, but it should work. Try it and let me know how it goes!

iPhone 13 unfortunately not possible. Apple doesn't allow any app to read the screen or control other apps. On iOS you're stuck with Siri and there's no way around that. Hopefully Apple opens this up someday but I wouldn't count on it.

github.com/agents-io/PokeClaw

GetagentId • Apr 6

this is absolutely cool will you be able to do ios ?

ithiria894 • Apr 6

No, iOS is a no-go. Apple doesn't let any third-party app control other apps through Accessibility. On iPhone your only option is Siri and you're stuck waiting for Apple to make it smarter. Android is the only platform where this approach is even possible.

github.com/agents-io/PokeClaw

Some comments may only be visible to logged-in visitors. Sign in to view all comments.