I spent the last few weeks building an AI agent that controls my Android phone from my laptop.
What it does:
- Send WhatsApp messages: "Tell Mum I'm on my way"
- Search YouTube: "Find videos about AI agents"
- Take photos, write notes, open any app
- Multi-step: "Open YouTube and search for AI, then save to notes"
- Voice control with wake word
How it works:
- Multi-agent system (Researcher, Router, Planner, Judge, Teacher)
- Reads phone screen in real-time
- Knowledge graph for relationships ("message my brother" → finds correct contact)
- Self-learning from every action
- Safety approval levels
Tech stack: Python, ADB, Ollama, Whisper, Piper, NetworkX, SQLite
Everything runs locally. No cloud. No tokens. No data leaves the device.
Happy to answer questions about the architecture or implementation.
Top comments (0)