We just open-sourced the firmware for Aiden — a physical AI agent device that operates the phone you already have. Here's how it drives any app without an automation API, and why we bet on hardware instead of an app.
The problem with "AI agents" today
Most agents can reason brilliantly and then stall at the last step: actually doing the thing. The moment you want one to operate a real app, you hit the wall — it can only control what that app chooses to expose through an API, SDK, or accessibility tree. The apps people actually live in often expose nothing, and never will.
So you're left with agents that are, functionally, very expensive chatbots.
The approach: operate the device like a human does
Aiden skips the integration layer entirely. It watches the target device's screen over HDMI capture and sends keyboard, pointer, and touch input over USB HID — the same channels a human uses. No app on the target. No jailbreak. No ADB or developer mode. (iOS needs AssistiveTouch switched on.)
Because it works at the display + input layer, it doesn't care whether an app has an API. If you can see it and tap it, Aiden can operate it.
How the loop works
Target screen → HDMI → TC358743 (HDMI-to-CSI) → /dev/video0
→ frame service → screenshot → Go agent
→ multimodal model (you choose) → next action
→ HID reports → /dev/hidg0 + /dev/hidg1 → target input
The device-side Go agent grabs a screenshot, sends it to a multimodal model you configure, decides the next action, and writes the input back over the USB HID gadget. Voice runs on-board: hardware VAD at sub-100ms latency, wake-word-free, with streaming STT/TTS through providers you set.
Why this matters: open and private by design
Bring your own model. OpenAI, Anthropic, or a fully local LLM — your call.
No Aiden backend. Screenshots, audio, and text only go to the endpoints you configure. We never see your screen or your conversations.
Self-hostable and auditable. Point everything at your own infrastructure; the firmware (C++ services + Go agent) is AGPL and open to scrutiny.
Your data stays yours. Memory and learned skills are exportable and portable.
Why hardware, not an app
An app can only ever control what other apps permit. A piece of hardware sitting at the screen-and-input layer can operate everything — including the apps that will never build you an integration. That's the whole bet. The board is powered straight off the phone's USB-C port today; future revisions are aimed at credit-card-sized and magnetically attaching to the back of a phone.
Where it's at — honestly
This is the development-board firmware, not a finished consumer product. It's the working core: capture, agent, HID control, voice, OTA, tests, benchmarks. We're building it in the open and would rather share the real thing early than a polished promise.
If the capture + HID approach interests you, the repo has wiring, flashing, and a newcomer quickstart. Contributions and hard questions both welcome.
Top comments (0)