Last week I wondered: can you run a real AI agent on a smartwatch? Not a remote control for your phone. Not a web view. An actual agent runtime, processing locally, talking to you through the speaker.
Turns out you can. I built ClawWatch and it works.
The problem
Every "AI on a watch" demo I have seen is just a thin client. Your voice goes to the cloud for transcription, the cloud calls the model, the cloud sends back audio. The watch is a microphone with a screen.
I wanted the opposite: run as much as possible on the watch itself.
What runs on the watch
The install is 2.8 MB:
NullClaw handles agent logic. It is a Zig binary, statically compiled for ARM. Uses about 1 MB of RAM. Starts in under 8 ms.
Vosk does speech-to-text entirely on-device. No Google, no cloud STT.
Android TextToSpeech speaks the response. Pre-installed, zero added cost.
SQLite stores conversation memories locally.
The only thing that leaves the watch is a single HTTPS call to the LLM API. I use Claude, but NullClaw supports 22+ providers so you can point it at whatever you want.
Why NullClaw instead of a normal runtime
A Galaxy Watch has 1.5 to 2 GB of RAM. Most agent frameworks would eat all of it. NullClaw is written in Zig and compiles to a static binary with no dependencies. It does not need Node.js, Python, or a JVM. It just runs.
I cross-compiled it with one command:
zig build -Dtarget=arm-linux-musleabihf -Doptimize=ReleaseSmall
Drop the binary into the Android app, call it via ProcessBuilder. Done.
How it works
[tap mic] -> Vosk STT (on-device) -> NullClaw agent -> LLM API -> Android TTS -> [watch speaks]
Tap the mic button. Speak your question. The watch transcribes locally, sends the text through NullClaw to the LLM, and speaks the answer back. Tap again to interrupt at any point.
What I learned
ARM ABI matters. I built for aarch64 first, but my watch needed 32-bit ARM. Check your target with adb shell getprop ro.product.cpu.abi before building.
Voice agents need different prompts. The system prompt says: no markdown, no lists, 1-3 sentences max. The user hears the response spoken aloud. Nobody wants to listen to a bullet list.
TTS duration is hard to predict. I started with a heuristic (character count times 55ms). Switched to Android's UtteranceProgressListener for the actual finish event.
On-device STT is good enough. Vosk's small English model is 68 MB and handles conversational speech well. Not perfect, but the LLM is forgiving of transcription errors.
Try it
The code is open source (AGPL-3.0): https://github.com/ThinkOffApp/ClawWatch
You need a Galaxy Watch 4 or newer, a Mac or Linux machine for building, and an API key for whatever LLM provider you choose. Would love to hear your feedback and discuss further development.
I think we are going to see more agents running on edge devices. The runtimes are getting smaller, the hardware is getting better. A 2.8 MB agent on your wrist is just the start.
Is the next step SmartRings, an agent on every finger? :D
Top comments (0)