Ondrej Machala

Posted on Mar 28 • Edited on Apr 20

I Stopped Typing to My AI Agent. I Talk to It Through Telegram Now.

#productivity #selfhosted #ios #ai

I have NanoClaw connected to my Telegram. Throughout the day I send it things. Translate this. Summarise that article. What time is it in Tokyo. Draft a reply to this message. It responds in the same thread, without me leaving the app.

It runs on my own machine, inside a container. The agent only has access to what you explicitly give it. Setup took about fifteen minutes: clone the repo, run Claude Code, type /setup. NanoClaw also bridges WhatsApp, Slack, Discord and Gmail, but Telegram is where I live.

What I didn't anticipate: I was still typing everything. Long questions. Multi-sentence requests. Context I had to spell out carefully. The assistant was right there in Telegram, but slow input was still slowing me down.

Where Diction comes in

I built Diction to fix exactly this. It's an iOS keyboard extension. In Telegram, you switch to it, tap the mic, speak, and the text appears in the compose field. Send it like any message.

I dictate to NanoClaw now. Things that would take a minute to type take ten seconds to say. NanoClaw gets the same message either way.

What I actually dictate

A normal day, real things I send:

"Draft a friendly but firm reply to this email. Keep it under three paragraphs and don't agree to the deadline."
"Translate the attached PDF from German to English. Just the first page for now."
"Summarise this article in five bullets. Strip the opinion parts, I only want the facts."
"What's 2,450 euros a month over 30 years at 4.5% interest?"
"I'm meeting with a lead in an hour. Draft three opening questions based on what we know about their company."

Each of those would take 30–60 seconds to type cleanly. Dictated, they're done in 10–15. Over a day the compounding is real.

A real loop, start to finish

Yesterday afternoon, waiting for a train, phone in one hand:

Me (into Diction): "Pull up my last three Home Assistant automations and summarise what each one does. One line each."

NanoClaw, a few seconds later: three numbered bullets with the automations and a one-line description of what each triggers on.

Me (into Diction again): "The first one fires too often. Suggest two ways to throttle it without losing the alerts I actually care about."

NanoClaw: a short reply with two concrete throttling strategies, the YAML shape for each, and the trade-off between them.

Total hands-on time: about thirty seconds. Zero typing. The whole exchange happened in a normal Telegram thread I could scroll back through later — no separate app, no cloud transcription, no context lost between turns.

Why on-device mode specifically

NanoClaw is self-hosted for a reason: you don't want a cloud provider logging every private request you make to your agent. But if your dictation tool sends every spoken prompt to its cloud first, you've reintroduced the exact problem NanoClaw exists to solve.

Diction's on-device mode runs Whisper locally on your iPhone. Nothing leaves the device. The spoken prompt is transcribed on the phone, the text lands in Telegram, and only then does it travel to your NanoClaw container — over whatever transport you chose. For a setup where the whole point is controlling where your data goes, the voice layer has to match.

Honest limits

On-device Whisper is good but not flawless. Noisy environments hurt accuracy. Some technical jargon comes back wrong — model names, library names, specific API endpoints. I keep a mental list of phrases to repeat when they matter.

You also need Full Access on the Diction keyboard for the mic to work — standard iOS keyboard-extension constraint. Nothing unusual, just something to expect during setup.

The setup, end to end

NanoClaw:

Clone github.com/qwibitai/nanoclaw
Run Claude Code in the repo directory
Type /setup — Claude Code handles everything
Connect your Telegram bot token when prompted

Diction:

Install from the App Store
Settings → General → Keyboard → Add New Keyboard → Diction
Grant Full Access
In Diction's settings, switch to On-Device mode
Switch to Diction in Telegram, tap the mic

That's the whole stack. A personal assistant on your own hardware, voice input on your own device, nothing leaking to someone else's cloud on the way through.

NanoClaw on GitHub | Diction on the App Store

Top comments (1)

Snailflyer • May 21

Interesting stack. The voice layer solves input cost, but the boundary I keep separating is async chat channel vs live session control. If Claude/Codex is already running in a tmux session on the host, I want the phone/browser to attach to that same process for compact output, short input, approve/interrupt, and handoff.

That is the narrow lane I am building in Faryo: github.com/Snailflyer/faryo