Dmytro Klymentiev

Posted on Mar 4 • Originally published at klymentiev.com

How LLM Can Fix Your Posture

#ai #productivity #opensource #tutorial

I stopped typing three months ago. Not completely, but for most of my work, I just talk.

The setup: I speak into my phone, the text appears on my computer wherever the cursor is. No copy-paste, no switching windows. I say a sentence, it gets typed. I press Enter.

This is how I write this article right now.

The problem

I'm a system engineer running a home server with dozens of services, AI agents, dashboards. I spend 5-7 hours a day at my workstation after my full-time job. Most of that time goes to typing: commands, prompts, messages, notes.

My hands get tired. My back hurts from hunching over the keyboard. And the worst part: typing is the bottleneck between thinking and doing.

I wanted to give instructions the way I'd talk to a colleague. By speaking.

How it actually works

The solution turned out to be embarrassingly simple:

Android app sends recognized text over WiFi to my workstation
Workstation service receives the text and types it into the active cursor position
That's it. No cloud. No server processing. No Whisper.

The key insight: Android's built-in speech recognition is better than anything I tried.

I experimented with Whisper (multiple model sizes), Faster Whisper, Vosk, and several other libraries. They all had problems. Whisper small was too slow on CPU, took 3-4 seconds per utterance. Whisper medium ate 4GB of RAM and was still slower than real-time. Faster Whisper improved speed but accuracy with mixed Russian/English was poor. Vosk worked offline but the models were huge and recognition quality was inconsistent.

Android's native speech-to-text just works. It's fast, it's accurate, it runs on the phone's hardware, and it handles language switching naturally. Google has spent billions optimizing on-device recognition. I can't compete with that on a single server.

The workflow

My phone sits on the desk next to me. When I want to "type" something:

Open the app (or it's already open)
Speak naturally, text appears in real-time on my phone screen
The text gets transmitted over WiFi to my workstation
It's inserted wherever my cursor is: terminal, browser, IDE, chat
I hit Enter (on the phone or keyboard)

Language switching: Android auto-detects language from phonemes. I use three languages daily -- English, Russian, Ukrainian -- and it switches between them naturally.

What changed

My productivity increased dramatically. Tasks that involved writing prompts, commit messages, or documentation took about 3x less time. The bottleneck shifted from typing to thinking, which is where it should be.

The physical change was even more dramatic. I have a motorized standing desk. Before voice input, I rarely used the standing position because typing while standing is uncomfortable. Your wrists are at a weird angle, the keyboard feels too low or too high.

Now I work standing half the day. Just talking.

The irony is that as a system engineer, my posture improved not from ergonomics advice but from building a voice tool.

Technical details

Android app: Kotlin, uses Android's SpeechRecognizer API. Connects to the workstation via WebSocket over the local network. Sends recognized text as plain string messages. The app stays in foreground with a persistent notification so Android doesn't kill the WebSocket connection.

Workstation service: Lightweight Python process, about 80 lines of code. Receives WebSocket messages, uses xdotool (Linux) to type the text at the current cursor position. Simulates keyboard input at the OS level, so it works with any application.

Network: Pure local WiFi. Phone and workstation on the same network. Latency under 50ms. No internet required. Total round-trip from speech end to text appearing on screen is about 200ms.

What I use it for daily

Talking to Claude. About 60% of all voice input. I dictate prompts, describe bugs, give instructions.
Writing notes and worklogs. I used to skip writing them because it felt tedious. Now I just say what I did.
Git commit messages. My commits got longer and more descriptive since I stopped typing them.
Slack and Telegram messages. Faster than thumb-typing on phone.
Documentation. Like this article.

What doesn't work great

Code. I don't dictate code. Variable names, brackets, indentation. Voice is terrible for this. But honestly, I haven't written code manually in three months either -- Claude Code writes it for me. I dictate the intent, the model writes the code. The keyboard limitation stopped mattering.

Noisy environments. Works great in my home office. Drops accuracy significantly with background noise.

Technical terms. When I say "xdotool" or "kubectl", Android has no idea what I mean. I keep a dictionary of corrections for terms I use often, but for these I just type.

Why local-only matters

No API keys or prompts leaving my network. No subscription. No account dependency. The entire system lives on my server -- I own the data, the latency, the uptime.

Was it worth building?

It took a weekend to build the first working version. Three months later, I use it every single day.

Total cost: one weekend of coding, zero ongoing costs. The phone I already had. The WiFi network I already had. Android's speech recognition is free.

Sometimes the most impactful tool isn't the most complex one. It's the one that removes friction from what you already do hundreds of times a day.

I type less. I think more. I stand up.

Originally published on klymentiev.com

DEV Community