Offline voice dictation on Linux, without the cloud
Voice dictation is incredibly useful, but most people only meet it through cloud services: Google, Apple, Microsoft, or commercial tools that send audio away from your machine.
That is convenient, but it is not always ideal.
Sometimes you want dictation that works locally. Sometimes you are writing code, notes, private messages, medical text, research ideas, or internal company material. Sometimes you simply do not want every voice input to depend on an internet connection, an API key, or a subscription.
That is why I built YazSes.
YazSes is an open-source, offline voice-dictation tool for Linux. You hold a key, speak, release, and your words are transcribed locally with faster-whisper and typed into the focused app.
No cloud.
No API key.
No subscription.
Audio stays on your machine.
Repo: https://github.com/MSKazemi/yazses
Site: https://mskazemi.github.io/yazses/
The problem: most dictation is cloud-first
Voice typing has become common, but the default experience is usually cloud-based.
For many users, that is fine. For others, it is a problem:
- You may not want private speech sent to a third-party service.
- You may want dictation that keeps working offline.
- You may want a tool that is scriptable, inspectable, and open source.
- You may want voice input for Linux, terminals, editors, and custom workflows.
Linux users especially often have fewer polished dictation options than users on macOS or Windows. And for developers, accessibility users, and privacy-conscious users, “just use the cloud dictation built into something else” is not always enough.
What YazSes does
YazSes is designed around a simple workflow:
- Hold a key.
- Speak.
- Release.
- The text appears in the app you are using.
The transcription runs locally using faster-whisper. The output is then inserted into the currently focused app, so it can work with editors, terminals, browsers, chat apps, notes, and other everyday tools.
The goal is not to build a big AI assistant. YazSes is intentionally smaller and more predictable:
It is offline dictation plus voice commands. It is not an LLM agent.
That distinction matters. YazSes does not browse your files, reason over your project, or take autonomous actions. It listens when you trigger it, transcribes locally, and can map specific spoken commands to specific actions.
Install and quick setup
For Python users, the basic install is:
pipx install yazses
On Linux, you can also use Snap:
sudo snap install yazses
A typical first setup flow is:
yazses doctor
yazses enroll
yazses start
The idea is:
-
doctorchecks your system setup. -
enrollhelps configure your voice/input workflow. -
startruns the dictation daemon.
After that, you can hold the trigger key, speak, and release to type into the current app.
Voice commands and macros
Plain dictation is useful, but developers and power users often need more than text.
YazSes also supports voice commands for common editor and terminal workflows. For example, phrases can map to actions such as:
- “undo that”
- “save file”
- “go to line 42”
- “run the tests”
- “rename this to user_id”
This is useful because voice input is not only about writing paragraphs. Sometimes you want to control repetitive editing actions without leaving your flow.
YazSes also supports macros and personal vocabulary, so the tool can become more useful for your own workflow over time.
How it works
At a high level, the pipeline looks like this:
Trigger key
↓
Record audio while held
↓
Voice activity / endpoint handling
↓
Local faster-whisper transcription
↓
Command grammar or plain text
↓
Type into the focused app
The important design choice is that the speech-to-text step happens locally. Your audio does not need to be uploaded to a cloud service to become text.
This makes YazSes useful for:
- private notes
- coding sessions
- terminal workflows
- accessibility experiments
- offline environments
- privacy-focused Linux setups
Limitations
YazSes is still evolving, and I want to be honest about the current scope.
First, YazSes is Linux-first right now. The repository also contains macOS and Windows backends/install guides, but those builds are still more experimental. I would especially welcome testers on macOS and Windows.
Second, Linux input behavior can depend on your desktop environment. X11 and Wayland can behave differently, especially around global hotkeys and text injection.
Third, YazSes is not an LLM agent. It will not plan tasks, browse your repository, or make decisions for you. It is a local dictation and command tool.
Fourth, accuracy depends on your microphone, environment, model choice, accent, and vocabulary. The goal is to make the setup practical and tunable, but speech recognition is never perfect for everyone out of the box.
Why I made it open source
I wanted a tool that was useful, inspectable, and privacy-friendly.
Voice input is too important to be locked behind cloud-only systems. It can help with productivity, accessibility, fatigue, coding, writing, and everyday computer use.
For me, the ideal version of this tool is:
- simple enough to trust
- local by default
- useful on Linux
- friendly to developers
- helpful for accessibility users
- extensible with commands and macros
That is the direction I am trying to take YazSes.
Try it
Repository:
https://github.com/MSKazemi/yazses
Project site:
https://mskazemi.github.io/yazses/
Install:
pipx install yazses
Or on Linux with Snap:
sudo snap install yazses
I would really appreciate feedback, especially on:
- transcription accuracy
- setup problems
- Linux desktop compatibility
- X11 vs Wayland behavior
- useful voice commands
- accessibility use cases
- macOS and Windows testing
If the project is useful to you, a GitHub star, issue, or test report would help a lot.
Top comments (0)