mohsen

Posted on Jun 22

Offline voice dictation on Linux, without the cloud

#automation #ai #programming #productivity

Offline voice dictation on Linux, without the cloud

Voice dictation is incredibly useful, but most people only meet it through cloud services: Google, Apple, Microsoft, or commercial tools that send audio away from your machine.

That is convenient, but it is not always ideal.

Sometimes you want dictation that works locally. Sometimes you are writing code, notes, private messages, medical text, research ideas, or internal company material. Sometimes you simply do not want every voice input to depend on an internet connection, an API key, or a subscription.

That is why I built YazSes.

YazSes is an open-source, offline voice-dictation tool for Linux. You hold a key, speak, release, and your words are transcribed locally with faster-whisper and typed into the focused app.

No cloud.
No API key.
No subscription.
Audio stays on your machine.

Repo: https://github.com/MSKazemi/yazses
Site: https://mskazemi.github.io/yazses/

The problem: most dictation is cloud-first

Voice typing has become common, but the default experience is usually cloud-based.

For many users, that is fine. For others, it is a problem:

You may not want private speech sent to a third-party service.
You may want dictation that keeps working offline.
You may want a tool that is scriptable, inspectable, and open source.
You may want voice input for Linux, terminals, editors, and custom workflows.

Linux users especially often have fewer polished dictation options than users on macOS or Windows. And for developers, accessibility users, and privacy-conscious users, “just use the cloud dictation built into something else” is not always enough.

What YazSes does

YazSes is designed around a simple workflow:

Hold a key.
Speak.
Release.
The text appears in the app you are using.

The transcription runs locally using faster-whisper. The output is then inserted into the currently focused app, so it can work with editors, terminals, browsers, chat apps, notes, and other everyday tools.

The goal is not to build a big AI assistant. YazSes is intentionally smaller and more predictable:

It is offline dictation plus voice commands. It is not an LLM agent.

That distinction matters. YazSes does not browse your files, reason over your project, or take autonomous actions. It listens when you trigger it, transcribes locally, and can map specific spoken commands to specific actions.

Install and quick setup

For Python users, the basic install is:

pipx install yazses

On Linux, you can also use Snap:

sudo snap install yazses

A typical first setup flow is:

yazses doctor yazses enroll yazses start

The idea is:

doctor checks your system setup.
enroll helps configure your voice/input workflow.
start runs the dictation daemon.

After that, you can hold the trigger key, speak, and release to type into the current app.

Voice commands and macros

Plain dictation is useful, but developers and power users often need more than text.

YazSes also supports voice commands for common editor and terminal workflows. For example, phrases can map to actions such as:

“undo that”
“save file”
“go to line 42”
“run the tests”
“rename this to user_id”

This is useful because voice input is not only about writing paragraphs. Sometimes you want to control repetitive editing actions without leaving your flow.

YazSes also supports macros and personal vocabulary, so the tool can become more useful for your own workflow over time.

How it works

At a high level, the pipeline looks like this:

Trigger key
   ↓
Record audio while held
   ↓
Voice activity / endpoint handling
   ↓
Local faster-whisper transcription
   ↓
Command grammar or plain text
   ↓
Type into the focused app

The important design choice is that the speech-to-text step happens locally. Your audio does not need to be uploaded to a cloud service to become text.

This makes YazSes useful for:

private notes
coding sessions
terminal workflows
accessibility experiments
offline environments
privacy-focused Linux setups

Limitations

YazSes is still evolving, and I want to be honest about the current scope.

First, YazSes is Linux-first right now. The repository also contains macOS and Windows backends/install guides, but those builds are still more experimental. I would especially welcome testers on macOS and Windows.

Second, Linux input behavior can depend on your desktop environment. X11 and Wayland can behave differently, especially around global hotkeys and text injection.

Third, YazSes is not an LLM agent. It will not plan tasks, browse your repository, or make decisions for you. It is a local dictation and command tool.

Fourth, accuracy depends on your microphone, environment, model choice, accent, and vocabulary. The goal is to make the setup practical and tunable, but speech recognition is never perfect for everyone out of the box.

Why I made it open source

I wanted a tool that was useful, inspectable, and privacy-friendly.

Voice input is too important to be locked behind cloud-only systems. It can help with productivity, accessibility, fatigue, coding, writing, and everyday computer use.

For me, the ideal version of this tool is:

simple enough to trust
local by default
useful on Linux
friendly to developers
helpful for accessibility users
extensible with commands and macros

That is the direction I am trying to take YazSes.

Try it

Repository:

https://github.com/MSKazemi/yazses

Project site:

https://mskazemi.github.io/yazses/

Install:

pipx install yazses

Or on Linux with Snap:

sudo snap install yazses

I would really appreciate feedback, especially on:

transcription accuracy
setup problems
Linux desktop compatibility
X11 vs Wayland behavior
useful voice commands
accessibility use cases
macOS and Windows testing

If the project is useful to you, a GitHub star, issue, or test report would help a lot.

DEV Community

Offline voice dictation on Linux, without the cloud

Offline voice dictation on Linux, without the cloud

The problem: most dictation is cloud-first

What YazSes does

Install and quick setup

Voice commands and macros

How it works

Limitations

Why I made it open source

Try it

Top comments (0)