SpeakShift: A Fully Local Desktop App Powered by Whisper.cpp + NLLB + FFmpeg

#localllm #whisper #ai #productivity

SpeakShift: Fully Local Whisper.cpp + NLLB Translation + FFmpeg Media Converter

Hi DEV Community 👋

Like many of you, I spend a lot of time working with audio and video — podcasts, meetings, lectures, interviews, and content creation. I wanted a fast, private, and fully offline workflow, but kept jumping between different tools for media conversion, transcription, and translation.

So I built SpeakShift — a clean, focused desktop application that brings everything together using battle-tested local technologies.

Core Technologies

Whisper.cpp — Blazing fast local speech-to-text (supports tiny, base, small, medium, large-v3, and large-v3-turbo models)
NLLB — Neural Machine Translation for high-quality multilingual translation
FFmpeg — Powerful backend for media conversion and processing

What SpeakShift Does

Convert media files (video → audio, format changes, trimming, etc.)

Transcribe audio/video locally with high accuracy
Translate transcripts
Organize your files and transcripts in a clean library
Export in multiple formats (TXT, SRT, JSON, etc.)
Speaker diarization (up to 4 speakers in Pro version)

Everything runs 100% locally — no cloud, no API keys, no data leaving your machine.

Platforms & Performance

Windows
macOS (Apple Silicon optimized)
Linux

It works great even on modest hardware, and flies on machines with decent CPUs or GPUs.

Pricing (Transparent & Fair)

Free version — Fully functional for media conversion + basic transcription. This will stay free forever.
Pro (one-time purchase) — Unlocks batch processing, speaker diarization, advanced translation, priority support, and future features.

Try the Free Version:

👉 https://usefulthings.gumroad.com/l/bzris

Full Product Page:

👉 https://usefulthings.gumroad.com/l/speakshift

Why I Built This

The LocalLLM community has shown that many developers and power users want control, privacy, and speed without monthly subscriptions. SpeakShift was built with that mindset.

I’d love to hear your thoughts:

What Whisper model do you use most?
What features are missing for your workflow?
Any pain points with current local transcription tools?

Feedback, bug reports, and feature requests are more than welcome.

If you work with audio, content, research, or local AI tools, give it a try and let me know how it goes.