Thế Hùng

Posted on May 21 • Edited on Jun 30

I built Voice2Sub: a local AI subtitle generator and subtitle editor for video and audio

#ai #desktop #transcription #subtitles

I built Voice2Sub because many subtitle and transcription workflows still start with uploading a media file to a browser-based tool.

That can be convenient for short public videos. But it becomes awkward when the file is long, private, local, or part of a repeat editing workflow.

Voice2Sub focuses on a local-first desktop workflow:

Import a local video or audio file
Generate subtitles or transcript text with Whisper-based AI recognition
Review the generated text
Adjust subtitle timing when needed
Export SRT, VTT, TXT, LRC or CSV

Website: https://voice2sub.pro.vn/
Download: https://voice2sub.pro.vn/download
Subtitle editor: https://voice2sub.pro.vn/subtitle-editor
GitHub release notes: https://github.com/thehungngo/Voice2Sub

Why I built it as a desktop app

A lot of creators, educators, podcasters, editors and content teams work with media that they do not always want to upload to an online transcription service.

Examples:

private interviews
long lectures
course recordings
podcasts
internal meetings
YouTube or TikTok editing workflows
archived audio/video files
client or team content that should stay local

A desktop app gives users more control over the media file, the AI model, the output format and the review workflow.

For this kind of product, transcription is only the first step. The real workflow usually continues with checking names, punctuation, timing, line breaks and final export formats.

What Voice2Sub does

Voice2Sub is a local AI subtitle generator, subtitle editor and speech-to-text desktop app for video and audio files.

It currently focuses on:

generating subtitles from local video/audio files
creating transcript text from speech
reviewing generated subtitles before publishing
editing subtitle text and timing
previewing audio while checking subtitle timing
opening supported subtitle files for review
exporting SRT, VTT, TXT, LRC and CSV
creating optional English subtitle output from supported source-language speech
running on Windows, macOS and Linux
supporting CUDA acceleration on compatible NVIDIA systems
supporting Metal-oriented workflows on Apple Silicon Macs
giving users control over model selection, prompt/context and transcription settings

The goal is not to become a full online video editor. Voice2Sub is focused on the subtitle generation and review part of the workflow.

Why not just use an online subtitle generator?

Online tools are convenient, but a desktop workflow is useful when:

the media file is large
the content is private
the user wants repeat local processing
the user wants model control
the user wants common subtitle export formats
the user works across Windows, macOS or Linux
the user wants to review and export subtitle files without moving the whole workflow into a browser

Voice2Sub is built for people who want to generate locally, review carefully, edit when needed and export files that are ready for publishing, learning, documentation or content creation.

What I learned while building it

The AI model is only one part of a desktop AI product.

A practical desktop AI tool also needs:

reliable model downloads
offline and interrupted download handling
safe retry/resume behavior
cross-platform packaging
clear error messages
GPU acceleration setup
update reliability
localization
clean export formats
a review workflow after generation
a first-run experience that does not confuse users

One thing I underestimated was how important the model download and setup experience is. If the user cannot download or select an AI model, the whole product feels broken even if the transcription engine itself works.

Another thing I learned is that subtitles are not just “text output”. Good subtitle workflows need timing review, readable line breaks, export format choices and a safe way to edit without losing the original generated result.

Current platforms

Voice2Sub currently supports:

Windows x64
macOS Apple Silicon
macOS Intel
Linux x64

The app also supports hardware acceleration when available:

CUDA on compatible NVIDIA systems
Metal-oriented processing on Apple Silicon Macs

Current export formats

Voice2Sub can export:

These formats cover common subtitle, transcript, lyric, editing and documentation workflows.

Recent improvements

Recent Voice2Sub releases added and improved:

batch subtitle generation
English subtitle output from supported source-language speech
smoother multilingual UI rendering
clearer CUDA setup and repair flow
subtitle review and editing workflow
timing adjustment with audio preview
safer edited subtitle export
better recent work and generated subtitle review flow

What I want to improve next

I am currently thinking about:

better subtitle review ergonomics
more polishing around timing adjustment
more workflow presets for YouTube, courses, podcasts and interviews
better handling for longer editing sessions
more guidance for first-time users
continued improvements to multilingual UI quality
clearer documentation for local AI model setup

Links

Website: https://voice2sub.pro.vn/
Download: https://voice2sub.pro.vn/download
Subtitle editor: https://voice2sub.pro.vn/subtitle-editor
Supported formats: https://voice2sub.pro.vn/supported-formats
GitHub release notes: https://github.com/thehungngo/Voice2Sub

The GitHub repository is used as the public product home for release notes, support links and issue tracking. The main application source code may remain private.

If you work with subtitles, transcripts, video editing, podcasts or course content, I would love feedback on the workflow.

DEV Community