I built Voice2Sub because many subtitle and transcription workflows still start with uploading a media file to a browser-based tool.
That can be convenient for short public videos. But it becomes awkward when the file is long, private, local, or part of a repeat editing workflow.
Voice2Sub focuses on a local-first desktop workflow:
- Import a local video or audio file
- Generate subtitles or transcript text with Whisper-based AI recognition
- Review the generated text
- Adjust subtitle timing when needed
- Export SRT, VTT, TXT, LRC or CSV
Website: https://voice2sub.pro.vn/
Download: https://voice2sub.pro.vn/download
Subtitle editor: https://voice2sub.pro.vn/subtitle-editor
GitHub release notes: https://github.com/thehungngo/Voice2Sub
Why I built it as a desktop app
A lot of creators, educators, podcasters, editors and content teams work with media that they do not always want to upload to an online transcription service.
Examples:
- private interviews
- long lectures
- course recordings
- podcasts
- internal meetings
- YouTube or TikTok editing workflows
- archived audio/video files
- client or team content that should stay local
A desktop app gives users more control over the media file, the AI model, the output format and the review workflow.
For this kind of product, transcription is only the first step. The real workflow usually continues with checking names, punctuation, timing, line breaks and final export formats.
What Voice2Sub does
Voice2Sub is a local AI subtitle generator, subtitle editor and speech-to-text desktop app for video and audio files.
It currently focuses on:
- generating subtitles from local video/audio files
- creating transcript text from speech
- reviewing generated subtitles before publishing
- editing subtitle text and timing
- previewing audio while checking subtitle timing
- opening supported subtitle files for review
- exporting SRT, VTT, TXT, LRC and CSV
- creating optional English subtitle output from supported source-language speech
- running on Windows, macOS and Linux
- supporting CUDA acceleration on compatible NVIDIA systems
- supporting Metal-oriented workflows on Apple Silicon Macs
- giving users control over model selection, prompt/context and transcription settings
The goal is not to become a full online video editor. Voice2Sub is focused on the subtitle generation and review part of the workflow.
Why not just use an online subtitle generator?
Online tools are convenient, but a desktop workflow is useful when:
- the media file is large
- the content is private
- the user wants repeat local processing
- the user wants model control
- the user wants common subtitle export formats
- the user works across Windows, macOS or Linux
- the user wants to review and export subtitle files without moving the whole workflow into a browser
Voice2Sub is built for people who want to generate locally, review carefully, edit when needed and export files that are ready for publishing, learning, documentation or content creation.
What I learned while building it
The AI model is only one part of a desktop AI product.
A practical desktop AI tool also needs:
- reliable model downloads
- offline and interrupted download handling
- safe retry/resume behavior
- cross-platform packaging
- clear error messages
- GPU acceleration setup
- update reliability
- localization
- clean export formats
- a review workflow after generation
- a first-run experience that does not confuse users
One thing I underestimated was how important the model download and setup experience is. If the user cannot download or select an AI model, the whole product feels broken even if the transcription engine itself works.
Another thing I learned is that subtitles are not just “text output”. Good subtitle workflows need timing review, readable line breaks, export format choices and a safe way to edit without losing the original generated result.
Current platforms
Voice2Sub currently supports:
- Windows x64
- macOS Apple Silicon
- macOS Intel
- Linux x64
The app also supports hardware acceleration when available:
- CUDA on compatible NVIDIA systems
- Metal-oriented processing on Apple Silicon Macs
Current export formats
Voice2Sub can export:
- SRT
- VTT
- TXT
- LRC
- CSV
These formats cover common subtitle, transcript, lyric, editing and documentation workflows.
Recent improvements
Recent Voice2Sub releases added and improved:
- batch subtitle generation
- English subtitle output from supported source-language speech
- smoother multilingual UI rendering
- clearer CUDA setup and repair flow
- subtitle review and editing workflow
- timing adjustment with audio preview
- safer edited subtitle export
- better recent work and generated subtitle review flow
What I want to improve next
I am currently thinking about:
- better subtitle review ergonomics
- more polishing around timing adjustment
- more workflow presets for YouTube, courses, podcasts and interviews
- better handling for longer editing sessions
- more guidance for first-time users
- continued improvements to multilingual UI quality
- clearer documentation for local AI model setup
Links
Website: https://voice2sub.pro.vn/
Download: https://voice2sub.pro.vn/download
Subtitle editor: https://voice2sub.pro.vn/subtitle-editor
Supported formats: https://voice2sub.pro.vn/supported-formats
GitHub release notes: https://github.com/thehungngo/Voice2Sub
The GitHub repository is used as the public product home for release notes, support links and issue tracking. The main application source code may remain private.
If you work with subtitles, transcripts, video editing, podcasts or course content, I would love feedback on the workflow.
Top comments (0)