DEV Community

Thế Hùng
Thế Hùng

Posted on

I built Voice2Sub: a local AI subtitle generator for video and audio

I built Voice2Sub because many subtitle and transcription workflows still start with uploading a media file to a browser tool.

That works for short public videos. But it becomes awkward when the file is long, private, local, or part of a repeat editing workflow.

Voice2Sub focuses on a desktop workflow:

  1. Import a local video or audio file
  2. Generate subtitles or transcript text with Whisper AI recognition
  3. Review the result
  4. Export SRT, VTT, TXT, LRC or CSV

Website: https://voice2sub.pro.vn/

GitHub release notes: https://github.com/vannchii/Voice2Sub

Download: https://voice2sub.pro.vn/download

Why I built it as a desktop app

A lot of creators, educators, podcasters and journalists work with media that they do not always want to upload to a browser tool.

Examples:

  • private interviews
  • long lectures
  • course recordings
  • podcasts
  • internal meetings
  • YouTube or TikTok editing workflows
  • archived audio/video files

A local-first desktop app gives users more control over the file, the model, the output format and the processing workflow.

What Voice2Sub does

Voice2Sub is an AI subtitle generator and speech-to-text desktop app for video/audio files.

It currently focuses on:

  • generating subtitles from local video/audio
  • creating transcript text from speech
  • exporting SRT, VTT, TXT, LRC and CSV
  • running on Windows, macOS Apple Silicon and Linux
  • supporting CUDA acceleration on compatible Windows/Linux systems
  • supporting Metal acceleration on Apple Silicon Macs
  • giving users more control over model selection and transcription settings

Why not just use an online subtitle generator?

Online tools are convenient, but a desktop workflow is useful when:

  • the media file is large
  • the content is private
  • the user wants repeat processing
  • the user wants local model control
  • the user wants common subtitle export formats
  • the user works across Windows, macOS or Linux

Voice2Sub is not trying to replace every online video editor. It is focused on a local subtitle and transcript workflow.

What I learned while building it

The AI part is only one piece of the product.

A desktop AI tool also needs:

  • reliable model downloads
  • offline and interrupted download handling
  • safe retry/resume behavior
  • cross-platform packaging
  • clear error messages
  • GPU acceleration setup
  • update reliability
  • localization
  • clean export formats
  • a first-run experience that does not confuse users

One thing I underestimated was how important the model download experience is. If the user cannot download or select an AI model, the whole product feels broken even if the transcription engine itself works.

Current platforms

Voice2Sub currently supports:

  • Windows x64
  • macOS Apple Silicon
  • Linux x64

The app also supports hardware acceleration when available:

  • CUDA on compatible NVIDIA systems
  • Metal on Apple Silicon Macs

Current export formats

Voice2Sub can export:

  • SRT
  • VTT
  • TXT
  • LRC
  • CSV

These formats cover common subtitle, transcript, lyric and editing workflows.

What I want to improve next

I am considering:

  • batch subtitle generation
  • better subtitle preview/editing
  • translation workflow
  • speaker detection
  • better presets for YouTube, courses, podcasts and interviews
  • more polish around the first-run onboarding experience

Links

Website: https://voice2sub.pro.vn/

Download: https://voice2sub.pro.vn/download

GitHub release notes: https://github.com/vannchii/Voice2Sub

If you work with subtitles, transcripts, video editing, podcasts or course content, I would love feedback on the workflow.

Top comments (0)