<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thế Hùng</title>
    <description>The latest articles on DEV Community by Thế Hùng (@thehung).</description>
    <link>https://dev.to/thehung</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944410%2F0cc9796f-1b9d-45e9-9401-3ff66e22fb7c.jpg</url>
      <title>DEV Community: Thế Hùng</title>
      <link>https://dev.to/thehung</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thehung"/>
    <language>en</language>
    <item>
      <title>I built Voice2Sub: a local AI subtitle generator and subtitle editor for video and audio</title>
      <dc:creator>Thế Hùng</dc:creator>
      <pubDate>Thu, 21 May 2026 15:47:52 +0000</pubDate>
      <link>https://dev.to/thehung/i-built-voice2sub-a-local-ai-subtitle-generator-for-video-and-audio-imk</link>
      <guid>https://dev.to/thehung/i-built-voice2sub-a-local-ai-subtitle-generator-for-video-and-audio-imk</guid>
      <description>&lt;p&gt;I built &lt;strong&gt;Voice2Sub&lt;/strong&gt; because many subtitle and transcription workflows still start with uploading a media file to a browser-based tool.&lt;/p&gt;

&lt;p&gt;That can be convenient for short public videos. But it becomes awkward when the file is long, private, local, or part of a repeat editing workflow.&lt;/p&gt;

&lt;p&gt;Voice2Sub focuses on a local-first desktop workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Import a local video or audio file&lt;/li&gt;
&lt;li&gt;Generate subtitles or transcript text with Whisper-based AI recognition&lt;/li&gt;
&lt;li&gt;Review the generated text&lt;/li&gt;
&lt;li&gt;Adjust subtitle timing when needed&lt;/li&gt;
&lt;li&gt;Export SRT, VTT, TXT, LRC or CSV&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Website: &lt;a href="https://voice2sub.pro.vn/" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/&lt;/a&gt;&lt;br&gt;
Download: &lt;a href="https://voice2sub.pro.vn/download" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/download&lt;/a&gt;&lt;br&gt;
Subtitle editor: &lt;a href="https://voice2sub.pro.vn/subtitle-editor" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/subtitle-editor&lt;/a&gt;&lt;br&gt;
GitHub release notes: &lt;a href="https://github.com/thehungngo/Voice2Sub" rel="noopener noreferrer"&gt;https://github.com/thehungngo/Voice2Sub&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built it as a desktop app
&lt;/h2&gt;

&lt;p&gt;A lot of creators, educators, podcasters, editors and content teams work with media that they do not always want to upload to an online transcription service.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;private interviews&lt;/li&gt;
&lt;li&gt;long lectures&lt;/li&gt;
&lt;li&gt;course recordings&lt;/li&gt;
&lt;li&gt;podcasts&lt;/li&gt;
&lt;li&gt;internal meetings&lt;/li&gt;
&lt;li&gt;YouTube or TikTok editing workflows&lt;/li&gt;
&lt;li&gt;archived audio/video files&lt;/li&gt;
&lt;li&gt;client or team content that should stay local&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A desktop app gives users more control over the media file, the AI model, the output format and the review workflow.&lt;/p&gt;

&lt;p&gt;For this kind of product, transcription is only the first step. The real workflow usually continues with checking names, punctuation, timing, line breaks and final export formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Voice2Sub does
&lt;/h2&gt;

&lt;p&gt;Voice2Sub is a local AI subtitle generator, subtitle editor and speech-to-text desktop app for video and audio files.&lt;/p&gt;

&lt;p&gt;It currently focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generating subtitles from local video/audio files&lt;/li&gt;
&lt;li&gt;creating transcript text from speech&lt;/li&gt;
&lt;li&gt;reviewing generated subtitles before publishing&lt;/li&gt;
&lt;li&gt;editing subtitle text and timing&lt;/li&gt;
&lt;li&gt;previewing audio while checking subtitle timing&lt;/li&gt;
&lt;li&gt;opening supported subtitle files for review&lt;/li&gt;
&lt;li&gt;exporting SRT, VTT, TXT, LRC and CSV&lt;/li&gt;
&lt;li&gt;creating optional English subtitle output from supported source-language speech&lt;/li&gt;
&lt;li&gt;running on Windows, macOS and Linux&lt;/li&gt;
&lt;li&gt;supporting CUDA acceleration on compatible NVIDIA systems&lt;/li&gt;
&lt;li&gt;supporting Metal-oriented workflows on Apple Silicon Macs&lt;/li&gt;
&lt;li&gt;giving users control over model selection, prompt/context and transcription settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to become a full online video editor. Voice2Sub is focused on the subtitle generation and review part of the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just use an online subtitle generator?
&lt;/h2&gt;

&lt;p&gt;Online tools are convenient, but a desktop workflow is useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the media file is large&lt;/li&gt;
&lt;li&gt;the content is private&lt;/li&gt;
&lt;li&gt;the user wants repeat local processing&lt;/li&gt;
&lt;li&gt;the user wants model control&lt;/li&gt;
&lt;li&gt;the user wants common subtitle export formats&lt;/li&gt;
&lt;li&gt;the user works across Windows, macOS or Linux&lt;/li&gt;
&lt;li&gt;the user wants to review and export subtitle files without moving the whole workflow into a browser&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Voice2Sub is built for people who want to generate locally, review carefully, edit when needed and export files that are ready for publishing, learning, documentation or content creation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned while building it
&lt;/h2&gt;

&lt;p&gt;The AI model is only one part of a desktop AI product.&lt;/p&gt;

&lt;p&gt;A practical desktop AI tool also needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reliable model downloads&lt;/li&gt;
&lt;li&gt;offline and interrupted download handling&lt;/li&gt;
&lt;li&gt;safe retry/resume behavior&lt;/li&gt;
&lt;li&gt;cross-platform packaging&lt;/li&gt;
&lt;li&gt;clear error messages&lt;/li&gt;
&lt;li&gt;GPU acceleration setup&lt;/li&gt;
&lt;li&gt;update reliability&lt;/li&gt;
&lt;li&gt;localization&lt;/li&gt;
&lt;li&gt;clean export formats&lt;/li&gt;
&lt;li&gt;a review workflow after generation&lt;/li&gt;
&lt;li&gt;a first-run experience that does not confuse users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I underestimated was how important the model download and setup experience is. If the user cannot download or select an AI model, the whole product feels broken even if the transcription engine itself works.&lt;/p&gt;

&lt;p&gt;Another thing I learned is that subtitles are not just “text output”. Good subtitle workflows need timing review, readable line breaks, export format choices and a safe way to edit without losing the original generated result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current platforms
&lt;/h2&gt;

&lt;p&gt;Voice2Sub currently supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows x64&lt;/li&gt;
&lt;li&gt;macOS Apple Silicon&lt;/li&gt;
&lt;li&gt;macOS Intel&lt;/li&gt;
&lt;li&gt;Linux x64&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The app also supports hardware acceleration when available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CUDA on compatible NVIDIA systems&lt;/li&gt;
&lt;li&gt;Metal-oriented processing on Apple Silicon Macs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Current export formats
&lt;/h2&gt;

&lt;p&gt;Voice2Sub can export:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SRT&lt;/li&gt;
&lt;li&gt;VTT&lt;/li&gt;
&lt;li&gt;TXT&lt;/li&gt;
&lt;li&gt;LRC&lt;/li&gt;
&lt;li&gt;CSV&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These formats cover common subtitle, transcript, lyric, editing and documentation workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recent improvements
&lt;/h2&gt;

&lt;p&gt;Recent Voice2Sub releases added and improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;batch subtitle generation&lt;/li&gt;
&lt;li&gt;English subtitle output from supported source-language speech&lt;/li&gt;
&lt;li&gt;smoother multilingual UI rendering&lt;/li&gt;
&lt;li&gt;clearer CUDA setup and repair flow&lt;/li&gt;
&lt;li&gt;subtitle review and editing workflow&lt;/li&gt;
&lt;li&gt;timing adjustment with audio preview&lt;/li&gt;
&lt;li&gt;safer edited subtitle export&lt;/li&gt;
&lt;li&gt;better recent work and generated subtitle review flow&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I want to improve next
&lt;/h2&gt;

&lt;p&gt;I am currently thinking about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better subtitle review ergonomics&lt;/li&gt;
&lt;li&gt;more polishing around timing adjustment&lt;/li&gt;
&lt;li&gt;more workflow presets for YouTube, courses, podcasts and interviews&lt;/li&gt;
&lt;li&gt;better handling for longer editing sessions&lt;/li&gt;
&lt;li&gt;more guidance for first-time users&lt;/li&gt;
&lt;li&gt;continued improvements to multilingual UI quality&lt;/li&gt;
&lt;li&gt;clearer documentation for local AI model setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;Website: &lt;a href="https://voice2sub.pro.vn/" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/&lt;/a&gt;&lt;br&gt;
Download: &lt;a href="https://voice2sub.pro.vn/download" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/download&lt;/a&gt;&lt;br&gt;
Subtitle editor: &lt;a href="https://voice2sub.pro.vn/subtitle-editor" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/subtitle-editor&lt;/a&gt;&lt;br&gt;
Supported formats: &lt;a href="https://voice2sub.pro.vn/supported-formats" rel="noopener noreferrer"&gt;https://voice2sub.pro.vn/supported-formats&lt;/a&gt;&lt;br&gt;
GitHub release notes: &lt;a href="https://github.com/thehungngo/Voice2Sub" rel="noopener noreferrer"&gt;https://github.com/thehungngo/Voice2Sub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The GitHub repository is used as the public product home for release notes, support links and issue tracking. The main application source code may remain private.&lt;/p&gt;

&lt;p&gt;If you work with subtitles, transcripts, video editing, podcasts or course content, I would love feedback on the workflow.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>desktop</category>
      <category>transcription</category>
      <category>subtitles</category>
    </item>
  </channel>
</rss>
