DEV Community

Aloysius Chan
Aloysius Chan

Posted on • Originally published at insightginie.com

Voice.ai Creator Voiceover Pipeline: Turn Scripts into Publish-Ready Voiceovers

What This Skill Does

The Voice.ai Creator Voiceover Pipeline is a comprehensive skill that
transforms scripts into publish-ready voiceovers with Voice.ai TTS technology.
It handles everything from segment creation to final video muxing, making it
ideal for content creators who want professional narration without the studio.

The pipeline generates numbered audio segments, stitches them into a master
track, creates YouTube chapters, produces SRT captions, and builds an
interactive review page. You can also replace the audio track on existing
videos with just one command.

Why Use This Skill

This skill is perfect for various content creation scenarios:

  • YouTube long-form : Full narration with chapter markers and captions
  • YouTube Shorts : Quick hooks with the shortform template
  • Podcasts : Consistent host voice with intro/outro templates
  • Course content : Professional narration for educational videos
  • Quick iteration : Smart caching means editing one section only re-renders that segment
  • Video audio replacement : Drop AI voiceover onto screen recordings or B-roll

The One-Command Workflow

If you have a script and a video, you can turn them into a finished video with
AI voiceover in one shot:

node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My Video" \
  --video ./my-recording.mp4 \
  --mux
Enter fullscreen mode Exit fullscreen mode

This renders the voiceover, stitches the master audio, and drops it onto your
video—all in one command. The output includes:

  • out/my-video/muxed.mp4 — your video with the new voiceover
  • out/my-video/master.wav — the standalone audio
  • out/my-video/review.html — listen and review each segment
  • out/my-video/chapters.txt — YouTube-ready chapter timestamps
  • out/my-video/captions.srt — SRT captions

Use --sync pad if the audio is shorter than the video, or --sync trim to
cut it to match.

Requirements

The skill requires:

  • Node.js 20+ — runtime (no npm install needed—the CLI is a single bundled file)
  • VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
  • ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Available Voices

The pipeline supports various voice aliases and UUIDs:

  • ellie : Ellie (F) — Youthful, vibrant vlogger
  • oliver : Oliver (M) — Friendly British
  • lilith : Lilith (F) — Soft, feminine
  • smooth : Smooth Calm Voice (M) — Deep, smooth narrator
  • corpse : Corpse Husband (M) — Deep, distinctive
  • skadi : Skadi (F) — Anime character
  • zhongli : Zhongli (M) — Deep, authoritative
  • flora : Flora (F) — Cheerful, high pitch
  • chief : Master Chief (M) — Heroic, commanding

Build Outputs

After a build, the output directory contains:

out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched audio (requires ffmpeg)
master.mp3 # MP3 encode (requires ffmpeg)
manifest.json # Build metadata: voice, template, segment list, hashes
timeline.json # Segment durations and start times
review.html # Interactive review page with audio players
chapters.txt # YouTube-friendly chapter timestamps
captions.srt # SRT captions using segment boundaries
description.txt # YouTube description with chapters + Voice.ai credit
Enter fullscreen mode Exit fullscreen mode




Templates

The pipeline supports three templates that auto-inject intro/outro segments:

  • youtube : Includes both intro and outro segments
  • podcast : Includes intro segment only
  • shortform : Includes hook segment only

You can edit the template files in the templates/ directory to customize the
content.

Caching

Segments are cached by a hash of text content + voice ID + language. Unchanged
segments are skipped on rebuild for fast iteration, while modified segments
are re-rendered automatically. Use --force to re-render everything.

Multilingual Support

Voice.ai supports 11 languages. Use --language to switch between
English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese
(pt), Polish (pl), Russian (ru), Dutch (nl), Swedish (sv), and Catalan (ca).
The pipeline auto-selects the multilingual TTS model for non-English
languages.

Troubleshooting

Common issues and solutions:

  • ffmpeg missing : Pipeline still works—you get segments, review page, chapters, and captions without muxing capabilities
  • API key issues : Check VOICE_AI_API_KEY in environment variables or .env file
  • Voice not found : Use --mock to test without API calls or check available voices with the voices command

Privacy

Video processing is entirely local. Only script text is sent to Voice.ai for
TTS, ensuring your content remains private throughout the workflow.

Commands

The skill provides several commands:

  • build : Generate a voiceover from a script
  • replace-audio : Swap the audio track on a video
  • voices : List available voices

Each command supports various options for customization, including voice
selection, templates, language settings, and video muxing capabilities.

With the Voice.ai Creator Voiceover Pipeline, you can create professional-
quality voiceovers for any content type with minimal effort and maximum
flexibility.

Skill can be found at:
voiceover-creator/SKILL.md>

Top comments (0)