What This Skill Does
The Voice.ai Creator Voiceover Pipeline is a comprehensive skill that
transforms scripts into publish-ready voiceovers with Voice.ai TTS technology.
It handles everything from segment creation to final video muxing, making it
ideal for content creators who want professional narration without the studio.
The pipeline generates numbered audio segments, stitches them into a master
track, creates YouTube chapters, produces SRT captions, and builds an
interactive review page. You can also replace the audio track on existing
videos with just one command.
Why Use This Skill
This skill is perfect for various content creation scenarios:
- YouTube long-form : Full narration with chapter markers and captions
- YouTube Shorts : Quick hooks with the shortform template
- Podcasts : Consistent host voice with intro/outro templates
- Course content : Professional narration for educational videos
- Quick iteration : Smart caching means editing one section only re-renders that segment
- Video audio replacement : Drop AI voiceover onto screen recordings or B-roll
The One-Command Workflow
If you have a script and a video, you can turn them into a finished video with
AI voiceover in one shot:
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My Video" \
--video ./my-recording.mp4 \
--mux
This renders the voiceover, stitches the master audio, and drops it onto your
video—all in one command. The output includes:
-
out/my-video/muxed.mp4— your video with the new voiceover -
out/my-video/master.wav— the standalone audio -
out/my-video/review.html— listen and review each segment -
out/my-video/chapters.txt— YouTube-ready chapter timestamps -
out/my-video/captions.srt— SRT captions
Use --sync pad if the audio is shorter than the video, or --sync trim to
cut it to match.
Requirements
The skill requires:
- Node.js 20+ — runtime (no npm install needed—the CLI is a single bundled file)
- VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
- ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
Available Voices
The pipeline supports various voice aliases and UUIDs:
- ellie : Ellie (F) — Youthful, vibrant vlogger
- oliver : Oliver (M) — Friendly British
- lilith : Lilith (F) — Soft, feminine
- smooth : Smooth Calm Voice (M) — Deep, smooth narrator
- corpse : Corpse Husband (M) — Deep, distinctive
- skadi : Skadi (F) — Anime character
- zhongli : Zhongli (M) — Deep, authoritative
- flora : Flora (F) — Cheerful, high pitch
- chief : Master Chief (M) — Heroic, commanding
Build Outputs
After a build, the output directory contains:
out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched audio (requires ffmpeg)
master.mp3 # MP3 encode (requires ffmpeg)
manifest.json # Build metadata: voice, template, segment list, hashes
timeline.json # Segment durations and start times
review.html # Interactive review page with audio players
chapters.txt # YouTube-friendly chapter timestamps
captions.srt # SRT captions using segment boundaries
description.txt # YouTube description with chapters + Voice.ai credit
Templates
The pipeline supports three templates that auto-inject intro/outro segments:
- youtube : Includes both intro and outro segments
- podcast : Includes intro segment only
- shortform : Includes hook segment only
You can edit the template files in the templates/ directory to customize the
content.
Caching
Segments are cached by a hash of text content + voice ID + language. Unchanged
segments are skipped on rebuild for fast iteration, while modified segments
are re-rendered automatically. Use --force to re-render everything.
Multilingual Support
Voice.ai supports 11 languages. Use --language to switch between
English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese
(pt), Polish (pl), Russian (ru), Dutch (nl), Swedish (sv), and Catalan (ca).
The pipeline auto-selects the multilingual TTS model for non-English
languages.
Troubleshooting
Common issues and solutions:
- ffmpeg missing : Pipeline still works—you get segments, review page, chapters, and captions without muxing capabilities
- API key issues : Check VOICE_AI_API_KEY in environment variables or .env file
- Voice not found : Use --mock to test without API calls or check available voices with the voices command
Privacy
Video processing is entirely local. Only script text is sent to Voice.ai for
TTS, ensuring your content remains private throughout the workflow.
Commands
The skill provides several commands:
- build : Generate a voiceover from a script
- replace-audio : Swap the audio track on a video
- voices : List available voices
Each command supports various options for customization, including voice
selection, templates, language settings, and video muxing capabilities.
With the Voice.ai Creator Voiceover Pipeline, you can create professional-
quality voiceovers for any content type with minimal effort and maximum
flexibility.
Skill can be found at:
voiceover-creator/SKILL.md>
Top comments (0)