Aloysius Chan

Posted on Mar 18 • Originally published at insightginie.com

Voice.ai Creator Voiceover Pipeline: Turn Scripts into Publish-Ready Voiceovers

#news #insights #ginie #openclaw

What This Skill Does

The Voice.ai Creator Voiceover Pipeline is a comprehensive skill that
transforms scripts into publish-ready voiceovers with Voice.ai TTS technology.
It handles everything from segment creation to final video muxing, making it
ideal for content creators who want professional narration without the studio.

The pipeline generates numbered audio segments, stitches them into a master
track, creates YouTube chapters, produces SRT captions, and builds an
interactive review page. You can also replace the audio track on existing
videos with just one command.

Why Use This Skill

This skill is perfect for various content creation scenarios:

YouTube long-form : Full narration with chapter markers and captions
YouTube Shorts : Quick hooks with the shortform template
Podcasts : Consistent host voice with intro/outro templates
Course content : Professional narration for educational videos
Quick iteration : Smart caching means editing one section only re-renders that segment
Video audio replacement : Drop AI voiceover onto screen recordings or B-roll

The One-Command Workflow

If you have a script and a video, you can turn them into a finished video with
AI voiceover in one shot:

node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My Video" \
  --video ./my-recording.mp4 \
  --mux

This renders the voiceover, stitches the master audio, and drops it onto your
video—all in one command. The output includes:

out/my-video/muxed.mp4 — your video with the new voiceover
out/my-video/master.wav — the standalone audio
out/my-video/review.html — listen and review each segment
out/my-video/chapters.txt — YouTube-ready chapter timestamps
out/my-video/captions.srt — SRT captions

Use --sync pad if the audio is shorter than the video, or --sync trim to
cut it to match.

Requirements

The skill requires:

Node.js 20+ — runtime (no npm install needed—the CLI is a single bundled file)
VOICE_AI_API_KEY — set as environment variable or in a .env file in the skill root. Get a key at voice.ai/dashboard.
ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

Available Voices

The pipeline supports various voice aliases and UUIDs:

ellie : Ellie (F) — Youthful, vibrant vlogger
oliver : Oliver (M) — Friendly British
lilith : Lilith (F) — Soft, feminine
smooth : Smooth Calm Voice (M) — Deep, smooth narrator
corpse : Corpse Husband (M) — Deep, distinctive
skadi : Skadi (F) — Anime character
zhongli : Zhongli (M) — Deep, authoritative
flora : Flora (F) — Cheerful, high pitch
chief : Master Chief (M) — Heroic, commanding

Build Outputs

After a build, the output directory contains:

out/<title-slug>/

  segments/           # Numbered WAV files (001-intro.wav, 002-section.wav, …)

  master.wav          # Stitched audio (requires ffmpeg)

  master.mp3          # MP3 encode (requires ffmpeg)

  manifest.json       # Build metadata: voice, template, segment list, hashes

  timeline.json       # Segment durations and start times

  review.html         # Interactive review page with audio players

  chapters.txt        # YouTube-friendly chapter timestamps

  captions.srt        # SRT captions using segment boundaries

  description.txt     # YouTube description with chapters + Voice.ai credit

Templates

The pipeline supports three templates that auto-inject intro/outro segments:

youtube : Includes both intro and outro segments
podcast : Includes intro segment only
shortform : Includes hook segment only

You can edit the template files in the templates/ directory to customize the
content.

Caching

Segments are cached by a hash of text content + voice ID + language. Unchanged
segments are skipped on rebuild for fast iteration, while modified segments
are re-rendered automatically. Use --force to re-render everything.

Multilingual Support

Voice.ai supports 11 languages. Use --language to switch between English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Russian (ru), Dutch (nl), Swedish (sv), and Catalan (ca). The pipeline auto-selects the multilingual TTS model for non-English languages.

Troubleshooting

Common issues and solutions:

ffmpeg missing : Pipeline still works—you get segments, review page, chapters, and captions without muxing capabilities
API key issues : Check VOICE_AI_API_KEY in environment variables or .env file
Voice not found : Use --mock to test without API calls or check available voices with the voices command

Privacy

Video processing is entirely local. Only script text is sent to Voice.ai for
TTS, ensuring your content remains private throughout the workflow.

Commands

The skill provides several commands:

build : Generate a voiceover from a script
replace-audio : Swap the audio track on a video
voices : List available voices

Each command supports various options for customization, including voice
selection, templates, language settings, and video muxing capabilities.

With the Voice.ai Creator Voiceover Pipeline, you can create professional-
quality voiceovers for any content type with minimal effort and maximum
flexibility.

Skill can be found at:
voiceover-creator/SKILL.md>

DEV Community