A beginner's guide to the Whisperx-Subtitles-Replicate model by Dashed on Replicate

Audio File: Accepts WAV or M4A format audio
Language: Optional ISO language code
Batch Size: Controls parallel processing speed
Initial Prompt: Optional text context for first window
Diarization Settings: Speaker identification options
VAD Parameters: Voice activity detection configuration

#coding #ai #machinelearning #programming

This is a simplified guide to an AI model called Whisperx-Subtitles-Replicate maintained by Dashed. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Model Overview

whisperx-subtitles-replicate converts audio into synchronized subtitles using WhisperX technology. This model stands out by offering precise word-level timestamps and efficient processing at 70x realtime speed. Unlike whisper subtitles which uses standard Whisper, this implementation by dashed incorporates faster-whisper-large-v3 for enhanced performance.

Model Inputs and Outputs

The model processes audio files and generates professional-grade subtitle files with options for language detection and speaker identification. The system uses PySBD for accurate sentence boundary detection and creates synchronized subtitle cues based on word-level timing.