This is a simplified guide to an AI model called Whisperx-Subtitles-Replicate maintained by Dashed. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model Overview
whisperx-subtitles-replicate converts audio into synchronized subtitles using WhisperX technology. This model stands out by offering precise word-level timestamps and efficient processing at 70x realtime speed. Unlike whisper subtitles which uses standard Whisper, this implementation by dashed incorporates faster-whisper-large-v3 for enhanced performance.
Model Inputs and Outputs
The model processes audio files and generates professional-grade subtitle files with options for language detection and speaker identification. The system uses PySBD for accurate sentence boundary detection and creates synchronized subtitle cues based on word-level timing.
Inputs
- Audio File: Accepts WAV or M4A format audio
- Language: Optional ISO language code
- Batch Size: Controls parallel processing speed
- Initial Prompt: Optional text context for first window
- Diarization Settings: Speaker identification options
- VAD Parameters: Voice activity detection configuration
Outputs
- SRT File: Standard subtitle format with timestamps
- Detected Language: Identified audio language
- Segments: Transcribed text with timing data
Capabilities
The system handles sentence segmentatio...
Click here to read the full guide to Whisperx-Subtitles-Replicate
Top comments (0)