DEV Community

aimodels-fyi
aimodels-fyi

Posted on • Originally published at aimodels.fyi

A beginner's guide to the Whisperx-Subtitles-Replicate model by Dashed on Replicate

This is a simplified guide to an AI model called Whisperx-Subtitles-Replicate maintained by Dashed. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Model Overview

whisperx-subtitles-replicate converts audio into synchronized subtitles using WhisperX technology. This model stands out by offering precise word-level timestamps and efficient processing at 70x realtime speed. Unlike whisper subtitles which uses standard Whisper, this implementation by dashed incorporates faster-whisper-large-v3 for enhanced performance.

Model Inputs and Outputs

The model processes audio files and generates professional-grade subtitle files with options for language detection and speaker identification. The system uses PySBD for accurate sentence boundary detection and creates synchronized subtitle cues based on word-level timing.

Inputs

  • Audio File: Accepts WAV or M4A format audio
  • Language: Optional ISO language code
  • Batch Size: Controls parallel processing speed
  • Initial Prompt: Optional text context for first window
  • Diarization Settings: Speaker identification options
  • VAD Parameters: Voice activity detection configuration

Outputs

  • SRT File: Standard subtitle format with timestamps
  • Detected Language: Identified audio language
  • Segments: Transcribed text with timing data

Capabilities

The system handles sentence segmentatio...

Click here to read the full guide to Whisperx-Subtitles-Replicate

Top comments (0)