This is a simplified guide to an AI model called Insanely-Fast-Whisper-With-Video maintained by Turian. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
The insanely-fast-whisper-with-video
model, created by turian, is a powerful AI-based audio transcription tool that leverages the impressive capabilities of OpenAI's Whisper Large v3 model. This model boasts incredible speed, allowing users to transcribe up to 150 minutes of audio in less than 98 seconds on a Nvidia A100 - 80GB GPU. The model also supports video transcription, making it a versatile tool for a wide range of applications.
The insanely-fast-whisper-with-video
model builds upon the work of chenxwh/insanely-fast-whisper and adidoes/cog-whisperx-video-transcribe, leveraging techniques like fp16
precision, batching
, Flash Attention 2
, and bettertransformer
to achieve these impressive transcription speeds.
Model inputs and outputs
Inputs
- File Name: The path or URL to the audio or video file to be transcribed.
- Task: The task to be performed, either transcription or translation.
- Language: The language of the input audio (optional, Whisper can auto-detect the language).
- Batch Size: The number of parallel batches to compute, adjustable to avoid Out-Of-Memory (OOM) issues.
- Timestamp: The type of timestamp to generate, either chunked or word-level.
- Diarise Audio: Whether to use Pyannote.audio to diarise the audio clips, which requires a Hugging Face token.
Outputs
- The transcription output, which can be saved to a specified file path.
Capabilities
The insanely-fast-whisper-with-video
...
Click here to read the full guide to Insanely-Fast-Whisper-With-Video
Top comments (0)