DEV Community

Cover image for How to Create Subtitle-Files (srt) with WhisperAI
Johannes Dienst
Johannes Dienst

Posted on • Updated on • Originally published at

How to Create Subtitle-Files (srt) with WhisperAI

For my tutorial videos I want to provide high-quality subtitles. But I did not want to write them myself as this is a tedious task.

Luckily there is WhisperAI that can help me, or so it promises 😂 Time to give it a shot for my current project.

Install WhisperAI

I followed their installation guide on their GitHub-Repository side. It is a Python tool so the first step I did was setting up a virtual-environment (my current installed version is 3.9.6 on macOS. Here are all the commands I ran:

## I installed python with brew as far as I remember ;-)
## Initialize and activate the virtualenv
python3 -m venv venv
source venv/bin/activate

## Install latest WhisperAI from Github
pip install git+

## Install ffmpeg
brew install ffmpeg
Enter fullscreen mode Exit fullscreen mode

Create Util-Script for Easier Handling

With that set I created a little script based on this Gist. It was for an older version of WhisperAI so I had to implement some changes. Save it as or whatever name you like 😋:

import sys

import whisper
from whisper.utils import get_writer

def run(input_path: str, output_name: str = "", output_directory: str = "./") -> None:
    model = whisper.load_model("medium")
    result = model.transcribe(input_path)

    writer = get_writer("srt", str(output_directory))
    writer(result, output_name)

def main() -> None:
    if len(sys.argv) != 4:
            "Error: Invalid number of arguments.\n"
            "Usage: python <input-path> <output-name> <output-directory>\n"
            "Example: python ./transcribed"

    run(input_path=sys.argv[1], output_name=sys.argv[2], output_directory=sys.argv[3])

if __name__ == "__main__":
Enter fullscreen mode Exit fullscreen mode

Usage of Util-Script

You can call it like this and it can handle .wav and also .mp4. So you do not even have to export your videos in another format to use it:

python <path to your mp4/wav file> <name of the srt file> <path to where to save the srt file>


If you want to use another model instead of medium you have to change the following line and replace medium with a model documented here:

model = whisper.load_model("medium")

If you want to change the output format you can use one of the following instead of srt: vtt, tsv, json, txt. Change it in the following line:

writer = get_writer("srt", str(output_directory))

Happy transcribing 🦄

Top comments (2)

jaizon profile image
Jaizon Carlos Oliveira Santos

It is interesting, but I got a question: how to fine-tune the timings?

The models are pretty accurate, but I've noticed it also includes the times when no one is speaking.

For example, in one of the audios I've tried, the person only started speaking after a few seconds, but whisper logged in the srt file from 00:00:00. That means that the text appeared in the video way before any voice came out.

johannesdienst profile image
Johannes Dienst

I have not found a workaround to going over the transcripts by myself and editing some mistakes.

For example company names are usually not recognized as they should.

Whisper and in my experience every AI tool I have tried so far gets you maybe 95% where you want. If you are ok with that use it as it is. If not you have to invest in the last 5% yourself ;-)