DEV Community

Cover image for Generate subtitles with OpenAI Whisper and Eyevinn OSC

Generate subtitles with OpenAI Whisper and Eyevinn OSC

With AI and LLMs being on (almost) everyone's mind these days, we at Eyevinn thought that it would be interesting to explore the possibilities of using OpenAI's speech-to-text model to generate subtitles for video/audio content.

The initial idea was that it would be cool if we could generate subtitles on the fly when a user requests them for a movie, they won't be perfect of course but maybe they can be just good enough to provide basic translations for content that otherwise wouldn't have subtitles in that specific language available.

With that idea in mind, we have created a small POC that is available on GitHub. It's a super simple API that makes it possible to provide a link to a video/audio segment where the response is the transcribed content. We've also added the possibility of uploading the transcribed file to an S3 bucket.

To transcribe content you will do a POST to the /transcribe endpoint:

{
  "url": "https://example.net/vod-audio_en=128000.aac"
  "language": "en" // ISO 639-1 language code (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (optional)
  "format": "vtt" // Supported formats: json, text, srt, verbose_json, or vtt (optional)
}
Enter fullscreen mode Exit fullscreen mode

The response will then be the transcribed content:

{
  "workerId": "BFabbcCi3IYuWOj6LfsgK",
  "result": "WEBVTT\n\n00:00:00.000 --> 00:00:04.180\nor into transcoding I mean, I could probably add just the keyframe in the start and just\n\n00:00:04.180 --> 00:00:06.920\nskip I-frames and the rest of that.\n\n"
}
Enter fullscreen mode Exit fullscreen mode

Formatted output:

WEBVTT

00:00:00.000 --> 00:00:01.940
So into transcoding, I mean, I could

00:00:01.940 --> 00:00:03.700
probably add just a keyframe at the start

00:00:03.700 --> 00:00:06.700
and then just skip iFrames in the rest of the scenes.
Enter fullscreen mode Exit fullscreen mode

The API uses the OpenAI API so an OpenAI account and API key are required.

The easiest and best way to try this out is to use our newly released (currently in beta) platform that we call Eyevinn Open Source
as a service
. It's a cloud platform that makes it super easy to prototype, monetize, and take open-source services to production. The idea behind this is that when you sign up, which is completely free, you can easily try out and deploy a selection of open-source services with a click of a button. What makes this platform special is that you only pay for the cost of running the services and a small community fee that will be shared with the maintainers of the open-source service that you use.

Take the auto-subtitles API as an example, when you have created an account you will be presented with a dashboard containing multiple services.

Dashboard view

When creating an instance of auto-subtitles you'll need to input your OpenAI key (this is only used when making calls with the OpenAI API)

Creating a service view

That's it now you are up and running and can try it out!

Eyevinn OSaaS is currently in beta so let us know what you think about it.

As always if you need assistance in the development and implementation of this, our team of video developers are happy to help you out. If you have any questions or comments just drop a line in the comments section to this post.

Top comments (0)