Using GPT-3 and Whisper to generate a Summary of a YouTube video of any language 😀

Karan Kulshestha — Wed, 19 Oct 2022 12:07:41 +0000

In this post, you'll learn about how to use whisper and GPT-3 to generate a short summary of YouTube videos in any language. you can see demo video here

Technologies Required

Streamlit (Building Webapp)
Whisper (OpenAI speech recognition model)
OpenAI GPT-3 API (API Key for using this service)
Python

Setup Required

Install Whisper and Streamlit using these command

pip install git+https://github.com/openai/whisper.git 
pip install streamlit

Install FFMPEG

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

Start Writing Code

Start Importing Packages in Python file

import streamlit as st;
import openai
from pytube import YouTube 
import whisper

Setup OpenAI API service

openai.organization = ""
openai.api_key = 'sk-*****kpeKyzuPRIT3Bl***************' # replace with your own key

Building UI of a WebApp using Streamlit

with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")

Taking the URL of a video and transcribing it using Whisper

yt = YouTube(text_input)              ## pass the input url 
yt.streams.filter(file_extension='mp3')
stream = yt.streams.get_by_itag(139)
stream.download('',"audio.mp3")            ## download the audio 
model = whisper.load_model("base")         ## load whisper model
result = model.transcribe("audio.mp3")     ## start transcribing
content = result["text"]                   ## store text

Generate the Summary of the transcription using OpenAI API

response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
max_tokens=200,
top_p=1.0,          ## calling API to get Summary using GPT engine 
frequency_penalty=0,
presence_penalty=0,)

Finally Display the Results

st.subheader("Here is your summary!")
st.write(response["choices"][0]["text"])   ## finally inject result to webapp using streamlit

Complete Source Code

import streamlit as st;
import openai
from pytube import YouTube 
import whisper

openai.organization = ""
openai.api_key = 'sk-yjfA0s****************1zOhM****lXM'

with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")


## input url of video ##

with st.container():
    st.write("---")
    text_input = st.text_input(
        "Please paste the url of the video 👇",
        placeholder="paste the url",                 # taking url of a YT video
    )

    if text_input:
        try: 
            with st.spinner('Wait for it...'):   ## streamlit loader
                tldr_tag = "\n\nTl;dr"         ## tag use to tell GPT engine where text is ended
                yt = YouTube(text_input)              ## pass url as text_input to pytube for for downloading the audio
                yt.streams.filter(file_extension='mp3')
                stream = yt.streams.get_by_itag(139)
                stream.download('',"audio.mp3")            ## download the audio and saved as audio.mp3 in same folder
                model = whisper.load_model("base")         ## load whisper model
                result = model.transcribe("audio.mp3")     ## start transcribing video into text
                content = result["text"]                   ## store text om content var
                st.write(content)
                response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
                max_tokens=200,
                top_p=1.0,                                 ## calling API to generate the summary of transcribed text stored in content var
                frequency_penalty=0,
                presence_penalty=0,
            )
                st.subheader("Here is your summary!")
                st.write(response["choices"][0]["text"])   ## finally inject responsed text into webapp using streamlit function 
            st.success('Done!')
        except: 
            print("Connection Error")

Hope you like it and Give me feedback please