DEV Community

Cover image for Using GPT-3 and Whisper to generate a Summary of a YouTube video of any language πŸ˜€
Karan Kulshestha
Karan Kulshestha

Posted on

Using GPT-3 and Whisper to generate a Summary of a YouTube video of any language πŸ˜€

In this post, you'll learn about how to use whisper and GPT-3 to generate a short summary of YouTube videos in any language. you can see demo video here
preview of web-app

Technologies Required

Setup Required

  • Install Whisper and Streamlit using these command
pip install git+https://github.com/openai/whisper.git 
pip install streamlit
Enter fullscreen mode Exit fullscreen mode
  • Install FFMPEG
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg 
Enter fullscreen mode Exit fullscreen mode

Start Writing Code

  • Start Importing Packages in Python file
import streamlit as st;
import openai
from pytube import YouTube 
import whisper
Enter fullscreen mode Exit fullscreen mode
  • Setup OpenAI API service
openai.organization = ""
openai.api_key = 'sk-*****kpeKyzuPRIT3Bl***************' # replace with your own key
Enter fullscreen mode Exit fullscreen mode
  • Building UI of a WebApp using Streamlit
with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")
Enter fullscreen mode Exit fullscreen mode
  • Taking the URL of a video and transcribing it using Whisper
yt = YouTube(text_input)              ## pass the input url 
yt.streams.filter(file_extension='mp3')
stream = yt.streams.get_by_itag(139)
stream.download('',"audio.mp3")            ## download the audio 
model = whisper.load_model("base")         ## load whisper model
result = model.transcribe("audio.mp3")     ## start transcribing
content = result["text"]                   ## store text 
Enter fullscreen mode Exit fullscreen mode
  • Generate the Summary of the transcription using OpenAI API
response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
max_tokens=200,
top_p=1.0,          ## calling API to get Summary using GPT engine 
frequency_penalty=0,
presence_penalty=0,)
Enter fullscreen mode Exit fullscreen mode
  • Finally Display the Results
st.subheader("Here is your summary!")
st.write(response["choices"][0]["text"])   ## finally inject result to webapp using streamlit
Enter fullscreen mode Exit fullscreen mode

Complete Source Code

import streamlit as st;
import openai
from pytube import YouTube 
import whisper

openai.organization = ""
openai.api_key = 'sk-yjfA0s****************1zOhM****lXM'

with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")


## input url of video ##

with st.container():
    st.write("---")
    text_input = st.text_input(
        "Please paste the url of the video πŸ‘‡",
        placeholder="paste the url",                 # taking url of a YT video
    )

    if text_input:
        try: 
            with st.spinner('Wait for it...'):   ## streamlit loader
                tldr_tag = "\n\nTl;dr"         ## tag use to tell GPT engine where text is ended
                yt = YouTube(text_input)              ## pass url as text_input to pytube for for downloading the audio
                yt.streams.filter(file_extension='mp3')
                stream = yt.streams.get_by_itag(139)
                stream.download('',"audio.mp3")            ## download the audio and saved as audio.mp3 in same folder
                model = whisper.load_model("base")         ## load whisper model
                result = model.transcribe("audio.mp3")     ## start transcribing video into text
                content = result["text"]                   ## store text om content var
                st.write(content)
                response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
                max_tokens=200,
                top_p=1.0,                                 ## calling API to generate the summary of transcribed text stored in content var
                frequency_penalty=0,
                presence_penalty=0,
            )
                st.subheader("Here is your summary!")
                st.write(response["choices"][0]["text"])   ## finally inject responsed text into webapp using streamlit function 
            st.success('Done!')
        except: 
            print("Connection Error")

Enter fullscreen mode Exit fullscreen mode

Hope you like it and Give me feedback please

My GitHub : link
You can connect with me here karankulx@gmail.com

Top comments (0)