DEV Community

Cover image for Using GPT-3 and Whisper to generate a Summary of a YouTube video of any language πŸ˜€
Karan Kulshestha
Karan Kulshestha

Posted on

2

Using GPT-3 and Whisper to generate a Summary of a YouTube video of any language πŸ˜€

In this post, you'll learn about how to use whisper and GPT-3 to generate a short summary of YouTube videos in any language. you can see demo video here
preview of web-app

Technologies Required

Setup Required

  • Install Whisper and Streamlit using these command
pip install git+https://github.com/openai/whisper.git 
pip install streamlit
Enter fullscreen mode Exit fullscreen mode
  • Install FFMPEG
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg 
Enter fullscreen mode Exit fullscreen mode

Start Writing Code

  • Start Importing Packages in Python file
import streamlit as st;
import openai
from pytube import YouTube 
import whisper
Enter fullscreen mode Exit fullscreen mode
  • Setup OpenAI API service
openai.organization = ""
openai.api_key = 'sk-*****kpeKyzuPRIT3Bl***************' # replace with your own key
Enter fullscreen mode Exit fullscreen mode
  • Building UI of a WebApp using Streamlit
with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")
Enter fullscreen mode Exit fullscreen mode
  • Taking the URL of a video and transcribing it using Whisper
yt = YouTube(text_input)              ## pass the input url 
yt.streams.filter(file_extension='mp3')
stream = yt.streams.get_by_itag(139)
stream.download('',"audio.mp3")            ## download the audio 
model = whisper.load_model("base")         ## load whisper model
result = model.transcribe("audio.mp3")     ## start transcribing
content = result["text"]                   ## store text 
Enter fullscreen mode Exit fullscreen mode
  • Generate the Summary of the transcription using OpenAI API
response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
max_tokens=200,
top_p=1.0,          ## calling API to get Summary using GPT engine 
frequency_penalty=0,
presence_penalty=0,)
Enter fullscreen mode Exit fullscreen mode
  • Finally Display the Results
st.subheader("Here is your summary!")
st.write(response["choices"][0]["text"])   ## finally inject result to webapp using streamlit
Enter fullscreen mode Exit fullscreen mode

Complete Source Code

import streamlit as st;
import openai
from pytube import YouTube 
import whisper

openai.organization = ""
openai.api_key = 'sk-yjfA0s****************1zOhM****lXM'

with st.container():
    st.header("Youtube Summary")
    st.title("Get the summary of any YouTube video in any language")


## input url of video ##

with st.container():
    st.write("---")
    text_input = st.text_input(
        "Please paste the url of the video πŸ‘‡",
        placeholder="paste the url",                 # taking url of a YT video
    )

    if text_input:
        try: 
            with st.spinner('Wait for it...'):   ## streamlit loader
                tldr_tag = "\n\nTl;dr"         ## tag use to tell GPT engine where text is ended
                yt = YouTube(text_input)              ## pass url as text_input to pytube for for downloading the audio
                yt.streams.filter(file_extension='mp3')
                stream = yt.streams.get_by_itag(139)
                stream.download('',"audio.mp3")            ## download the audio and saved as audio.mp3 in same folder
                model = whisper.load_model("base")         ## load whisper model
                result = model.transcribe("audio.mp3")     ## start transcribing video into text
                content = result["text"]                   ## store text om content var
                st.write(content)
                response = openai.Completion.create(engine="text-davinci-002",prompt=content + tldr_tag,temperature=0.3,
                max_tokens=200,
                top_p=1.0,                                 ## calling API to generate the summary of transcribed text stored in content var
                frequency_penalty=0,
                presence_penalty=0,
            )
                st.subheader("Here is your summary!")
                st.write(response["choices"][0]["text"])   ## finally inject responsed text into webapp using streamlit function 
            st.success('Done!')
        except: 
            print("Connection Error")

Enter fullscreen mode Exit fullscreen mode

Hope you like it and Give me feedback please

My GitHub : link
You can connect with me here karankulx@gmail.com

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Sentry image

See why 4M developers consider Sentry, β€œnot bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more