In the vast ocean of content that is YouTube, the ability to quickly and accurately summarize video content is not just a luxury, but a necessity for many users and businesses.
Whether you're a content creator wanting to provide concise summaries for your audience, a researcher looking to extract key information, or a business aiming to analyze video content at scale, having the right tools and techniques is crucial.
This guide delves deep into the world of YouTube video summarization, harnessing the power of cutting-edge technologies including Deepgram for superior audio transcription, Langchain for harvesting the power of the LLM, and Mistral 7B, a state-of-the-art and open-source LLM.
Together, these technologies form a formidable trio, enabling us to extract, process, and summarize video content with unparalleled accuracy and efficiency.
In this tutorial, you'll construct a fully functional Streamlit application from the ground up. Streamlit lets you turn simple data scripts into web applications without traditional front-end tools. This application will be capable of downloading audio from any YouTube video, transcribing it using Deepgram, and then summarizing the content with the assistance of Mistral 7B, all streamlined through the capabilities of Langchain.
You can deploy and preview the YouTube Summarization application from this guide using the Deploy to Koyeb button below:
Note: Remember to use a larger instance for faster processing times and to replace the value of the DEEPGRAM_API_KEY
environment variable with your own information (as described in the section on integrating Deepgram).
Requirements
To successfully develop this application, you will need the following:
- Python installed on your machine, here we use version 3.11
- A Koyeb account to deploy the application
- A Deepgram account for using their API. Deepgram offers new accounts $200 credit without providing credit card details.
- A GitHub account to store your application code and trigger deployments
Steps
- Install and setup Streamlit
- Download audio with yt-dip
- Transcribe audio with Deepgram
- Summarize transcript with Langchain and Mistral 7B
- Combine everything together in the Streamlit app
- Deploy to Koyeb
Install and Setup Streamlit
First, start by creating a new project. You should use venv
to keep your Python dependencies organized in a virtual environment.
Create a new project/folder locally on your computer with:
# Create and move to the new folder
mkdir YouTubeSummarizer
cd YouTubeSummarizer
# Create a virtual environment
python -m venv venv
# Active the virtual environment (Windows)
.\venv\Scripts\activate.bat
# Active the virtual environment (Linux and MacOS)
source ./venv/bin/activate
To install Streamlit, you just need to run the pip command:
pip install streamlit
A Streamlit application can consist of more than one page, but in this case, you will create a single-page application.
Streamlit allows you to design the page by adding different components to that page. The components can be text, like headings and sub-headings, but also objects like input widgets and download buttons.
To start your Streamlit application, create a new file called main.py
, with the following initial contents:
import streamlit as st
# Set page title
st.set_page_config(page_title="YouTube Video Summarization", page_icon="📜", layout="wide")
# Set title
st.title("YouTube Video Summarization", anchor=False)
st.header("Summarize YouTube videos with AI", anchor=False)
This code snippet uses the Streamlit library to set up the web application interface:
- The code starts by importing the Streamlit library, which is aliased as
st
for easier reference throughout the code. - Next, the
st.set_page_config
function is called to configure the page settings. The page title is set to "YouTube Video Summarization", a scroll emoji is set as the page icon, and the layout is set to "wide" which provides a wider layout compared to the default. - Following the page configuration, the
st.title
andst.header
functions are called to set the main title and header of the page respectively. Additionally, theanchor
parameter is specified asFalse
in both functions, indicating that there won't be HTML anchor links associated with the title and header.
This setup lays down the basic structure and design of the web page for the YouTube Video Summarization application, creating a user-friendly interface.
You can run the application with:
streamlit run main.py
You will add more elements to this code when you build the final application in the last section, but for now you have the initial basic Streamlit application:
Download audio with yt-dlp
To download audio from YouTube videos, you'll utilize the widely used yt-dlp library, which can be installed using the pip command as follows:
pip install yt-dlp
Now, let's proceed to craft the download logic to retrieve the audio file from a YouTube video.
For better organization and to keep the logic separate, you'll place this logic in a new file. Go ahead and create a file named download.py
.
from yt_dlp import YoutubeDL
def download_audio_from_url(url):
videoinfo = YoutubeDL().extract_info(url=url, download=False)
length = videoinfo['duration']
filename = f"./audio/youtube/{videoinfo['id']}.mp3"
options = {
'format': 'bestaudio/best',
'keepvideo': False,
'outtmpl': filename,
}
with YoutubeDL(options) as ydl:
ydl.download([videoinfo['webpage_url']])
return filename, length
# Testing by running this file
if __name__ == "__main__":
url = "https://www.youtube.com/watch?v=q_eMJiOPZMU"
filename, length = download_audio_from_url(url)
print(f"Audio file: {filename} with length {length} seconds")
print("Done!")
Here’s a breakdown of what each section of the code does:
- The
YoutubeDL
class is imported from theyt_dlp
library. It is used to interact with and download video/audio from YouTube. - An instance of
YoutubeDL
is created and itsextract_info
method is called with theurl
argument to gather information about the video without downloading it (download=False
). The information is stored in thevideoinfo
variable. - The
filename
variable constructs a file path where the audio will be saved, using the video's unique id and an.mp3
extension. - An
options
dictionary is created to specify the download options:-
'format': 'bestaudio/best'
specifies to download the best audio quality available. -
'keepvideo': False
indicates that the video portion should not be kept after downloading. -
'outtmpl': filename
specifies the output template for the filename.
-
- A new
YoutubeDL
instance is created with theoptions
dictionary as the argument, and within awith
block, itsdownload
method is called with the video URL to download the audio.
This function encapsulates the process of downloading audio from a YouTube video, preparing the necessary file path and download options, and utilizing the yt_dlp
library to perform the audio download.
You can test the audio download with the sample URL provided in the code or you can use your own. Run the file with:
python download.py
A similar output to this will be shown:
[youtube] Extracting URL: https://www.youtube.com/watch?v=q_eMJiOPZMU
[youtube] q_eMJiOPZMU: Downloading webpage
[...]
[download] Destination: audio\youtube\q_eMJiOPZMU.mp3
[download] 100% of 3.09MiB in 00:00:00 at 16.97MiB/s
Audio file: ./audio/youtube/q_eMJiOPZMU.mp3 with length 242 seconds
Done!
You can check the audio file at the path mentioned in the output.
Transcribe audio with Deepgram
To summarize the video, a transcript is required. In the realm of audio or video recordings, a transcript is a document encompassing the textual representation of all spoken content, serving as a medium to access and read the recording's content in text form.
For transcribing, you'll use the remarkable Deepgram library. You can register for free and receive a $200 credit without providing credit card details.
Post registration, you'll gain access to the Dashboard where you can generate the necessary API key:
You can define your API key with these options (you can give it a different name if you would like):
Then just click on the 'Create Key' button to create your API key and you will see the newly generated key:
Make sure to copy this key to a safe place, like a text file. You will need it later on.
To keep your API key safe and not exposed in the code, create a .env
file where you will keep any necessary secrets:
DEEPGRAM_API_KEY=<your key here>
To use the keys from the .env
file, you will use the python-decouple
library, which can be installed with:
pip install python-decouple
You should also install the Deepgram SDK:
pip install deepgram-sdk
Now you have all the necessary configuration to write the transcribe functionality.
Create a new file called transcribe.py
where you will place this logic:
from decouple import config
from deepgram import Deepgram
DEEPGRAM_API_KEY = config('DEEPGRAM_API_KEY')
def transcribe_audio(filename):
dg_client = Deepgram(DEEPGRAM_API_KEY)
with open(filename, 'rb') as audio:
source = {'buffer': audio, 'mimetype': 'audio/mp3'}
response = dg_client.transcription.sync_prerecorded(source,
model='nova-2-ea',
smart_format=True)
transcript = response['results']['channels'][0]['alternatives'][0]['transcript']
return transcript
Here's a breakdown of each part of the code:
- The function
transcribe_audio
is defined with a parameterfilename
, which is expected to be the path to the audio file to be transcribed. - An instance of the Deepgram client is created named
dg_client
, usingDEEPGRAM_API_KEY
as the authentication key, which comes from the.env
file. - The
with open(filename, 'rb') as audio:
line is using a context manager to open the specified audio file in binary read mode ('rb'
). This ensures that the file is automatically closed after use. - A dictionary named
source
is created to hold the audio buffer and its mime type. Thebuffer
key holds the audio file object, and themimetype
key specifies that the audio file is in MP3 format. - A transcription request is made to Deepgram using the
sync_prerecorded
method of thedg_client
object. The method is passed several arguments:-
source
: Thesource
dictionary containing the audio buffer and mime type. -
model
: Specifies the model to be used for transcription, in this case,nova-2-ea
. -
smart_format
: When set toTrue
, this option enables smart formatting of the transcript.
-
- The transcript text is extracted from the response dictionary by navigating through its nested structure:
response['results']['channels'][0]['alternatives'][0]['transcript']
.
This function encapsulates the process of transcribing an audio file using the Deepgram API, extracting the transcript text from the response, and returning it.
Summarize transcript with Langchain and Mistral 7B
So far you have downloaded an audio file from YouTube and created a transcript. Now is the time to create a summary of that transcript of the orginal video, so that it captures the most important points.
For this, you will use the Langchain library, a powerful toolkit for orchestrating language models. At the core of this orchestration is the Map-Reduce pattern, powered by Langchain's MapReduceDocumentsChain
, to process the transcript in a systematic and scalable manner.
It loads the transcript and then segments it into manageable chunks, which are individually summarized (mapped) before being consolidated into a final summary (reduced).
You will also leverage the prowess of Mistral 7B, a state-of-the-art language model, orchestrated through a series of chains and templates provided by Langchain.
- In the use of the Langchain library, "chains" refer to a sequence of processing steps where the output from one step is used as the input for the next, facilitating complex text processing tasks like summarization through an orchestrated flow.
- On the other hand, "templates" are predefined textual frameworks called "PromptTemplates" which structure the prompts given to language models, incorporating placeholders and specific instructions that mold the model's outputs towards a targeted outcome, ensuring consistency and direction in the tasks being executed.
The final result is a well-organized, consolidated summary of the main points from the transcript, achieved with a high degree of efficiency and accuracy.
Start by installing the Langchain library, transformers, and ctransformers, as usual with a pip command:
pip install langchain ctransformers transformers
The transformers
library is an open-source, state-of-the-art machine-learning library developed by Hugging Face. It provides a vast collection of pre-trained models specialized in various NLP tasks, including text classification, summarization, translation, and question-answering.
Now you can create a new file called summarize.py
that will contain the summarization logic:
import os
import time
from langchain.chains import MapReduceDocumentsChain, LLMChain, ReduceDocumentsChain, StuffDocumentsChain
from langchain.llms import CTransformers
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import (TextLoader)
# (... code continues in following snippets)
First, you start with the imports for the necessary chains, prompt template, text splitter, and also load the transcript text.
To run Mistral 7B, you will use the CTransformers library from Langchain itself.
Next, you expand the file to load the transcript and LLM (Mistral 7B):
# summarize.py
# (... previous code ...)
def summarize_transcript(filename):
# Load transcript
loader = TextLoader(filename)
docs = loader.load()
# Load LLM
config = {'max_new_tokens': 4096, 'temperature': 0.7, 'context_length': 4096}
llm = CTransformers(model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
config=config,
threads=os.cpu_count())
Here is the breakdown of the code:
- A function named
summarize_transcript
is defined withfilename
as its argument, which is intended to hold the path to the transcript file. - An instance of
TextLoader
is created and theload
method is called to load the contents of the transcript file into a variable nameddocs
. - A language model instance (
CTransformers
) is initialized using a specific model and configuration, which is stored in a variable namedllm
. - The
config
variable holds the LLM configuration:-
max_new_tokens
: This specifies the maximum number of tokens that the language model is allowed to generate in a single invocation. In this case, the model should not generate more than 4096 tokens. -
temperature
: This controls the randomness of the output generated by the language model. A temperature closer to 0 makes the model more deterministic, favoring more likely outcomes. A higher temperature leads to more diversity but can also result in less coherent outputs. A value of0.7
suggests a balance between randomness and determinism. -
context_length
: This indicates the number of tokens from the input that the model should consider when generating new content or making predictions. In this case, the model takes into account up to 4096 tokens of the provided input data to understand the context before generating any output.
-
For the LLM you will use a version of Mistral 7B from TheBloke, which is optimized to run on the CPU. When the code is run, the model will be automatically downloaded.
The next step is to build the prompt templates that will generate the prompts that run at the different stages of the chain:
# summarize.py
# (... previous function code ...)
# Map template and chain
map_template = """<s>[INST] The following is a part of a transcript:
{docs}
Based on this, please identify the main points.
Answer: [/INST] </s>"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)
# Reduce template and chain
reduce_template = """<s>[INST] The following is set of summaries from the transcript:
{doc_summaries}
Take these and distill it into a final, consolidated summary of the main points.
Construct it as a well organized summary of the main points and should be between 3 and 5 paragraphs.
Answer: [/INST] </s>"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)
This portion of the code is focused on setting up the templates and chains for the Map and Reduce phases of the summarization process using the Langchain library, here’s a breakdown.
For the map_template
and chain:
- A string
map_template
is defined to hold the template for mapping. The template is structured to instruct a language model to identify the main points from a portion of the transcript represented by{docs}
. -
PromptTemplate.from_template(map_template)
is then called to convert the string template into aPromptTemplate
object, which is stored inmap_prompt
. - An instance of
LLMChain
is created with thellm
object (representing the language model) andmap_prompt
as arguments, and is stored inmap_chain
. This chain will be used to process individual chunks of the transcript and identify the main points from each chunk.
For the reduce_template
and chain:
- A string
reduce_template
is defined to hold the template for reducing. The template is structured to instruct a language model to consolidate a set of summaries (represented by{doc_summaries}
) into a final, organized summary of main points that should span between 3 and 5 paragraphs. -
PromptTemplate.from_template(reduce_template)
is called to convert the string template into aPromptTemplate
object, which is stored inreduce_prompt
. - An instance of
LLMChain
is created with thellm
object andreduce_prompt
as arguments, and is stored inreduce_chain
. This chain will be used to process the set of summaries generated from the map phase and consolidate them into a final summary.
With the different templates and individual chain components prepared, the next step is to start building the chain itself:
# summarize.py
# (... previous function code ...)
# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
llm_chain=reduce_chain, document_variable_name="doc_summaries"
)
# Combines and iteratively reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
# This is final chain that is called.
combine_documents_chain=combine_documents_chain,
# If documents exceed context for `StuffDocumentsChain`
collapse_documents_chain=combine_documents_chain,
# The maximum number of tokens to group documents into.
token_max=4000,
)
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
# Map chain
llm_chain=map_chain,
# Reduce chain
reduce_documents_chain=reduce_documents_chain,
# The variable name in the llm_chain to put the documents in
document_variable_name="docs",
# Return the results of the map steps in the output
return_intermediate_steps=True,
)
This code block is dedicated to setting up various chains that orchestrate the process of summarizing documents. These chains are designed to work together, with each serving a specific role in the summarization pipeline, here's a detailed breakdown.
Combining documents chain:
-
StuffDocumentsChain
: This chain is initiated withllm_chain
set toreduce_chain
anddocument_variable_name
set to "doc_summaries". Its purpose is to take a list of documents, combine them into a single string, and pass this combined string to thereduce_chain
. This is stored in the variablecombine_documents_chain
.
Reducing documents chain:
-
ReduceDocumentsChain
: This chain is initiated with three arguments:-
combine_documents_chain
: Refers to the previously definedcombine_documents_chain
. -
collapse_documents_chain
: Also refers tocombine_documents_chain
, indicating that if the documents exceed the context length, they should be passed to thecombine_documents_chain
to be collapsed or combined further. -
token_max
: Set to 4000, this argument specifies the maximum number of tokens to group documents into before passing them to thecombine_documents_chain
. - This chain, stored in the variable
reduce_documents_chain
, is designed to manage the iterative process of reducing or summarizing the documents further.
-
The map-reduce chain:
-
MapReduceDocumentsChain
: This is a more complex chain that orchestrates the map-reduce process for summarizing documents. It's initiated with several arguments:-
llm_chain
: Refers to the previously definedmap_chain
, which is used for mapping or summarizing individual chunks of documents. -
reduce_documents_chain
: Refers to thereduce_documents_chain
, which is used for reducing or summarizing the mapped documents further. -
document_variable_name
: Set to "docs", this argument specifies the variable name in thellm_chain
to put the documents in. -
return_intermediate_steps
: Set toTrue
, this argument specifies that the results of the map steps should be returned in the output. - This chain, stored in the variable
map_reduce_chain
, orchestrates the overall map-reduce process of summarizing documents by first mapping a chain over them to summarize individual chunks, and then reducing or consolidating these summaries further.
-
The final step in the summarize.py
file and the summarization function is to actually run the complete chain:
# summarize.py
# (... previous function code ...)
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=4000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)
# Run the chain
start_time = time.time()
result = map_reduce_chain.__call__(split_docs, return_only_outputs=True)
print(f"Time taken: {time.time() - start_time} seconds")
return result['output_text']
This section of the code is focused on preparing the documents for processing, and executing the map-reduce summarization chain, here's a breakdown:
Splitting documents:
- An instance of
RecursiveCharacterTextSplitter
is created, namedtext_splitter
, with achunk_size
of 4000 andchunk_overlap
of 0. This object is designed to split documents into smaller chunks based on character count. - The
split_documents
method oftext_splitter
is then called with thedocs
variable (which contains the loaded transcript) as the argument. This splits the transcript into smaller chunks, and the result is stored in thesplit_docs
variable.
Summarization Chain:
- The
__call__
method ofmap_reduce_chain
is then executed withsplit_docs
andreturn_only_outputs=True
as arguments. This triggers the map-reduce summarization process on the split documents. Thereturn_only_outputs=True
argument indicates that only the final outputs of the chain should be returned.
Measuring LLM execution time:
- The
time.time()
method is called at the end, and the difference between this time andstart_time
is calculated to determine the time taken to run the summarization chain.
This completes the summarize.py
file. It may seem quite complex, but the basic principle is to prepare the prompts, prepare the map-reduce chains, and execute the final chain, splitting documents when necessary.
Combine them all together in the Streamlit app
With all the necessary components built for the different stages of the flow, you can now focus on creating the user interface to receive a YouTube URL, process it and also display info for each stage of the process, and finally show the resulting summary.
The final step in building this summarization application is add the necessary widgets to finish the Streamlit application that you started in the first step.
You can complete the main.py
file:
# main.py
# (... previous import ..)
from download import download_audio_from_url
from summarize import summarize_transcript
from transcribe import transcribe_audio
# (... code continuation ...)
# Input URL
st.divider()
url = st.text_input("Enter YouTube URL", value="")
# Download audio
st.divider()
if url:
with st.status("Processing...", state="running", expanded=True) as status:
st.write("Downloading audio file from YouTube...")
audio_file, length = download_audio_from_url(url)
st.write("Transcribing audio file...")
transcript = transcribe_audio(audio_file)
st.write("Summarizing transcript...")
with open("transcript.txt", "w") as f:
f.write(transcript)
summary = summarize_transcript("transcript.txt")
status.update(label="Finished", state="complete")
# Play Audio
st.divider()
st.audio(audio_file, format='audio/mp3')
# Show Summary
st.subheader("Summary:", anchor=False)
st.write(summary)
This script is constructed to handle user input, download audio from a specified YouTube URL, transcribe the audio, summarize the transcript, and display the result using the Streamlit framework. Here’s a detailed breakdown:
Input URL:
- A visual divider is created using
st.divider()
. - The
st.text_input
function creates a text input box for the user to enter a YouTube URL. The text "Enter YouTube URL" is displayed as a placeholder.
Processing URL:
- Within the
if
statement, a status indicator is initiated usingst.status
with the message "Processing...", and the following steps are encapsulated within this status indicator:- The
download_audio_from_url
function is called with the URL as an argument, and the returned audio file path and length are stored inaudio_file
andlength
, respectively. - Then a call to the
transcribe_audio
function withaudio_file
as the argument, storing the returned transcript intranscript
. - The transcript is written to a file named "transcript.txt", and then the
summarize_transcript
function is called with "transcript.txt" as the argument, storing the returned summary insummary
. - The status indicator is updated to "Finished" using
status.update
.
- The
Summary:
- The
st.audio
function is used to create an audio player widget that allows the user to play the downloaded audio file, specifying the format as 'audio/mp3'. - The summary is displayed below the subheader using
st.write(summary)
.
This concludes your YouTube summarization application, you can now run with:
streamlit run main.py
You should see something similar to this:
<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/esnASX1nxnM?si=aT_mWANnWteVcYuI"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen
Deploy to Koyeb
Now that you have the application running locally, depending on your CPU it might take some seconds to run the summarization. Deploy it on Koyeb and take advantage of the better processing power offered by high-performance microVMs.
Create a repository on your GitHub account, called YouTubeSummarizer
.
Then create a .gitignore
file in your local directory to exclude some folders and files from being pushed to the repository:
# PyCharm files
.idea
# Audio folder
audio
# Python virtual environment
venv
# Environment variables
.env
# Transcripts text file
transcript.txt
Run the following commands in your terminal to commit and push your code to the repository:
echo "# YouTubeSummarizer" >> README.md
git init
git add .
git commit -m "first commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main
You should now have all your local code in your remote repository. Now it is time to deploy the application.
Within the Koyeb control panel, while on the Overview tab, initiate the app creation and deployment process by clicking Create App. On the App deployment page:
- Select GitHub as your deployment method.
- Choose the repository that you created earlier. For example,
YouTubeSummarizer
. - Select the Buildpack as your builder option.
- Click Build and deploy settings to configure your Run command by selecting
Override
and adding the same command as when you ran the application locally:streamlit run main.py
. - In the Instance selection, click "XLarge". This will provide you instance a good balance between performance and cost.
- Select the regions where you want this service to run.
- Click Advanced to view additional settings.
- Set the port to
8501
. This is the port Streamlit uses to serve your application. - Click the Add Variable button to add your Deepgram API key named
DEEPGRAM_API_KEY
. - Set the App name to your choice. Keep in mind it will be used to create the URL for your application.
- Finally, click the Deploy button.
Your application will start to deploy. Please note that the first time that you run the application it will take longer because it will download the Mistral 7B model. Subsequent runs will be much faster.
After the deployment process is complete, you can access your app by clicking with the application URL.
<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/esnASX1nxnM?si=aT_mWANnWteVcYuI"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen
Conclusion
By following this guide, you have created a sophisticated tool for summarizing YouTube video content. You've seen firsthand how potent tools like Deepgram, Langchain, and Mistral 7B can be used together to create a Streamlit application that not only simplifies but also amplifies the value of video content.
Deploying this application to Koyeb enables you to harness the power of high-performance microVMs, ensuring that your application runs smoothly in the regions where your users are located.
Now that you have a working application, you can think of ways to expand it, whether for enhancing personal productivity, advancing research, or providing cutting-edge business solutions.
Top comments (0)