parmarjatin4911@gmail.com

Posted on Feb 7

YouTube Chatbot using LangChain and OpenAI

YouTube Chatbot using LangChain and OpenAI

Unlock the power to interact with YouTube videos from your command line

These days creating a chatbot based on a YouTube video’s content is easy work. How cool is that! Just a few months ago, before the release of powerful LLMs like OpenAI’s ChatGPT, it would have sounded impossible, or at least very hard. These days … just a few lines of code.

So, how do we do it?

In this article, we explore LangChain’s ability to

access a YouTube video’s transcript
store and categorise the transcript’s content
interact with the content chatbot style using OpenAI’s ChatGPT API

We start by seeing how easy it is to interact with YouTube videos using just a few lines of code and some of LangChain’s higher-level wrapper classes.

Then we do a deep dive into what is happening behind the scenes with these wrappers and build the YouTube chatbot again using lower-level LangChain components. This will help give us a better idea of how it all works and provide us with more flexible code that will be easier to customise to our needs.

Contents

YouTube Chatbots in the Wild

Coding Time - a quick script using high-level LangChain wrappers

Digging Deeper - using lower-level component LangChain components

Summary

YouTube Chatbots in the Wild

People are using these concepts out in the wild these days. Here is just a small sample of some of those ideas and projects

summate, provides weekly summaries of your favourite YouTube content
ChatGPT for YouTube, a Chrome extension that lets you interact with YouTube videos
Youtube-to-Chatbot, a Python project that lets you create a chatbot to interact with YouTube videos. It’s made from the same person that brought us TopGPT

OK, looks cool. Let’s get started building our own projects!

Coding Time

A quick script using high-level LangChain wrappers

Interacting with YouTube videos chatbot style is super easy to do. We can get the whole thing done in LangChain using just a few lines of code.

All the code for the scripts below is available on GitHub, in case you want to check it out and start running the scripts immediately. Remember to read the Readme file and setup your OpenAI API key in the .env file correctly.

git clone https://github.com/parmarjh/youtube-chatbot.git

Otherwise, let’s get started building up our first script. First, we’ll create the directory, script file, and kick off our virtual environment.

mkdir youtube-chatbot
cd youtube-chatbot
touch simple-chatbot.py
python3 -m venv .venv
. .venv/bin/activate

And now, let’s install our dependencies. With the script below, we are are good with just a few lines of code. Make sure to update the script with your OpenAI API key, also.

pip install langchain==0.0.202
pip install youtube-transcript-api==0.6.1
pip install openai==0.27.8
pip install chromadb==0.3.26
pip install tiktoken==0.4.0

import os
import sys
from langchain.document_loaders import YoutubeLoader
from langchain.indexes import VectorstoreIndexCreator

os.environ["OPENAI_API_KEY"] = "sk-XXX"

video_id = sys.argv[1]

loader = YoutubeLoader(video_id)
docs = loader.load()

index = VectorstoreIndexCreator()
index = index.from_documents(docs)

response = index.query("Summarise the video in 3 bullet points")
print(f"Answer: {response}")

Notice how many dependencies there are, even though it’s just a few lines of code. As we mentioned at the start, we are using some high-level wrapper classes here, which let us get the job done quickly while doing a lot of work under the hood. In the next section, we will explore what is happening under the hood a little more.

Now, you can run the following command to interact with a video:It is the part after the v= in the URL.

For example, let’s get a summary of the Alex Hormozi video on Dropping shipping with the following code:

python3 simple-chatbot.py 9fCi_YN-z6E

Answer:

Provide digital products such as ebooks and video courses to increase perceived value.
Offer incentives to customers who leave reviews and use techniques to boost the visibility of positive reviews.
Offer bonuses such as training videos to help customers use the product in the best way.

All right, sweet! That was easy. This is so easy to do because LangChain provides the high level YoutubeLoader and VectorstoreIndexCreator classes, which actually do quite a lot of work under the hood.
Digging Deeper
Using lower-level component LangChain components

But what are the YoutubeLoader and VectorstoreIndexCreator classes actually doing under the hood? Let’s dig in a little and find out. This is useful if we want to build more complicated applications or products where we want to customise things a little more.

So, a list of what is actually going on will include the following:

Load the transcript of the video from YouTube
Split the transcript into lots of little text chunks, semantically categorise them (using embeddings), and store the categorised chunks into a vector store
Create a Q&A chain that lets us interact with OpenAI and can use the vector store as a source for context
Interact with the Q&A chain. First, we pass our question to it. On receiving our question, it retrieves relevant bits of information from our vector store, uses that as the context in our prompt, and sends the context and question in a prompt to OpenAI, which returns our response.

That is quite a lot going on behind the scenes! The process looks something like this:
video transcript, text chunks, vector store, context + user, question, vector store = prompt, open ai, response

If this looks a little confusing, or some of the concepts like text chunking, embeddings, vector stores, and passing related text chunks to a prompt context sound a bit foreign, I wrote an article about building a multi-document chatbot that goes into these concepts in much detail. It introduces these ideas much more gradually, so I would definitely recommend checking it out if you are still trying to get familiar with these concepts.

And the article is so related because a document reader chatbot or YouTube video chatbot is essentially the same thing. The only thing we are changing is the source from which we get our content. Once we have converted the content source into LangChain Documents (i.e., our text chunks in the diagram above) using a Document Loader, the entire process from then on is the same.

LangChain actually allows us to grab content from lots of different sources, including Twitter feeds, Notion docs, databases, and much more. It calls these Document Loaders, and in our case, we are leveraging its YouTube document loader.

OK. Now, let’s have some fun and try and build the YouTube chatbot using some of the lower-level classes from LangChain again. We will discover that this will also give us more flexibility as we can customise the code to suit our needs more.

To find out what to do, we can just inspect the actual code inside the YoutubeLoader and VectorstoreIndexCreator classes.

As always, I recommend downloading the LangChain source code and digging around to see how it works.

git clone https://github.com/hwchase17/langchain.git

Let’s start with the YoutubeLoader class.

So, first, we are creating an instance of the YoutubeLoader, and then calling its load method. Here’s how to do that:

loader = YoutubeLoader(video_id)
docs = loader.load()

If we look at the definition of the class and method inside the LangChain repository, we see the following:

class YoutubeLoader(BaseLoader):
def load(self) -> List[Document]:

What it is doing is pulling the entire transcript from YouTube, using the video_id we passed in, converting the entire transcript into a single Document, and returning an array (List) with a single element, our Document

To get the information from YouTube, under the hood, it uses the youtube-transcript-api python library. We won’t dig any more into this part.

The VectorstoreIndexCreator is actually doing some more interesting things.

Let’s investigate the VectorstoreIndexCreator class and its from_documents method. First, let’s have a look at the class definition and the default properties it is setting.

class VectorstoreIndexCreator(BaseModel):
vectorstore_cls: Type[VectorStore] = Chroma
embedding: Embeddings = Field(default_factory=OpenAIEmbeddings)
text_splitter: TextSplitter = Field(default_factory=_get_default_text_splitter)
vectorstore_kwargs: dict = Field(default_factory=dict)

So, what is going on here?

It is setting up Chroma as our default vector database
It is setting up the OpenAI Embeddings transformers as the transformer to use to create the embeddings from our text context
The _get_default_text_splitter method is being used to set up the TextSplitter , which we use to split the entire transcript into small chunks so that later we can pass the relevant parts to the prompt context in the query to the LLM.

If we lookup the _get_default_text_splitter() method, it returns the RecursiveCharacterTextSplitter

def _get_default_text_splitter() -> TextSplitter:
return RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

As mentioned in the LangChain docs about the Recursive Character Splitter:

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically-related pieces of text.

So, for random bits of text (i.e., our YouTube transcripts), where we have very little control over the formatting, a RecursiveCharacterTextSplitter makes sense. We cannot be sure if there will be new lines or spaces, so we start breaking the text down based on the largest natural whitespace breaks (ie. line breaks, "\n\n"), all the way down to no spaces at all ("").

The idea seems to be that large white spaces, like line breaks, would be natural breaking points in the text, so it would be a good place to start trying to break up the text while keeping semantically similar bits of text inside the same chunks.

This is compared to the CharacterTextSplitter, which only splits along new lines “\n\n” , which would not be very practical in our use case, as we cannot guarantee how many of these new lines there would be in our transcripts. There might not be any new lines, and the text splitter would return a single chunk containing all of our text inside it.

OK, now let’s look at the from_document method in some more detail:

class VectorstoreIndexCreator(BaseModel):

def from_documents(self, documents: List[Document]) -> VectorStoreIndexWrapper:
"""Create a vectorstore index from documents."""
sub_docs = self.text_splitter.split_documents(documents)
vectorstore = self.vectorstore_cls.from_documents(
sub_docs, self.embedding, **self.vectorstore_kwargs
)
return VectorStoreIndexWrapper(vectorstore=vectorstore)

It uses the text splitter to split the transcript into smaller chunks. Then convert them to embeddings (using the OpenAI Embeddings we set up in the constructor) and store them in the vector store.

And it is returning us a VectorStoreIndexWrapper. This is the class we called the query method on in our initial script to make our request to the LLM. Here is the call again for reference:

index = VectorstoreIndexCreator()
index = index.from_documents(docs)
response = index.query("Summarise the video in 3 bullet points")

So, let’s explore the VectorStoreIndexWrapper and its query method some more.

class VectorStoreIndexWrapper(BaseModel):
def query(
self, question: str, llm: Optional[BaseLanguageModel] = None, **kwargs: Any
) -> str:
"""Query the vectorstore."""
llm = llm or OpenAI(temperature=0)
chain = RetrievalQA.from_chain_type(
llm, retriever=self.vectorstore.as_retriever(), **kwargs
)
return chain.run(question)

Again, a lot of magic is happening under the hood here. It is setting up a RetrievalQA chain with OpenAI as the LLM and the vector store as the retriever. To get related bits of text for the prompt context, we can run our query against the chain.

Now, while this is super fast to set up, the following things make it less flexible than using the lower-level components directly:

We are forced to use the RetrievalQA chain. However, we may indeed want to use another chain, such the ConversationalRetrievalChain which would allow us to maintain a conversation history and respond in the context of recent conversations, which the RetrievalQA chain does not support
We cannot pass in any PromptTemplates
We need to re-instantiate the RetrievalQA chain on each call to query, which could be potentially inefficient.
If we wanted to pass in a custom LLM, we would need to pass it in on each query call (see below). This may make our code look bulkier than needed if we were to have many calls to the query method throughout our script. Ideally, we could set up our chain once at the start of our script, with the LLM we want, and then use it whenever we want to query something without having to pass in the LLM again.

what our query call would look like if we had to pass in the llm each time

index.query("Summarise the video in 3 bullet points", llm=OpenAI())

So, what is our solution? Build out the entire thing ourselves using the lower-level components we just described. Let’s do this so we can:

get a better feel for how it works under the hood
get a better grasp of the concepts needed for building a chatbot
get more familiar with the LangChain library to build cooler products in the future!

Let’s create the file, chatbot.py, and create the directory where we will store the database file. Here’s how to get started:

touch chatbot.py
mkdir data

First, let’s add all the imports we need to get that part out of the way.

import os
import argparse
import shutil
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

Next, we will accept an argument from the CLI command to tell us what video to retrieve. Let’s do that using the argparse module. And let’s set up our OpenAI API key again also.

os.environ["OPENAI_API_KEY"] = "sk-XXX"

parser = argparse.ArgumentParser(description="Query a Youtube video:")
parser.add_argument("-v", "--video-id", type=str, help="The video ID from the Youtube video")
args = parser.parse_args()

That means we can now call our class using the following command:

python3 chatbot.py --video-id={video_id}

And if we pass in the --help flag, we will get a list of the commands the script will accept. This feature is provided by argparse out of the box.

python3 chatbot.py --help

usage: chatbot.py [-h] [-v VIDEO_ID]

Query a Youtube video:
options:
-h, --help show this help message and exit
-v VIDEO_ID, --video-id VIDEO_ID The video ID from the Youtube video

OK, now let’s load our transcript and convert it to a Document. This works the same as in our previous example. We use the YouTube document loader directly.

loader = YoutubeLoader(args.video_id)
documents = loader.load()

Now, we want to split the documents into lots of little chunks of text. This will be so that we can send small chunks of related text into the LLM prompt to provide context for our questions. We will use the RecursiveCharacterTextSplitter, just like the VectorstoreIndexCreator does

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

OK, great. Now, let’s set up our vector database and embeddings transformer. We will store our database files inside the data directory of our project by setting the persist_directory flag. Also, we will use the shutil.rmtree(‘./data’) command to clear the data from there on each run before the database is created.

While changing the code between runs (i.e., during development while you are testing things out), the database data seems to get corrupted, and the script throws an exception. Not exactly sure why that is happening, but cleaning out the data directory on each run seems to fix it.

Again, below, we will use the OpenAIEmbeddings to convert our text chunks into embeddings.

shutil.rmtree('./data')
vectordb = Chroma.from_documents(
documents,
embedding=OpenAIEmbeddings(),
persist_directory='./data'
)
vectordb.persist()

Great, we are nearly done. All we need to do now is set up the Q&A chain and start a chat window. We will create a RetrievalQA similar to our initial example.

qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(),
retriever=vectordb.as_retriever(),
return_source_documents=True,
verbose=False
)

The retriever=vectordb.as_retriever() means we are using our vector database as a source for the context of the prompt. When we execute a query on the chain, the RetrievalQA will also pass the query to the vector store and retrieve the chunks of content that seem related to our query. It will then pass these chunks of text in as context to the LLM prompt.

And now, we can set up a while loop to create an interactive chat with the chain. We’ll also add some text colour to make it nicer to interact with. Press q at any time to exit the script.

green = "\033[0;32m"
white = "\033[0;39m"

while True:
query = input(f"{green}Prompt: ")
if query == "quit" or query == "q":
break
if query == '':
continue
response = qa_chain({'query': query})
print(f"{white}Answer: " + response['result'])

Now, let’s call our script by passing in video-id.

python3 chatbot.py --video-id=9fCi_YN-z6E

Prompt:summarise the video in 3 bullet points
Answer:

Anchoring is a strategy used in negotiations, where whoever puts a number down first wins.
When faced with a low offer, ignore the anchor and suggest a higher number.
Start high when pricing for services and double the number when asked for advice.

Here is a screenshot of the interaction:
summarise the video in three bullet points answer: provide incentives to get people to complete a task offer digital products (ebooks, video courses) to add value offer bonuses such as tutorials/tips to help customer use product effectively summarise the video in 200 words Alex Hormozi explains how to build a successful dropshipping business. fundamentals of selecting a product, creating good ads, high-quality store, analyzing the numbers. suggests to get reviews, increase perceived value, etc.

Here’s the entire script:

os.environ["OPENAI_API_KEY"] = "sk-"

parser = argparse.ArgumentParser(description="Query a Youtube video:")
parser.add_argument("-v", "--video-id", type=str, help="The video ID from the Youtube video")
args = parser.parse_args()

loader = YoutubeLoader(args.video_id)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

shutil.rmtree('./data')
vectordb = Chroma.from_documents(
documents,
embedding=OpenAIEmbeddings(),
persist_directory='./data'
)
vectordb.persist()

qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(),
retriever=vectordb.as_retriever(),
return_source_documents=True,
verbose=False
)

green = "\033[0;32m"
white = "\033[0;39m"

while True:
query = input(f"{green}Prompt: ")
if query == "quit" or query == "q":
break
if query == '':
continue
response = qa_chain({'query': query})
print(f"{white}Answer: " + response['result'])

And here is the command to rerun it:

python3 chatbot.py --video-id=9fCi_YN-z6E

And we are all done!

Summary

The idea here was to show how easy it is to get a YouTube video chatbot up and running using LangChain, and then do a deep dive into some of the concepts we are using to build the chatbot and rebuild it using some of the lower-level LangChain components.

Hopefully, this helped to better understand how some of the higher-level LangChain wrappers work under the hood and get you more familiar with the different moving parts needed to create a chatbot.

The final script we ended up with, using the lower-level components, also gives our code more flexibility and opportunities for customisation. This is helpful if you want to build it out into something more useful and complex.

Some of the things we could do to take this idea a bit further include:

Setup a UI for the application using Streamlit or Vercel
Use YouTube transcripts to fine-tune an LLM, so it can respond in different voices

I am thinking about exploring how to fine-tune an LLM using YouTube transcripts as part of the next article, so if you think that sounds interesting, let me know.