Written by Carlos Mucuho
✏️
In this tutorial, you will learn how to use a technique known as retrieval-augmented generation (RAG) alongside the OpenAI API and the LangChain framework to build a time-saving and cost-effective interactive command line application that allows you to retrieve information from a YouTube video without having to watch it.
RAG enhances LLM knowledge by incorporating external data, enabling AI applications to reason about information not included in their initial training data. It involves the process of introducing and incorporating specific information into the model prompt for improved reasoning.
To build the application, you’ll use the youtube-transcript package to retrieve YouTube video transcripts. You will then use LangChain and the Transformers.js package to generate free Hugging Face embeddings for the given transcript and store them in a vector store instead of relying on potentially expensive OpenAI embeddings. Lastly, you will use LangChain and an OpenAI model to retrieve information stored in the vector store.
By the end of this tutorial, you will have an application that looks similar to the following:
Prerequisites
To follow this tutorial, you will need:
- Node.js and npm installed locally
- A basic understanding of Node.js
- An OpenAI API key
Creating the project root directory
In this section, you will create the project directory, initialize a Node.js application, and install the required packages.
First, open a terminal window and navigate to a suitable location for your project. Run the following commands to create the project directory and navigate into it:
mkdir youtube-video-rag
cd youtube-video-rag
In your project root directory, create a file named .env
and store your OpenAI API key in it as the value for OPENAI_API_KEY
:
OPENAI_API_KEY="Your OpenAI API key"
Run the following command to create a new Node project:
npm init -y
Set “type":"module"
in the package.json
file to load ES modules:
{
"name": "youtube-video-rag",
"version": "1.0.0",
"description": "",
"type": "module",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
}
Now, use the following command to install the packages needed to build this application:
npm install chalk dotenv youtube-transcript langchain @xenova/transformers
With the command above, you installed the following packages:
- Chalk: Provides an easy way to stylize terminal strings with various colors and text formatting in Node.js, aiding in creating visually appealing command-line outputs
- dotenv: Designed to load environment variables from a
.env
file into theprocess.env
environment - youtube-transcript: Facilitates the retrieval of transcripts from YouTube videos
- LangChain: Facilitates the creation of context-aware applications by connecting language models to diverse contextual sources. It uses language models to deduce responses or actions based on the provided context
- Transformers.js: Provides access to various pre-trained transformer-based models for various tasks. These models can be used for text analysis, summarization, computer vision, and other tasks. LangChain will use this library under the hood to generate free embeddings
Retrieving the video transcript
In this section, you will write the code that will allow the application to retrieve the transcripts of a provided YouTube video.
In your project root directory, create a file named rag.js
and add the following code to it:
import { YoutubeTranscript } from 'youtube-transcript';
This line imports the YoutubeTranscript
module from the youtube-transcript
package. This module will allow the application to retrieve the transcript of a YouTube video.
Add the following code below the import statement:
export async function getTranscript(videoURL) {
try {
let transcript = ""
const transcriptChunks = await YoutubeTranscript.fetchTranscript(videoURL)
if (transcriptChunks.length > 0) {
for (let chunk of transcriptChunks) {
transcript += " " + chunk.text
}
}
console.log('transcript', transcript.length, transcript)
return transcript
} catch (error) {
return ""
}
}
This code block defines a function named getTranscript()
that takes a videoURL
parameter. This function is responsible for fetching and consolidating the transcript of a video from YouTube using the youtube-transcript library.
The function begins by declaring an empty string variable named transcript
to store the fetched transcript.
Using the YoutubeTranscript.fetchTranscript()
method, it retrieves the transcript chunks for the provided video URL asynchronously. The fetched transcript is stored in the transcriptChunks
variable.
The code checks if transcriptChunks
contains any elements, ensuring that the transcript data is available. If the data is available, it iterates through each chunk and concatenates the text content into the transcript string with spaces separating each chunk.
Finally, the function returns the concatenated transcript, containing the consolidated text of the video transcript. If errors occur during the fetching or consolidation process, the function returns an empty string as a fallback.
In your project root directory, create a file named app.js
and add the following code to it:
import { stdin as input, stdout as output } from 'node:process';
import { createInterface } from 'node:readline/promises';
import chalk from 'chalk';
import { getTranscript } from './rag.js'
const readline = createInterface({ input, output });
The code initiates by importing the input
and output
modules from the node:process
package, and the createInterface
function from node:readline/promises
.
Additionally, it imports the Chalk library for colorizing terminal output and a function named getTranscript()
from the rag.js
file. The input
and output
modules will be responsible for handling user input and output within the terminal. The readline
package will be responsible for asynchronously reading user input.
Add the following code below the readline
constant:
async function addVideo(videoURL) {
if (videoURL === '') {
videoURL = await readline.question(
chalk.green('AI: Please send the youtube video URL')
+ chalk.blue('\nUser: ')
);
}
const transcript = await getTranscript(videoURL)
if (transcript === '') {
console.info(chalk.red('\nAPP: the application was unable to retrieve the video transcript, please try again\n'))
return false
}
return true
}
This code defines an asynchronous function named addVideo
that is responsible for managing the addition of a YouTube video to the application. It accepts a videoURL
parameter.
If videoURL
is empty, the function prompts the user to input a YouTube video URL via the terminal using readline.question
.
Once a URL is obtained or provided, it retrieves the transcript of the video using the getTranscript
function. If the transcript retrieval fails or returns an empty result, it displays an error message indicating the failure to retrieve the video transcript and returns false
.
If the transcript retrieval is successful and not empty, it signifies the successful addition of the video and returns true
.
The following code defines an asynchronous function named main
, which serves as the main entry point for the application's logic:
async function main() {
const wasVideoAdded = await addVideo('')
if (!wasVideoAdded) {
return
}
}
await main()
readline.close();
Inside the function, it awaits the result of the addVideo
function, passing an empty string as its argument.
The addVideo
function is responsible for handling the addition of a YouTube video to the application, as explained earlier. If the video addition process is unsuccessful (i.e., it returns false
), the main
function exits early using the return
statement.
Finally, the main
function is invoked using await main()
, initiating the main application logic. After the execution of main
, the code closes the readline
interface using readline.close()
. Upon the closure of the readline
interface, the application will also terminate.
Run the following command to start the application:
node app.js
When prompted, add a YouTube video URL that you would like to extract information from. Here we added the URL of LogRocket’s Issue Management demo video. After a few seconds, depending on the video length, you should see the following output:
You should see the transcript length and text displayed in the terminal.
Stop the application and comment out the following line (located inside the getTranscript()
function) before moving to the next section:
// console.log('transcript', transcript.length, transcript)
Generating and storing the transcript embeddings
In this section, you will use LangChain and Transformers.js to generate free text embeddings of the transcript and store them in a vector store.
Text embeddings transform textual content into numerical vectors, capturing semantic relationships and contextual information to facilitate natural language processing tasks. The vectors' distances reflect their similarity, with shorter distances indicating higher similarity and longer distances suggesting lower similarity.
A vector store is a data structure storing embeddings or vectors associated with specific documents or items, enabling similarity comparisons and retrievals based on vector distances.
Add the following code to the rag.js
file below the import statement section:
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from "langchain/document";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import "dotenv/config"
let vectorStore
The code block imports key modules from the LangChain library and configures the application to use environment variables stored in a .env
file.
First, it imports RecursiveCharacterTextSplitter
from langchain/text_splitter
, which splits text into smaller, more manageable chunks for further processing.
Next, the Document
module from langchain/document
represents the text document structure used within the LangChain library. It also imports MemoryVectorStore
from langchain/vectorstores/memory
, which manages the storage and retrieval of vectorized text representations.
Additionally, it imports HuggingFaceTransformersEmbeddings
from langchain/embeddings/hf_transformers
to facilitate text embedding using the Transformers.js library.
Lastly, it pulls in environment variables from .env
using dotenv
and declares a variable named vectorStore
.
Add the following code below the getTranscript
function:
export async function generateEmbeddings(transcript) {
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: "Xenova/all-MiniLM-L6-v2",
});
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splitDocs = await textSplitter.splitDocuments([
new Document({ pageContent: transcript }),
]);
vectorStore = await MemoryVectorStore.fromDocuments(
splitDocs,
embeddings
);
console.log("vectorStore", vectorStore)
return true
}
This code block defines a function named generateEmbeddings()
. This function takes a parameter named transcript
and is responsible for creating embeddings for the provided transcript using the Hugging Face Transformers library.
Within this function, it initiates an instance of HuggingFaceTransformersEmbeddings
with a specified model name. It also initializes a RecursiveCharacterTextSplitter
instance, configuring it with specific chunk sizes and overlaps for splitting the text.
The function then utilizes textSplitter
to split the transcript into chunks of text, encapsulated within a Document
object. These split documents are then processed to generate vector embeddings through the MemoryVectorStore
by invoking the fromDocuments
method, using the embeddings and split documents as inputs.
Go to the app.js
file, and import the generateEmbeddings()
function in the line where you imported the getTranscript()
function:
import { getTranscript, generateEmbeddings } from './rag.js'
In the app.js
file, add the following code to the addVideo
function before the last return
statement:
async function addVideo(videoURL) {
...
const wasEmbeddingGenerated = await generateEmbeddings(transcript)
if (!wasEmbeddingGenerated) {
console.info(chalk.red('\nAPP: the application was unable to generate embeddings, please try again\n'))
return false
}
return true
}
The code added is responsible for checking if embeddings were successfully generated for the provided video transcript. It generates embeddings by calling the generateEmbeddings()
function, passing the transcript as a parameter.
If the embeddings generation process encounters an issue or fails, it displays an error message indicating the failure to generate embeddings and returns a Boolean value of false
.
Run the following command to start the application:
node app.js
When prompted, add a YouTube video URL that you would like to extract information from. After a few seconds, depending on the video length, you should see the following output: You should see the contents of the vector store displayed in the terminal.
Stop the application and comment out the following line (located inside the generateEmbeddings()
function) before moving to the next section:
// console.log("vectorStore", vectorStore)
Retrieving information from the video
In this section, you will use LangChain and an OpenAI model to query information stored in the vector store containing the transcript embeddings.
Go to the rag.js
file and add the following code to the import statements:
import { OpenAI } from "langchain/llms/openai";
import { RetrievalQAChain } from "langchain/chains";
The code added introduces specific modules from LangChain essential for utilizing OpenAI language models and handling retrieval-based question-answer chains. The OpenAI
module allows the use of the OpenAI language models within the LangChain framework.
The RetrievalQAChain
module allows the use of a retrieval-based question-answer chain within LangChain. Such chains are structured to retrieve relevant answers based on specific queries or questions presented to the AI system.
Add the following code below the line where you declared the vectorStore
variable:
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const model = new OpenAI({ modelName: "gpt-3.5-turbo", openAIApiKey: OPENAI_API_KEY, temperature: 0 });
The code snippet initializes a constant OPENAI_API_KEY
by fetching the environment variable OPENAI_API_KEY
from the process environment using process.env
.
Following that, a new instance of the OpenAI
class is created and stored in the model
variable. This instance configuration includes setting the model name as gpt-3.5-turbo
, providing the OpenAI API key through the openAIApiKey
property, and configuring the temperature parameter as 0
, which controls the randomness of the model's responses.
Add the following function below the generateEmbeddings()
function:
export async function askLLM(question) {
const chain = RetrievalQAChain.fromLLM(model, vectorStore.asRetriever());
const result = await chain.call({
query: question,
});
return result
}
Here, the code defines and exports an asynchronous function named askLLM
responsible for interfacing with the LangChain's Retrieval QA chain. This function expects a parameter question
representing the query or question to be processed by the retrieval-based QA chain.
The RetrievalQAChain
instance is created using the fromLLM
method from the LangChain library, taking in model
and vectorStore
as parameters. This step constructs the retrieval-based QA chain, leveraging the specified language model (model
) and vector store (vectorStore
) as the retriever.
Upon calling chain.call()
, the method processes the provided question
within the retrieval-based QA chain. It then returns the result retrieved by the chain based on the query.
Go to the app.js
file and add the askLLM
function in the line where you imported the getTranscript()
function:
import { getTranscript, generateEmbeddings, askLLM } from './rag.js'
Add the following code to the bottom of the main
function:
async function main() {
...
let userInput = await readline.question(
chalk.green('AI: Ask anything about the Youtube video.')
+ chalk.blue('\nUser: ')
);
while (userInput !== '.exit') {
try {
if (userInput.includes('https://www.youtube')) {
const videoURL = userInput
const wasVideoAdded = await addVideo(videoURL)
if (!wasVideoAdded) {
return
}
userInput = await readline.question(
chalk.green('AI: Ask anything about the Youtube video.')
+ chalk.blue('\nUser: ')
);
}
const llmResponse = await askLLM(userInput)
if (llmResponse) {
userInput = await readline.question(
chalk.green('\nAI: ' + llmResponse.text)
+ chalk.blue('\nUser: ')
);
} else {
userInput = await readline.question(
chalk.blue('\nAPP: No response, try asking again')
+ chalk.blue('\nUser: ')
);
}
} catch (error) {
console.error(chalk.red(error.message));
return
}
}
}
After the video addition step, the code prompts the user to input a question or command related to the YouTube video. The program then enters a while
loop, processing the user input until the command .exit
is entered.
During each iteration of the loop, the code checks if the user input contains a YouTube URL. If it does, the system attempts to add the video via addVideo(videoURL)
, using the extracted URL. If successful, the user is prompted again to ask a question related to the YouTube video.
If the user's input is not a video URL, the system processes the input using the askLLM()
function, which interacts with a language model to generate a response based on the user query.
If a response (llmResponse
) is obtained, it's displayed to the user in green as part of the conversation. Otherwise, the user is prompted to retry their question.
Throughout this process, error handling is implemented, with any encountered errors being logged in red before terminating the function's execution.
Run the following command to start the application:
node app.js
When prompted, add a YouTube video URL from which you would like to extract information. When prompted, write the question that you would like to ask: The output above shows that the application is working as expected and you are now able to query information about the video.
Conclusion
In this tutorial, you learned to use the RAG technique with the OpenAI API and LangChain to create a time-saving and cost-effective command line app. This app allows you to fetch info from a YouTube video using its URL without the need to watch the video.
Throughout this tutorial, you've discovered how to get video transcripts, create and save free embeddings, and then use them to get useful data with an OpenAI model.
Get set up with LogRocket's modern error tracking in minutes:
- Visit https://logrocket.com/signup/ to get an app ID.
- Install LogRocket via NPM or script tag.
LogRocket.init()
must be called client-side, not server-side.
NPM:
$ npm i --save logrocket
// Code:
import LogRocket from 'logrocket';
LogRocket.init('app/id');
Script Tag:
Add to your HTML:
<script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
<script>window.LogRocket && window.LogRocket.init('app/id');</script>
3.(Optional) Install plugins for deeper integrations with your stack:
- Redux middleware
- ngrx middleware
- Vuex plugin
Top comments (0)