Megan Lee for LogRocket

Posted on Feb 21 • Originally published at blog.logrocket.com

Extracting YouTube video data with OpenAI and LangChain

#openai #langchain #webdev

In this tutorial, you will learn how to use a technique known as retrieval-augmented generation (RAG) alongside the OpenAI API and the LangChain framework to build a time-saving and cost-effective interactive command line application that allows you to retrieve information from a YouTube video without having to watch it.

RAG enhances LLM knowledge by incorporating external data, enabling AI applications to reason about information not included in their initial training data. It involves the process of introducing and incorporating specific information into the model prompt for improved reasoning.

To build the application, you’ll use the youtube-transcript package to retrieve YouTube video transcripts. You will then use LangChain and the Transformers.js package to generate free Hugging Face embeddings for the given transcript and store them in a vector store instead of relying on potentially expensive OpenAI embeddings. Lastly, you will use LangChain and an OpenAI model to retrieve information stored in the vector store.

By the end of this tutorial, you will have an application that looks similar to the following:

Prerequisites

To follow this tutorial, you will need:

Node.js and npm installed locally
A basic understanding of Node.js
An OpenAI API key

Creating the project root directory

In this section, you will create the project directory, initialize a Node.js application, and install the required packages.

First, open a terminal window and navigate to a suitable location for your project. Run the following commands to create the project directory and navigate into it:

mkdir youtube-video-rag
cd youtube-video-rag

In your project root directory, create a file named .env and store your OpenAI API key in it as the value for OPENAI_API_KEY:

OPENAI_API_KEY="Your OpenAI API key"

Run the following command to create a new Node project:

npm init -y

Set “type":"module" in the package.json file to load ES modules:

{
  "name": "youtube-video-rag",
  "version": "1.0.0",
  "description": "",
  "type": "module",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
}

Now, use the following command to install the packages needed to build this application:

npm install chalk dotenv youtube-transcript langchain @xenova/transformers

With the command above, you installed the following packages:

Chalk: Provides an easy way to stylize terminal strings with various colors and text formatting in Node.js, aiding in creating visually appealing command-line outputs
dotenv: Designed to load environment variables from a .env file into the process.env environment
youtube-transcript: Facilitates the retrieval of transcripts from YouTube videos
LangChain: Facilitates the creation of context-aware applications by connecting language models to diverse contextual sources. It uses language models to deduce responses or actions based on the provided context
Transformers.js: Provides access to various pre-trained transformer-based models for various tasks. These models can be used for text analysis, summarization, computer vision, and other tasks. LangChain will use this library under the hood to generate free embeddings

Retrieving the video transcript

In this section, you will write the code that will allow the application to retrieve the transcripts of a provided YouTube video.

In your project root directory, create a file named rag.js and add the following code to it:

import { YoutubeTranscript } from 'youtube-transcript';

This line imports the YoutubeTranscript module from the youtube-transcript package. This module will allow the application to retrieve the transcript of a YouTube video.

Add the following code below the import statement:

export async function getTranscript(videoURL) {
  try {
    let transcript = ""
    const transcriptChunks = await YoutubeTranscript.fetchTranscript(videoURL)
    if (transcriptChunks.length > 0) {
      for (let chunk of transcriptChunks) {
        transcript += " " + chunk.text
      }
    }
    console.log('transcript', transcript.length, transcript)
    return transcript
  } catch (error) {
    return ""
  }
}

This code block defines a function named getTranscript() that takes a videoURL parameter. This function is responsible for fetching and consolidating the transcript of a video from YouTube using the youtube-transcript library.

The function begins by declaring an empty string variable named transcript to store the fetched transcript.

Using the YoutubeTranscript.fetchTranscript() method, it retrieves the transcript chunks for the provided video URL asynchronously. The fetched transcript is stored in the transcriptChunks variable.

The code checks if transcriptChunks contains any elements, ensuring that the transcript data is available. If the data is available, it iterates through each chunk and concatenates the text content into the transcript string with spaces separating each chunk.

Finally, the function returns the concatenated transcript, containing the consolidated text of the video transcript. If errors occur during the fetching or consolidation process, the function returns an empty string as a fallback.

In your project root directory, create a file named app.js and add the following code to it:

import { stdin as input, stdout as output } from 'node:process';
import { createInterface } from 'node:readline/promises';
import chalk from 'chalk';
import { getTranscript } from './rag.js'

const readline = createInterface({ input, output });

The code initiates by importing the input and output modules from the node:process package, and the createInterface function from node:readline/promises.

Additionally, it imports the Chalk library for colorizing terminal output and a function named getTranscript() from the rag.js file. The input and output modules will be responsible for handling user input and output within the terminal. The readline package will be responsible for asynchronously reading user input.

Add the following code below the readline constant:

async function addVideo(videoURL) {
  if (videoURL === '') {
    videoURL = await readline.question(
      chalk.green('AI: Please send the youtube video URL')
      + chalk.blue('\nUser: ')
    );
  }

  const transcript = await getTranscript(videoURL)
  if (transcript === '') {
    console.info(chalk.red('\nAPP: the application was unable to retrieve the video transcript, please try again\n'))
    return false
  }
  return true
}

This code defines an asynchronous function named addVideo that is responsible for managing the addition of a YouTube video to the application. It accepts a videoURL parameter.

If videoURL is empty, the function prompts the user to input a YouTube video URL via the terminal using readline.question.

Once a URL is obtained or provided, it retrieves the transcript of the video using the getTranscript function. If the transcript retrieval fails or returns an empty result, it displays an error message indicating the failure to retrieve the video transcript and returns false.

If the transcript retrieval is successful and not empty, it signifies the successful addition of the video and returns true.

The following code defines an asynchronous function named main, which serves as the main entry point for the application's logic:

async function main() {
  const wasVideoAdded = await addVideo('')
  if (!wasVideoAdded) {
    return
  }
}
await main()
readline.close();

Inside the function, it awaits the result of the addVideo function, passing an empty string as its argument.

The addVideo function is responsible for handling the addition of a YouTube video to the application, as explained earlier. If the video addition process is unsuccessful (i.e., it returns false), the main function exits early using the return statement.

Finally, the main function is invoked using await main(), initiating the main application logic. After the execution of main, the code closes the readline interface using readline.close(). Upon the closure of the readline interface, the application will also terminate.

Run the following command to start the application:

node app.js

When prompted, add a YouTube video URL that you would like to extract information from. Here we added the URL of LogRocket’s Issue Management demo video. After a few seconds, depending on the video length, you should see the following output:

You should see the transcript length and text displayed in the terminal.

Stop the application and comment out the following line (located inside the getTranscript() function) before moving to the next section:

// console.log('transcript', transcript.length, transcript)

Generating and storing the transcript embeddings

In this section, you will use LangChain and Transformers.js to generate free text embeddings of the transcript and store them in a vector store.

Text embeddings transform textual content into numerical vectors, capturing semantic relationships and contextual information to facilitate natural language processing tasks. The vectors' distances reflect their similarity, with shorter distances indicating higher similarity and longer distances suggesting lower similarity.

A vector store is a data structure storing embeddings or vectors associated with specific documents or items, enabling similarity comparisons and retrievals based on vector distances.

Add the following code to the rag.js file below the import statement section:

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from "langchain/document";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import "dotenv/config"

let vectorStore

The code block imports key modules from the LangChain library and configures the application to use environment variables stored in a .env file.

First, it imports RecursiveCharacterTextSplitter from langchain/text_splitter, which splits text into smaller, more manageable chunks for further processing.

Next, the Document module from langchain/document represents the text document structure used within the LangChain library. It also imports MemoryVectorStore from langchain/vectorstores/memory, which manages the storage and retrieval of vectorized text representations.

Additionally, it imports HuggingFaceTransformersEmbeddings from langchain/embeddings/hf_transformers to facilitate text embedding using the Transformers.js library.

Lastly, it pulls in environment variables from .env using dotenv and declares a variable named vectorStore.

Add the following code below the getTranscript function:

export async function generateEmbeddings(transcript) {
  const embeddings = new HuggingFaceTransformersEmbeddings({
    modelName: "Xenova/all-MiniLM-L6-v2",
  });
  const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 200,
  });
  const splitDocs = await textSplitter.splitDocuments([
    new Document({ pageContent: transcript }),
  ]);
  vectorStore = await MemoryVectorStore.fromDocuments(
    splitDocs,
    embeddings
  );
  console.log("vectorStore", vectorStore)
  return true
}

This code block defines a function named generateEmbeddings(). This function takes a parameter named transcript and is responsible for creating embeddings for the provided transcript using the Hugging Face Transformers library.

Within this function, it initiates an instance of HuggingFaceTransformersEmbeddings with a specified model name. It also initializes a RecursiveCharacterTextSplitter instance, configuring it with specific chunk sizes and overlaps for splitting the text.

The function then utilizes textSplitter to split the transcript into chunks of text, encapsulated within a Document object. These split documents are then processed to generate vector embeddings through the MemoryVectorStore by invoking the fromDocuments method, using the embeddings and split documents as inputs.

Go to the app.js file, and import the generateEmbeddings() function in the line where you imported the getTranscript() function:

import { getTranscript, generateEmbeddings } from './rag.js'

In the app.js file, add the following code to the addVideo function before the last return statement:

async function addVideo(videoURL) {
  ...
  const wasEmbeddingGenerated = await generateEmbeddings(transcript)
  if (!wasEmbeddingGenerated) {
    console.info(chalk.red('\nAPP: the application was unable to generate embeddings, please try again\n'))
    return false
  }
  return true
}

The code added is responsible for checking if embeddings were successfully generated for the provided video transcript. It generates embeddings by calling the generateEmbeddings() function, passing the transcript as a parameter.

If the embeddings generation process encounters an issue or fails, it displays an error message indicating the failure to generate embeddings and returns a Boolean value of false.

Run the following command to start the application:

node app.js

When prompted, add a YouTube video URL that you would like to extract information from. After a few seconds, depending on the video length, you should see the following output: You should see the contents of the vector store displayed in the terminal.

Stop the application and comment out the following line (located inside the generateEmbeddings() function) before moving to the next section:

// console.log("vectorStore", vectorStore)

Retrieving information from the video

In this section, you will use LangChain and an OpenAI model to query information stored in the vector store containing the transcript embeddings.

Go to the rag.js file and add the following code to the import statements:

import { OpenAI } from "langchain/llms/openai";
import { RetrievalQAChain } from "langchain/chains";

The code added introduces specific modules from LangChain essential for utilizing OpenAI language models and handling retrieval-based question-answer chains. The OpenAI module allows the use of the OpenAI language models within the LangChain framework.

The RetrievalQAChain module allows the use of a retrieval-based question-answer chain within LangChain. Such chains are structured to retrieve relevant answers based on specific queries or questions presented to the AI system.

Add the following code below the line where you declared the vectorStore variable:

const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const model = new OpenAI({ modelName: "gpt-3.5-turbo", openAIApiKey: OPENAI_API_KEY, temperature: 0 });

The code snippet initializes a constant OPENAI_API_KEY by fetching the environment variable OPENAI_API_KEY from the process environment using process.env.

Following that, a new instance of the OpenAI class is created and stored in the model variable. This instance configuration includes setting the model name as gpt-3.5-turbo, providing the OpenAI API key through the openAIApiKey property, and configuring the temperature parameter as 0, which controls the randomness of the model's responses.

Add the following function below the generateEmbeddings() function:

export async function askLLM(question) {
  const chain = RetrievalQAChain.fromLLM(model, vectorStore.asRetriever());
  const result = await chain.call({
    query: question,
  });
  return result
}

Here, the code defines and exports an asynchronous function named askLLM responsible for interfacing with the LangChain's Retrieval QA chain. This function expects a parameter question representing the query or question to be processed by the retrieval-based QA chain.

The RetrievalQAChain instance is created using the fromLLM method from the LangChain library, taking in model and vectorStore as parameters. This step constructs the retrieval-based QA chain, leveraging the specified language model (model) and vector store (vectorStore) as the retriever.

Upon calling chain.call(), the method processes the provided question within the retrieval-based QA chain. It then returns the result retrieved by the chain based on the query.

Go to the app.js file and add the askLLM function in the line where you imported the getTranscript() function:

import { getTranscript, generateEmbeddings, askLLM } from './rag.js'

Add the following code to the bottom of the main function:

async function main() {
  ...
  let userInput = await readline.question(
    chalk.green('AI: Ask anything about the Youtube video.')
    + chalk.blue('\nUser: ')
  );
  while (userInput !== '.exit') {
    try {
      if (userInput.includes('https://www.youtube')) {
        const videoURL = userInput
        const wasVideoAdded = await addVideo(videoURL)
        if (!wasVideoAdded) {
          return
        }
        userInput = await readline.question(
          chalk.green('AI: Ask anything about the Youtube video.')
          + chalk.blue('\nUser: ')
        );
      }
      const llmResponse = await askLLM(userInput)
      if (llmResponse) {
        userInput = await readline.question(
          chalk.green('\nAI: ' + llmResponse.text)
          + chalk.blue('\nUser: ')
        );
      } else {
        userInput = await readline.question(
          chalk.blue('\nAPP: No response, try asking again')
          + chalk.blue('\nUser: ')
        );
      }
    } catch (error) {
      console.error(chalk.red(error.message));
      return
    }
  }
}

After the video addition step, the code prompts the user to input a question or command related to the YouTube video. The program then enters a while loop, processing the user input until the command .exit is entered.

During each iteration of the loop, the code checks if the user input contains a YouTube URL. If it does, the system attempts to add the video via addVideo(videoURL), using the extracted URL. If successful, the user is prompted again to ask a question related to the YouTube video.

If the user's input is not a video URL, the system processes the input using the askLLM() function, which interacts with a language model to generate a response based on the user query.

If a response (llmResponse) is obtained, it's displayed to the user in green as part of the conversation. Otherwise, the user is prompted to retry their question.

Throughout this process, error handling is implemented, with any encountered errors being logged in red before terminating the function's execution.

Run the following command to start the application:

node app.js

When prompted, add a YouTube video URL from which you would like to extract information. When prompted, write the question that you would like to ask: The output above shows that the application is working as expected and you are now able to query information about the video.

Conclusion

In this tutorial, you learned to use the RAG technique with the OpenAI API and LangChain to create a time-saving and cost-effective command line app. This app allows you to fetch info from a YouTube video using its URL without the need to watch the video.

Throughout this tutorial, you've discovered how to get video transcripts, create and save free embeddings, and then use them to get useful data with an OpenAI model.

Get set up with LogRocket's modern error tracking in minutes:

Visit https://logrocket.com/signup/ to get an app ID.
Install LogRocket via NPM or script tag. LogRocket.init() must be called client-side, not server-side.

NPM:

$ npm i --save logrocket 

// Code:

import LogRocket from 'logrocket'; 
LogRocket.init('app/id');

Script Tag:

Add to your HTML:

<script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
<script>window.LogRocket && window.LogRocket.init('app/id');</script>

3.(Optional) Install plugins for deeper integrations with your stack:

Redux middleware
ngrx middleware
Vuex plugin

Get started now

DEV Community

Extracting YouTube video data with OpenAI and LangChain

Prerequisites

Creating the project root directory

Retrieving the video transcript

Generating and storing the transcript embeddings

Retrieving information from the video

Conclusion

Get set up with LogRocket's modern error tracking in minutes:

Top comments (0)

Read next

Day 5 of #100DaysOFCode

Resolve “Hash sum mismatch” error when running `sudo apt-get update` in Kali linux 2024.1

BlurryImageLoader: A JavaScript Library for Blurry Image Loading Effects ✨

Webhooks and WebSockets