Samagra Shrivastava

Posted on Jun 3

YouTube Video Transcripts Using LangChain

#programming #python #machinelearning #ai

This post demonstrates how to use the LangChain library to load and save the transcript of a YouTube video. The python script retrieves the video's transcript, prints it, and writes the content to a text file for further use.

let's go through the code line by line:

from langchain.document_loaders import youtube

This line imports the youtube module from the langchain.document_loaders package. This module is responsible for handling YouTube-related document loading functionalities.

import io

This line imports the io module from Python's standard library, which provides tools for working with streams and I/O operations.

loader = youtube.YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=3OvmwM61vJw")

This line creates an instance of YoutubeLoader by calling the from_youtube_url class method. The method takes a YouTube URL as an argument and initializes the loader object to handle the video at the specified URL.

docs = loader.load()

This line calls the load method on the loader object. This method retrieves the document(s) (in this case, probably the transcript or other related data) from the YouTube video and stores them in the docs variable. docs is expected to be a list of document objects.

print(docs)

This line prints the docs variable to the console. This helps in debugging or understanding what data has been loaded from the YouTube video.

with io.open("transcript.txt", "w", encoding="utf-8") as f:

This line opens a file named transcript.txt in write mode with UTF-8 encoding. The with statement ensures that the file is properly opened and will be automatically closed after the indented block of code is executed. The file object is assigned to the variable f.

    for doc in docs:

This line starts a for loop that iterates over each document object in the docs list.

        f.write(doc.page_content)

Within the loop, this line writes the page_content attribute of each document object to the file f. This attribute likely contains the text content of the document (such as the transcript of the YouTube video).

    f.close()

This line closes the file f. However, since the file was opened using the with statement, it will be closed automatically even if this line is omitted. Including it is redundant but does not cause any issues.

Summary

This code loads the transcript of a YouTube video, prints the loaded documents to the console, and writes the content of these documents to a file named transcript.txt.

Top comments (3)

Ben Link • Jun 3

Thanks for a quick & easy tutorial!

Samagra Shrivastava • Jun 4

You're welcome! I'm glad I could help.

Yagyesh Bobde • Jun 9

A great share!!

I personally created a tool which uses existing transcripts from youtube videos to enhance video learning experience. I have made a post about it here: supaclip.pro

DEV Community

YouTube Video Transcripts Using LangChain

Summary

Top comments (3)

Read next

Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀

The Power of LivinGrimoire AGI: Enhancing AI with Skill Absorption

Can English Replace Java? The Future of Programming in Plain Language

TDoC 2024 - Day 3: Introduction to Machine Learning