DEV Community

Samagra Shrivastava
Samagra Shrivastava

Posted on

YouTube Video Transcripts Using LangChain

This post demonstrates how to use the LangChain library to load and save the transcript of a YouTube video. The python script retrieves the video's transcript, prints it, and writes the content to a text file for further use.

let's go through the code line by line:

from langchain.document_loaders import youtube
Enter fullscreen mode Exit fullscreen mode
  • This line imports the youtube module from the langchain.document_loaders package. This module is responsible for handling YouTube-related document loading functionalities.
import io
Enter fullscreen mode Exit fullscreen mode
  • This line imports the io module from Python's standard library, which provides tools for working with streams and I/O operations.
loader = youtube.YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=3OvmwM61vJw")
Enter fullscreen mode Exit fullscreen mode
  • This line creates an instance of YoutubeLoader by calling the from_youtube_url class method. The method takes a YouTube URL as an argument and initializes the loader object to handle the video at the specified URL.
docs = loader.load()
Enter fullscreen mode Exit fullscreen mode
  • This line calls the load method on the loader object. This method retrieves the document(s) (in this case, probably the transcript or other related data) from the YouTube video and stores them in the docs variable. docs is expected to be a list of document objects.
print(docs)
Enter fullscreen mode Exit fullscreen mode
  • This line prints the docs variable to the console. This helps in debugging or understanding what data has been loaded from the YouTube video.
with io.open("transcript.txt", "w", encoding="utf-8") as f:
Enter fullscreen mode Exit fullscreen mode
  • This line opens a file named transcript.txt in write mode with UTF-8 encoding. The with statement ensures that the file is properly opened and will be automatically closed after the indented block of code is executed. The file object is assigned to the variable f.
    for doc in docs:
Enter fullscreen mode Exit fullscreen mode
  • This line starts a for loop that iterates over each document object in the docs list.
        f.write(doc.page_content)
Enter fullscreen mode Exit fullscreen mode
  • Within the loop, this line writes the page_content attribute of each document object to the file f. This attribute likely contains the text content of the document (such as the transcript of the YouTube video).
    f.close()
Enter fullscreen mode Exit fullscreen mode
  • This line closes the file f. However, since the file was opened using the with statement, it will be closed automatically even if this line is omitted. Including it is redundant but does not cause any issues.

Summary

This code loads the transcript of a YouTube video, prints the loaded documents to the console, and writes the content of these documents to a file named transcript.txt.

Top comments (3)

Collapse
 
linkbenjamin profile image
Ben Link

Thanks for a quick & easy tutorial!

Collapse
 
samagra07 profile image
Samagra Shrivastava

You're welcome! I'm glad I could help.

Collapse
 
bobde_yagyesh profile image
Yagyesh Bobde

A great share!!

I personally created a tool which uses existing transcripts from youtube videos to enhance video learning experience. I have made a post about it here: supaclip.pro