This post demonstrates how to use the LangChain library to load and save the transcript of a YouTube video. The python script retrieves the video's transcript, prints it, and writes the content to a text file for further use.
let's go through the code line by line:
from langchain.document_loaders import youtube
- This line imports the
youtubemodule from thelangchain.document_loaderspackage. This module is responsible for handling YouTube-related document loading functionalities.
import io
- This line imports the
iomodule from Python's standard library, which provides tools for working with streams and I/O operations.
loader = youtube.YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=3OvmwM61vJw")
- This line creates an instance of
YoutubeLoaderby calling thefrom_youtube_urlclass method. The method takes a YouTube URL as an argument and initializes theloaderobject to handle the video at the specified URL.
docs = loader.load()
- This line calls the
loadmethod on theloaderobject. This method retrieves the document(s) (in this case, probably the transcript or other related data) from the YouTube video and stores them in thedocsvariable.docsis expected to be a list of document objects.
print(docs)
- This line prints the
docsvariable to the console. This helps in debugging or understanding what data has been loaded from the YouTube video.
with io.open("transcript.txt", "w", encoding="utf-8") as f:
- This line opens a file named
transcript.txtin write mode with UTF-8 encoding. Thewithstatement ensures that the file is properly opened and will be automatically closed after the indented block of code is executed. The file object is assigned to the variablef.
for doc in docs:
- This line starts a for loop that iterates over each document object in the
docslist.
f.write(doc.page_content)
- Within the loop, this line writes the
page_contentattribute of each document object to the filef. This attribute likely contains the text content of the document (such as the transcript of the YouTube video).
f.close()
- This line closes the file
f. However, since the file was opened using thewithstatement, it will be closed automatically even if this line is omitted. Including it is redundant but does not cause any issues.
Summary
This code loads the transcript of a YouTube video, prints the loaded documents to the console, and writes the content of these documents to a file named transcript.txt.
Top comments (3)
Thanks for a quick & easy tutorial!
You're welcome! I'm glad I could help.
A great share!!
I personally created a tool which uses existing transcripts from youtube videos to enhance video learning experience. I have made a post about it here: supaclip.pro