YouTube Playlist Transcript Tool
This simple command-line tool downloads the full transcript from every video in a YouTube playlist and saves them into a single merged text file. It uses open source libraries to fetch video metadata, extract available transcripts, and combine them with clear section headers for readability.
Features
- Automatically processes all videos in a public YouTube playlist
- Fetches auto-generated or manually created transcripts where available
- Each video's transcript is prefixed with its title and video URL
- Outputs a clean, organized
.txtfile suitable for reading or further processing - Lightweight and dependency-managed via standard Python packages
Installation
You'll need Python 3.7+ installed. Install required packages using:
pip install youtube-transcript-api pytube
Usage
Run the script from the command line:
python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output output.txt
The tool will fetch all video IDs from the playlist, attempt to retrieve their transcripts, and save everything into the specified output file. If a video doesnβt have a transcript available, it will log a warning and continue with the next.
Output Format
Each transcript section looks like:
=== Video: [Title] ===
URL: https://youtu.be/abc123
Transcript:
[Full text here...]
---
This structure makes it easy to parse later or simply read through.
Notes
- Only works with playlists that are public and have transcripts enabled
- Transcripts depend on YouTube's availability (either auto-generated or provided by creators)
- Respect content copyrights β this tool is intended for personal or research use
Dependencies
-
pytube: For extracting playlist and video metadata -
youtube-transcript-api: To fetch actual transcript text
No API key is needed, making setup fast and simple.
License
MIT β feel free to modify or extend.
import argparse
import os
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import Playlist
def get_transcript(video_id):
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return ' '.join([t['text'] for t in transcript])
except Exception as e:
return None
def process_playlist(playlist_url, output_file):
playlist = Playlist(playlist_url)
with open(output_file, 'w', encoding='utf-8') as f:
f.write(f"# Transcript dump from YouTube playlist\n# URL: {playlist_url}\n\n")
for url in playlist.video_urls:
video_id = url.split('v=')[-1]
title = "Unknown Title"
try:
title = playlist.title # pytube may not load individual titles directly
# Extract title safely
tmp = url.split('v=')[-1].split('&')[0]
# Better title fetch would require individual Video object
except:
pass
print(f"Processing: {url}")
transcript = get_transcript(video_id)
if transcript:
f.write(f"=== Video: {title} ===\n")
f.write(f"URL: {url}\n")
f.write(f"Transcript:\n{transcript}\n\n---\n\n")
else:
f.write(f"=== Video: {title} ===\n")
f.write(f"URL: {url}\n")
f.write(f"[Transcript not available]\n\n---\n\n")
print(f"\nTranscripts saved to {output_file}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Fetch transcripts from all videos in a YouTube playlist.')
parser.add_argument('--playlist-url', type=str, required=True, help='URL of the YouTube playlist')
parser.add_argument('--output', type=str, default='transcript_output.txt', help='Output file path')
args = parser.parse_args()
process_playlist(args.playlist_url, args.output)
Top comments (0)