YouTube Playlist Transcript Tool
This tool allows you to extract transcripts from all videos in a public YouTube playlist and combine them into a single text file. Each transcript is prefixed with the video title and URL for easy reference. This is useful for researchers, content creators, or students who want to analyze video content in bulk using text processing tools.
Features
- Automatically retrieves all video IDs from a given YouTube playlist.
- Fetches auto-generated or available transcripts using the
youtube-transcript-api. - Saves all transcripts into one organized
.txtfile. - Handles errors gracefully (e.g., videos without transcripts).
- Simple command-line interface.
Requirements
- Python 3.7+
- Packages:
youtube-transcript-api,youtube_dl(oryt-dlp)
Install dependencies:
pip install youtube-transcript-api yt-dlp
Usage
Run the script from the command line:
python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output transcripts.txt
Optional arguments:
-
--output: Output filename (default:transcripts.txt) -
--lang: Preferred language code (e.g.,en,es; default:en)
Output Format
Each video's transcript starts with:
# Video: <title>
# URL: https://youtube.com/watch?v=<id>
[Transcript text...]
---
This format makes it easy to parse or search later.
Notes
- Only works with playlists that have public transcripts enabled.
- Some videos may not have transcripts available; these are skipped with a warning.
- Respect YouTube's Terms of Service and copyright when using this tool.
License
MIT
import argparse
import json
from urllib.parse import parse_qs, urlparse
import yt_dlp
from youtube_transcript_api import YouTubeTranscriptApi
def get_playlist_video_ids(playlist_url):
ydl_opts = {
'quiet': True,
'extract_flat': True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(playlist_url, download=False)
return [entry['id'] for entry in info['entries']]
def get_transcript(video_id, lang='en'):
try:
transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang])
return ' '.join([t['text'] for t in transcript_list])
except Exception:
return None
def main(playlist_url, output_file, lang):
print("Fetching video list...")
video_ids = get_playlist_video_ids(playlist_url)
with open(output_file, 'w', encoding='utf-8') as f:
for i, vid in enumerate(video_ids, 1):
print(f"Processing {i}/{len(video_ids)}: {vid}")
url = f'https://youtube.com/watch?v={vid}'
try:
yt = yt_dlp.YoutubeDL({'quiet': True})
info = yt.extract_info(url, download=False)
title = info.get('title', 'Untitled')
except Exception:
title = 'Untitled'
transcript = get_transcript(vid, lang)
if transcript:
f.write(f"# Video: {title}\n")
f.write(f"# URL: {url}\n\n")
f.write(f"{transcript}\n\n---\n\n")
else:
print(f"No transcript available for: {vid}")
print(f"Transcripts saved to {output_file}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Download transcripts from a YouTube playlist.')
parser.add_argument('--playlist-url', required=True, help='YouTube playlist URL')
parser.add_argument('--output', default='transcripts.txt', help='Output file name')
parser.add_argument('--lang', default='en', help='Transcript language (e.g., en, es)')
args = parser.parse_args()
main(args.playlist_url, args.output, args.lang)
Top comments (0)