🛠️ yt_playlist_transcript: Download and combine transcripts from a YouTube playlist into a single

#youtube #transcript #productivity

YouTube Playlist Transcript Tool

This tool allows you to extract transcripts from all videos in a public YouTube playlist and combine them into a single text file. Each transcript is prefixed with the video title and URL for easy reference. This is useful for researchers, content creators, or students who want to analyze video content in bulk using text processing tools.

Features

Automatically retrieves all video IDs from a given YouTube playlist.
Fetches auto-generated or available transcripts using the youtube-transcript-api.
Saves all transcripts into one organized .txt file.
Handles errors gracefully (e.g., videos without transcripts).
Simple command-line interface.

Requirements

Python 3.7+
Packages: youtube-transcript-api, youtube_dl (or yt-dlp)

Install dependencies:

pip install youtube-transcript-api yt-dlp

Usage

Run the script from the command line:

python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output transcripts.txt

Optional arguments:

--output: Output filename (default: transcripts.txt)
--lang: Preferred language code (e.g., en, es; default: en)

Output Format

Each video's transcript starts with:

# Video: <title>
# URL: https://youtube.com/watch?v=<id>

[Transcript text...]

---

This format makes it easy to parse or search later.

Notes

Only works with playlists that have public transcripts enabled.
Some videos may not have transcripts available; these are skipped with a warning.
Respect YouTube's Terms of Service and copyright when using this tool.

License

MIT

import argparse
import json
from urllib.parse import parse_qs, urlparse

import yt_dlp
from youtube_transcript_api import YouTubeTranscriptApi


def get_playlist_video_ids(playlist_url):
    ydl_opts = {
        'quiet': True,
        'extract_flat': True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(playlist_url, download=False)
        return [entry['id'] for entry in info['entries']]


def get_transcript(video_id, lang='en'):
    try:
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang])
        return ' '.join([t['text'] for t in transcript_list])
    except Exception:
        return None


def main(playlist_url, output_file, lang):
    print("Fetching video list...")
    video_ids = get_playlist_video_ids(playlist_url)

    with open(output_file, 'w', encoding='utf-8') as f:
        for i, vid in enumerate(video_ids, 1):
            print(f"Processing {i}/{len(video_ids)}: {vid}")
            url = f'https://youtube.com/watch?v={vid}'

            try:
                yt = yt_dlp.YoutubeDL({'quiet': True})
                info = yt.extract_info(url, download=False)
                title = info.get('title', 'Untitled')
            except Exception:
                title = 'Untitled'

            transcript = get_transcript(vid, lang)
            if transcript:
                f.write(f"# Video: {title}\n")
                f.write(f"# URL: {url}\n\n")
                f.write(f"{transcript}\n\n---\n\n")
            else:
                print(f"No transcript available for: {vid}")

    print(f"Transcripts saved to {output_file}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Download transcripts from a YouTube playlist.')
    parser.add_argument('--playlist-url', required=True, help='YouTube playlist URL')
    parser.add_argument('--output', default='transcripts.txt', help='Output file name')
    parser.add_argument('--lang', default='en', help='Transcript language (e.g., en, es)')
    args = parser.parse_args()

    main(args.playlist_url, args.output, args.lang)