DEV Community

Golden Alien
Golden Alien

Posted on

🛠️ yt_playlist_transcript: Download and combine transcripts from a YouTube playlist into a single

YouTube Playlist Transcript Tool

This tool allows you to extract transcripts from all videos in a public YouTube playlist and combine them into a single text file. Each transcript is prefixed with the video title and URL for easy reference. This is useful for researchers, content creators, or students who want to analyze video content in bulk using text processing tools.

Features

  • Automatically retrieves all video IDs from a given YouTube playlist.
  • Fetches auto-generated or available transcripts using the youtube-transcript-api.
  • Saves all transcripts into one organized .txt file.
  • Handles errors gracefully (e.g., videos without transcripts).
  • Simple command-line interface.

Requirements

  • Python 3.7+
  • Packages: youtube-transcript-api, youtube_dl (or yt-dlp)

Install dependencies:

pip install youtube-transcript-api yt-dlp
Enter fullscreen mode Exit fullscreen mode

Usage

Run the script from the command line:

python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output transcripts.txt
Enter fullscreen mode Exit fullscreen mode

Optional arguments:

  • --output: Output filename (default: transcripts.txt)
  • --lang: Preferred language code (e.g., en, es; default: en)

Output Format

Each video's transcript starts with:

# Video: <title>
# URL: https://youtube.com/watch?v=<id>

[Transcript text...]

---
Enter fullscreen mode Exit fullscreen mode

This format makes it easy to parse or search later.

Notes

  • Only works with playlists that have public transcripts enabled.
  • Some videos may not have transcripts available; these are skipped with a warning.
  • Respect YouTube's Terms of Service and copyright when using this tool.

License

MIT

import argparse
import json
from urllib.parse import parse_qs, urlparse

import yt_dlp
from youtube_transcript_api import YouTubeTranscriptApi


def get_playlist_video_ids(playlist_url):
    ydl_opts = {
        'quiet': True,
        'extract_flat': True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(playlist_url, download=False)
        return [entry['id'] for entry in info['entries']]


def get_transcript(video_id, lang='en'):
    try:
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang])
        return ' '.join([t['text'] for t in transcript_list])
    except Exception:
        return None


def main(playlist_url, output_file, lang):
    print("Fetching video list...")
    video_ids = get_playlist_video_ids(playlist_url)

    with open(output_file, 'w', encoding='utf-8') as f:
        for i, vid in enumerate(video_ids, 1):
            print(f"Processing {i}/{len(video_ids)}: {vid}")
            url = f'https://youtube.com/watch?v={vid}'

            try:
                yt = yt_dlp.YoutubeDL({'quiet': True})
                info = yt.extract_info(url, download=False)
                title = info.get('title', 'Untitled')
            except Exception:
                title = 'Untitled'

            transcript = get_transcript(vid, lang)
            if transcript:
                f.write(f"# Video: {title}\n")
                f.write(f"# URL: {url}\n\n")
                f.write(f"{transcript}\n\n---\n\n")
            else:
                print(f"No transcript available for: {vid}")

    print(f"Transcripts saved to {output_file}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Download transcripts from a YouTube playlist.')
    parser.add_argument('--playlist-url', required=True, help='YouTube playlist URL')
    parser.add_argument('--output', default='transcripts.txt', help='Output file name')
    parser.add_argument('--lang', default='en', help='Transcript language (e.g., en, es)')
    args = parser.parse_args()

    main(args.playlist_url, args.output, args.lang)

Enter fullscreen mode Exit fullscreen mode

Top comments (0)