DEV Community

Golden Alien
Golden Alien

Posted on

πŸ› οΈ yt_playlist_transcript: Extract full transcripts from YouTube playlists into a single text fil

YouTube Playlist Transcript Tool

This simple command-line tool downloads the full transcript from every video in a YouTube playlist and saves them into a single merged text file. It uses open source libraries to fetch video metadata, extract available transcripts, and combine them with clear section headers for readability.

Features

  • Automatically processes all videos in a public YouTube playlist
  • Fetches auto-generated or manually created transcripts where available
  • Each video's transcript is prefixed with its title and video URL
  • Outputs a clean, organized .txt file suitable for reading or further processing
  • Lightweight and dependency-managed via standard Python packages

Installation

You'll need Python 3.7+ installed. Install required packages using:

pip install youtube-transcript-api pytube
Enter fullscreen mode Exit fullscreen mode

Usage

Run the script from the command line:

python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output output.txt
Enter fullscreen mode Exit fullscreen mode

The tool will fetch all video IDs from the playlist, attempt to retrieve their transcripts, and save everything into the specified output file. If a video doesn’t have a transcript available, it will log a warning and continue with the next.

Output Format

Each transcript section looks like:

=== Video: [Title] ===
URL: https://youtu.be/abc123
Transcript:
[Full text here...]

---
Enter fullscreen mode Exit fullscreen mode

This structure makes it easy to parse later or simply read through.

Notes

  • Only works with playlists that are public and have transcripts enabled
  • Transcripts depend on YouTube's availability (either auto-generated or provided by creators)
  • Respect content copyrights β€” this tool is intended for personal or research use

Dependencies

  • pytube: For extracting playlist and video metadata
  • youtube-transcript-api: To fetch actual transcript text

No API key is needed, making setup fast and simple.

License

MIT β€” feel free to modify or extend.

import argparse
import os
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import Playlist


def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return ' '.join([t['text'] for t in transcript])
    except Exception as e:
        return None


def process_playlist(playlist_url, output_file):
    playlist = Playlist(playlist_url)

    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(f"# Transcript dump from YouTube playlist\n# URL: {playlist_url}\n\n")

        for url in playlist.video_urls:
            video_id = url.split('v=')[-1]
            title = "Unknown Title"

            try:
                title = playlist.title  # pytube may not load individual titles directly
                # Extract title safely
                tmp = url.split('v=')[-1].split('&')[0]
                # Better title fetch would require individual Video object
            except:
                pass

            print(f"Processing: {url}")
            transcript = get_transcript(video_id)

            if transcript:
                f.write(f"=== Video: {title} ===\n")
                f.write(f"URL: {url}\n")
                f.write(f"Transcript:\n{transcript}\n\n---\n\n")
            else:
                f.write(f"=== Video: {title} ===\n")
                f.write(f"URL: {url}\n")
                f.write(f"[Transcript not available]\n\n---\n\n")

    print(f"\nTranscripts saved to {output_file}")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Fetch transcripts from all videos in a YouTube playlist.')
    parser.add_argument('--playlist-url', type=str, required=True, help='URL of the YouTube playlist')
    parser.add_argument('--output', type=str, default='transcript_output.txt', help='Output file path')

    args = parser.parse_args()

    process_playlist(args.playlist_url, args.output)

Enter fullscreen mode Exit fullscreen mode

Top comments (0)