🛠️ yt_playlist_transcript: Extract full transcripts from YouTube playlists into a single text fil

#youtube #transcript #productivity

YouTube Playlist Transcript Tool

This simple command-line tool downloads the full transcript from every video in a YouTube playlist and saves them into a single merged text file. It uses open source libraries to fetch video metadata, extract available transcripts, and combine them with clear section headers for readability.

Features

Automatically processes all videos in a public YouTube playlist
Fetches auto-generated or manually created transcripts where available
Each video's transcript is prefixed with its title and video URL
Outputs a clean, organized .txt file suitable for reading or further processing
Lightweight and dependency-managed via standard Python packages

Installation

You'll need Python 3.7+ installed. Install required packages using:

pip install youtube-transcript-api pytube

Usage

Run the script from the command line:

python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output output.txt

The tool will fetch all video IDs from the playlist, attempt to retrieve their transcripts, and save everything into the specified output file. If a video doesn’t have a transcript available, it will log a warning and continue with the next.

Output Format

Each transcript section looks like:

=== Video: [Title] ===
URL: https://youtu.be/abc123
Transcript:
[Full text here...]

---

This structure makes it easy to parse later or simply read through.

Notes

Only works with playlists that are public and have transcripts enabled
Transcripts depend on YouTube's availability (either auto-generated or provided by creators)
Respect content copyrights — this tool is intended for personal or research use

Dependencies

pytube: For extracting playlist and video metadata
youtube-transcript-api: To fetch actual transcript text

No API key is needed, making setup fast and simple.

License

MIT — feel free to modify or extend.

import argparse
import os
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import Playlist


def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return ' '.join([t['text'] for t in transcript])
    except Exception as e:
        return None


def process_playlist(playlist_url, output_file):
    playlist = Playlist(playlist_url)

    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(f"# Transcript dump from YouTube playlist\n# URL: {playlist_url}\n\n")

        for url in playlist.video_urls:
            video_id = url.split('v=')[-1]
            title = "Unknown Title"

            try:
                title = playlist.title  # pytube may not load individual titles directly
                # Extract title safely
                tmp = url.split('v=')[-1].split('&')[0]
                # Better title fetch would require individual Video object
            except:
                pass

            print(f"Processing: {url}")
            transcript = get_transcript(video_id)

            if transcript:
                f.write(f"=== Video: {title} ===\n")
                f.write(f"URL: {url}\n")
                f.write(f"Transcript:\n{transcript}\n\n---\n\n")
            else:
                f.write(f"=== Video: {title} ===\n")
                f.write(f"URL: {url}\n")
                f.write(f"[Transcript not available]\n\n---\n\n")

    print(f"\nTranscripts saved to {output_file}")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Fetch transcripts from all videos in a YouTube playlist.')
    parser.add_argument('--playlist-url', type=str, required=True, help='URL of the YouTube playlist')
    parser.add_argument('--output', type=str, default='transcript_output.txt', help='Output file path')

    args = parser.parse_args()

    process_playlist(args.playlist_url, args.output)