🛠️ yt_playlist_transcriber: Download and merge transcripts from a YouTube playlist into a single t

#youtube #transcript #productivity

YouTube Playlist Transcriber

This tool downloads transcripts from all videos in a public YouTube playlist and merges them into a single text file. Each transcript is prefixed with the video title and URL for easy reference. This is ideal for researchers, content creators, or students who want to analyze or archive spoken content across multiple videos.

Features

Extracts transcripts using the youtube-transcript-api library.
Automatically fetches all video URLs from a given playlist.
Saves a clean, organized text file with video titles, URLs, and full transcripts.
Handles missing or disabled transcripts gracefully.
Supports playlists with up to hundreds of videos.

Requirements

Python 3.7+
youtube-transcript-api
pytube

Install dependencies:

pip install youtube-transcript-api pytube

Usage

Run the script from the command line:

python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output output.txt

You can also set the output path and view help with:

python main.py -h

Output Format

The output file contains:

Video Title
Video URL
Transcript Text
Separator between videos

Example:

Title: How AI Will Transform Education
URL: https://www.youtube.com/watch?v=abc123
Transcript:
In this video, I discuss how artificial intelligence...

----------------------------------------

Notes

Only works with videos that have transcripts available (either auto-generated or manual).
Private or unlisted playlists are not supported.
Some regions may have limited transcript availability.

License

MIT

import argparse
import sys
from pytube import Playlist
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled


def get_transcript_text(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return ' '.join([t['text'] for t in transcript])
    except TranscriptsDisabled:
        return None
    except Exception as e:
        return f'[Error retrieving transcript: {str(e)}]'


def process_playlist(playlist_url, output_file):
    try:
        playlist = Playlist(playlist_url)
    except Exception as e:
        print(f"Error loading playlist: {e}")
        sys.exit(1)

    with open(output_file, 'w', encoding='utf-8') as f_out:
        for url in playlist.video_urls:
            video_id = url.split('v=')[-1].split('&')[0]
            title = "Unknown Title"
            try:
                video_title = playlist.get_info(url)['title']
                title = video_title
            except:
                pass

            f_out.write(f"Title: {title}\n")
            f_out.write(f"URL: {url}\n")
            f_out.write(f"Transcript:\n")

            transcript_text = get_transcript_text(video_id)
            if transcript_text is None:
                f_out.write("[Transcript not available]\n")
            else:
                f_out.write(f"{transcript_text}\n")

            f_out.write("\n" + "-" * 40 + "\n\n")

    print(f"Transcripts saved to {output_file}")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Fetch transcripts from a YouTube playlist.')
    parser.add_argument('--playlist-url', required=True, help='URL of the YouTube playlist')
    parser.add_argument('--output', default='transcripts.txt', help='Output file path (default: transcripts.txt)')
    args = parser.parse_args()

    process_playlist(args.playlist_url, args.output)