DEV Community

Golden Alien
Golden Alien

Posted on

🛠️ yt_playlist_transcript: Download and combine transcripts from a YouTube playlist into a single

YouTube Playlist Transcript Tool

This tool extracts transcripts from all videos in a public YouTube playlist and combines them into a single text file. It's ideal for researchers, educators, content creators, or students who want to analyze or archive spoken content across multiple videos.

Features

  • Automatically fetches video URLs from a given playlist
  • Retrieves auto-generated or manually created transcripts (where available)
  • Saves each video's transcript with title and timestamp header
  • Combines all transcripts into one clean, readable .txt file
  • Supports playlists with up to hundreds of videos
  • Handles errors gracefully (e.g., unavailable videos, no transcript)

Requirements

  • Python 3.7+
  • youtube_transcript_api
  • pytube

Install dependencies:

pip install youtube-transcript-api pytube
Enter fullscreen mode Exit fullscreen mode

Usage

Run the script from the command line:

python main.py --playlist_url "https://www.youtube.com/playlist?list=..." --output transcripts.txt
Enter fullscreen mode Exit fullscreen mode

You can also specify the output path and whether to include video titles and timestamps.

Output Format

Each transcript entry includes:

  • Video title
  • Video URL
  • Transcript text with timestamps (optional)
  • Separator line between videos

The resulting file can be used for summarization, search, or offline reading.

Limitations

  • Only works with videos that have transcripts enabled (either auto-generated or manual)
  • Private or unavailable videos are skipped
  • Extremely long playlists may trigger rate limits (though no API key is required)

Example Use Cases

  • Academic research on video lecture series
  • Creating searchable documentation from tutorial playlists
  • Building datasets for NLP projects

License

MIT

import argparse
import sys
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import Playlist


def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return '\n'.join([f"[{entry['start']:.0f}s] {entry['text']}" for entry in transcript])
    except Exception as e:
        return f"[Transcript not available: {str(e)}]"

def main(playlist_url, output_file):
    playlist = Playlist(playlist_url)
    with open(output_file, 'w', encoding='utf-8') as f:
        for video in playlist.videos:
            try:
                title = video.title
                video_id = video.video_id
                f.write(f"# Title: {title}\n")
                f.write(f"# URL: https://youtube.com/watch?v={video_id}\n\n")
                transcript = get_transcript(video_id)
                f.write(f"{transcript}\n\n")
                f.write("-" * 80 + "\n\n")
                print(f"Downloaded: {title}")
            except Exception as e:
                print(f"Failed to process video: {str(e)}")
    print(f"\nTranscripts saved to {output_file}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Fetch transcripts from a YouTube playlist.')
    parser.add_argument('--playlist_url', type=str, required=True, help='URL of the YouTube playlist')
    parser.add_argument('--output', type=str, default='transcripts.txt', help='Output file path')
    args = parser.parse_args()

    main(args.playlist_url, args.output)

Enter fullscreen mode Exit fullscreen mode

Top comments (0)