YouTube Playlist Transcript Tool
This tool extracts transcripts from all videos in a public YouTube playlist and combines them into a single text file. It's ideal for researchers, educators, content creators, or students who want to analyze or archive spoken content across multiple videos.
Features
- Automatically fetches video URLs from a given playlist
- Retrieves auto-generated or manually created transcripts (where available)
- Saves each video's transcript with title and timestamp header
- Combines all transcripts into one clean, readable
.txtfile - Supports playlists with up to hundreds of videos
- Handles errors gracefully (e.g., unavailable videos, no transcript)
Requirements
- Python 3.7+
youtube_transcript_apipytube
Install dependencies:
pip install youtube-transcript-api pytube
Usage
Run the script from the command line:
python main.py --playlist_url "https://www.youtube.com/playlist?list=..." --output transcripts.txt
You can also specify the output path and whether to include video titles and timestamps.
Output Format
Each transcript entry includes:
- Video title
- Video URL
- Transcript text with timestamps (optional)
- Separator line between videos
The resulting file can be used for summarization, search, or offline reading.
Limitations
- Only works with videos that have transcripts enabled (either auto-generated or manual)
- Private or unavailable videos are skipped
- Extremely long playlists may trigger rate limits (though no API key is required)
Example Use Cases
- Academic research on video lecture series
- Creating searchable documentation from tutorial playlists
- Building datasets for NLP projects
License
MIT
import argparse
import sys
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import Playlist
def get_transcript(video_id):
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return '\n'.join([f"[{entry['start']:.0f}s] {entry['text']}" for entry in transcript])
except Exception as e:
return f"[Transcript not available: {str(e)}]"
def main(playlist_url, output_file):
playlist = Playlist(playlist_url)
with open(output_file, 'w', encoding='utf-8') as f:
for video in playlist.videos:
try:
title = video.title
video_id = video.video_id
f.write(f"# Title: {title}\n")
f.write(f"# URL: https://youtube.com/watch?v={video_id}\n\n")
transcript = get_transcript(video_id)
f.write(f"{transcript}\n\n")
f.write("-" * 80 + "\n\n")
print(f"Downloaded: {title}")
except Exception as e:
print(f"Failed to process video: {str(e)}")
print(f"\nTranscripts saved to {output_file}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Fetch transcripts from a YouTube playlist.')
parser.add_argument('--playlist_url', type=str, required=True, help='URL of the YouTube playlist')
parser.add_argument('--output', type=str, default='transcripts.txt', help='Output file path')
args = parser.parse_args()
main(args.playlist_url, args.output)
Top comments (0)