YouTube Playlist Transcriber
This tool downloads transcripts from all videos in a public YouTube playlist and merges them into a single text file. Each transcript is prefixed with the video title and URL for easy reference. This is ideal for researchers, content creators, or students who want to analyze or archive spoken content across multiple videos.
Features
- Extracts transcripts using the
youtube-transcript-apilibrary. - Automatically fetches all video URLs from a given playlist.
- Saves a clean, organized text file with video titles, URLs, and full transcripts.
- Handles missing or disabled transcripts gracefully.
- Supports playlists with up to hundreds of videos.
Requirements
- Python 3.7+
youtube-transcript-apipytube
Install dependencies:
pip install youtube-transcript-api pytube
Usage
Run the script from the command line:
python main.py --playlist-url "https://www.youtube.com/playlist?list=..." --output output.txt
You can also set the output path and view help with:
python main.py -h
Output Format
The output file contains:
- Video Title
- Video URL
- Transcript Text
- Separator between videos
Example:
Title: How AI Will Transform Education
URL: https://www.youtube.com/watch?v=abc123
Transcript:
In this video, I discuss how artificial intelligence...
----------------------------------------
Notes
- Only works with videos that have transcripts available (either auto-generated or manual).
- Private or unlisted playlists are not supported.
- Some regions may have limited transcript availability.
License
MIT
import argparse
import sys
from pytube import Playlist
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
def get_transcript_text(video_id):
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return ' '.join([t['text'] for t in transcript])
except TranscriptsDisabled:
return None
except Exception as e:
return f'[Error retrieving transcript: {str(e)}]'
def process_playlist(playlist_url, output_file):
try:
playlist = Playlist(playlist_url)
except Exception as e:
print(f"Error loading playlist: {e}")
sys.exit(1)
with open(output_file, 'w', encoding='utf-8') as f_out:
for url in playlist.video_urls:
video_id = url.split('v=')[-1].split('&')[0]
title = "Unknown Title"
try:
video_title = playlist.get_info(url)['title']
title = video_title
except:
pass
f_out.write(f"Title: {title}\n")
f_out.write(f"URL: {url}\n")
f_out.write(f"Transcript:\n")
transcript_text = get_transcript_text(video_id)
if transcript_text is None:
f_out.write("[Transcript not available]\n")
else:
f_out.write(f"{transcript_text}\n")
f_out.write("\n" + "-" * 40 + "\n\n")
print(f"Transcripts saved to {output_file}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Fetch transcripts from a YouTube playlist.')
parser.add_argument('--playlist-url', required=True, help='URL of the YouTube playlist')
parser.add_argument('--output', default='transcripts.txt', help='Output file path (default: transcripts.txt)')
args = parser.parse_args()
process_playlist(args.playlist_url, args.output)
Top comments (0)