How to Get YouTube Transcripts: A Complete Developer's Guide
YouTube transcripts unlock a world of possibilities—from building AI-powered video summarizers to creating searchable content databases, training machine learning models, or automating content repurposing. But getting transcripts programmatically isn't always straightforward.
In this guide, we'll walk through every method for extracting YouTube transcripts, complete with working code examples. By the end, you'll know exactly how to implement transcript extraction in your own projects.
Understanding YouTube Transcripts
Before diving into code, let's understand what we're working with.
YouTube stores transcripts (also called captions or subtitles) in a few different ways:
- Auto-generated captions: YouTube's speech recognition creates these automatically for most videos
- Manual captions: Uploaded by creators for better accuracy
- Community contributions: Viewer-submitted captions (being phased out)
Each caption track includes the text and timing information, typically in formats like SRT, VTT, or YouTube's proprietary timedtext format.
Method 1: Using the youtube-transcript-api (Python)
The most popular open-source solution for Python developers is the youtube-transcript-api library.
Installation
pip install youtube-transcript-api
Basic Usage
from youtube_transcript_api import YouTubeTranscriptApi
# Extract transcript from a video
video_id = "dQw4w9WgXcQ" # The video ID from the URL
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
for entry in transcript:
print(f"[{entry['start']:.2f}s] {entry['text']}")
except Exception as e:
print(f"Error: {e}")
Output Format
The library returns a list of dictionaries:
[
{'text': 'Hello everyone', 'start': 0.0, 'duration': 2.5},
{'text': 'Welcome to my video', 'start': 2.5, 'duration': 3.0},
# ...
]
Getting Transcripts in Different Languages
from youtube_transcript_api import YouTubeTranscriptApi
video_id = "dQw4w9WgXcQ"
# List available transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
for transcript in transcript_list:
print(f"Language: {transcript.language} ({transcript.language_code})")
print(f"Auto-generated: {transcript.is_generated}")
# Get specific language
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['es', 'en'])
# Translate to another language
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
transcript = transcript_list.find_transcript(['en'])
translated = transcript.translate('es').fetch()
Handling Videos Without Transcripts
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound
video_id = "your_video_id"
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
except TranscriptsDisabled:
print("Transcripts are disabled for this video")
except NoTranscriptFound:
print("No transcript available for this video")
except Exception as e:
print(f"An error occurred: {e}")
Combining Into Plain Text
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter, SRTFormatter
video_id = "dQw4w9WgXcQ"
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Plain text
formatter = TextFormatter()
plain_text = formatter.format_transcript(transcript)
print(plain_text)
# SRT format
srt_formatter = SRTFormatter()
srt_output = srt_formatter.format_transcript(transcript)
print(srt_output)
Method 2: Using yt-dlp (Python/CLI)
yt-dlp is a powerful media downloader that can also extract subtitles.
Installation
pip install yt-dlp
Command Line Usage
# Download auto-generated subtitles
yt-dlp --write-auto-sub --skip-download "https://youtube.com/watch?v=VIDEO_ID"
# Download manual subtitles
yt-dlp --write-sub --skip-download "https://youtube.com/watch?v=VIDEO_ID"
# Specify format (srt, vtt, etc.)
yt-dlp --write-auto-sub --sub-format srt --skip-download "https://youtube.com/watch?v=VIDEO_ID"
# List available subtitles
yt-dlp --list-subs "https://youtube.com/watch?v=VIDEO_ID"
Python Usage
import yt_dlp
def get_transcript_with_ytdlp(video_url):
ydl_opts = {
'writeautomaticsub': True,
'writesubtitles': True,
'subtitlesformat': 'json3',
'skip_download': True,
'outtmpl': '%(id)s',
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(video_url, download=False)
# Access subtitle information
if 'subtitles' in info:
print("Manual subtitles available:", list(info['subtitles'].keys()))
if 'automatic_captions' in info:
print("Auto captions available:", list(info['automatic_captions'].keys()))
return info
video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
info = get_transcript_with_ytdlp(video_url)
Method 3: Direct YouTube API Approach
You can also fetch captions directly using YouTube's timedtext endpoint, though this method is less reliable and may break with YouTube updates.
import requests
import json
import re
from xml.etree import ElementTree
def get_transcript_direct(video_id):
# First, get the video page to find caption tracks
watch_url = f"https://www.youtube.com/watch?v={video_id}"
response = requests.get(watch_url)
# Extract caption track URL from the page
# This is fragile and may break with YouTube updates
pattern = r'"captions":.*?"captionTracks":\[(.*?)\]'
match = re.search(pattern, response.text)
if not match:
return None
caption_data = json.loads('[' + match.group(1) + ']')
if not caption_data:
return None
# Get the first available caption track
caption_url = caption_data[0]['baseUrl']
# Fetch the captions
caption_response = requests.get(caption_url)
# Parse the XML
root = ElementTree.fromstring(caption_response.text)
transcript = []
for text_element in root.findall('.//text'):
transcript.append({
'text': text_element.text or '',
'start': float(text_element.get('start', 0)),
'duration': float(text_element.get('dur', 0))
})
return transcript
# Usage
transcript = get_transcript_direct("dQw4w9WgXcQ")
if transcript:
for entry in transcript:
print(f"[{entry['start']:.2f}s] {entry['text']}")
Warning: This method parses YouTube's page structure directly and can break whenever YouTube updates their frontend. Use with caution in production.
Method 4: Building Your Own Audio-Based Transcription
When captions aren't available, you'll need to transcribe the audio yourself. Here's how to build a basic pipeline using OpenAI's Whisper.
Installation
pip install yt-dlp openai-whisper torch
Complete Pipeline
import yt_dlp
import whisper
import os
def download_audio(video_url, output_path="audio.mp3"):
"""Download audio from YouTube video"""
ydl_opts = {
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
'outtmpl': output_path.replace('.mp3', ''),
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([video_url])
return output_path
def transcribe_audio(audio_path, model_size="base"):
"""Transcribe audio using Whisper"""
model = whisper.load_model(model_size)
result = model.transcribe(audio_path)
return result
def get_transcript_with_whisper(video_url):
"""Complete pipeline: download and transcribe"""
# Download audio
print("Downloading audio...")
audio_path = download_audio(video_url)
# Transcribe
print("Transcribing...")
result = transcribe_audio(audio_path)
# Clean up
os.remove(audio_path)
# Format output
transcript = []
for segment in result['segments']:
transcript.append({
'text': segment['text'].strip(),
'start': segment['start'],
'end': segment['end']
})
return transcript, result['text']
# Usage
video_url = "https://www.youtube.com/watch?v=VIDEO_ID"
segments, full_text = get_transcript_with_whisper(video_url)
print("Full transcript:")
print(full_text)
print("\nWith timestamps:")
for segment in segments:
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")
Getting Word-Level Timestamps
import whisper
def transcribe_with_word_timestamps(audio_path):
model = whisper.load_model("base")
result = model.transcribe(audio_path, word_timestamps=True)
words = []
for segment in result['segments']:
if 'words' in segment:
for word in segment['words']:
words.append({
'word': word['word'].strip(),
'start': word['start'],
'end': word['end']
})
return words
# Usage
words = transcribe_with_word_timestamps("audio.mp3")
for word in words[:20]: # First 20 words
print(f"[{word['start']:.2f}s] {word['word']}")
Method 5: Node.js Implementation
For JavaScript/Node.js developers, here's how to extract transcripts.
Using youtube-transcript
npm install youtube-transcript
import { YoutubeTranscript } from 'youtube-transcript';
async function getTranscript(videoId) {
try {
const transcript = await YoutubeTranscript.fetchTranscript(videoId);
transcript.forEach(entry => {
console.log(`[${entry.offset.toFixed(2)}s] ${entry.text}`);
});
return transcript;
} catch (error) {
console.error('Error fetching transcript:', error.message);
return null;
}
}
// Usage
getTranscript('dQw4w9WgXcQ');
Building a Simple Express API
import express from 'express';
import { YoutubeTranscript } from 'youtube-transcript';
const app = express();
app.get('/transcript/:videoId', async (req, res) => {
try {
const { videoId } = req.params;
const { format } = req.query;
const transcript = await YoutubeTranscript.fetchTranscript(videoId);
if (format === 'text') {
const text = transcript.map(entry => entry.text).join(' ');
res.type('text/plain').send(text);
} else {
res.json({
videoId,
transcript,
totalDuration: transcript[transcript.length - 1]?.offset || 0
});
}
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('Transcript API running on port 3000');
});
Common Challenges and Solutions
Challenge 1: Rate Limiting
YouTube may rate-limit your requests if you make too many in a short period.
import time
from youtube_transcript_api import YouTubeTranscriptApi
def get_transcripts_with_rate_limit(video_ids, delay=1.0):
"""Fetch multiple transcripts with rate limiting"""
transcripts = {}
for video_id in video_ids:
try:
transcripts[video_id] = YouTubeTranscriptApi.get_transcript(video_id)
print(f"✓ Got transcript for {video_id}")
except Exception as e:
print(f"✗ Failed for {video_id}: {e}")
transcripts[video_id] = None
time.sleep(delay) # Wait between requests
return transcripts
Challenge 2: Missing Transcripts
Not all videos have captions available.
def get_transcript_or_fallback(video_id):
"""Try to get transcript, with fallback options"""
# Try auto-generated English first
try:
return YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
except:
pass
# Try any available transcript
try:
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
for transcript in transcript_list:
return transcript.fetch()
except:
pass
# No transcript available - would need audio-based transcription
return None
Challenge 3: Cleaning Up Transcript Text
Auto-generated transcripts often have issues like missing punctuation, incorrect capitalization, and timing artifacts.
import re
def clean_transcript(transcript):
"""Clean and improve transcript text"""
# Combine all text
full_text = ' '.join([entry['text'] for entry in transcript])
# Remove timing artifacts like [Music], [Applause]
full_text = re.sub(r'\[.*?\]', '', full_text)
# Fix multiple spaces
full_text = re.sub(r'\s+', ' ', full_text)
# Basic sentence detection (add periods where needed)
# This is a simple heuristic - production code would use NLP
full_text = full_text.strip()
return full_text
def add_paragraphs(text, sentences_per_paragraph=4):
"""Split text into paragraphs"""
import re
# Split on sentence boundaries
sentences = re.split(r'(?<=[.!?])\s+', text)
paragraphs = []
for i in range(0, len(sentences), sentences_per_paragraph):
paragraph = ' '.join(sentences[i:i + sentences_per_paragraph])
paragraphs.append(paragraph)
return '\n\n'.join(paragraphs)
Real-World Example: Building a Video Summarizer
Let's put it all together with a practical example—a video summarizer that extracts transcripts and generates summaries using an LLM.
from youtube_transcript_api import YouTubeTranscriptApi
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")
def extract_video_id(url):
"""Extract video ID from YouTube URL"""
import re
patterns = [
r'(?:v=|\/)([0-9A-Za-z_-]{11}).*',
r'(?:embed\/)([0-9A-Za-z_-]{11})',
r'(?:youtu\.be\/)([0-9A-Za-z_-]{11})',
]
for pattern in patterns:
match = re.search(pattern, url)
if match:
return match.group(1)
return None
def get_transcript_text(video_id):
"""Get transcript as plain text"""
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return ' '.join([entry['text'] for entry in transcript])
except Exception as e:
return None
def summarize_text(text, max_length=500):
"""Summarize text using GPT"""
response = openai.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that summarizes video transcripts concisely."
},
{
"role": "user",
"content": f"Please summarize this video transcript in about {max_length} characters:\n\n{text[:8000]}"
}
],
max_tokens=500
)
return response.choices[0].message.content
def summarize_video(youtube_url):
"""Complete pipeline: URL to summary"""
# Extract video ID
video_id = extract_video_id(youtube_url)
if not video_id:
return {"error": "Invalid YouTube URL"}
# Get transcript
transcript = get_transcript_text(video_id)
if not transcript:
return {"error": "Could not fetch transcript"}
# Generate summary
summary = summarize_text(transcript)
return {
"video_id": video_id,
"transcript_length": len(transcript),
"summary": summary
}
# Usage
result = summarize_video("https://www.youtube.com/watch?v=VIDEO_ID")
print(result['summary'])
The Challenges of Building Your Own Solution
While the methods above work, building a production-ready transcript extraction system comes with significant challenges:
Reliability Issues
- YouTube frequently updates their frontend, breaking scraping-based solutions
- Auto-generated captions aren't available for all videos
- Rate limiting can disrupt high-volume applications
Missing Features
- Basic libraries only give you sentence-level timestamps, not word-level precision
- No semantic paragraph detection—you get a wall of text
- No speaker identification for podcasts and interviews
- No fallback when captions don't exist
Maintenance Burden
- You'll spend time fixing breakages instead of building features
- Audio-based transcription requires GPU infrastructure
- Scaling brings additional complexity
Quality Concerns
- Auto-generated captions often have errors
- No built-in cleaning or formatting
- Inconsistent results across different video types
The Easier Alternative: YouTubeTranscript.dev API
If you'd rather focus on building your application than maintaining transcript infrastructure, YouTubeTranscript.dev provides everything you need through a simple API.
Why Use a Managed API?
It Just Works
- Never get "transcript not available" errors—audio-based fallback handles videos without captions
- Consistent, reliable results every time
- No maintenance when YouTube changes their systems
Advanced Features Out of the Box
- Word-level timestamps for precision applications
- Semantic paragraph segmentation—no more walls of text
- Speaker diarization for multi-speaker content
- Multiple output formats (JSON, SRT, VTT, plain text)
Simple Integration
curl "https://youtubetranscript.dev/api/transcript?video_id=VIDEO_ID" \
-H "Authorization: Bearer YOUR_API_KEY"
import requests
response = requests.get(
"https://youtubetranscript.dev/api/transcript",
params={"video_id": "dQw4w9WgXcQ"},
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
data = response.json()
print(data['transcript'])
Generous Free Tier
Start with 30 free credits per month—enough to test and prototype before committing. Paid plans start at just $9/month for 1,000 credits.
When to Build vs. Buy
Build your own if:
- You're learning and want to understand how it works
- You have very specific requirements no API meets
- Volume is extremely low (a few videos per month)
Use YouTubeTranscript.dev if:
- You need reliability for production applications
- You want word-level timestamps or speaker detection
- Videos sometimes don't have captions
- You'd rather ship features than fix infrastructure
Conclusion
Getting YouTube transcripts programmatically is achievable with open-source tools, but building a robust, production-ready solution requires handling many edge cases—missing captions, rate limits, formatting issues, and YouTube's ever-changing systems.
For hobby projects and learning, the methods in this guide will serve you well. For production applications where reliability matters, consider offloading the complexity to a purpose-built API like YouTubeTranscript.dev.
Whatever path you choose, transcripts unlock powerful possibilities for your applications. Happy building!
Found this guide helpful? Check out YouTubeTranscript.dev for a hassle-free transcript API with features like word-level timestamps, speaker detection, and audio-based fallback transcription.
Top comments (0)