DEV Community

Relative Insanity
Relative Insanity

Posted on

How to Get YouTube Transcripts: A Complete Developer's Guide

How to Get YouTube Transcripts: A Complete Developer's Guide

YouTube transcripts unlock a world of possibilities—from building AI-powered video summarizers to creating searchable content databases, training machine learning models, or automating content repurposing. But getting transcripts programmatically isn't always straightforward.

In this guide, we'll walk through every method for extracting YouTube transcripts, complete with working code examples. By the end, you'll know exactly how to implement transcript extraction in your own projects.


Understanding YouTube Transcripts

Before diving into code, let's understand what we're working with.

YouTube stores transcripts (also called captions or subtitles) in a few different ways:

  • Auto-generated captions: YouTube's speech recognition creates these automatically for most videos
  • Manual captions: Uploaded by creators for better accuracy
  • Community contributions: Viewer-submitted captions (being phased out)

Each caption track includes the text and timing information, typically in formats like SRT, VTT, or YouTube's proprietary timedtext format.


Method 1: Using the youtube-transcript-api (Python)

The most popular open-source solution for Python developers is the youtube-transcript-api library.

Installation

pip install youtube-transcript-api
Enter fullscreen mode Exit fullscreen mode

Basic Usage

from youtube_transcript_api import YouTubeTranscriptApi

# Extract transcript from a video
video_id = "dQw4w9WgXcQ"  # The video ID from the URL

try:
    transcript = YouTubeTranscriptApi.get_transcript(video_id)

    for entry in transcript:
        print(f"[{entry['start']:.2f}s] {entry['text']}")

except Exception as e:
    print(f"Error: {e}")
Enter fullscreen mode Exit fullscreen mode

Output Format

The library returns a list of dictionaries:

[
    {'text': 'Hello everyone', 'start': 0.0, 'duration': 2.5},
    {'text': 'Welcome to my video', 'start': 2.5, 'duration': 3.0},
    # ...
]
Enter fullscreen mode Exit fullscreen mode

Getting Transcripts in Different Languages

from youtube_transcript_api import YouTubeTranscriptApi

video_id = "dQw4w9WgXcQ"

# List available transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

for transcript in transcript_list:
    print(f"Language: {transcript.language} ({transcript.language_code})")
    print(f"Auto-generated: {transcript.is_generated}")

# Get specific language
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['es', 'en'])

# Translate to another language
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
transcript = transcript_list.find_transcript(['en'])
translated = transcript.translate('es').fetch()
Enter fullscreen mode Exit fullscreen mode

Handling Videos Without Transcripts

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound

video_id = "your_video_id"

try:
    transcript = YouTubeTranscriptApi.get_transcript(video_id)
except TranscriptsDisabled:
    print("Transcripts are disabled for this video")
except NoTranscriptFound:
    print("No transcript available for this video")
except Exception as e:
    print(f"An error occurred: {e}")
Enter fullscreen mode Exit fullscreen mode

Combining Into Plain Text

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter, SRTFormatter

video_id = "dQw4w9WgXcQ"
transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Plain text
formatter = TextFormatter()
plain_text = formatter.format_transcript(transcript)
print(plain_text)

# SRT format
srt_formatter = SRTFormatter()
srt_output = srt_formatter.format_transcript(transcript)
print(srt_output)
Enter fullscreen mode Exit fullscreen mode

Method 2: Using yt-dlp (Python/CLI)

yt-dlp is a powerful media downloader that can also extract subtitles.

Installation

pip install yt-dlp
Enter fullscreen mode Exit fullscreen mode

Command Line Usage

# Download auto-generated subtitles
yt-dlp --write-auto-sub --skip-download "https://youtube.com/watch?v=VIDEO_ID"

# Download manual subtitles
yt-dlp --write-sub --skip-download "https://youtube.com/watch?v=VIDEO_ID"

# Specify format (srt, vtt, etc.)
yt-dlp --write-auto-sub --sub-format srt --skip-download "https://youtube.com/watch?v=VIDEO_ID"

# List available subtitles
yt-dlp --list-subs "https://youtube.com/watch?v=VIDEO_ID"
Enter fullscreen mode Exit fullscreen mode

Python Usage

import yt_dlp

def get_transcript_with_ytdlp(video_url):
    ydl_opts = {
        'writeautomaticsub': True,
        'writesubtitles': True,
        'subtitlesformat': 'json3',
        'skip_download': True,
        'outtmpl': '%(id)s',
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(video_url, download=False)

        # Access subtitle information
        if 'subtitles' in info:
            print("Manual subtitles available:", list(info['subtitles'].keys()))
        if 'automatic_captions' in info:
            print("Auto captions available:", list(info['automatic_captions'].keys()))

        return info

video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
info = get_transcript_with_ytdlp(video_url)
Enter fullscreen mode Exit fullscreen mode

Method 3: Direct YouTube API Approach

You can also fetch captions directly using YouTube's timedtext endpoint, though this method is less reliable and may break with YouTube updates.

import requests
import json
import re
from xml.etree import ElementTree

def get_transcript_direct(video_id):
    # First, get the video page to find caption tracks
    watch_url = f"https://www.youtube.com/watch?v={video_id}"
    response = requests.get(watch_url)

    # Extract caption track URL from the page
    # This is fragile and may break with YouTube updates
    pattern = r'"captions":.*?"captionTracks":\[(.*?)\]'
    match = re.search(pattern, response.text)

    if not match:
        return None

    caption_data = json.loads('[' + match.group(1) + ']')

    if not caption_data:
        return None

    # Get the first available caption track
    caption_url = caption_data[0]['baseUrl']

    # Fetch the captions
    caption_response = requests.get(caption_url)

    # Parse the XML
    root = ElementTree.fromstring(caption_response.text)

    transcript = []
    for text_element in root.findall('.//text'):
        transcript.append({
            'text': text_element.text or '',
            'start': float(text_element.get('start', 0)),
            'duration': float(text_element.get('dur', 0))
        })

    return transcript

# Usage
transcript = get_transcript_direct("dQw4w9WgXcQ")
if transcript:
    for entry in transcript:
        print(f"[{entry['start']:.2f}s] {entry['text']}")
Enter fullscreen mode Exit fullscreen mode

Warning: This method parses YouTube's page structure directly and can break whenever YouTube updates their frontend. Use with caution in production.


Method 4: Building Your Own Audio-Based Transcription

When captions aren't available, you'll need to transcribe the audio yourself. Here's how to build a basic pipeline using OpenAI's Whisper.

Installation

pip install yt-dlp openai-whisper torch
Enter fullscreen mode Exit fullscreen mode

Complete Pipeline

import yt_dlp
import whisper
import os

def download_audio(video_url, output_path="audio.mp3"):
    """Download audio from YouTube video"""
    ydl_opts = {
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'outtmpl': output_path.replace('.mp3', ''),
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([video_url])

    return output_path

def transcribe_audio(audio_path, model_size="base"):
    """Transcribe audio using Whisper"""
    model = whisper.load_model(model_size)
    result = model.transcribe(audio_path)
    return result

def get_transcript_with_whisper(video_url):
    """Complete pipeline: download and transcribe"""

    # Download audio
    print("Downloading audio...")
    audio_path = download_audio(video_url)

    # Transcribe
    print("Transcribing...")
    result = transcribe_audio(audio_path)

    # Clean up
    os.remove(audio_path)

    # Format output
    transcript = []
    for segment in result['segments']:
        transcript.append({
            'text': segment['text'].strip(),
            'start': segment['start'],
            'end': segment['end']
        })

    return transcript, result['text']

# Usage
video_url = "https://www.youtube.com/watch?v=VIDEO_ID"
segments, full_text = get_transcript_with_whisper(video_url)

print("Full transcript:")
print(full_text)

print("\nWith timestamps:")
for segment in segments:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")
Enter fullscreen mode Exit fullscreen mode

Getting Word-Level Timestamps

import whisper

def transcribe_with_word_timestamps(audio_path):
    model = whisper.load_model("base")
    result = model.transcribe(audio_path, word_timestamps=True)

    words = []
    for segment in result['segments']:
        if 'words' in segment:
            for word in segment['words']:
                words.append({
                    'word': word['word'].strip(),
                    'start': word['start'],
                    'end': word['end']
                })

    return words

# Usage
words = transcribe_with_word_timestamps("audio.mp3")
for word in words[:20]:  # First 20 words
    print(f"[{word['start']:.2f}s] {word['word']}")
Enter fullscreen mode Exit fullscreen mode

Method 5: Node.js Implementation

For JavaScript/Node.js developers, here's how to extract transcripts.

Using youtube-transcript

npm install youtube-transcript
Enter fullscreen mode Exit fullscreen mode
import { YoutubeTranscript } from 'youtube-transcript';

async function getTranscript(videoId) {
    try {
        const transcript = await YoutubeTranscript.fetchTranscript(videoId);

        transcript.forEach(entry => {
            console.log(`[${entry.offset.toFixed(2)}s] ${entry.text}`);
        });

        return transcript;
    } catch (error) {
        console.error('Error fetching transcript:', error.message);
        return null;
    }
}

// Usage
getTranscript('dQw4w9WgXcQ');
Enter fullscreen mode Exit fullscreen mode

Building a Simple Express API

import express from 'express';
import { YoutubeTranscript } from 'youtube-transcript';

const app = express();

app.get('/transcript/:videoId', async (req, res) => {
    try {
        const { videoId } = req.params;
        const { format } = req.query;

        const transcript = await YoutubeTranscript.fetchTranscript(videoId);

        if (format === 'text') {
            const text = transcript.map(entry => entry.text).join(' ');
            res.type('text/plain').send(text);
        } else {
            res.json({
                videoId,
                transcript,
                totalDuration: transcript[transcript.length - 1]?.offset || 0
            });
        }
    } catch (error) {
        res.status(500).json({ error: error.message });
    }
});

app.listen(3000, () => {
    console.log('Transcript API running on port 3000');
});
Enter fullscreen mode Exit fullscreen mode

Common Challenges and Solutions

Challenge 1: Rate Limiting

YouTube may rate-limit your requests if you make too many in a short period.

import time
from youtube_transcript_api import YouTubeTranscriptApi

def get_transcripts_with_rate_limit(video_ids, delay=1.0):
    """Fetch multiple transcripts with rate limiting"""
    transcripts = {}

    for video_id in video_ids:
        try:
            transcripts[video_id] = YouTubeTranscriptApi.get_transcript(video_id)
            print(f"✓ Got transcript for {video_id}")
        except Exception as e:
            print(f"✗ Failed for {video_id}: {e}")
            transcripts[video_id] = None

        time.sleep(delay)  # Wait between requests

    return transcripts
Enter fullscreen mode Exit fullscreen mode

Challenge 2: Missing Transcripts

Not all videos have captions available.

def get_transcript_or_fallback(video_id):
    """Try to get transcript, with fallback options"""

    # Try auto-generated English first
    try:
        return YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
    except:
        pass

    # Try any available transcript
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        for transcript in transcript_list:
            return transcript.fetch()
    except:
        pass

    # No transcript available - would need audio-based transcription
    return None
Enter fullscreen mode Exit fullscreen mode

Challenge 3: Cleaning Up Transcript Text

Auto-generated transcripts often have issues like missing punctuation, incorrect capitalization, and timing artifacts.

import re

def clean_transcript(transcript):
    """Clean and improve transcript text"""

    # Combine all text
    full_text = ' '.join([entry['text'] for entry in transcript])

    # Remove timing artifacts like [Music], [Applause]
    full_text = re.sub(r'\[.*?\]', '', full_text)

    # Fix multiple spaces
    full_text = re.sub(r'\s+', ' ', full_text)

    # Basic sentence detection (add periods where needed)
    # This is a simple heuristic - production code would use NLP
    full_text = full_text.strip()

    return full_text

def add_paragraphs(text, sentences_per_paragraph=4):
    """Split text into paragraphs"""
    import re

    # Split on sentence boundaries
    sentences = re.split(r'(?<=[.!?])\s+', text)

    paragraphs = []
    for i in range(0, len(sentences), sentences_per_paragraph):
        paragraph = ' '.join(sentences[i:i + sentences_per_paragraph])
        paragraphs.append(paragraph)

    return '\n\n'.join(paragraphs)
Enter fullscreen mode Exit fullscreen mode

Real-World Example: Building a Video Summarizer

Let's put it all together with a practical example—a video summarizer that extracts transcripts and generates summaries using an LLM.

from youtube_transcript_api import YouTubeTranscriptApi
import openai
import os

openai.api_key = os.getenv("OPENAI_API_KEY")

def extract_video_id(url):
    """Extract video ID from YouTube URL"""
    import re
    patterns = [
        r'(?:v=|\/)([0-9A-Za-z_-]{11}).*',
        r'(?:embed\/)([0-9A-Za-z_-]{11})',
        r'(?:youtu\.be\/)([0-9A-Za-z_-]{11})',
    ]
    for pattern in patterns:
        match = re.search(pattern, url)
        if match:
            return match.group(1)
    return None

def get_transcript_text(video_id):
    """Get transcript as plain text"""
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return ' '.join([entry['text'] for entry in transcript])
    except Exception as e:
        return None

def summarize_text(text, max_length=500):
    """Summarize text using GPT"""
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant that summarizes video transcripts concisely."
            },
            {
                "role": "user",
                "content": f"Please summarize this video transcript in about {max_length} characters:\n\n{text[:8000]}"
            }
        ],
        max_tokens=500
    )
    return response.choices[0].message.content

def summarize_video(youtube_url):
    """Complete pipeline: URL to summary"""

    # Extract video ID
    video_id = extract_video_id(youtube_url)
    if not video_id:
        return {"error": "Invalid YouTube URL"}

    # Get transcript
    transcript = get_transcript_text(video_id)
    if not transcript:
        return {"error": "Could not fetch transcript"}

    # Generate summary
    summary = summarize_text(transcript)

    return {
        "video_id": video_id,
        "transcript_length": len(transcript),
        "summary": summary
    }

# Usage
result = summarize_video("https://www.youtube.com/watch?v=VIDEO_ID")
print(result['summary'])
Enter fullscreen mode Exit fullscreen mode

The Challenges of Building Your Own Solution

While the methods above work, building a production-ready transcript extraction system comes with significant challenges:

Reliability Issues

  • YouTube frequently updates their frontend, breaking scraping-based solutions
  • Auto-generated captions aren't available for all videos
  • Rate limiting can disrupt high-volume applications

Missing Features

  • Basic libraries only give you sentence-level timestamps, not word-level precision
  • No semantic paragraph detection—you get a wall of text
  • No speaker identification for podcasts and interviews
  • No fallback when captions don't exist

Maintenance Burden

  • You'll spend time fixing breakages instead of building features
  • Audio-based transcription requires GPU infrastructure
  • Scaling brings additional complexity

Quality Concerns

  • Auto-generated captions often have errors
  • No built-in cleaning or formatting
  • Inconsistent results across different video types

The Easier Alternative: YouTubeTranscript.dev API

If you'd rather focus on building your application than maintaining transcript infrastructure, YouTubeTranscript.dev provides everything you need through a simple API.

Why Use a Managed API?

It Just Works

  • Never get "transcript not available" errors—audio-based fallback handles videos without captions
  • Consistent, reliable results every time
  • No maintenance when YouTube changes their systems

Advanced Features Out of the Box

  • Word-level timestamps for precision applications
  • Semantic paragraph segmentation—no more walls of text
  • Speaker diarization for multi-speaker content
  • Multiple output formats (JSON, SRT, VTT, plain text)

Simple Integration

curl "https://youtubetranscript.dev/api/transcript?video_id=VIDEO_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"
Enter fullscreen mode Exit fullscreen mode
import requests

response = requests.get(
    "https://youtubetranscript.dev/api/transcript",
    params={"video_id": "dQw4w9WgXcQ"},
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

data = response.json()
print(data['transcript'])
Enter fullscreen mode Exit fullscreen mode

Generous Free Tier

Start with 30 free credits per month—enough to test and prototype before committing. Paid plans start at just $9/month for 1,000 credits.

When to Build vs. Buy

Build your own if:

  • You're learning and want to understand how it works
  • You have very specific requirements no API meets
  • Volume is extremely low (a few videos per month)

Use YouTubeTranscript.dev if:

  • You need reliability for production applications
  • You want word-level timestamps or speaker detection
  • Videos sometimes don't have captions
  • You'd rather ship features than fix infrastructure

Conclusion

Getting YouTube transcripts programmatically is achievable with open-source tools, but building a robust, production-ready solution requires handling many edge cases—missing captions, rate limits, formatting issues, and YouTube's ever-changing systems.

For hobby projects and learning, the methods in this guide will serve you well. For production applications where reliability matters, consider offloading the complexity to a purpose-built API like YouTubeTranscript.dev.

Whatever path you choose, transcripts unlock powerful possibilities for your applications. Happy building!


Found this guide helpful? Check out YouTubeTranscript.dev for a hassle-free transcript API with features like word-level timestamps, speaker detection, and audio-based fallback transcription.

Top comments (0)