YouTube's official API doesn't even have a transcripts endpoint. But there's another way.
How YouTube Loads Captions
When you watch a video with captions enabled, YouTube fetches an XML file with timestamped text. The URL is embedded in the page's player response.
Steps
-
Fetch the video page and extract
captionTracksfrom the player response - Find the right language — tracks include language code and auto-generated flag
-
Fetch the XML from the track's
baseUrl -
Parse timestamps — each
<text>element hasstartanddurattributes
Output
{
"videoId": "dQw4w9WgXcQ",
"language": "en",
"isAutoGenerated": true,
"wordCount": 285,
"fullText": "We're no strangers to love...",
"entries": [{"start": 18.0, "duration": 3.5, "text": "We're no strangers to love"}]
}
Use Cases
- Content repurposing (video → blog post)
- AI training data from video content
- Accessibility analysis
- Translation workflows
I built a YouTube Transcript Scraper — free on Apify (search knotless_cadence youtube-transcript).
Top comments (0)