YouTube Transcripts Without the API — Extract Captions Programmatically

#youtube #api #javascript #a11y

YouTube's official API doesn't even have a transcripts endpoint. But there's another way.

How YouTube Loads Captions

When you watch a video with captions enabled, YouTube fetches an XML file with timestamped text. The URL is embedded in the page's player response.

Steps

Fetch the video page and extract captionTracks from the player response
Find the right language — tracks include language code and auto-generated flag
Fetch the XML from the track's baseUrl
Parse timestamps — each <text> element has start and dur attributes

Output

{
  "videoId": "dQw4w9WgXcQ",
  "language": "en",
  "isAutoGenerated": true,
  "wordCount": 285,
  "fullText": "We're no strangers to love...",
  "entries": [{"start": 18.0, "duration": 3.5, "text": "We're no strangers to love"}]
}