<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jaime Martinez</title>
    <description>The latest articles on DEV Community by Jaime Martinez (@jamhimself).</description>
    <link>https://dev.to/jamhimself</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3989508%2F2a8cb05c-8976-47b6-b6b2-7d2424791e46.png</url>
      <title>DEV Community: Jaime Martinez</title>
      <link>https://dev.to/jamhimself</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jamhimself"/>
    <language>en</language>
    <item>
      <title>Why your YouTube transcript scraper started returning empty strings (and how to fix it in 2026)</title>
      <dc:creator>Jaime Martinez</dc:creator>
      <pubDate>Wed, 17 Jun 2026 17:21:11 +0000</pubDate>
      <link>https://dev.to/jamhimself/why-your-youtube-transcript-scraper-started-returning-empty-strings-and-how-to-fix-it-in-2026-20ed</link>
      <guid>https://dev.to/jamhimself/why-your-youtube-transcript-scraper-started-returning-empty-strings-and-how-to-fix-it-in-2026-20ed</guid>
      <description>&lt;p&gt;If you have a script that pulled YouTube transcripts a year ago, there's a good chance it quietly broke. It still runs, no errors — it just returns &lt;strong&gt;empty&lt;/strong&gt;. Here's what changed, and how to actually get captions again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;You hit YouTube's caption endpoint (&lt;code&gt;timedtext&lt;/code&gt;), get back &lt;strong&gt;HTTP 200&lt;/strong&gt;… and an &lt;strong&gt;empty body&lt;/strong&gt;. No exception, no 403, nothing to catch. Just nothing. So your pipeline happily writes empty transcripts and you don't notice until your RAG index is full of blanks.&lt;/p&gt;

&lt;p&gt;This is why a lot of the popular libraries went dark in 2025–2026, even ones that are still "maintained." The request shape that used to work now returns nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually changed: PoToken
&lt;/h2&gt;

&lt;p&gt;YouTube now requires a &lt;strong&gt;Proof-of-Origin Token (PoToken)&lt;/strong&gt; — generated by its BotGuard system — on caption requests. Without a valid token bound to the specific video, &lt;code&gt;timedtext&lt;/code&gt; returns that empty &lt;code&gt;200&lt;/code&gt;. Datacenter IPs (AWS/GCP/Azure) also get blocked or throttled hard, which is the &lt;em&gt;second&lt;/em&gt; reason server-side scrapers silently fail.&lt;/p&gt;

&lt;p&gt;So the modern recipe is three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fetch the watch page&lt;/strong&gt; and parse &lt;code&gt;ytInitialPlayerResponse&lt;/code&gt; for the caption tracks + &lt;code&gt;visitorData&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mint a PoToken&lt;/strong&gt; bound to the video ID by solving the BotGuard challenge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request the caption track&lt;/strong&gt; with &lt;code&gt;&amp;amp;pot=&amp;lt;token&amp;gt;&amp;amp;c=WEB&amp;amp;fmt=json3&lt;/code&gt; — &lt;em&gt;then&lt;/em&gt; you get real JSON back.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The PoToken part is the bit everyone gets stuck on. You don't have to reverse-engineer BotGuard yourself — &lt;a href="https://github.com/LuanRT/BgUtils" rel="noopener noreferrer"&gt;&lt;code&gt;bgutils-js&lt;/code&gt;&lt;/a&gt; (paired with &lt;code&gt;jsdom&lt;/code&gt; to give it a DOM to run in) handles the challenge. Here's the shape of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;buildURL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;GOOG_API_KEY&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bgutils-js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;JSDOM&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;jsdom&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// 1. give BotGuard a DOM to run in&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JSDOM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;!DOCTYPE html&amp;gt;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.youtube.com/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;globalThis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;window&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. solve the challenge -&amp;gt; integrity token -&amp;gt; a minter bound to your session&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Challenge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;globalObj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;globalThis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;requestKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;visitorData&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;interpreterJavascript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;privateDoNotAccessOrElseSafeScriptWrappedValue&lt;/span&gt;&lt;span class="p"&gt;)();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BotGuardClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;program&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;globalName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;globalName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;globalObj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;buildURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GenerateIT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json+protobuf&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-goog-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GOOG_API_KEY&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;requestKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;bg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;webPoSignalOutput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt; &lt;span class="p"&gt;})]),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;minter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;BG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WebPoMinter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;integrityToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. mint a token bound to THIS video, attach to the caption URL&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;minter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mintAsWebsafeString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;videoId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;captionTrack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fmt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;json3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pot&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;c&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;WEB&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;segments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// &amp;lt;- finally, real data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A couple of things that bit me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bind the token to the video ID&lt;/strong&gt; (&lt;code&gt;mintAsWebsafeString(videoId)&lt;/code&gt;), not a generic identifier — a session-only token still returns empty on &lt;code&gt;timedtext&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;amp;c=WEB&lt;/code&gt; is required&lt;/strong&gt; alongside &lt;code&gt;&amp;amp;pot=&lt;/code&gt;. Miss it and you're back to the empty &lt;code&gt;200&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The integrity token has a TTL (~12h), so for batches you bootstrap &lt;strong&gt;once&lt;/strong&gt; and reuse the minter.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  I packaged it as a tiny CLI
&lt;/h2&gt;

&lt;p&gt;I got tired of re-deriving this, so I put it in a zero-config MIT package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx get-youtube-transcript https://www.youtube.com/watch?v&lt;span class="o"&gt;=&lt;/span&gt;jNQXAC9IVRw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It does the whole watch-page → PoToken → captions dance and prints the transcript (text/JSON/SRT). Source: &lt;a href="https://github.com/jamhimself/youtube-transcript-cli" rel="noopener noreferrer"&gt;github.com/jamhimself/youtube-transcript-cli&lt;/a&gt;. It's single-video and runs from your IP — perfect for scripts and notebooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it gets hard: scale
&lt;/h2&gt;

&lt;p&gt;The CLI works great until you need &lt;strong&gt;hundreds or thousands&lt;/strong&gt; of videos. Then you hit the &lt;em&gt;other&lt;/em&gt; wall: YouTube rate-limits and blocks datacenter IPs, so an unattended server job gets throttled fast. At that point you need rotating residential proxies + retries + uptime monitoring, which is a different project from "parse the captions."&lt;/p&gt;

&lt;p&gt;That's the part I run as a hosted service on Apify — &lt;a href="https://apify.com/jamhimself/youtube-transcript-extractor" rel="noopener noreferrer"&gt;a transcript scraper&lt;/a&gt; and a &lt;a href="https://apify.com/jamhimself/youtube-channel-transcripts" rel="noopener noreferrer"&gt;whole-channel-to-RAG&lt;/a&gt; version that lists a channel and returns every video's transcript as chunked, embed-ready text. Same engine as the CLI, just with the proxy/uptime layer so you don't babysit it. (Mentioning it because "how do I do this at scale" is the inevitable next question — but the CLI above is genuinely all you need for low-volume.)&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Empty transcripts in 2026 = missing &lt;strong&gt;PoToken&lt;/strong&gt; + datacenter-IP blocks.&lt;/li&gt;
&lt;li&gt;Recipe: watch page → caption tracks → mint a video-bound PoToken (&lt;code&gt;bgutils-js&lt;/code&gt;) → fetch with &lt;code&gt;&amp;amp;pot=&amp;amp;c=WEB&amp;amp;fmt=json3&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For one-off use, &lt;code&gt;npx get-youtube-transcript &amp;lt;url&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For scale, you need residential proxies + retries on top — that's the real cost, not the parsing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your captions pipeline has been quietly returning blanks, now you know why. Go check your RAG index. 🙃&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
