DEV Community

Hammer Nexon
Hammer Nexon

Posted on

Building a YouTube Transcript API with Next.js

If you've ever tried to programmatically access YouTube video transcripts, you know the pain. There's no official endpoint in the YouTube Data API v3 for captions text. You either scrape, reverse-engineer undocumented endpoints, or give up.

I didn't want to give up. I was building ScripTube, a tool that lets anyone paste a YouTube URL and get the full transcript instantly. Here's how I built the backend API using Next.js API routes.

The Problem

YouTube's Data API lets you list caption tracks for a video, but actually downloading the caption text requires OAuth on behalf of the video owner. That's useless if you want transcripts of videos you don't own.

The workaround: YouTube serves auto-generated and manual captions to every viewer through an internal endpoint. Several open-source libraries tap into this.

The Stack

  • Next.js 14 (App Router)
  • youtube-transcript npm package (or youtube-transcript-api for Python)
  • Vercel for deployment
  • Rate limiting via Upstash Redis

Step 1: The API Route

Create app/api/transcript/route.ts:

import { NextRequest, NextResponse } from 'next/server';
import { YoutubeTranscript } from 'youtube-transcript';

export async function POST(req: NextRequest) {
  try {
    const { url } = await req.json();
    if (!url) {
      return NextResponse.json({ error: 'URL is required' }, { status: 400 });
    }
    const videoId = extractVideoId(url);
    if (!videoId) {
      return NextResponse.json({ error: 'Invalid YouTube URL' }, { status: 400 });
    }
    const transcript = await YoutubeTranscript.fetchTranscript(videoId);
    const formatted = transcript.map((entry) => entry.text).join(' ');
    return NextResponse.json({ videoId, transcript: formatted, segments: transcript });
  } catch (error) {
    return NextResponse.json({ error: 'Failed to fetch transcript.' }, { status: 500 });
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Extracting the Video ID

YouTube URLs come in many flavors. You need a robust parser:

function extractVideoId(url: string): string | null {
  const patterns = [
    /(?:youtube\.com\/watch\?v=)([a-zA-Z0-9_-]{11})/,
    /(?:youtu\.be\/)([a-zA-Z0-9_-]{11})/,
    /(?:youtube\.com\/embed\/)([a-zA-Z0-9_-]{11})/,
  ];
  for (const pattern of patterns) {
    const match = url.match(pattern);
    if (match) return match[1];
  }
  if (/^[a-zA-Z0-9_-]{11}$/.test(url)) return url;
  return null;
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Rate Limiting

Without rate limiting, your API will get hammered. I use Upstash Redis:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '60 s'),
});

const ip = req.headers.get('x-forwarded-for') ?? '127.0.0.1';
const { success } = await ratelimit.limit(ip);
if (!success) {
  return NextResponse.json({ error: 'Rate limit exceeded.' }, { status: 429 });
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Formatting the Output

Raw transcript data comes as segments with text, offset, and duration. Serve both plain text and timestamped versions:

const plainText = transcript.map((s) => s.text).join(' ').replace(/\s+/g, ' ').trim();

const withTimestamps = transcript.map((s) => ({
  time: formatTimestamp(s.offset),
  text: s.text,
}));

function formatTimestamp(ms: number): string {
  const totalSeconds = Math.floor(ms / 1000);
  const minutes = Math.floor(totalSeconds / 60);
  const seconds = totalSeconds % 60;
  return `${minutes}:${seconds.toString().padStart(2, '0')}`;
}
Enter fullscreen mode Exit fullscreen mode

Gotchas I Hit

  1. Not all videos have transcripts. Some creators disable captions entirely.
  2. Auto-generated captions have errors. Especially with technical terms and heavy accents.
  3. YouTube occasionally changes internal endpoints. Pin your dependency versions.
  4. Long videos = large payloads. A 3-hour video can be 50K+ words.

Deployment

Deploy to Vercel with vercel --prod. The API routes become serverless functions automatically.

This is essentially what powers ScripTube. The architecture is simple — the complexity is in the edge cases.

If you're building something similar, start simple. One input. One button. Ship it.

Check out ScripTube →

Top comments (0)