DEV Community

Cover image for Building a YouTube Transcript Extraction Service: From Idea to 10K+ Monthly Users
Nic Bars
Nic Bars

Posted on

Building a YouTube Transcript Extraction Service: From Idea to 10K+ Monthly Users

Building a YouTube Transcript Extraction Service: From Idea to 10K+ Monthly Users

Hey developers! 👋

A few months ago, I got frustrated paying $50/month for simple YouTube transcript extraction tools. As a developer, I thought "how hard can this be?" - famous last words, right?

Turns out, building a robust YouTube transcript service taught me more about web scraping, API rate limits, and user experience than I expected. Here's how I built it from scratch.

The Problem I Was Solving

Most transcript tools either:

  • Cost way too much ($30-50/month)
  • Have terrible UX with ads everywhere
  • Don't preserve timestamps
  • Can't handle different languages
  • Break when YouTube changes their structure

I wanted something clean, fast, and free. So I built YouTubeNavigator.com

Tech Stack Overview

Frontend:

  • Next.js 14 (App Router)
  • TypeScript
  • Tailwind CSS
  • React Hook Form

Backend:

  • Next.js API Routes
  • Node.js
  • YouTube Transcript API
  • Vercel for deployment

Key Libraries:

npm install youtube-transcript
npm install get-video-id
npm install react-youtube

Enter fullscreen mode Exit fullscreen mode

Step 1: Understanding YouTube's Transcript System

YouTube stores transcripts in a specific format that's not immediately obvious. Here's what I learned:

// Basic transcript fetching
import { YoutubeTranscript } from 'youtube-transcript';

async function getTranscript(videoId) {
  try {
    const transcript = await YoutubeTranscript.fetchTranscript(videoId);
    return transcript;
  } catch (error) {
    console.error('Transcript fetch failed:', error);
    throw new Error('No transcript available');
  }
}
Enter fullscreen mode Exit fullscreen mode

The tricky part? YouTube has multiple transcript formats:

  • Auto-generated captions
  • Manual captions
  • Different languages
  • Various quality levels

Step 2: Building the API Endpoint

Here's my main API route that handles the heavy lifting:

// app/api/fetch-transcript/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { YoutubeTranscript } from 'youtube-transcript';
import getVideoId from 'get-video-id';

export async function GET(request: NextRequest) {
  const { searchParams } = new URL(request.url);
  const url = searchParams.get('url');
  const includeTimestamps = searchParams.get('timestamps') === 'true';

  if (!url) {
    return NextResponse.json({ error: 'URL is required' }, { status: 400 });
  }

  try {
    // Extract video ID from various YouTube URL formats
    const videoData = getVideoId(url);
    if (!videoData.id) {
      throw new Error('Invalid YouTube URL');
    }

    // Fetch transcript with error handling
    const transcriptData = await YoutubeTranscript.fetchTranscript(videoData.id, {
      lang: 'en', // Default to English, but we can handle multiple languages
    });

    if (includeTimestamps) {
      // Format with timestamps preserved
      const formattedTranscript = transcriptData.map(item => ({
        timestamp: formatTime(item.offset),
        text: item.text,
        startTimeMs: item.offset
      }));

      return NextResponse.json({
        transcript: formattedTranscript.map(item => 
          `${item.timestamp} ${item.text}`
        ).join('\n'),
        segments: formattedTranscript
      });
    }

    // Plain text version
    const plainText = transcriptData.map(item => item.text).join(' ');

    return NextResponse.json({ transcript: plainText });

  } catch (error) {
    console.error('Transcript extraction failed:', error);
    return NextResponse.json(
      { error: 'Failed to extract transcript. Video may not have captions.' },
      { status: 500 }
    );
  }
}

function formatTime(milliseconds) {
  const totalSeconds = Math.floor(milliseconds / 1000);
  const minutes = Math.floor(totalSeconds / 60);
  const seconds = totalSeconds % 60;
  return `${minutes}:${seconds.toString().padStart(2, '0')}`;
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Frontend Implementation

The frontend needed to be dead simple. Here's the core component:

// components/TranscriptExtractor.tsx
'use client';

import { useState } from 'react';
import { toast } from 'react-hot-toast';

export default function TranscriptExtractor() {
  const [url, setUrl] = useState('');
  const [transcript, setTranscript] = useState('');
  const [loading, setLoading] = useState(false);

  const handleSubmit = async (e) => {
    e.preventDefault();
    if (!url) return;

    setLoading(true);
    try {
      const response = await fetch(
        `/api/fetch-transcript?url=${encodeURIComponent(url)}&timestamps=true`
      );

      if (!response.ok) {
        const error = await response.json();
        throw new Error(error.error);
      }

      const data = await response.json();
      setTranscript(data.transcript);
      toast.success('Transcript extracted successfully!');

    } catch (error) {
      toast.error(error.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div className="max-w-4xl mx-auto p-6">
      <form onSubmit={handleSubmit} className="mb-8">
        <div className="flex gap-4">
          <input
            type="url"
            value={url}
            onChange={(e) => setUrl(e.target.value)}
            placeholder="Paste YouTube URL here..."
            className="flex-1 px-4 py-2 border rounded-lg"
            required
          />
          <button
            type="submit"
            disabled={loading}
            className="px-6 py-2 bg-blue-600 text-white rounded-lg disabled:opacity-50"
          >
            {loading ? 'Extracting...' : 'Get Transcript'}
          </button>
        </div>
      </form>

      {transcript && (
        <div className="bg-gray-50 p-6 rounded-lg">
          <pre className="whitespace-pre-wrap text-sm">
            {transcript}
          </pre>
        </div>
      )}
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Adding Advanced Features

Interactive Timestamps

One feature that sets my transcript tool apart is clickable timestamps that jump to video positions:

// Enhanced transcript display with video integration
import YouTube from 'react-youtube';

function InteractiveTranscript({ segments, videoId }) {
  const [player, setPlayer] = useState(null);

  const seekToTime = (timeInSeconds) => {
    if (player) {
      player.seekTo(timeInSeconds, true);
      player.playVideo();
    }
  };

  return (
    <div className="grid grid-cols-1 lg:grid-cols-2 gap-6">
      {/* YouTube Player */}
      <div>
        <YouTube
          videoId={videoId}
          onReady={(event) => setPlayer(event.target)}
          opts={{
            height: '400',
            width: '100%',
            playerVars: { autoplay: 0, controls: 1 }
          }}
        />
      </div>

      {/* Interactive Transcript */}
      <div className="max-h-96 overflow-y-auto">
        {segments.map((segment, index) => (
          <div key={index} className="mb-4 group">
            <button
              onClick={() => seekToTime(Math.floor(segment.startTimeMs / 1000))}
              className="text-blue-600 hover:text-blue-800 font-mono text-sm mb-1 flex items-center gap-2"
            >
              <PlayIcon className="w-3 h-3 opacity-0 group-hover:opacity-100" />
              {segment.timestamp}
            </button>
            <p className="text-gray-800 text-sm ml-5">
              {segment.text}
            </p>
          </div>
        ))}
      </div>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

Multiple Export Formats

Users wanted different formats, so I added SRT subtitle generation:

function generateSRT(segments) {
  return segments.map((segment, index) => {
    const startTime = formatTimeForSRT(segment.startTimeMs);
    const endTime = formatTimeForSRT(segment.startTimeMs + 3000); // 3 second duration

    return `${index + 1}
${startTime} --> ${endTime}
${segment.text}
`;
  }).join('\n');
}

function formatTimeForSRT(milliseconds) {
  const totalSeconds = Math.floor(milliseconds / 1000);
  const hours = Math.floor(totalSeconds / 3600);
  const minutes = Math.floor((totalSeconds % 3600) / 60);
  const seconds = totalSeconds % 60;
  const ms = milliseconds % 1000;

  return `${hours.toString().padStart(2, '0')}:${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Handling Edge Cases

Real-world usage taught me about edge cases:

URL Validation

function isValidYouTubeUrl(url) {
  const patterns = [
    /^https?:\/\/(www\.)?youtube\.com\/watch\?v=[\w-]+/,
    /^https?:\/\/youtu\.be\/[\w-]+/,
    /^https?:\/\/(www\.)?youtube\.com\/embed\/[\w-]+/
  ];

  return patterns.some(pattern => pattern.test(url));
}
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

// Simple in-memory rate limiting
const rateLimiter = new Map();

function checkRateLimit(ip) {
  const now = Date.now();
  const windowMs = 60 * 1000; // 1 minute
  const maxRequests = 10;

  if (!rateLimiter.has(ip)) {
    rateLimiter.set(ip, { count: 1, resetTime: now + windowMs });
    return true;
  }

  const limit = rateLimiter.get(ip);
  if (now > limit.resetTime) {
    limit.count = 1;
    limit.resetTime = now + windowMs;
    return true;
  }

  if (limit.count >= maxRequests) {
    return false;
  }

  limit.count++;
  return true;
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Performance Optimizations

Caching Strategy

// Redis-like caching for transcripts
const cache = new Map();
const CACHE_TTL = 24 * 60 * 60 * 1000; // 24 hours

function getCachedTranscript(videoId) {
  const cached = cache.get(videoId);
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data;
  }
  return null;
}

function setCachedTranscript(videoId, data) {
  cache.set(videoId, {
    data,
    timestamp: Date.now()
  });
}
Enter fullscreen mode Exit fullscreen mode

Lazy Loading Components

// Dynamic imports for better performance
import dynamic from 'next/dynamic';

const TranscriptResult = dynamic(() => import('./TranscriptResult'), {
  ssr: false,
  loading: () => <div>Loading transcript...</div>
});
Enter fullscreen mode Exit fullscreen mode

Step 7: SEO and Discoverability

Since I wanted the YouTube transcript tool to rank well, I focused on SEO:

// SEO-optimized metadata
export const metadata = {
  title: 'Free YouTube Transcript Extractor - Download Video Transcripts',
  description: 'Extract YouTube video transcripts for free. Download as TXT, SRT with timestamps. No signup required.',
  keywords: 'youtube transcript, video transcript, subtitle extractor, free transcript tool',
  openGraph: {
    title: 'YouTube Transcript Extractor',
    description: 'Free tool to extract and download YouTube video transcripts',
    url: 'https://youtubenavigator.com/youtube-transcript',
  }
};
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Start Simple: My first version was just a form and text area. Added features based on user feedback.
  2. Error Handling is Critical: YouTube's API can be unpredictable. Robust error handling saved me countless support requests.
  3. Performance Matters: Caching reduced API calls by 80% and improved response times significantly.
  4. User Experience Wins: The interactive timestamps feature got more positive feedback than anything else.
  5. SEO Takes Time: It took 3 months to start ranking for "YouTube transcript" keywords.

Current Stats

After 6 months, the transcript extraction service now handles:

  • 10,000+ monthly active users
  • 500,000+ transcripts extracted
  • 99.2% uptime
  • Average response time: 1.2 seconds

What's Next?

I'm working on:

  • Multi-language transcript support
  • Batch processing for multiple videos
  • API access for developers
  • Integration with popular note-taking apps

Try It Yourself

Want to see it in action? Check out the YouTube Transcript Extractor — it's completely free and no signup required.

BTW, I’ve also built a few other tools that complement it:

The full source code concepts I've shared here should give you a solid foundation for building your own transcript service. The key is starting simple and iterating based on real user needs.

Resources and Links

Building this YouTube transcript service taught me that sometimes the best products come from solving your own problems. What developer tool will you build next?


Have questions about implementing any of these features? Drop them in the comments! I'm always happy to help fellow developers build cool stuff.

Tags: #webdev #javascript #nextjs #youtube #api #typescript #react

Top comments (0)