We've all been there. You see a 2-hour podcast titled "The Future of AI is Here". You want the insights, but you don't have 2 hours.
Sure, there are tools that do this. But why pay $20/month when you can build your own in 15 minutes?
In this tutorial, we'll build a YouTube Video Summarizer that:
- Takes a YouTube URL.
- Extracts the full transcript (even if captions are auto-generated).
- Uses GPT-4 to summarize it into bullet points.
The Stack
- Frontend: Next.js 14 (App Router)
- AI: OpenAI API (GPT-4o-mini)
- Data: SociaVault API (YouTube Transcript Extraction)
Why not just use youtube-dl?
You could use youtube-dl or ytdl-core, but YouTube constantly changes their DOM and rate-limits server-side requests. If you deploy a ytdl-core app to Vercel, it will likely get blocked immediately because the IP is flagged.
We'll use SociaVault because it handles the proxies and rotation for us.
Step 1: Setup
Create a new Next.js app:
npx create-next-app@latest yt-summarizer
cd yt-summarizer
npm install openai
Get your API keys:
- OpenAI Key: platform.openai.com
- SociaVault Key: sociavault.com (Free tier works fine)
Add them to .env.local:
OPENAI_API_KEY=sk-...
SOCIAVAULT_API_KEY=sv_...
Step 2: The Backend Route
We need a server-side route to handle the secrets. Create app/api/summarize/route.ts.
This route does two things:
- Fetches the transcript.
- Sends it to OpenAI.
import { NextResponse } from "next/server";
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function POST(req: Request) {
const { videoUrl } = await req.json();
// 1. Extract Video ID
const videoId = videoUrl.split("v=")[1]?.split("&")[0];
if (!videoId) return NextResponse.json({ error: "Invalid URL" }, { status: 400 });
try {
// 2. Get Transcript from SociaVault
const transcriptRes = await fetch(
`https://api.sociavault.com/api/v1/youtube/video/${videoId}/transcript`,
{
headers: { "x-api-key": process.env.SOCIAVAULT_API_KEY! },
}
);
if (!transcriptRes.ok) throw new Error("Failed to fetch transcript");
const data = await transcriptRes.json();
// Combine segments into one text block
const fullText = data.transcript
.map((item: any) => item.text)
.join(" ")
.slice(0, 15000); // Limit length for token budget
// 3. Summarize with OpenAI
const completion = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content: "You are a helpful assistant. Summarize the following YouTube video transcript into 5 key bullet points. Be concise.",
},
{ role: "user", content: fullText },
],
});
return NextResponse.json({
summary: completion.choices[0].message.content
});
} catch (error) {
return NextResponse.json({ error: "Something went wrong" }, { status: 500 });
}
}
Step 3: The Frontend
Now, a simple UI to take the input. app/page.tsx:
"use client";
import { useState } from "react";
export default function Home() {
const [url, setUrl] = useState("");
const [summary, setSummary] = useState("");
const [loading, setLoading] = useState(false);
const handleSummarize = async () => {
setLoading(true);
const res = await fetch("/api/summarize", {
method: "POST",
body: JSON.stringify({ videoUrl: url }),
});
const data = await res.json();
setSummary(data.summary);
setLoading(false);
};
return (
<main className="min-h-screen flex flex-col items-center justify-center p-24 bg-gray-50">
<div className="max-w-2xl w-full space-y-8">
<h1 className="text-4xl font-bold text-center text-gray-900">
📺 YouTube Summarizer
</h1>
<div className="flex gap-4">
<input
type="text"
placeholder="Paste YouTube URL..."
className="flex-1 p-4 rounded-lg border border-gray-300"
value={url}
onChange={(e) => setUrl(e.target.value)}
/>
<button
onClick={handleSummarize}
disabled={loading}
className="bg-blue-600 text-white px-8 py-4 rounded-lg hover:bg-blue-700 disabled:opacity-50"
>
{loading ? "Thinking..." : "Summarize"}
</button>
</div>
{summary && (
<div className="bg-white p-8 rounded-xl shadow-sm border prose">
<h3 className="text-xl font-semibold mb-4">Summary</h3>
<div className="whitespace-pre-wrap">{summary}</div>
</div>
)}
</div>
</main>
);
}
Testing It Out
Run npm run dev and paste a URL.
I tested it on a 45-minute Lex Fridman podcast.
Result: It extracted the transcript in ~2 seconds and generated a summary in ~5 seconds.
Total cost per run?
- SociaVault: Free tier covers 50 requests.
- OpenAI: ~$0.01 per summary.
Next Steps
You can take this further:
- Chat with Video: Store the transcript in a vector DB (Pinecone) and use RAG to ask questions like "What did he say about Aliens?".
- Timestamp Linking: The API returns timestamps for every sentence. You could make the summary bullets clickable to jump to that part of the video.
Happy building!
Resources:
Top comments (0)