DEV Community

Cover image for I got tired of 40-minute tutorials, so I built an AI YouTube Summarizer with Next.js ⚡
The MOMD
The MOMD

Posted on

I got tired of 40-minute tutorials, so I built an AI YouTube Summarizer with Next.js ⚡

Have you ever clicked on a 40-minute YouTube tutorial only to realize the core information could have been conveyed in a 2-minute read? As developers, time is our most valuable asset.

That's exactly why I built YT Summarizer—a lightning-fast, open-source web application that extracts transcripts from any YouTube video and uses Google's Gemini 2.5 AI to generate a highly readable, perfectly formatted Markdown summary.

In this article, I’ll walk you through why I built it, the tech stack I chose, and how you can use (or contribute to) the project!

🚀 The Problem
Video content is amazing for visual learners, but terrible for quick referencing. If I need to remember a specific command or concept from a video I watched yesterday, I have to scrub through the timeline to find it. I wanted a tool that:

Gives me the text instantly.
Formats it beautifully so I can paste it into my Notion or Obsidian.
Automatically detects the language (especially my native language, Farsi) and formats the layout accordingly.
🛠️ The Tech Stack
I wanted to build this to be as lightweight, modern, and cost-effective as possible. Here is what I used:

Next.js (App Router): For lightning-fast routing and server-side API endpoints.
Vanilla CSS: I completely skipped heavy styling frameworks. By using standard CSS Variables, the app is incredibly lightweight and natively supports seamless Dark/Light modes.
Google Gemini 2.5 Flash: Google's free tier for Gemini is incredibly generous and blazingly fast. It excels at reasoning through large blocks of transcript text and outputting structured Markdown.
youtube-transcript: A simple npm package to bypass the need for a YouTube API key just to get captions.
🧠 How it Works Under the Hood
The flow of the application is surprisingly straightforward:

The User pastes a YouTube URL into the beautiful, minimalist UI.
The Backend (Next.js Route Handler) extracts the video ID, fetches the raw caption XML using youtube-transcript, and cleans it up.
The Prompt: The raw text is passed to the Gemini API with a strict system prompt telling it to act as an expert technical writer. It is instructed to detect the language (Outputting RTL for Persian/Arabic and LTR for English) and return the response purely in Markdown.
The UI: The frontend uses react-markdown to render the response into beautiful typography, with a handy "Copy to Clipboard" button.
🌍 Open Source & Free to Self-Host
The best part? You can run this entirely for free. Because the project relies on Google AI Studio's free tier, you can clone the repository, add your own API key, and deploy it to Vercel in less than 5 minutes.

Check out the project on GitHub: 👉 https://github.com/the-momd/yt-summarizer

If you find it useful, I’d deeply appreciate it if you gave the repository a ⭐!

Contributions are totally welcome—whether you want to add new features, improve the UI, or fix bugs, feel free to fork it and open a Pull Request.

I’m The MOMD, a Full-Stack Software Engineer who loves building functional open-source tools. Let’s connect on Twitter or check out my Portfolio!

Top comments (0)