Fais Azis Wibowo

Posted on Mar 23 • Edited on Mar 29

Designing and Deploying an AI-Powered EdTech SaaS with LLM Integration

#ai #nextjs #webdev #tutorial

It Started With My Brother

My brother has a habit. Whenever he's studying from a YouTube video, he doesn't just watch it — he wants to be tested on it. Every time, he'd come to me: "Can you make me a quiz from this video?"

Every. Single. Time.

The first few times, I did it manually. Watched the video, wrote some questions, and formatted them. It took longer to make the quiz than it took him to watch the video. There had to be a better way.

That frustration became Skoowl AI.

I'm Fais — a 20-year-old CS enthusiast from Indonesia. I had never shipped a full SaaS product before. I had the technical background (Next.js, TypeScript, some AI/ML work), but building something real that is deployed and actually used was new territory.

This is the full story of how I built Skoowl AI: what it does, every major tech decision, the AI pipeline that powers it, and what's coming next.

What Is Skoowl AI?

Skoowl AI is a study platform that takes raw educational content — PDFs, audio recordings, YouTube videos — and transforms it into structured, study-ready materials using large language models.

Here's what it can do:

Feature	What it does
📝 Smart Notes	Auto-generate formatted study notes from any uploaded file
⚡ Flashcards	Create spaced-repetition flashcards for key terms and concepts
❓ Adaptive Quizzes	Generate MCQ, true/false, or fill-in-the-blank quizzes with AI hints
🧠 Mind Maps	Visualize topics with interactive Radial, Tree, Fishbone, and much more layouts
🎙️ Live Transcription	Record lectures in real-time or upload audio for instant transcription
📺 YouTube Learning	Paste a video URL, extract and process the knowledge directly
🗣️ Chat Assistant	Ask questions and get answers scoped to your own study notes

The idea is simple: you bring the content, Skoowl handles the processing. You spend your time learning, not formatting.

The Stack — And Why I Chose It

Every decision in this stack was deliberate. Here's the breakdown.

⚡ Frontend

Next.js 15 (App Router) + React 19 + TypeScript

Next.js with the App Router gives me server components, streaming, and a clean file-based routing model in one package. React 19's concurrency improvements matter specifically for this app — the UI needs to stay responsive while AI is generating content in the background. TypeScript across the entire codebase keeps everything honest; when you're wiring together AI responses, database models, and API handlers, loose types cause real bugs.

For styling, Tailwind CSS v4 handles the utility-first layout and design system. Framer Motion covers complex, fluid UI animations — page transitions, entrance/exit effects, and loading states. Radix UI provides the headless, accessible primitives (dialogs, dropdowns, tabs, accordions) so I didn't have to reinvent accessibility from scratch, and Lucide React keeps the icon system consistent throughout.

🎨 Styling & UI Enhancements

Tiptap + React Markdown + KaTeX + React Flow + Three.js + GSAP

AI-generated notes need to be editable, so I built a full rich text editing experience using Tiptap — a headless editor that supports highlights, text alignment, color, and more. Users can edit, format, and reorganize their generated notes directly in the app.

For rendering AI output that includes markdown, tables, and mathematical formulas, react-markdown with remark/rehype plugins and KaTeX handles LaTeX math rendering cleanly. This matters for STEM content — physics equations and calculus notation render correctly, not as broken text.

React Flow powers the interactive mind maps — a node-based graph UI where users can drag, expand, and explore topic hierarchies.

For the heavier visual layer: Three.js and React Three Fiber handle interactive 3D elements in the UI, ShaderGradient creates the smooth WebGL-powered gradient backgrounds, and GSAP handles complex timeline-based animation sequences that go beyond what Framer Motion covers.

🗄️ Backend & Database

PostgreSQL (Neon) + Prisma + Upstash Redis + Clerk + Dodo Payments

PostgreSQL via Neon Serverless is the production database — relational data fits this product well since users, documents, and generated content all have clear structure. Locally, I use SQLite for fast, zero-config development. Prisma v5 sits on top as the ORM, giving type-safe queries and clean migrations that talk directly to TypeScript.

Upstash Redis handles two things: caching expensive AI results so the same document doesn't get reprocessed unnecessarily, and rate limiting API routes to protect endpoints from abuse. Both are the kind of infrastructure you don't think about until you need them — I added Upstash early and it paid off.

Clerk handles user authentication — sessions, OAuth, user management. It took roughly a day to integrate and has needed zero maintenance since. Rolling your own auth is a classic beginner mistake; I skipped it.

Dodo Payments handles the billing and subscription infrastructure. Svix sits alongside it for webhook signature verification — ensuring incoming webhooks from Clerk and Dodo are legitimate before acting on them.

🤖 AI Layer

Full pipeline breakdown in the next section, but the core libraries:

Vercel AI SDK — the backbone of all AI integration, handles streaming responses seamlessly to the React frontend
Google Gemini + OpenAI (via Vercel AI SDK) — fast, efficient reasoning for notes, quizzes, and all text generation
Deepgram — high-quality real-time audio and speech transcription, covering both uploaded files and live recordings

📄 File Processing

The content ingestion layer handles more formats than most people expect:

Documents: pdf-parse for PDFs, mammoth for Word files, officeparser for PowerPoint and other Office formats
YouTube: youtube-transcript for caption extraction, @distube/ytdl-core and yt-dlp-exec for videos requiring deeper audio/video processing
Audio: Deepgram SDK for both batch and real-time speech-to-text

Skoowl AI - AI-Powered Study Assistant

Turn your lectures into structured notes, flashcards, quizzes, and mind maps instantly.

skoowlai.com

The AI Pipeline — How It Actually Works

1. Document Pipeline (PDF / Word / Text)

The file is parsed server-side to extract raw text, then sent to Gemini via the Vercel AI SDK with a structured prompt. The SDK handles streaming, so users see notes generating word-by-word rather than waiting for a full response.

For structured outputs like flashcard arrays or quiz sets, the model returns JSON validated by Zod — if the response doesn't match the expected schema, it retries before surfacing an error to the user.

2. Audio Pipeline

Deepgram handles both modes:

Uploaded audio: Batch transcription — the file is sent and a full transcript comes back, then feeds into the same Gemini pipeline.
Live recording: Deepgram's streaming API returns incremental transcripts as the user speaks, with low enough latency that the text appears almost in real-time.

3. YouTube Pipeline

This one is the reason Skoowl exists — my brother's quiz requests, remember?

The pipeline uses a YouTube transcript to extract auto-generated or creator-provided captions from a YouTube URL, cleans them up, and sends them through Gemini like any other text input. For videos without captions, yt-dlp-exec pulls the audio, which then goes through Deepgram for transcription first.

Paste a link, get a quiz in seconds. My brother now generates his own quizzes. Problem solved.

4. Mind Maps

The model is prompted to return a structured JSON graph representing the topic hierarchy. React Flow renders it as an interactive visual — users can drag, expand, and explore nodes. Zod validates the JSON schema before it ever reaches the renderer.

Deployment & Getting to 100+ Users

Skoowl AI is deployed on Vercel, which was the obvious choice given the Next.js stack. Edge functions, automatic preview deployments per branch, and zero-config CI/CD — it removed an entire category of DevOps problems.

Getting the first users was uncomfortable in a productive way. I posted in a few Reddit communities, framed it as "I built this, honest feedback welcome." The early adopters came from that. Real usage exposed things that no amount of local testing would have found — different file encodings, unexpected internet speeds affecting streaming, and mobile layouts that needed work.

The 100+ international users milestone came within the first few weeks, which validated the core idea more than any internal testing could.

What's Next for Skoowl AI

The current feature set covers the core study workflow, but there's a lot more planned:

🔍 Discover — An AI-powered research assistant that automatically finds relevant materials for your topic — web searches, books, papers — so you don't have to start from scratch
📊 Slides — Auto-generate presentation slides directly from your uploaded content
🖼️ Infographics — Turn dense information into shareable visual summaries
🎙️ AI Podcast — Convert your study materials into a conversational audio format you can listen to on the go
And many more new features in the future...

The Discover feature is the one I'm most excited about. Right now, users bring their own content to Skoowl AI. Discover flips that — Skoowl AI helps you find the content in the first place.

Closing

Building Skoowl AI taught me more in a few months than I could have learned any other way. Shipping something real, getting it in front of real users, and watching it actually solve a problem — even a small one like my brother's quiz requests — is a different kind of education.

If you've built something similar, have thoughts on the stack, or want to explore a collaboration, drop a comment. I read all of them.

🔗 Website: skoowlai.com
🔗 GitHub: github.com/faissssss/skoowlai
🔗 Instagram: instagram.com/skoowlai/
🔗 X: x.com/skoowlai
🔗 Linkedin: linkedin.com/company/skoowl-ai/
🔗 Tiktok: tiktok.com/@skoowlai
🔗 Discord: discord/skoowlai

This was my first SaaS. It won't be my last.

DEV Community