Beyond Simple RAG: Building an Agentic Workflow with Next.js, Python, and Supabase

abhinav sriharsha — Wed, 24 Dec 2025 19:51:01 +0000

The Problem: "Chat with PDF" is the new Hello World.
Building a basic RAG app is easy today. You upload a 5-page PDF, split it into 1000-character chunks, and it works.
But when I tried this with a 500-page university textbook, the standard pipeline fell apart.

I didn't want a chatbot. I wanted a tutor. So I built Learneazy.io.

Here is the engineering deep dive into the 3-Layer RAG Pipeline and Generative Flashcard Engine I architected to solve this.

The Secret Sauce: 3-Layer Semantic Indexing

Most RAG apps treat a document as one giant blob of text. I realized that textbooks have structure (Index -> Chapters -> Content), so I mirrored that structure in my database.

I built a Python (Flask) microservice using PyMuPDF to handle the heavy lifting. Instead of simple recursive splitting, it processes every textbook into three distinct layers:

Layer 1: The "Skeleton" (Table of Contents)
- Purpose: Quick, high-level structural queries.
Layer 2: The "Container" (Chapter-wise Chunks)
- Purpose: Context-aware searches. This ensures that when you ask about "Thermodynamics in Chapter 4," we only search Chapter 4.
Layer 3: The "Deep Dive" (Granular Chunks)
- Purpose: Answering specific, deep-dive questions where every nuance matters.

The Brain: Agentic Routing

A hierarchical index is useless if you don't know which layer to search.

I implemented a LangChain Agent equipped with custom tools for each layer. The Agent acts as a router:

User: "How many chapters are there?" → Agent: Calls Index Tool. (Fast, cheap).
User: "Summarize Chapter 3." → Agent: Calls Chapter Tool. (High context).
User: "Explain the formula for X." → Agent: Calls Deep Dive Tool. (High precision).

This routing logic reduced my token usage by ~40% and drastically improved accuracy.

Beyond Chat: The Flashcard Engine

This was the hardest technical challenge. I wanted users to be able to say: "Generate 10 flashcards for Chapter 5."

The AI couldn't just "guess." I had to build a specific Generative Workflow:

Topic Extraction: First, the system scans the Chapter Layer to identify key themes (e.g., "Mitochondria," "Krebs Cycle").
Context Retrieval: It performs a targeted vector search in the Deep Dive Layer specifically for those topics to get the definitions.
Synthesis: The LLM (Google Gemini) formats these grounded facts into strict Q&A pairs.

The result? Flashcards that are generated from your specific material, not general internet knowledge.

The Stack: Why Microservices?

I split the architecture to optimize for performance:

Frontend: Next.js 16 (React 19) for a snappy, responsive UI.
Processing Service: Python (Flask). Python is simply better at PDF manipulation and chunking logic than Node.js.
Embeddings: Cohere (embed-english-v3.0). I chose this over OpenAI's embeddings because Cohere's latest model is specifically finetuned for RAG and retrieval quality.
Database: Supabase (PostgreSQL + pgVector). Storing vectors right next to my user data (Auth, Metadata) simplified my backend significantly.

GitHub: github.com/Abhinav-Sriharsha

DEV Community: abhinav sriharsha

Beyond Simple RAG: Building an Agentic Workflow with Next.js, Python, and Supabase

The Secret Sauce: 3-Layer Semantic Indexing

The Brain: Agentic Routing

Beyond Chat: The Flashcard Engine

The Stack: Why Microservices?