DEV Community

Cover image for Beyond Simple RAG: Building an Agentic Workflow with Next.js, Python, and Supabase
abhinav sriharsha
abhinav sriharsha Subscriber

Posted on

Beyond Simple RAG: Building an Agentic Workflow with Next.js, Python, and Supabase

The flow of RAG Application

The Problem: "Chat with PDF" is the new Hello World.
Building a basic RAG app is easy today. You upload a 5-page PDF, split it into 1000-character chunks, and it works.
But when I tried this with a 500-page university textbook, the standard pipeline fell apart.

I didn't want a chatbot. I wanted a tutor. So I built Learneazy.io.

Here is the engineering deep dive into the 3-Layer RAG Pipeline and Generative Flashcard Engine I architected to solve this.


The Secret Sauce: 3-Layer Semantic Indexing

Most RAG apps treat a document as one giant blob of text. I realized that textbooks have structure (Index -> Chapters -> Content), so I mirrored that structure in my database.

I built a Python (Flask) microservice using PyMuPDF to handle the heavy lifting. Instead of simple recursive splitting, it processes every textbook into three distinct layers:

  1. Layer 1: The "Skeleton" (Table of Contents)
    • Purpose: Quick, high-level structural queries.
  2. Layer 2: The "Container" (Chapter-wise Chunks)
    • Purpose: Context-aware searches. This ensures that when you ask about "Thermodynamics in Chapter 4," we only search Chapter 4.
  3. Layer 3: The "Deep Dive" (Granular Chunks)
    • Purpose: Answering specific, deep-dive questions where every nuance matters.

The Brain: Agentic Routing

A hierarchical index is useless if you don't know which layer to search.

I implemented a LangChain Agent equipped with custom tools for each layer. The Agent acts as a router:

  • User: "How many chapters are there?" → Agent: Calls Index Tool. (Fast, cheap).
  • User: "Summarize Chapter 3." → Agent: Calls Chapter Tool. (High context).
  • User: "Explain the formula for X." → Agent: Calls Deep Dive Tool. (High precision).

This routing logic reduced my token usage by ~40% and drastically improved accuracy.


Beyond Chat: The Flashcard Engine

This was the hardest technical challenge. I wanted users to be able to say: "Generate 10 flashcards for Chapter 5."

The AI couldn't just "guess." I had to build a specific Generative Workflow:

  1. Topic Extraction: First, the system scans the Chapter Layer to identify key themes (e.g., "Mitochondria," "Krebs Cycle").
  2. Context Retrieval: It performs a targeted vector search in the Deep Dive Layer specifically for those topics to get the definitions.
  3. Synthesis: The LLM (Google Gemini) formats these grounded facts into strict Q&A pairs.

The result? Flashcards that are generated from your specific material, not general internet knowledge.


The Stack: Why Microservices?

I split the architecture to optimize for performance:

  • Frontend: Next.js 16 (React 19) for a snappy, responsive UI.
  • Processing Service: Python (Flask). Python is simply better at PDF manipulation and chunking logic than Node.js.
  • Embeddings: Cohere (embed-english-v3.0). I chose this over OpenAI's embeddings because Cohere's latest model is specifically finetuned for RAG and retrieval quality.
  • Database: Supabase (PostgreSQL + pgVector). Storing vectors right next to my user data (Auth, Metadata) simplified my backend significantly.

GitHub: github.com/Abhinav-Sriharsha

Top comments (0)