"I Built an AI Platform for Students — Here's What the Architecture Actually Looked Like"

Udit Bhatt — Sun, 12 Apr 2026 17:24:31 +0000

By Udit Bhatt | AI Systems Architect & Orchestration Lead, Team FireFury
EduRag Project | Stack: React, FastAPI, Supabase, Gemini AI, Hindsight Memory
I Built an AI Platform for Students — Here's What the Architecture Actually Looked Like

Honestly, the hardest part of building EduRag was not the AI. It was figuring out how to make five very different technologies — React, FastAPI, Supabase, Google Gemini, and a memory system called Hindsight — actually talk to each other in a way that did not turn into a spaghetti mess two weeks in.
I was the systems architect on this project, which sounds fancy but mostly means I was the person who had to make the final call every time two engineers disagreed on where a piece of logic should live — and then stay up fixing it when I was wrong.
What We Were Actually Trying to Build
EduRag is a RAG platform built specifically for education. Teachers upload PDFs — textbooks, notes, past papers. Students ask questions in natural language. The system finds the most relevant chunks from those PDFs, feeds them to Gemini, and returns an answer with source citations.
We also built role-specific dashboards for students, teachers, and admins. Students get study plans, personalized recommendations, and a chatroom. Teachers get analytics on what topics students are searching. Admins manage everything. It deployed on Vercel — both the React frontend and the FastAPI backend — with all data living in Supabase.
Simple enough on paper. Less simple when you are coordinating four engineers across all those layers simultaneously.
The First Architecture Was a Disaster
The original backend was a single large Python file. All the route handlers, all the database calls, all the AI logic — one file. I wrote it myself in the first two days just to get something working. It worked. For about a week.
Then Shrikant needed to add the RAG pipeline and could not find where to put it without breaking my auth code. Then Raghav needed to add memory endpoints and had to read 600 lines of context to understand the database client setup. We had a bug where a student endpoint accidentally had hardcoded teacher-level permissions because a variable was reused in a function three scrolls above.
That was the moment I realized: a working demo and a maintainable system are not the same thing, and I had built the former when we needed the latter.
The Restructure — Four Layers, One Shared Contract
I broke the backend into four clear layers. Routers handle HTTP transport — they parse requests, validate input, and return responses, nothing else. The core layer holds cross-cutting concerns, primarily our RBAC dependency injection. The services layer is where actual business logic lives — the RAG orchestration, the chat broadcasting, the memory integration. Models.py is the single file that defines all Pydantic schemas that every other layer references.
The RBAC piece was the design decision I am most satisfied with. Every protected route in FastAPI declares a dependency like Depends(require_role(['teacher', 'admin'])). The dependency decodes the JWT, extracts the user role, and either passes through or raises a 403. No conditional logic inside route handlers — ever. This meant I could audit every permission in the system by grepping for require_role, rather than reading every function body individually.
The result: when a new engineer joined and needed to add a teacher-only endpoint, they could look at any existing teacher route, copy the dependency pattern, and be correct without asking anyone. That is the test of a good architecture decision — does it guide people toward doing the right thing automatically?
Deploying Everything on Vercel — One Project, No Separate Server
Most teams I have seen split the frontend and backend deployments. We did not. The entire EduRag application — React SPA and FastAPI backend — lives in one Vercel project. React builds to static files and gets served over Vercel's CDN. FastAPI runs as a Python serverless function, invoked on each API request.
The practical benefit is enormous. One deployment. One set of environment variables. One place to check logs. No synchronization between separate services when you push a change. The tradeoff is cold start latency — the first request after the function has been idle takes an extra second or two while Python initializes. We handled this by keeping the Supabase client as a module-level singleton so it does not get recreated on every invocation, and by accepting that educational platforms do not usually have the kind of spiky traffic where cold starts matter much.
Why We Added Hindsight — and Why I Did Not Build It Myself
Halfway through the project, we realized the system felt stateless in a way that hurt the product. A student could ask ten questions about thermodynamics over three sessions and the AI would still respond as if it was their first time. No continuity, no sense that the system knew them at all.
I evaluated three options for fixing this. First: store interaction history in a new Supabase table and inject it into prompts manually. Second: build a Redis-based summary cache. Third: use Hindsight, an external memory API that handles retain, recall, and reflect operations for AI agents.
I chose Hindsight for a specific reason: it solved a data-engineering problem I did not want to own. Building a good memory system — one that summarizes correctly, retrieves by semantic similarity, handles PII safely — would have taken weeks. Hindsight gave us that for the cost of a few API calls and a HINDSIGHT_ENABLED environment flag I could toggle off if it went down.
The flag was not an afterthought. I made it a first-class configuration option from day one because integrating an external dependency without a clean killswitch is how you get paged at 2am when their service has an outage. When Hindsight is disabled, the system degrades gracefully to stateless behavior. No crashes, no 500s — just responses without personalization.
Before and After Memory — What Actually Changed
Before Hindsight, every request was treated as a first contact. The AI had the document context from retrieval, but nothing about who was asking. A student three weeks into studying could get an explanation designed for a complete beginner because we had no signal otherwise.
After Hindsight, each RAG search automatically calls the retain endpoint to store a structured fact about what the student searched. On the next request, the recall endpoint pulls back relevant memories. Those get injected into the generation prompt — not as raw history, but as contextual facts. The AI can now acknowledge prior coverage, adjust its explanation depth, and point toward what the student has not yet explored.
The reflect endpoint turned out to be the most useful feature for teachers. An admin can trigger a Hindsight reflection for any student and get a natural language summary of their entire learning history — topics covered, frequency, gaps. That took about 20 lines of backend code and became one of the features people mentioned most in feedback.
The One Thing I Would Do Differently
I would introduce the layered architecture on day one instead of day five. The cost of restructuring mid-project was about three days of everyone being blocked while I moved things around. If I had drawn the four-layer diagram before writing the first route handler, we would have saved that time and avoided the permission bug entirely.
The other thing: I waited too long to add rate limiting. SlowAPI with 60 requests per minute per IP took about an hour to implement. I should have done it in the first week. Security headers middleware — Content-Security-Policy, HSTS, X-Frame-Options — took maybe 30 minutes. These are not complex features. The only reason to defer them is inertia, and inertia is not a good reason.
What I Took Away From This
• Architecture decisions compound. A good decision made early saves hours. A bad one made early costs days.
• Dependency injection for RBAC is worth the initial setup cost — it makes authorization auditable and guides engineers toward correctness without documentation.
• Never integrate an external dependency without a feature flag killswitch. Especially for AI services with unpredictable availability.
• Serverless deployment for paired frontend and backend is underrated. The operational simplicity is real.
• Memory improves AI products not by making responses smarter, but by making them relevant — which is actually more important.
Final Thought
EduRag is a system I am genuinely proud of, not because of what it can do but because of how it is put together. Four engineers, three weeks, five major technologies, one deployment — and it works cleanly. That does not happen by accident. It happens because someone made a call on where logic should live and held the line on it even when it was inconvenient.
If you are building something similar and want to talk through the architecture decisions, I am always happy to. The choices we made are not universal — they fit our constraints. But the reasoning behind them might save you a few late nights.

— Udit Bhatt, AI Systems Architect, Team FireFury

DEV Community: Udit Bhatt

"I Built an AI Platform for Students — Here's What the Architecture Actually Looked Like"