yukta31

Posted on May 17

How I Built an AI Knowledge Engine for My University Using RAG

#webdev #ai #machinelearning #python

When I started my MS CS program at George Mason University, I noticed
a frustrating problem — finding accurate information about GMU
policies, deadlines, and resources meant digging through dozens of
scattered web pages. So I built GMU SmartPatriot, an AI-powered
knowledge engine that answers student questions by pulling from 200+
real GMU web pages.

Here's exactly how I built it.

The Problem with Basic Chatbots

A regular chatbot just generates text based on training data. Ask it
about GMU's Spring 2026 registration deadline and it will either
hallucinate an answer or say it doesn't know. Neither is useful.

The solution is RAG — Retrieval Augmented Generation. Instead of
relying on the LLM's memory, you give it real, verified documents to
read before answering. The answer is grounded in actual source
material, not generated from thin air.

How RAG Works (Simply)

Scrape — collect your source documents
Index — store them in a searchable format
Retrieve — when a user asks a question, find the most relevant documents
Generate — pass those documents + the question to an LLM and let it answer

That's it. The LLM becomes a reader, not a guesser.

What I Built With

Cheerio — for scraping 200+ GMU web pages
Node.js + Next.js — backend and frontend
Groq API (llama-3.1) — for fast LLM inference, free tier
Vercel — serverless deployment
TypeScript — type safety throughout

The Scraping Challenge

The first problem was data collection. GMU's website has hundreds of
pages across different departments — academic calendars, financial aid,
housing, IT support, and more.

I used Cheerio to scrape and parse HTML, extracting clean text from
each page. The tricky part was handling inconsistent page structures —
some pages used tables, others used lists, others were just paragraphs.
I wrote a preprocessing step to normalize everything into clean chunks
of text.

The result: a structured knowledge base of 200+ pages, ready to query.

Building the Retrieval Pipeline

For retrieval, I used keyword-based search combined with semantic
matching. When a user asks a question:

Extract key terms from the question
Search the knowledge base for relevant chunks
Rank results by relevance
Pass top 3-5 chunks to the LLM as context

This is the core of RAG — the quality of your retrieval directly
determines the quality of your answers.

Conversation Memory

One thing basic RAG implementations miss is memory. If a user asks
"What are the registration deadlines?" then follows up with "What
about for graduate students?" — a memoryless system loses context
on the second question.

I implemented a sliding window memory of 5-7 turns. Each new question
gets the last N exchanges as context, so the conversation feels natural
and continuous.

The Result

Response latency under 2 seconds
Answers grounded in real GMU content
No hallucinations about university-specific information
Multi-turn conversation that maintains context

What I Learned

Ground your LLM. Ungrounded LLMs are confident and wrong. RAG
makes them confident and right — as long as your source data is
accurate.

Retrieval quality matters more than model quality. A great
retrieval step with a small model beats poor retrieval with a large
model every time.

Chunking is an art. How you split your documents into chunks
significantly affects retrieval quality. Too small and you lose
context. Too large and you overwhelm the LLM's context window.

What's Next

I'm currently exploring vector embeddings for semantic search to
replace keyword matching — this will significantly improve retrieval
accuracy for complex questions.

The code is on GitHub: github.com/yukta31

If you're building something similar or want to discuss RAG
architectures, connect with me on LinkedIn.

DEV Community