Vaishali

Posted on May 20 • Edited on May 29

I Thought Building A RAG App Would Teach Me AI

#ai #rag #frontend #webdev

I almost didn’t start this project.

I kept thinking:

“I should learn more backend first.”
“I should understand AI properly first.”
“I should probably build smaller projects before touching RAG.”

So I delayed starting longer than I should have.

But eventually I started anyway.

And honestly, this project taught me more than the previous 2–3 months of passive learning combined.

Not just about RAG or AI.

About debugging, architecture, async systems, API design, and how quickly complexity grows once AI enters the picture.

This article is basically a retrospective on the project — what worked, what broke, and the lessons I’d carry into future AI apps.

If you’re unfamiliar with RAG (Retrieval-Augmented Generation), I wrote a beginner-friendly introduction earlier: How AI Apps Actually Use LLMs: Introducing RAG

🤯 Fear vs Reality

As a frontend developer, I assumed the difficult part would be:

embeddings
vector search
prompt engineering
LangChain

But most problems came from:

state management
API consistency
request synchronization
session isolation
frontend/backend contracts

Most debugging had nothing to do with AI itself.

The hard part wasn’t using the model.
It was building reliable systems around it.

🛠️ What I Built

I built a document-chat application where users can:

upload PDFs
ask questions about documents
receive retrieval-based responses from uploaded files

There’s also an admin side for managing documents and customizing the chat experience.

At first, I underestimated the complexity because the project “only had two pages.”

That assumption turned out to be completely wrong.

UI complexity is rarely about page count.
It’s about state, interactions, and async behavior.

AI applications introduced far more state complexity than I expected.

⚙️ The Stack I Chose

Most choices came down to one thing: Free tiers.

The stack was:

LangChain — retrieval/generation orchestration
Nomic AI embeddings
Groq — LLM inference
Chroma — local vector database
FastAPI (Python) - backend

For a first RAG app, it worked surprisingly well.

🚨 Every Mistake I Made

Most of these weren’t “AI problems” in the way I expected — they were engineering, architecture, and system design problems showing up inside an AI application.

⚠️ Underestimating The Project

At first, I thought the app was “small” because it only had two pages.

That assumption affected almost everything later:

state management
API design
session handling
request synchronization
retrieval architecture

I underestimated how quickly complexity grows once AI workflows enter the picture.

AI apps often look simpler than they actually are because most complexity lives in the system behavior, not the UI.

Lesson: AI projects need architectural planning much earlier than I expected.

⚠️ Mixing UI Messages With LLM Messages

My frontend messages looked like this:

{
  id: string
  sender: "user" | "ai"
  content: string
  status: "loading" | "error"
}

But the LLM expects:

{ "role": "user", "content": "..." }
{ "role": "assistant", "content": "..." }

I was passing frontend messages directly into the model.

That caused:

invalid roles
UI fields leaking into prompts
broken requests

Lesson: Frontend message models and LLM message models should never be the same object.

⚠️ Empty Assistant Messages

I sometimes stored temporary empty assistant messages during loading states.

Example:

{
  role: "assistant",
  content: ""
}

Those empty messages polluted conversation history and occasionally broke Groq requests.

Tiny bug. Massive debugging time.

Conversation history quality matters more than I expected initially.

⚠️ Output Formatting As An Afterthought

LLMs return chaos unless you define structure clearly.

I got:

walls of text
inconsistent markdown
unpredictable formatting

Good AI UX depends as much on output structure as model quality.

Your UI expectations need to exist inside the system prompt from the beginning.

⚠️ Ignoring Multi-User Isolation

Originally, all users shared the same Chroma collection.

Meaning one user could retrieve another user’s document chunks.

User isolation is not an advanced feature in AI apps. It’s a requirement.

Lesson: Session-based isolation is mandatory for multi-user RAG apps.

⚠️ Local Vector Storage Assumptions

Local Chroma worked great during development.

But without persistence configured properly, server restarts could wipe stored embeddings.

That’s something I didn’t think about early enough.

Development shortcuts quietly become production architecture later.

⚠️ Underestimating State Complexity

Initially, I had one isLoading state for the entire app.

But file uploads, chat responses, document fetching, and deletes all needed separate states.

AI apps look simple visually, but they hide a huge amount of async complexity underneath.

Lesson: Every async operation deserves its own loading/error state.

⚠️ Multiple Simultaneous Requests

Users could trigger multiple chat requests before the previous one completed.

That caused:

duplicate responses
out-of-order messages
inconsistent UI state
unnecessary LLM calls

Concurrency problems appear surprisingly fast in chat-based AI interfaces.

Lesson: Request locking/debouncing should be added early.

📈 Recent Improvements

Over time, the app became significantly more stable.

Some of the biggest improvements were:

session-based vector storage
duplicate request prevention
filtering empty assistant messages
better loading synchronization

One thing I’m genuinely glad about is that I didn’t ignore issues once they appeared.

I kept debugging, refining, and improving the system instead of just shipping the first working version and moving on.

That process taught me as much as building the app itself.

✅ Things I’m Glad I Did Right

But not everything went wrong.

A few early decisions actually made the project much easier to improve later.

1. Starting With Free-Tier Tools

Using free-tier tools let me focus on learning instead of worrying about API costs.

For a first AI app, that was the right decision.

2. Building The Backend Myself

Building the backend myself taught me far more than I expected.

I learned:

retrieval orchestration
API flow
vector storage
backend debugging
tool execution flow

3. Focusing On Understanding Instead Of Copy-Pasting

I spent time understanding:

embeddings
retrieval flow
chunking
vector search

That made debugging much easier later.

4. Doing A Proper Retrospective Afterwards

One thing I’m glad I did was sit down after building and properly evaluate:

what worked
what broke
what scaled poorly
what I’d redesign

That reflection turned this from “just another project” into a real learning experience.

5. Starting Before Feeling “Ready”

This was probably the biggest lesson.

I delayed starting because I thought I needed to learn more first.

But the project itself became the learning process.

🧠 What I’d Do Before Starting Next Time

1. API

Before writing code, I’d define:

endpoint contracts
request/response shapes
upload constraints
API consistency rules

upfront.

2. Frontend

I’d map all UI states beforehand:

loading
error
retry
optimistic updates

AI apps have far more UI states than they initially appear to.

3. Database Decisions Earlier

I’d decide much sooner:

vector DB strategy
persistence setup
session isolation
relational DB responsibilities

Those become painful to retrofit later.

🚀 Next Major Improvements

There’s still a lot I’d improve in future versions.

1. Streaming Responses

Right now responses render only after full generation completes.

The app works fine functionally, but streaming would improve perceived responsiveness dramatically.

Even when latency stays the same, streaming makes AI apps feel much faster and more interactive.

2. Better Chunking Strategy

RecursiveCharacterTextSplitter works well as a default, but it struggles with:

tables
headers
multi-column PDFs
structured documents

Next time I’d explore:

semantic chunking
structure-aware parsing
metadata-aware retrieval

because retrieval quality depends heavily on chunk quality.

3. Proper RAG Evaluation

One thing I completely skipped initially was evaluation.

Most early testing was basically:

“Does this answer look correct?”

That doesn’t scale well.

Next time I’d add proper evaluation for:

retrieval accuracy
hallucination detection
answer relevance
context quality

Because without evaluation, improving a RAG pipeline becomes mostly guesswork.

🔗 Project Link

If you're curious, here’s the project: Live Project Demo

🪞 What This Actually Taught Me

The hard part wasn’t the RAG pipeline itself.

It was everything around it — deployment, frontend integration, state management, edge cases, and all the small system behaviors you only notice while debugging.

Those aren’t AI engineering problems.

They’re software engineering problems showing up inside AI applications.

And honestly, that was probably the biggest surprise of the entire project.