I almost didn’t start this project.
I kept thinking:
- “I should learn more backend first.”
- “I should understand AI properly first.”
- “I should probably build smaller projects before touching RAG.”
So I delayed starting longer than I should have.
But eventually I started anyway.
And honestly, this project taught me more than the previous 2–3 months of passive learning combined.
Not just about RAG or AI.
About debugging, architecture, async systems, API design, and how quickly complexity grows once AI enters the picture.
This article is basically a retrospective on the project — what worked, what broke, and the lessons I’d carry into future AI apps.
If you’re unfamiliar with RAG (Retrieval-Augmented Generation), I wrote a beginner-friendly introduction earlier: How AI Apps Actually Use LLMs: Introducing RAG
🤯 Fear vs Reality
As a frontend developer, I assumed the difficult part would be:
- embeddings
- vector search
- prompt engineering
- LangChain
But most problems came from:
- state management
- API consistency
- request synchronization
- session isolation
- frontend/backend contracts
Most debugging had nothing to do with AI itself.
The hard part wasn’t using the model.
It was building reliable systems around it.
🛠️ What I Built
I built a document-chat application where users can:
- upload PDFs
- ask questions about documents
- receive retrieval-based responses from uploaded files
There’s also an admin side for managing documents and customizing the chat experience.
At first, I underestimated the complexity because the project “only had two pages.”
That assumption turned out to be completely wrong.
UI complexity is rarely about page count.
It’s about state, interactions, and async behavior.
AI applications introduced far more state complexity than I expected.
⚙️ The Stack I Chose
Most choices came down to one thing: Free tiers.
The stack was:
- LangChain — retrieval/generation orchestration
- Nomic AI embeddings
- Groq — LLM inference
- Chroma — local vector database
- FastAPI (Python) - backend
For a first RAG app, it worked surprisingly well.
🚨 Every Mistake I Made
Most of these weren’t “AI problems” in the way I expected — they were engineering, architecture, and system design problems showing up inside an AI application.
⚠️ Underestimating The Project
At first, I thought the app was “small” because it only had two pages.
That assumption affected almost everything later:
- state management
- API design
- session handling
- request synchronization
- retrieval architecture
I underestimated how quickly complexity grows once AI workflows enter the picture.
AI apps often look simpler than they actually are because most complexity lives in the system behavior, not the UI.
Lesson: AI projects need architectural planning much earlier than I expected.
⚠️ Mixing UI Messages With LLM Messages
My frontend messages looked like this:
{
id: string
sender: "user" | "ai"
content: string
status: "loading" | "error"
}
But the LLM expects:
{ "role": "user", "content": "..." }
{ "role": "assistant", "content": "..." }
I was passing frontend messages directly into the model.
That caused:
- invalid roles
- UI fields leaking into prompts
- broken requests
Lesson: Frontend message models and LLM message models should never be the same object.
⚠️ Empty Assistant Messages
I sometimes stored temporary empty assistant messages during loading states.
Example:
{
role: "assistant",
content: ""
}
Those empty messages polluted conversation history and occasionally broke Groq requests.
Tiny bug. Massive debugging time.
Conversation history quality matters more than I expected initially.
⚠️ Output Formatting As An Afterthought
LLMs return chaos unless you define structure clearly.
I got:
- walls of text
- inconsistent markdown
- unpredictable formatting
Good AI UX depends as much on output structure as model quality.
Your UI expectations need to exist inside the system prompt from the beginning.
⚠️ Ignoring Multi-User Isolation
Originally, all users shared the same Chroma collection.
Meaning one user could retrieve another user’s document chunks.
User isolation is not an advanced feature in AI apps. It’s a requirement.
Lesson: Session-based isolation is mandatory for multi-user RAG apps.
⚠️ Local Vector Storage Assumptions
Local Chroma worked great during development.
But without persistence configured properly, server restarts could wipe stored embeddings.
That’s something I didn’t think about early enough.
Development shortcuts quietly become production architecture later.
⚠️ Underestimating State Complexity
Initially, I had one isLoading state for the entire app.
But file uploads, chat responses, document fetching, and deletes all needed separate states.
AI apps look simple visually, but they hide a huge amount of async complexity underneath.
Lesson: Every async operation deserves its own loading/error state.
⚠️ Multiple Simultaneous Requests
Users could trigger multiple chat requests before the previous one completed.
That caused:
- duplicate responses
- out-of-order messages
- inconsistent UI state
- unnecessary LLM calls
Concurrency problems appear surprisingly fast in chat-based AI interfaces.
Lesson: Request locking/debouncing should be added early.
📈 Recent Improvements
Over time, the app became significantly more stable.
Some of the biggest improvements were:
- session-based vector storage
- duplicate request prevention
- filtering empty assistant messages
- better loading synchronization
One thing I’m genuinely glad about is that I didn’t ignore issues once they appeared.
I kept debugging, refining, and improving the system instead of just shipping the first working version and moving on.
That process taught me as much as building the app itself.
✅ Things I’m Glad I Did Right
But not everything went wrong.
A few early decisions actually made the project much easier to improve later.
1. Starting With Free-Tier Tools
Using free-tier tools let me focus on learning instead of worrying about API costs.
For a first AI app, that was the right decision.
2. Building The Backend Myself
Building the backend myself taught me far more than I expected.
I learned:
- retrieval orchestration
- API flow
- vector storage
- backend debugging
- tool execution flow
3. Focusing On Understanding Instead Of Copy-Pasting
I spent time understanding:
- embeddings
- retrieval flow
- chunking
- vector search
That made debugging much easier later.
4. Doing A Proper Retrospective Afterwards
One thing I’m glad I did was sit down after building and properly evaluate:
- what worked
- what broke
- what scaled poorly
- what I’d redesign
That reflection turned this from “just another project” into a real learning experience.
5. Starting Before Feeling “Ready”
This was probably the biggest lesson.
I delayed starting because I thought I needed to learn more first.
But the project itself became the learning process.
🧠 What I’d Do Before Starting Next Time
1. API
Before writing code, I’d define:
- endpoint contracts
- request/response shapes
- upload constraints
- API consistency rules
upfront.
2. Frontend
I’d map all UI states beforehand:
- loading
- error
- retry
- optimistic updates
AI apps have far more UI states than they initially appear to.
3. Database Decisions Earlier
I’d decide much sooner:
- vector DB strategy
- persistence setup
- session isolation
- relational DB responsibilities
Those become painful to retrofit later.
🚀 Next Major Improvements
There’s still a lot I’d improve in future versions.
1. Streaming Responses
Right now responses render only after full generation completes.
The app works fine functionally, but streaming would improve perceived responsiveness dramatically.
Even when latency stays the same, streaming makes AI apps feel much faster and more interactive.
2. Better Chunking Strategy
RecursiveCharacterTextSplitter works well as a default, but it struggles with:
- tables
- headers
- multi-column PDFs
- structured documents
Next time I’d explore:
- semantic chunking
- structure-aware parsing
- metadata-aware retrieval
because retrieval quality depends heavily on chunk quality.
3. Proper RAG Evaluation
One thing I completely skipped initially was evaluation.
Most early testing was basically:
“Does this answer look correct?”
That doesn’t scale well.
Next time I’d add proper evaluation for:
- retrieval accuracy
- hallucination detection
- answer relevance
- context quality
Because without evaluation, improving a RAG pipeline becomes mostly guesswork.
🔗 Project Link
If you're curious, here’s the project: Live Project Demo
🪞 What This Actually Taught Me
The hard part wasn’t the RAG pipeline itself.
It was everything around it — deployment, frontend integration, state management, edge cases, and all the small system behaviors you only notice while debugging.
Those aren’t AI engineering problems.
They’re software engineering problems showing up inside AI applications.
And honestly, that was probably the biggest surprise of the entire project.
Top comments (0)