Let’s be real: most companies are drowning in a sea of PDFs. Contracts, handbooks, ancient policy docs—it’s a mess. Usually, employees waste half their day hunting for one specific clause. And if you try to just throw a generic ChatGPT at the problem? You get "hallucinations" (AI-speak for "making stuff up") that could get someone fired.
That’s the exact pain we’re solving today. We’re building a RAG-powered (Retrieval-Augmented Generation) SaaS backend.
The goal: Users upload their own docs, and the AI only answers based on that data. No external guessing, just fast, accurate, cited answers. If you’re a dev looking to move past "Hello World" AI tutorials and build something that actually survives a production environment, you're in the right place.
The "Battle-Tested" Tech Stack
I’ve built enough of these to know what breaks. Here’s what we’re using to keep things scalable:
- The Engine: FastAPI. It’s fast, handles async like a champ, and the auto-docs save you hours of debugging.
-
The Brain (Vectors): PostgreSQL + pgvector. Don't get distracted by "trendy" vector-only DBs for your first SaaS.
pgvectorlets you keep your user data and your embeddings in one place. It’s persistent, SQL-friendly, and scales beautifully. - The Muscle: Redis + Celery. Generating embeddings is "heavy lifting." You don't want your API hanging while you process a 50-page PDF. Celery handles the dirty work in the background.
-
The Intelligence: OpenAI (
text-embedding-3-small+GPT-4o-mini). It’s the gold standard for a reason, though you can swap in Gemini if you're feeling adventurous. - The Glue: Unstructured.io for parsing those messy PDFs and JWT for keeping user data private.
Let’s Build It: Step-by-Step
1. The Foundation
First, grab your virtual env (I’m a fan of uv or poetry these days—standard pip is a bit "last decade" for prod).
pip install fastapi uvicorn sqlalchemy asyncpg psycopg[binary] pgvector openai redis celery python-dotenv python-multipart PyPDF2 "unstructured[all-docs]" slowapi python-jose[cryptography] passlib[bcrypt]
Pro Tip: Create your .env file immediately. OPENAI_API_KEY, DATABASE_URL, REDIS_URL. If I see these hardcoded in your repo, we’re gonna have a talk.
2. Setting Up the Vector Vault
Spin up Postgres with pgvector using Docker. It’s the fastest way to get moving.
docker run -d --name pgvector -e POSTGRES_PASSWORD=your-secret -p 5432:5432 ankane/pgvector
In your models, define a Chunk table with a Vector(1536) column. Trust me: keeping your vectors inside Postgres makes joining metadata (like "which user owns this doc?") a breeze.
3. Privacy is Non-Negotiable
This is a SaaS, not a personal script. Every document and every text chunk must be tied to a user_id.
-
The Rule: Always filter queries with
WHERE user_id = current_user.id. - The Level Up: Use Postgres Row-Level Security (RLS) to ensure one user can never peek at another's data.
4. The Processing Pipeline
When a user hits POST /upload, don't make them wait.
-
Parse: Use
Unstructured—it’s way better thanPyPDF2at handling tables. - Chunk: Don't just cut text every 500 characters. Use a recursive splitter with overlap (e.g., 800 tokens with 150 token overlap) so you don't lose the context mid-sentence.
- Embed: Send those chunks to OpenAI in batches.
- Offload: Use Celery. Your API should just say "Got it, I'm working on it!" while the background worker does the heavy lifting.
5. The Magic "/ask" Endpoint
This is where the RAG happens:
- Embed the Question: Turn the user's query into a vector.
-
Semantic Search: Use
pgvectorto find the 5-10 most relevant chunks. - The Prompt: "Answer ONLY using this context. If it's not there, say you don't know. Cite your sources."
- Cache: If someone asks the same thing twice, serve it from Redis. It’s cheaper and faster.
Lessons Learned (The Hard Way)
I’ve made the mistakes so you don't have to:
-
The "Prototype Trap": Don't use FAISS for a multi-user app. It lives in RAM. If your server restarts, your "brain" disappears. Use
pgvector. - The "Spinning Wheel of Death": Never embed synchronously. If a user uploads a book, your API will timeout. Always use background tasks.
- The "Hallucination Headache": Be aggressive with your system prompt. Tell the AI: "If you aren't 100% sure based on the provided text, don't guess."
The Payoff
When you're done, you have a system where retrieval usually takes under 200ms, and full, cited answers pop up in less than 3 seconds. It looks incredible in a portfolio because it shows you understand async flows, data security, and cost management.
(This is usually where you'd drop a screenshot of your Swagger UI showing those clean /upload and /ask endpoints in action!)
Want This Built for Your Business?
I’m a freelance developer who lives and breathes this stuff. If you need a custom RAG platform, a high-performance FastAPI backend, or just want to turn your company's messy documentation into a searchable superpower, let's talk.
- Portfolio: girma.studio
- Upwork: View My Profile
- X (Twitter): @Girma880731631
Top comments (0)