The Vector Database Trap: Scaling AI Search with Python, FastAPI, and Supabase pgvector
If you've built an AI application in the last year, you've probably implemented Retrieval-Augmented Generation (RAG). The standard tutorial stack is predictable: take some documents, chunk them, embed them with OpenAI, and shove them into a dedicated vector database like Pinecone, Weaviate, or Milvus.
It works beautifully for a weekend hackathon. But when you push to production, the cracks start to show.
You suddenly have two sources of truth: your primary relational database (PostgreSQL) and your vector database. Keeping them in sync becomes a distributed systems nightmare. When a user deletes their account, you have to ensure their vectors are also purged. When a document is updated, you have to re-embed and upsert. And then there's the cost—dedicated vector databases can get expensive quickly as your data grows.
The solution? Stop treating vectors as a special snowflake. They are just data. And PostgreSQL, with the pgvector extension, is more than capable of handling them at scale.
In this deep dive, we'll architect a production-ready RAG pipeline using Python, FastAPI, and Supabase (PostgreSQL + pgvector). We'll cover the architecture, the implementation, and the edge cases that tutorials conveniently ignore.
Architecture and Context
Before we write any code, let's define our architecture. We are building a document search API for a multi-tenant SaaS application.
The Stack:
- Backend: Python 3.11+ with FastAPI for high-performance async endpoints.
- Database: Supabase (PostgreSQL 15+) with the
pgvectorextension enabled. - Embeddings: OpenAI's
text-embedding-3-smallmodel. - ORM: SQLAlchemy 2.0 with
pgvectorsupport.
The Prerequisites:
You'll need a Supabase project, an OpenAI API key, and a basic understanding of FastAPI and SQLAlchemy.
Deep-Dive Implementation
1. Database Setup: Enabling pgvector in Supabase
First, we need to enable the pgvector extension in our Supabase database. You can do this via the Supabase dashboard (Database -> Extensions) or by running a simple SQL command:
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
Next, let's define our database schema. We need a table to store our documents and their corresponding vector embeddings. Notice how we store the vector alongside the relational data (tenant_id, content, metadata). This is the superpower of pgvector.
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}'::jsonb,
embedding VECTOR(1536) -- OpenAI text-embedding-3-small dimension
);
-- Create an index for faster similarity search
-- We use HNSW (Hierarchical Navigable Small World) for better performance at scale
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
2. The Python Backend: FastAPI and SQLAlchemy
Let's set up our FastAPI application and SQLAlchemy models. We'll use the pgvector Python package to integrate seamlessly with SQLAlchemy.
# requirements.txt
# fastapi==0.109.0
# uvicorn==0.27.0
# sqlalchemy==2.0.25
# psycopg2-binary==2.9.9
# pgvector==0.2.4
# openai==1.10.0
# pydantic==2.5.3
from fastapi import FastAPI, HTTPException, Depends
from sqlalchemy import create_engine, Column, String, Text, JSON
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base, sessionmaker, Session
from pgvector.sqlalchemy import Vector
import uuid
import os
from openai import AsyncOpenAI
# Database Configuration
DATABASE_URL = os.getenv("SUPABASE_DB_URL") # e.g., postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
# OpenAI Configuration
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
class Document(Base):
__tablename__ = "documents"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), nullable=False)
content = Column(Text, nullable=False)
metadata_col = Column("metadata", JSON, default={})
embedding = Column(Vector(1536)) # 1536 dimensions for text-embedding-3-small
app = FastAPI(title="Vector Search API")
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
3. Ingestion: Embedding and Storing Documents
When a user uploads a document, we need to generate an embedding and store it in the database. We'll create an async endpoint to handle this.
from pydantic import BaseModel
from typing import Dict, Any
class DocumentCreate(BaseModel):
tenant_id: uuid.UUID
content: str
metadata: Dict[str, Any] = {}
@app.post("/documents/")
async def create_document(doc: DocumentCreate, db: Session = Depends(get_db)):
try:
# 1. Generate Embedding
response = await openai_client.embeddings.create(
input=doc.content,
model="text-embedding-3-small"
)
embedding_vector = response.data[0].embedding
# 2. Store in Database
db_document = Document(
tenant_id=doc.tenant_id,
content=doc.content,
metadata_col=doc.metadata,
embedding=embedding_vector
)
db.add(db_document)
db.commit()
db.refresh(db_document)
return {"id": db_document.id, "message": "Document ingested successfully"}
except Exception as e:
db.rollback()
raise HTTPException(status_code=500, detail=str(e))
4. Retrieval: Similarity Search with Row-Level Security
Now for the magic. We want to find the most relevant documents for a given query, but only for a specific tenant. Because our vectors live in PostgreSQL, we can combine vector similarity search with standard relational filtering in a single query.
class SearchQuery(BaseModel):
tenant_id: uuid.UUID
query: str
limit: int = 5
@app.post("/search/")
async def search_documents(search: SearchQuery, db: Session = Depends(get_db)):
try:
# 1. Embed the search query
response = await openai_client.embeddings.create(
input=search.query,
model="text-embedding-3-small"
)
query_embedding = response.data[0].embedding
# 2. Perform Vector Search with Relational Filtering
# We use the `<=>` operator for cosine distance
results = db.query(Document).filter(
Document.tenant_id == search.tenant_id # Relational filter (Multi-tenancy)
).order_by(
Document.embedding.cosine_distance(query_embedding) # Vector search
).limit(search.limit).all()
return [
{
"id": doc.id,
"content": doc.content,
"metadata": doc.metadata_col,
# Optional: Calculate similarity score (1 - distance)
# "similarity": 1 - doc.embedding.cosine_distance(query_embedding)
}
for doc in results
]
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Common Pitfalls & Edge Cases
Building this is easy; scaling it is hard. Here are the traps you'll fall into and how to avoid them.
Problem 1: The "Out of Memory" Index Build
The Error: ERROR: out of memory Detail: Failed on request of size 8388608 in memory context "hnsw index build".
The Fix: Building an HNSW index is memory-intensive. If you try to build it on a massive table with a small database instance, PostgreSQL will crash.
- Increase
maintenance_work_memin your PostgreSQL configuration before building the index. - Build the index after your initial bulk data load, not before.
Problem 2: Slow Queries Despite the Index
The Error: Your vector searches are taking 500ms+ even with an HNSW index.
The Fix: PostgreSQL might be ignoring your index.
- Ensure your
m(max connections per layer) andef_constructionparameters are tuned correctly when creating the index. - Crucial: If you are filtering heavily (e.g.,
tenant_id = X), PostgreSQL might choose a sequential scan over the vector index if it thinks the relational filter is highly selective. You may need to use partitioned tables (one partition per tenant) if you have thousands of tenants with millions of vectors.
Problem 3: The Dimension Mismatch
The Error: ERROR: expected 1536 dimensions, not 1024.
The Fix: You changed your embedding model (e.g., from text-embedding-ada-002 to a smaller open-source model) but didn't update your database schema. The VECTOR(1536) type is strict. If you plan to experiment with models, you can use the unconstrained VECTOR type, but you sacrifice some performance and safety.
Conclusion
Moving your vector search into PostgreSQL with Supabase and pgvector simplifies your architecture, reduces costs, and eliminates the data synchronization headaches of dedicated vector databases.
Key Takeaways:
- Consolidate your stack: Keep your relational data and vector embeddings in the same database to maintain a single source of truth.
- Leverage relational filtering: Combine vector similarity search with standard SQL
WHEREclauses for robust multi-tenant architectures. - Index wisely: Use HNSW indexes for performance, but be mindful of memory constraints during index creation.
- Plan for scale: Monitor query execution plans to ensure PostgreSQL is actually using your vector indexes, especially when combining them with relational filters.
Discussion Prompt
Have you made the switch from a dedicated vector database to pgvector? What performance bottlenecks did you hit, and how did you solve them? Let's discuss in the comments!
About the Author: Ameer Hamza is a Top-Rated Full-Stack Developer with 7+ years of experience building SaaS platforms, eCommerce solutions, and AI-powered applications. He specializes in Laravel, Vue.js, React, Next.js, and AI integrations — with 50+ projects shipped and a 100% job success rate. Check out his portfolio at ameer.pk to see his latest work, or reach out for your next development project.
Top comments (0)