You've decided to use PostgreSQL for your vector embeddings. Smart move. Now you need to wire it up from Python — and if you've landed here, you've probably already noticed that there are a few different libraries involved, the syntax isn't immediately obvious, and the official pgvector docs give you the C extension but leave the Python story somewhat scattered.
This guide covers the whole picture: installing the Python client, connecting with both psycopg3 and SQLAlchemy, storing and querying embeddings, building indexes, and wiring it up into a real RAG pipeline. By the end you'll have a working setup you can actually ship.
What You Need Before You Start
You'll need:
- A PostgreSQL database with the
vectorextension enabled - Python 3.8+
- The
pgvectorPython package
If you're running PostgreSQL locally, install the pgvector extension from the pgvector GitHub repo and run CREATE EXTENSION vector;. If you're using a managed PostgreSQL service, the extension is typically pre-installed — on Rivestack, it's enabled by default on every database.
Install the Python package:
pip install pgvector
You'll also want a database driver. The two most common choices for Python are psycopg3 (direct, fast, recommended) and SQLAlchemy (for ORM-based projects). We'll cover both.
# For psycopg3
pip install "psycopg[binary]"
# For SQLAlchemy
pip install sqlalchemy psycopg2-binary
Setting Up the Vector Extension
Connect to your database and enable the extension if it isn't already:
CREATE EXTENSION IF NOT EXISTS vector;
Run this once per database. You can verify it's installed with:
SELECT * FROM pg_extension WHERE extname = 'vector';
Using pgvector with psycopg3
psycopg3 is the modern PostgreSQL driver for Python. It's faster than psycopg2, has proper async support, and the pgvector Python package integrates with it natively.
Connecting and Registering the Vector Type
import psycopg
from pgvector.psycopg import register_vector
conn = psycopg.connect("postgresql://user:password@localhost:5432/mydb")
register_vector(conn)
The register_vector call is important — it tells psycopg how to serialize and deserialize Python lists/numpy arrays into the vector type PostgreSQL expects. Without it, you'll get type errors when inserting.
Creating a Table with a Vector Column
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536)
)
""")
conn.commit()
The 1536 dimension matches OpenAI's text-embedding-3-small model. If you're using a different model, adjust accordingly:
| Model | Dimensions |
|---|---|
| OpenAI text-embedding-3-small | 1536 |
| OpenAI text-embedding-3-large | 3072 |
| Cohere embed-v4 | 1024 |
| Google text-embedding-005 | 768 |
| all-MiniLM-L6-v2 (local) | 384 |
Inserting Embeddings
In practice, you generate embeddings with an API call or a local model, then insert them alongside your content:
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
documents = [
"PostgreSQL is a powerful open-source relational database.",
"pgvector adds vector similarity search to PostgreSQL.",
"Python is a high-level programming language.",
]
with conn.cursor() as cur:
for doc in documents:
embedding = get_embedding(doc)
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(doc, embedding)
)
conn.commit()
pgvector's psycopg integration accepts plain Python lists directly — no special wrapping needed.
Querying by Similarity
To find the documents most similar to a query, compute the query embedding and use one of pgvector's distance operators:
-
<=>— cosine distance (most common for text embeddings) -
<->— L2 (Euclidean) distance -
<#>— negative inner product
query = "How does vector search work in PostgreSQL?"
query_embedding = get_embedding(query)
with conn.cursor() as cur:
cur.execute("""
SELECT content, 1 - (embedding <=> %s) AS similarity
FROM documents
ORDER BY embedding <=> %s
LIMIT 5
""", (query_embedding, query_embedding))
results = cur.fetchall()
for content, similarity in results:
print(f"[{similarity:.3f}] {content}")
The <=> operator returns cosine distance (0 = identical, 2 = opposite), so 1 - distance gives you cosine similarity if you want a score between 0 and 1.
Using pgvector with SQLAlchemy
If your project uses SQLAlchemy for ORM models, pgvector has first-class support there too.
from sqlalchemy import create_engine, Column, BigInteger, Text
from sqlalchemy.orm import declarative_base, Session
from pgvector.sqlalchemy import Vector
engine = create_engine("postgresql+psycopg2://user:password@localhost:5432/mydb")
Base = declarative_base()
class Document(Base):
__tablename__ = "documents"
id = Column(BigInteger, primary_key=True, autoincrement=True)
content = Column(Text, nullable=False)
embedding = Column(Vector(1536))
Base.metadata.create_all(engine)
Inserting with SQLAlchemy:
doc = Document(
content="pgvector makes PostgreSQL a capable vector store.",
embedding=get_embedding("pgvector makes PostgreSQL a capable vector store.")
)
with Session(engine) as session:
session.add(doc)
session.commit()
Querying with SQLAlchemy — use the l2_distance, cosine_distance, or max_inner_product functions:
from pgvector.sqlalchemy import cosine_distance
query_embedding = get_embedding("vector search with PostgreSQL")
with Session(engine) as session:
results = (
session.query(Document)
.order_by(cosine_distance(Document.embedding, query_embedding))
.limit(5)
.all()
)
for doc in results:
print(doc.content)
Adding an Index for Production
The queries above work fine for development and small datasets — PostgreSQL does an exact nearest-neighbor scan. But with more than ~50,000 vectors, you'll want an approximate nearest-neighbor (ANN) index to keep queries fast.
pgvector supports two index types:
- HNSW — faster queries, higher memory usage during build, better recall
- IVFFlat — faster build, less memory, slightly lower recall
For most production use cases, HNSW is the right choice:
with conn.cursor() as cur:
cur.execute("""
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64)
""")
conn.commit()
The vector_cosine_ops tells pgvector to optimize the index for cosine distance — make sure this matches the operator you use in queries (<=>). If you're using L2 distance, use vector_l2_ops instead.
Important: Build the index after loading your initial data. Building on an empty table and then loading data works too, but index quality is better when built on the full dataset.
HNSW parameters
-
m— number of connections per layer (default 16, range 2–100). Higher = better recall, more memory. -
ef_construction— search depth during build (default 64, range 4–1000). Higher = better recall, slower build.
For query-time speed/recall tradeoff, set hnsw.ef_search:
with conn.cursor() as cur:
cur.execute("SET hnsw.ef_search = 100")
# Now run your similarity query
Building a RAG Pipeline with pgvector and Python
Here's a minimal but complete Retrieval-Augmented Generation pipeline using everything above:
import psycopg
from pgvector.psycopg import register_vector
from openai import OpenAI
openai_client = OpenAI()
db = psycopg.connect("postgresql://user:password@localhost:5432/mydb")
register_vector(db)
def embed(text: str) -> list[float]:
return openai_client.embeddings.create(
model="text-embedding-3-small",
input=text
).data[0].embedding
def index_document(content: str):
"""Store a document and its embedding."""
embedding = embed(content)
with db.cursor() as cur:
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(content, embedding)
)
db.commit()
def retrieve(query: str, k: int = 5) -> list[str]:
"""Find the k most relevant documents for a query."""
query_embedding = embed(query)
with db.cursor() as cur:
cur.execute("""
SELECT content
FROM documents
ORDER BY embedding <=> %s
LIMIT %s
""", (query_embedding, k))
return [row[0] for row in cur.fetchall()]
def answer(question: str) -> str:
"""Retrieve context and generate an answer."""
context_docs = retrieve(question)
context = "\n\n".join(context_docs)
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Answer questions using only the provided context. Be concise."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return response.choices[0].message.content
# Index some documents
index_document("pgvector supports HNSW and IVFFlat indexes.")
index_document("Cosine similarity is best for text embedding comparisons.")
index_document("Rivestack provides managed PostgreSQL with pgvector pre-installed.")
# Ask a question
print(answer("What index types does pgvector support?"))
Metadata Filtering
One of the big advantages of pgvector over standalone vector databases is that you can filter by regular columns in the same query. If you have multi-tenant data or want to search within a category, just add a WHERE clause:
# Add a source column to your table
cur.execute("""
ALTER TABLE documents ADD COLUMN IF NOT EXISTS source TEXT
""")
# Query with metadata filter
cur.execute("""
SELECT content, embedding <=> %s AS distance
FROM documents
WHERE source = 'internal-wiki'
ORDER BY embedding <=> %s
LIMIT 5
""", (query_embedding, query_embedding))
No syncing, no secondary filtering step, no extra infrastructure. It's just SQL.
Connection Pooling
pgvector queries are fast, but embedding vectors are large (1536 floats ≈ 6KB per vector). If you're running a web application with concurrent requests, use a connection pool to avoid exhausting PostgreSQL's connection limit:
from psycopg_pool import ConnectionPool
pool = ConnectionPool("postgresql://user:password@localhost:5432/mydb",
min_size=2, max_size=10,
configure=register_vector)
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("SELECT content FROM documents ORDER BY embedding <=> %s LIMIT 5",
(query_embedding,))
results = cur.fetchall()
On Rivestack, connection pooling is built into the platform so you don't need to manage PgBouncer yourself — you connect to the pooler endpoint and the rest is handled for you.
Async Support
If you're building with FastAPI, asyncio, or any other async Python framework, psycopg3 has full async support:
import asyncio
import psycopg
from pgvector.psycopg import register_vector_async
async def main():
conn = await psycopg.AsyncConnection.connect(
"postgresql://user:password@localhost:5432/mydb"
)
await register_vector_async(conn)
query_embedding = get_embedding("vector search")
async with conn.cursor() as cur:
await cur.execute("""
SELECT content FROM documents
ORDER BY embedding <=> %s LIMIT 5
""", (query_embedding,))
results = await cur.fetchall()
for row in results:
print(row[0])
asyncio.run(main())
Note the register_vector_async — you need the async version when working with AsyncConnection.
Common Mistakes
Using the wrong distance operator for your index. If you create an HNSW index with vector_cosine_ops but query with <-> (L2 distance), the index won't be used and you'll get a full sequential scan. Match the operator to the index ops class.
Forgetting register_vector. You'll get can't adapt type 'list' errors. Call it once per connection (or once on the pool).
Building indexes on empty tables. The index will exist but won't actually improve query quality. Load your data first.
Not setting ef_search for precision-sensitive use cases. The default is 40, which is fast but sacrifices some recall. For RAG where missing a relevant document matters, SET hnsw.ef_search = 100 is worth the small latency increase.
Where to Go From Here
You now have everything you need to build vector search into a Python application using pgvector:
- psycopg3 for direct, fast database access
- SQLAlchemy if you prefer an ORM
- HNSW indexes for production performance
- Metadata filtering using standard SQL
WHEREclauses - A full RAG pipeline skeleton you can extend
The operational side — backups, connection pooling, high availability, keeping pgvector updated — is where the real work begins if you're self-hosting. If you'd rather not deal with that, try Rivestack: managed PostgreSQL with pgvector pre-installed, NVMe storage for fast index traversal, and automatic backups. You get the same SQL interface you've just built, without the 3 AM pages.
Originally published at rivestack.io
Top comments (0)