DEV Community

Yasser B.
Yasser B.

Posted on • Originally published at rivestack.io

How to Use pgvector with Python: A Complete Guide

You've decided to use PostgreSQL for your vector embeddings. Smart move. Now you need to wire it up from Python — and if you've landed here, you've probably already noticed that there are a few different libraries involved, the syntax isn't immediately obvious, and the official pgvector docs give you the C extension but leave the Python story somewhat scattered.

This guide covers the whole picture: installing the Python client, connecting with both psycopg3 and SQLAlchemy, storing and querying embeddings, building indexes, and wiring it up into a real RAG pipeline. By the end you'll have a working setup you can actually ship.

What You Need Before You Start

You'll need:

  • A PostgreSQL database with the vector extension enabled
  • Python 3.8+
  • The pgvector Python package

If you're running PostgreSQL locally, install the pgvector extension from the pgvector GitHub repo and run CREATE EXTENSION vector;. If you're using a managed PostgreSQL service, the extension is typically pre-installed — on Rivestack, it's enabled by default on every database.

Install the Python package:

pip install pgvector
Enter fullscreen mode Exit fullscreen mode

You'll also want a database driver. The two most common choices for Python are psycopg3 (direct, fast, recommended) and SQLAlchemy (for ORM-based projects). We'll cover both.

# For psycopg3
pip install "psycopg[binary]"

# For SQLAlchemy
pip install sqlalchemy psycopg2-binary
Enter fullscreen mode Exit fullscreen mode

Setting Up the Vector Extension

Connect to your database and enable the extension if it isn't already:

CREATE EXTENSION IF NOT EXISTS vector;
Enter fullscreen mode Exit fullscreen mode

Run this once per database. You can verify it's installed with:

SELECT * FROM pg_extension WHERE extname = 'vector';
Enter fullscreen mode Exit fullscreen mode

Using pgvector with psycopg3

psycopg3 is the modern PostgreSQL driver for Python. It's faster than psycopg2, has proper async support, and the pgvector Python package integrates with it natively.

Connecting and Registering the Vector Type

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("postgresql://user:password@localhost:5432/mydb")
register_vector(conn)
Enter fullscreen mode Exit fullscreen mode

The register_vector call is important — it tells psycopg how to serialize and deserialize Python lists/numpy arrays into the vector type PostgreSQL expects. Without it, you'll get type errors when inserting.

Creating a Table with a Vector Column

with conn.cursor() as cur:
    cur.execute("""
        CREATE TABLE IF NOT EXISTS documents (
            id BIGSERIAL PRIMARY KEY,
            content TEXT NOT NULL,
            embedding VECTOR(1536)
        )
    """)
    conn.commit()
Enter fullscreen mode Exit fullscreen mode

The 1536 dimension matches OpenAI's text-embedding-3-small model. If you're using a different model, adjust accordingly:

Model Dimensions
OpenAI text-embedding-3-small 1536
OpenAI text-embedding-3-large 3072
Cohere embed-v4 1024
Google text-embedding-005 768
all-MiniLM-L6-v2 (local) 384

Inserting Embeddings

In practice, you generate embeddings with an API call or a local model, then insert them alongside your content:

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

documents = [
    "PostgreSQL is a powerful open-source relational database.",
    "pgvector adds vector similarity search to PostgreSQL.",
    "Python is a high-level programming language.",
]

with conn.cursor() as cur:
    for doc in documents:
        embedding = get_embedding(doc)
        cur.execute(
            "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
            (doc, embedding)
        )
    conn.commit()
Enter fullscreen mode Exit fullscreen mode

pgvector's psycopg integration accepts plain Python lists directly — no special wrapping needed.

Querying by Similarity

To find the documents most similar to a query, compute the query embedding and use one of pgvector's distance operators:

  • <=> — cosine distance (most common for text embeddings)
  • <-> — L2 (Euclidean) distance
  • <#> — negative inner product
query = "How does vector search work in PostgreSQL?"
query_embedding = get_embedding(query)

with conn.cursor() as cur:
    cur.execute("""
        SELECT content, 1 - (embedding <=> %s) AS similarity
        FROM documents
        ORDER BY embedding <=> %s
        LIMIT 5
    """, (query_embedding, query_embedding))

    results = cur.fetchall()
    for content, similarity in results:
        print(f"[{similarity:.3f}] {content}")
Enter fullscreen mode Exit fullscreen mode

The <=> operator returns cosine distance (0 = identical, 2 = opposite), so 1 - distance gives you cosine similarity if you want a score between 0 and 1.

Using pgvector with SQLAlchemy

If your project uses SQLAlchemy for ORM models, pgvector has first-class support there too.

from sqlalchemy import create_engine, Column, BigInteger, Text
from sqlalchemy.orm import declarative_base, Session
from pgvector.sqlalchemy import Vector

engine = create_engine("postgresql+psycopg2://user:password@localhost:5432/mydb")
Base = declarative_base()

class Document(Base):
    __tablename__ = "documents"

    id = Column(BigInteger, primary_key=True, autoincrement=True)
    content = Column(Text, nullable=False)
    embedding = Column(Vector(1536))

Base.metadata.create_all(engine)
Enter fullscreen mode Exit fullscreen mode

Inserting with SQLAlchemy:

doc = Document(
    content="pgvector makes PostgreSQL a capable vector store.",
    embedding=get_embedding("pgvector makes PostgreSQL a capable vector store.")
)

with Session(engine) as session:
    session.add(doc)
    session.commit()
Enter fullscreen mode Exit fullscreen mode

Querying with SQLAlchemy — use the l2_distance, cosine_distance, or max_inner_product functions:

from pgvector.sqlalchemy import cosine_distance

query_embedding = get_embedding("vector search with PostgreSQL")

with Session(engine) as session:
    results = (
        session.query(Document)
        .order_by(cosine_distance(Document.embedding, query_embedding))
        .limit(5)
        .all()
    )
    for doc in results:
        print(doc.content)
Enter fullscreen mode Exit fullscreen mode

Adding an Index for Production

The queries above work fine for development and small datasets — PostgreSQL does an exact nearest-neighbor scan. But with more than ~50,000 vectors, you'll want an approximate nearest-neighbor (ANN) index to keep queries fast.

pgvector supports two index types:

  • HNSW — faster queries, higher memory usage during build, better recall
  • IVFFlat — faster build, less memory, slightly lower recall

For most production use cases, HNSW is the right choice:

with conn.cursor() as cur:
    cur.execute("""
        CREATE INDEX IF NOT EXISTS documents_embedding_idx
        ON documents
        USING hnsw (embedding vector_cosine_ops)
        WITH (m = 16, ef_construction = 64)
    """)
    conn.commit()
Enter fullscreen mode Exit fullscreen mode

The vector_cosine_ops tells pgvector to optimize the index for cosine distance — make sure this matches the operator you use in queries (<=>). If you're using L2 distance, use vector_l2_ops instead.

Important: Build the index after loading your initial data. Building on an empty table and then loading data works too, but index quality is better when built on the full dataset.

HNSW parameters

  • m — number of connections per layer (default 16, range 2–100). Higher = better recall, more memory.
  • ef_construction — search depth during build (default 64, range 4–1000). Higher = better recall, slower build.

For query-time speed/recall tradeoff, set hnsw.ef_search:

with conn.cursor() as cur:
    cur.execute("SET hnsw.ef_search = 100")
    # Now run your similarity query
Enter fullscreen mode Exit fullscreen mode

Building a RAG Pipeline with pgvector and Python

Here's a minimal but complete Retrieval-Augmented Generation pipeline using everything above:

import psycopg
from pgvector.psycopg import register_vector
from openai import OpenAI

openai_client = OpenAI()
db = psycopg.connect("postgresql://user:password@localhost:5432/mydb")
register_vector(db)

def embed(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    ).data[0].embedding

def index_document(content: str):
    """Store a document and its embedding."""
    embedding = embed(content)
    with db.cursor() as cur:
        cur.execute(
            "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
            (content, embedding)
        )
    db.commit()

def retrieve(query: str, k: int = 5) -> list[str]:
    """Find the k most relevant documents for a query."""
    query_embedding = embed(query)
    with db.cursor() as cur:
        cur.execute("""
            SELECT content
            FROM documents
            ORDER BY embedding <=> %s
            LIMIT %s
        """, (query_embedding, k))
        return [row[0] for row in cur.fetchall()]

def answer(question: str) -> str:
    """Retrieve context and generate an answer."""
    context_docs = retrieve(question)
    context = "\n\n".join(context_docs)

    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "Answer questions using only the provided context. Be concise."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )
    return response.choices[0].message.content

# Index some documents
index_document("pgvector supports HNSW and IVFFlat indexes.")
index_document("Cosine similarity is best for text embedding comparisons.")
index_document("Rivestack provides managed PostgreSQL with pgvector pre-installed.")

# Ask a question
print(answer("What index types does pgvector support?"))
Enter fullscreen mode Exit fullscreen mode

Metadata Filtering

One of the big advantages of pgvector over standalone vector databases is that you can filter by regular columns in the same query. If you have multi-tenant data or want to search within a category, just add a WHERE clause:

# Add a source column to your table
cur.execute("""
    ALTER TABLE documents ADD COLUMN IF NOT EXISTS source TEXT
""")

# Query with metadata filter
cur.execute("""
    SELECT content, embedding <=> %s AS distance
    FROM documents
    WHERE source = 'internal-wiki'
    ORDER BY embedding <=> %s
    LIMIT 5
""", (query_embedding, query_embedding))
Enter fullscreen mode Exit fullscreen mode

No syncing, no secondary filtering step, no extra infrastructure. It's just SQL.

Connection Pooling

pgvector queries are fast, but embedding vectors are large (1536 floats ≈ 6KB per vector). If you're running a web application with concurrent requests, use a connection pool to avoid exhausting PostgreSQL's connection limit:

from psycopg_pool import ConnectionPool

pool = ConnectionPool("postgresql://user:password@localhost:5432/mydb",
                      min_size=2, max_size=10,
                      configure=register_vector)

with pool.connection() as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT content FROM documents ORDER BY embedding <=> %s LIMIT 5",
                    (query_embedding,))
        results = cur.fetchall()
Enter fullscreen mode Exit fullscreen mode

On Rivestack, connection pooling is built into the platform so you don't need to manage PgBouncer yourself — you connect to the pooler endpoint and the rest is handled for you.

Async Support

If you're building with FastAPI, asyncio, or any other async Python framework, psycopg3 has full async support:

import asyncio
import psycopg
from pgvector.psycopg import register_vector_async

async def main():
    conn = await psycopg.AsyncConnection.connect(
        "postgresql://user:password@localhost:5432/mydb"
    )
    await register_vector_async(conn)

    query_embedding = get_embedding("vector search")
    async with conn.cursor() as cur:
        await cur.execute("""
            SELECT content FROM documents
            ORDER BY embedding <=> %s LIMIT 5
        """, (query_embedding,))
        results = await cur.fetchall()
        for row in results:
            print(row[0])

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Note the register_vector_async — you need the async version when working with AsyncConnection.

Common Mistakes

Using the wrong distance operator for your index. If you create an HNSW index with vector_cosine_ops but query with <-> (L2 distance), the index won't be used and you'll get a full sequential scan. Match the operator to the index ops class.

Forgetting register_vector. You'll get can't adapt type 'list' errors. Call it once per connection (or once on the pool).

Building indexes on empty tables. The index will exist but won't actually improve query quality. Load your data first.

Not setting ef_search for precision-sensitive use cases. The default is 40, which is fast but sacrifices some recall. For RAG where missing a relevant document matters, SET hnsw.ef_search = 100 is worth the small latency increase.

Where to Go From Here

You now have everything you need to build vector search into a Python application using pgvector:

  • psycopg3 for direct, fast database access
  • SQLAlchemy if you prefer an ORM
  • HNSW indexes for production performance
  • Metadata filtering using standard SQL WHERE clauses
  • A full RAG pipeline skeleton you can extend

The operational side — backups, connection pooling, high availability, keeping pgvector updated — is where the real work begins if you're self-hosting. If you'd rather not deal with that, try Rivestack: managed PostgreSQL with pgvector pre-installed, NVMe storage for fast index traversal, and automatic backups. You get the same SQL interface you've just built, without the 3 AM pages.


Originally published at rivestack.io

Top comments (0)