Yasser B.

Posted on Apr 10 • Originally published at rivestack.io

pgvector with LangChain: Build a RAG Pipeline on PostgreSQL

#postgres #python #tutorial #ai

LangChain has a vectorstore abstraction that lets you swap out the underlying vector database without rewriting your application logic. Swap Chroma for Pinecone, Pinecone for Weaviate, whatever. In theory, it's clean. In practice, most teams end up staying with whatever they picked first, because migration is never as simple as swapping a class name.

So the decision matters. And if you're building on PostgreSQL, the answer is almost always: use pgvector. Your embeddings live in the same database as your users, documents, and application state. No sync pipeline. No eventual consistency. Full SQL.

This guide walks through the LangChain PGVector integration from scratch, including document loading, embedding, similarity search, metadata filtering, and wiring it into a working retrieval chain.

Why pgvector Over a Dedicated Vectorstore

Before the code, let's be direct about the tradeoff.

Dedicated vectorstores like Pinecone are fast and scale to billions of vectors without you thinking about infrastructure. If you're building something where vectors are the entire product, they're a reasonable choice.

But most applications aren't like that. You have users. You have documents. You have metadata. You have access control logic. When your vectors live in Pinecone and everything else lives in PostgreSQL, you've just created a sync problem. Document gets deleted from Postgres, the embedding stays in Pinecone. You want to filter by user_id, now you need to implement that in a separate system. You want a transaction that inserts a document and its embedding atomically, you can't have one.

pgvector collapses this. Everything lives in one place. JOINs work. Transactions work. Your existing backup strategy, monitoring setup, and connection pooler all work with zero changes.

Setting Up

You'll need:

PostgreSQL with the vector extension enabled
Python 3.9+
An OpenAI API key (or any LangChain-compatible embeddings model)

Install dependencies:

pip install langchain langchain-openai langchain-postgres psycopg

The langchain-postgres package is the current home for the PGVector integration. On a managed PostgreSQL service with pgvector pre-installed, just run this once:

CREATE EXTENSION IF NOT EXISTS vector;

Connecting LangChain to pgvector

The PGVector class takes a connection string and an embeddings object:

from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings

connection_string = "postgresql+psycopg://user:password@localhost:5432/mydb"
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=connection_string,
    use_jsonb=True,
)

Always set use_jsonb=True — it stores metadata as JSONB, unlocking proper metadata filtering. LangChain creates the required tables automatically on first use.

Loading and Embedding Documents

from langchain_core.documents import Document

docs = [
    Document(
        page_content="PostgreSQL supports ACID transactions and is MVCC-based.",
        metadata={"source": "postgres-intro", "category": "database"}
    ),
    Document(
        page_content="pgvector adds vector similarity search using HNSW and IVFFlat indexes.",
        metadata={"source": "pgvector-intro", "category": "extensions"}
    ),
]

vectorstore.add_documents(docs)

For real use cases, split long documents first:

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = splitter.split_documents(raw_docs)
vectorstore.add_documents(split_docs)

Similarity Search

results = vectorstore.similarity_search("How does pgvector index work?", k=4)

results_with_scores = vectorstore.similarity_search_with_score("pgvector index", k=4)
for doc, score in results_with_scores:
    print(f"Score: {score:.4f} — {doc.page_content[:80]}")

Lower scores mean higher similarity when using cosine distance.

Metadata Filtering

# Filter by a single field
results = vectorstore.similarity_search(
    "database indexing",
    k=4,
    filter={"category": "database"}
)

# Multi-tenant: scope to a specific user
results = vectorstore.similarity_search(
    "vector search",
    k=4,
    filter={"user_id": "user_123"}
)

With use_jsonb=True, this translates to a JSONB containment query that can be indexed.

MMR Search for Diverse Results

Maximum Marginal Relevance avoids returning five chunks that all say the same thing:

results = vectorstore.max_marginal_relevance_search(
    "PostgreSQL extensions",
    k=4,
    fetch_k=20,
    lambda_mult=0.5  # 0 = max diversity, 1 = max relevance
)

Building a Retrieval Chain

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "fetch_k": 20}
)

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:

{context}

Question: {question}
""")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = (
    {"context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
     "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("What indexes does pgvector support?")

For multi-tenant RAG, scope the retriever per authenticated user:

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 4, "filter": {"user_id": current_user_id}}
)

HNSW Indexing for Production

By default, pgvector uses exact k-NN search. For datasets over 100k vectors, add an HNSW index:

CREATE INDEX ON langchain_pg_embedding
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- At query time:
SET hnsw.ef_search = 100;

LangChain won't create this index automatically. Run it once after initial load.

Connection Pooling

Configure an engine with a pool instead of relying on per-instance connections:

from sqlalchemy import create_engine

engine = create_engine(
    connection_string,
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
)

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=engine,
    use_jsonb=True,
)

Async Usage

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=async_engine,
    use_jsonb=True,
    async_mode=True,
)

results = await vectorstore.asimilarity_search("your query", k=4)
answer = await chain.ainvoke("What indexes does pgvector support?")

Document Upsert and Deletion

# Upsert by ID
vectorstore.add_documents(updated_docs, ids=["doc_id_1", "doc_id_2"])

# Delete by ID
vectorstore.delete(ids=["doc_id_1"])

Summary

pgvector with LangChain is production-ready. The langchain-postgres package handles embeddings storage, similarity search, MMR, and metadata filtering. The underlying database is PostgreSQL, so transactions, JOINs, and your existing operational tooling all work.

Key things to get right: use use_jsonb=True, create an HNSW index before your dataset grows large, configure a connection pool at the engine level, and scope retrievers to the authenticated user for multi-tenant workloads.

If you want a managed PostgreSQL setup with pgvector pre-installed, PgBouncer included, and backups handled: check out Rivestack.

Originally published at rivestack.io

DEV Community