LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide
This guide walks you through building a tiny semantic search demo using:
- LangChain.js - to orchestrate embeddings and vector search
- Supabase (Postgres + pgvector) - to store your vectors and query them efficiently
-
OpenAI embeddings - we’ll use
text-embedding-3-small
By the end, you’ll be able to insert documents into a Supabase table and query similar text with just a few lines of code.
Never hardcode real API keys in code. Use environment variables. Also, do not expose your Supabase Service Role Key to the browser-keep it server-side only.
What you’ll build
We’ll index 4 short facts into a documents
table and then ask a question:
“Mitochondria are made of what?”
The code will return the most similar document (spoiler: the one explaining mitochondria).
How vector search works (in plain English)
- Embed text: Convert text into high‑dimensional vectors using an embedding model.
-
Store vectors: Save those vectors in Postgres via pgvector (the
vector
data type). - Search: Embed the user’s query and find nearest neighbors (cosine similarity / distance).
That’s it. No keywords, no exact string matching-semantic similarity does the heavy lifting.
Prerequisites
- Node.js 18+
- A Supabase project (free tier is fine)
- An OpenAI API key
- Basic terminal familiarity
1) Create a new project
mkdir lc-supabase-demo
cd lc-supabase-demo
npm init -y
Use ESM imports (since our code uses import
). Add "type": "module"
to your package.json
:
{
"name": "lc-supabase-demo",
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "node index.mjs",
"seed": "node seed.mjs"
}
}
2) Install dependencies
npm i @langchain/community @langchain/openai @supabase/supabase-js dotenv
-
@langchain/community
– community integrations (Supabase vector store lives here) -
@langchain/openai
– OpenAI embeddings and chat models for LangChain -
@supabase/supabase-js
– Supabase client SDK -
dotenv
– load env variables from.env
3) Create your .env
(do not commit this)
Create a file named .env
in the project root:
touch .env
Put the following into .env
(replace with your values):
# OpenAI
OPENAI_API_KEY=sk-your-openai-key
# Supabase
SUPABASE_URL=https://your-project-id.supabase.co
# Service role key must be kept server-side only (NEVER expose in frontend code)
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
4) Prepare your database (Supabase)
Open your Supabase dashboard -> SQL Editor. Run the SQL below to enable pgvector, create a table, index it, and add an RPC function for searching.
If you prefer UUID ids, use the first block and don’t pass custom
ids
when inserting. If you want to pass string ids like"1"
, use the Text ID variant.
4.1 Enable pgvector
-- The extension name is "vector"
create extension if not exists vector;
4.2 Create the documents
table (UUID id)
create table if not exists public.documents (
id uuid primary key default gen_random_uuid(),
content text,
metadata jsonb,
embedding vector(1536) -- matches OpenAI text-embedding-3-small
);
OR (Text ID, if you plan to insert ids like "1","2"):
create table if not exists public.documents (
id text primary key,
content text,
metadata jsonb,
embedding vector(1536)
);
1536
matches the default dimensionality fortext-embedding-3-small
.
4.3 Optional index for faster search (cosine)
create index if not exists documents_embedding_idx
on public.documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
IVFFlat speeds up ANN (approximate nearest neighbor) search. You can tune
lists
later.
4.4 Create the match_documents
RPC
This function returns the top‑N most similar rows using cosine distance:
create or replace function public.match_documents(
query_embedding vector(1536),
match_count int default 5,
filter jsonb default '{}'
)
returns table (
id text,
content text,
metadata jsonb,
similarity float
)
language plpgsql
as $$
begin
return query
select
d.id::text,
d.content,
d.metadata,
1 - (d.embedding <=> query_embedding) as similarity
from public.documents d
where d.metadata @> filter
order by d.embedding <=> query_embedding -- cosine distance (smaller is closer)
limit match_count;
end;
$$;
If you used
uuid
forid
, you can still cast totext
in the return shape as shown.
5) Project structure
lc-supabase-demo/
├─ .env
├─ package.json
├─ startup.mjs
├─ seed.mjs
└─ index.mjs
6) startup.mjs
- initialize clients and vector store
// startup.mjs
import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
const {
OPENAI_API_KEY,
SUPABASE_URL,
SUPABASE_SERVICE_ROLE_KEY,
} = process.env;
if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
throw new Error("Missing required env vars. Check your .env.");
}
export const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
apiKey: OPENAI_API_KEY,
// Optional: dimensions: 1536, // stick to default 1536
});
export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);
// Table and RPC function names must match what you created in SQL
const vectorStore = new SupabaseVectorStore(embeddings, {
client: supabase,
tableName: "documents",
queryName: "match_documents",
});
export default vectorStore;
7) seed.mjs
- add documents
// seed.mjs
import vectorStore from "./startup.mjs";
const doc1 = {
pageContent: "The powerhouse of the cell is the mitochondria.", // fixed typo
metadata: { page: 1, source: "https://example.com" },
};
const doc2 = {
pageContent: "Buildings are commonly made out of brick.",
metadata: { page: 2, source: "https://example.com" },
};
const doc3 = {
pageContent: "Mitochondria membranes are rich in lipids.",
metadata: { page: 3, source: "https://example.com" },
};
const doc4 = {
pageContent: "The 2024 Olympics were hosted in Paris.",
metadata: { page: 4, source: "https://example.com" },
};
const documents = [doc1, doc2, doc3, doc4];
// Option A) Let DB auto-generate ids (use UUID schema)
await vectorStore.addDocuments(documents);
// Option B) If you created a text id column and want custom ids:
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });
console.log("Seeded documents!");
8) index.mjs
- run a similarity search
// index.mjs
import vectorStore from "./startup.mjs";
const query = "Mitochondria are made of what?";
const k = 1;
const results = await vectorStore.similaritySearch(query, k);
console.log(results);
/*
Example shape:
[
{
pageContent: 'Mitochondria membranes are rich in lipids.',
metadata: { page: 3, source: 'https://example.com' },
id: '...'
}
]
*/
9) Run it
# 1) Seed the data (one-time or whenever you change documents)
npm run seed
# 2) Query
npm run dev
Troubleshooting & common gotchas
PGRST202 could not find function public.match_documents
You didn’t create the RPC, or it’s not in thepublic
schema, or the name mismatches yourqueryName
option.Dimension mismatch
Ensure your table column and RPC vector dimensions match your embedding model (1536 fortext-embedding-3-small
).Service role key exposure
UseSUPABASE_SERVICE_ROLE_KEY
only on the server. For browser apps, create a secure backend route that proxies requests.Inconsistent ids
If your table usesuuid
, don’t pass customids
inaddDocuments
. If you want to pass your own ids, defineid text primary key
.Performance
Create theivfflat
index (cosine ops), and consider increasinglists
andprobes
depending on data size and latency targets.
Why this works
-
LangChain.js ships a SupabaseVectorStore integration in
@langchain/community
, which expects adocuments
table and an RPC (defaultmatch_documents
) to perform similarity search. -
pgvector provides the
vector
type, distance operators, and ANN indexes for fast semantic search. -
OpenAI
text-embedding-3-small
gives high‑quality, low‑cost embeddings with 1536 dimensions, perfect for quick demos and many production workloads.
Next steps
- Add metadata filters (e.g.,
filter: { source: "https://example.com" }
) to scope search. - Batch insert larger datasets and tweak ANN index parameters.
- Swap in other embedding providers (Cohere, Voyage, local models) if needed.
- Build a RAG pipeline: chunk PDFs/HTML -> store -> query -> rerank -> answer with citations.
Full source (copy‑paste friendly)
startup.mjs
import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
const {
OPENAI_API_KEY,
SUPABASE_URL,
SUPABASE_SERVICE_ROLE_KEY,
} = process.env;
if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
throw new Error("Missing required env vars. Check your .env.");
}
export const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
apiKey: OPENAI_API_KEY,
});
export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);
const vectorStore = new SupabaseVectorStore(embeddings, {
client: supabase,
tableName: "documents",
queryName: "match_documents",
});
export default vectorStore;
seed.mjs
import vectorStore from "./startup.mjs";
const document1 = {
pageContent: "The powerhouse of the cell is the mitochondria.",
metadata: { page: 1, source: "https://example.com" },
};
const document2 = {
pageContent: "Buildings are commonly made out of brick.",
metadata: { page: 2, source: "https://example.com" },
};
const document3 = {
pageContent: "Mitochondria membranes are rich in lipids.",
metadata: { page: 3, source: "https://example.com" },
};
const document4 = {
pageContent: "The 2024 Olympics were hosted in Paris.",
metadata: { page: 4, source: "https://example.com" },
};
const documents = [document1, document2, document3, document4];
// Option A: auto UUID
await vectorStore.addDocuments(documents);
// Option B: custom ids (if table uses text id)
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });
console.log("Seed complete.");
index.mjs
import vectorStore from "./startup.mjs";
const query = "Mitochondria are made of what?";
const results = await vectorStore.similaritySearch(query, 1);
console.log(results);
Security checklist
- API keys live in .env, not source files.
- Service role key is used only on trusted servers (Node, serverless function, etc.).
- Rate limit and auth‑gate any routes that call your vector store.
- Consider Row Level Security (RLS) and custom filters in
match_documents
to restrict result visibility.
Happy building!
Top comments (0)