gautam kumar

Posted on Sep 10

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

#webdev #programming #javascript #ai

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

This guide walks you through building a tiny semantic search demo using:

LangChain.js - to orchestrate embeddings and vector search
Supabase (Postgres + pgvector) - to store your vectors and query them efficiently
OpenAI embeddings - we’ll use text-embedding-3-small

By the end, you’ll be able to insert documents into a Supabase table and query similar text with just a few lines of code.

Never hardcode real API keys in code. Use environment variables. Also, do not expose your Supabase Service Role Key to the browser-keep it server-side only.

What you’ll build

We’ll index 4 short facts into a documents table and then ask a question:

“Mitochondria are made of what?”

The code will return the most similar document (spoiler: the one explaining mitochondria).

How vector search works (in plain English)

Embed text: Convert text into high‑dimensional vectors using an embedding model.
Store vectors: Save those vectors in Postgres via pgvector (the vector data type).
Search: Embed the user’s query and find nearest neighbors (cosine similarity / distance).

That’s it. No keywords, no exact string matching-semantic similarity does the heavy lifting.

Prerequisites

Node.js 18+
A Supabase project (free tier is fine)
An OpenAI API key
Basic terminal familiarity

1) Create a new project

mkdir lc-supabase-demo
cd lc-supabase-demo
npm init -y

Use ESM imports (since our code uses import). Add "type": "module" to your package.json:

{
  "name": "lc-supabase-demo",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "dev": "node index.mjs",
    "seed": "node seed.mjs"
  }
}

2) Install dependencies

npm i @langchain/community @langchain/openai @supabase/supabase-js dotenv

@langchain/community – community integrations (Supabase vector store lives here)
@langchain/openai – OpenAI embeddings and chat models for LangChain
@supabase/supabase-js – Supabase client SDK
dotenv – load env variables from .env

3) Create your `.env` (do not commit this)

Create a file named .env in the project root:

touch .env

Put the following into .env (replace with your values):

# OpenAI
OPENAI_API_KEY=sk-your-openai-key

# Supabase
SUPABASE_URL=https://your-project-id.supabase.co
# Service role key must be kept server-side only (NEVER expose in frontend code)
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key

4) Prepare your database (Supabase)

Open your Supabase dashboard -> SQL Editor. Run the SQL below to enable pgvector, create a table, index it, and add an RPC function for searching.

If you prefer UUID ids, use the first block and don’t pass custom ids when inserting. If you want to pass string ids like "1", use the Text ID variant.

4.1 Enable `pgvector`

-- The extension name is "vector"
create extension if not exists vector;

4.2 Create the `documents` table (UUID id)

create table if not exists public.documents (
  id uuid primary key default gen_random_uuid(),
  content text,
  metadata jsonb,
  embedding vector(1536)  -- matches OpenAI text-embedding-3-small
);

OR (Text ID, if you plan to insert ids like "1","2"):

create table if not exists public.documents (
  id text primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

1536 matches the default dimensionality for text-embedding-3-small.

4.3 Optional index for faster search (cosine)

create index if not exists documents_embedding_idx
on public.documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

IVFFlat speeds up ANN (approximate nearest neighbor) search. You can tune lists later.

4.4 Create the `match_documents` RPC

This function returns the top‑N most similar rows using cosine distance:

create or replace function public.match_documents(
  query_embedding vector(1536),
  match_count int default 5,
  filter jsonb default '{}'
)
returns table (
  id text,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    d.id::text,
    d.content,
    d.metadata,
    1 - (d.embedding <=> query_embedding) as similarity
  from public.documents d
  where d.metadata @> filter
  order by d.embedding <=> query_embedding  -- cosine distance (smaller is closer)
  limit match_count;
end;
$$;

If you used uuid for id, you can still cast to text in the return shape as shown.

5) Project structure

lc-supabase-demo/
├─ .env
├─ package.json
├─ startup.mjs
├─ seed.mjs
└─ index.mjs

6) `startup.mjs` - initialize clients and vector store

// startup.mjs
import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";

const {
  OPENAI_API_KEY,
  SUPABASE_URL,
  SUPABASE_SERVICE_ROLE_KEY,
} = process.env;

if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
  throw new Error("Missing required env vars. Check your .env.");
}

export const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: OPENAI_API_KEY,
  // Optional: dimensions: 1536, // stick to default 1536
});

export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);

// Table and RPC function names must match what you created in SQL
const vectorStore = new SupabaseVectorStore(embeddings, {
  client: supabase,
  tableName: "documents",
  queryName: "match_documents",
});

export default vectorStore;

7) `seed.mjs` - add documents

// seed.mjs
import vectorStore from "./startup.mjs";

const doc1 = {
  pageContent: "The powerhouse of the cell is the mitochondria.", // fixed typo
  metadata: { page: 1, source: "https://example.com" },
};
const doc2 = {
  pageContent: "Buildings are commonly made out of brick.",
  metadata: { page: 2, source: "https://example.com" },
};
const doc3 = {
  pageContent: "Mitochondria membranes are rich in lipids.",
  metadata: { page: 3, source: "https://example.com" },
};
const doc4 = {
  pageContent: "The 2024 Olympics were hosted in Paris.",
  metadata: { page: 4, source: "https://example.com" },
};

const documents = [doc1, doc2, doc3, doc4];

// Option A) Let DB auto-generate ids (use UUID schema)
await vectorStore.addDocuments(documents);

// Option B) If you created a text id column and want custom ids:
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

console.log("Seeded documents!");

8) `index.mjs` - run a similarity search

// index.mjs
import vectorStore from "./startup.mjs";

const query = "Mitochondria are made of what?";
const k = 1;

const results = await vectorStore.similaritySearch(query, k);
console.log(results);

/*
Example shape:
[
  {
    pageContent: 'Mitochondria membranes are rich in lipids.',
    metadata: { page: 3, source: 'https://example.com' },
    id: '...'
  }
]
*/

9) Run it

# 1) Seed the data (one-time or whenever you change documents)
npm run seed

# 2) Query
npm run dev

Troubleshooting & common gotchas

PGRST202 could not find function public.match_documents

You didn’t create the RPC, or it’s not in the public schema, or the name mismatches your queryName option.
Dimension mismatch

Ensure your table column and RPC vector dimensions match your embedding model (1536 for text-embedding-3-small).
Service role key exposure

Use SUPABASE_SERVICE_ROLE_KEY only on the server. For browser apps, create a secure backend route that proxies requests.
Inconsistent ids

If your table uses uuid, don’t pass custom ids in addDocuments. If you want to pass your own ids, define id text primary key.
Performance

Create the ivfflat index (cosine ops), and consider increasing lists and probes depending on data size and latency targets.

Why this works

LangChain.js ships a SupabaseVectorStore integration in @langchain/community, which expects a documents table and an RPC (default match_documents) to perform similarity search.
pgvector provides the vector type, distance operators, and ANN indexes for fast semantic search.
OpenAI text-embedding-3-small gives high‑quality, low‑cost embeddings with 1536 dimensions, perfect for quick demos and many production workloads.

Next steps

Add metadata filters (e.g., filter: { source: "https://example.com" }) to scope search.
Batch insert larger datasets and tweak ANN index parameters.
Swap in other embedding providers (Cohere, Voyage, local models) if needed.
Build a RAG pipeline: chunk PDFs/HTML -> store -> query -> rerank -> answer with citations.

Full source (copy‑paste friendly)

startup.mjs

import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";

const {
  OPENAI_API_KEY,
  SUPABASE_URL,
  SUPABASE_SERVICE_ROLE_KEY,
} = process.env;

if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
  throw new Error("Missing required env vars. Check your .env.");
}

export const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: OPENAI_API_KEY,
});

export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);

const vectorStore = new SupabaseVectorStore(embeddings, {
  client: supabase,
  tableName: "documents",
  queryName: "match_documents",
});

export default vectorStore;

seed.mjs

import vectorStore from "./startup.mjs";

const document1 = {
  pageContent: "The powerhouse of the cell is the mitochondria.",
  metadata: { page: 1, source: "https://example.com" },
};

const document2 = {
  pageContent: "Buildings are commonly made out of brick.",
  metadata: { page: 2, source: "https://example.com" },
};

const document3 = {
  pageContent: "Mitochondria membranes are rich in lipids.",
  metadata: { page: 3, source: "https://example.com" },
};

const document4 = {
  pageContent: "The 2024 Olympics were hosted in Paris.",
  metadata: { page: 4, source: "https://example.com" },
};

const documents = [document1, document2, document3, document4];

// Option A: auto UUID
await vectorStore.addDocuments(documents);

// Option B: custom ids (if table uses text id)
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

console.log("Seed complete.");

index.mjs

import vectorStore from "./startup.mjs";

const query = "Mitochondria are made of what?";
const results = await vectorStore.similaritySearch(query, 1);
console.log(results);

Security checklist

API keys live in .env, not source files.
Service role key is used only on trusted servers (Node, serverless function, etc.).
Rate limit and auth‑gate any routes that call your vector store.
Consider Row Level Security (RLS) and custom filters in match_documents to restrict result visibility.

Happy building!

DEV Community

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

What you’ll build

How vector search works (in plain English)

Prerequisites

1) Create a new project

2) Install dependencies

3) Create your `.env` (do not commit this)

4) Prepare your database (Supabase)

4.1 Enable `pgvector`

4.2 Create the `documents` table (UUID id)

4.3 Optional index for faster search (cosine)

4.4 Create the `match_documents` RPC

5) Project structure

6) `startup.mjs` - initialize clients and vector store

7) `seed.mjs` - add documents

8) `index.mjs` - run a similarity search

9) Run it

Troubleshooting & common gotchas

Why this works

Next steps

Full source (copy‑paste friendly)

Security checklist

Top comments (0)

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

What you’ll build

How vector search works (in plain English)

Prerequisites

1) Create a new project

2) Install dependencies

3) Create your .env (do not commit this)

4) Prepare your database (Supabase)

4.1 Enable pgvector

4.2 Create the documents table (UUID id)

4.3 Optional index for faster search (cosine)

4.4 Create the match_documents RPC

5) Project structure

6) startup.mjs - initialize clients and vector store

7) seed.mjs - add documents

8) index.mjs - run a similarity search

9) Run it

Troubleshooting & common gotchas

Why this works

Next steps

Full source (copy‑paste friendly)

Security checklist

3) Create your `.env` (do not commit this)

4.1 Enable `pgvector`

4.2 Create the `documents` table (UUID id)

4.4 Create the `match_documents` RPC

6) `startup.mjs` - initialize clients and vector store

7) `seed.mjs` - add documents

8) `index.mjs` - run a similarity search