DEV Community

Cover image for LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide
gautam kumar
gautam kumar

Posted on

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

LangChain + Supabase Vector Store (pgvector) - A Beginner‑Friendly Guide

This guide walks you through building a tiny semantic search demo using:

  • LangChain.js - to orchestrate embeddings and vector search
  • Supabase (Postgres + pgvector) - to store your vectors and query them efficiently
  • OpenAI embeddings - we’ll use text-embedding-3-small

By the end, you’ll be able to insert documents into a Supabase table and query similar text with just a few lines of code.

Never hardcode real API keys in code. Use environment variables. Also, do not expose your Supabase Service Role Key to the browser-keep it server-side only.


What you’ll build

We’ll index 4 short facts into a documents table and then ask a question:

“Mitochondria are made of what?”

The code will return the most similar document (spoiler: the one explaining mitochondria).

How vector search works (in plain English)

  1. Embed text: Convert text into high‑dimensional vectors using an embedding model.
  2. Store vectors: Save those vectors in Postgres via pgvector (the vector data type).
  3. Search: Embed the user’s query and find nearest neighbors (cosine similarity / distance).

That’s it. No keywords, no exact string matching-semantic similarity does the heavy lifting.


Prerequisites

  • Node.js 18+
  • A Supabase project (free tier is fine)
  • An OpenAI API key
  • Basic terminal familiarity

1) Create a new project

mkdir lc-supabase-demo
cd lc-supabase-demo
npm init -y
Enter fullscreen mode Exit fullscreen mode

Use ESM imports (since our code uses import). Add "type": "module" to your package.json:

{
  "name": "lc-supabase-demo",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "dev": "node index.mjs",
    "seed": "node seed.mjs"
  }
}
Enter fullscreen mode Exit fullscreen mode

2) Install dependencies

npm i @langchain/community @langchain/openai @supabase/supabase-js dotenv
Enter fullscreen mode Exit fullscreen mode
  • @langchain/community – community integrations (Supabase vector store lives here)
  • @langchain/openai – OpenAI embeddings and chat models for LangChain
  • @supabase/supabase-js – Supabase client SDK
  • dotenv – load env variables from .env

3) Create your .env (do not commit this)

Create a file named .env in the project root:

touch .env
Enter fullscreen mode Exit fullscreen mode

Put the following into .env (replace with your values):

# OpenAI
OPENAI_API_KEY=sk-your-openai-key

# Supabase
SUPABASE_URL=https://your-project-id.supabase.co
# Service role key must be kept server-side only (NEVER expose in frontend code)
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
Enter fullscreen mode Exit fullscreen mode

4) Prepare your database (Supabase)

Open your Supabase dashboard -> SQL Editor. Run the SQL below to enable pgvector, create a table, index it, and add an RPC function for searching.

If you prefer UUID ids, use the first block and don’t pass custom ids when inserting. If you want to pass string ids like "1", use the Text ID variant.

4.1 Enable pgvector

-- The extension name is "vector"
create extension if not exists vector;
Enter fullscreen mode Exit fullscreen mode

4.2 Create the documents table (UUID id)

create table if not exists public.documents (
  id uuid primary key default gen_random_uuid(),
  content text,
  metadata jsonb,
  embedding vector(1536)  -- matches OpenAI text-embedding-3-small
);
Enter fullscreen mode Exit fullscreen mode

OR (Text ID, if you plan to insert ids like "1","2"):

create table if not exists public.documents (
  id text primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);
Enter fullscreen mode Exit fullscreen mode

1536 matches the default dimensionality for text-embedding-3-small.

4.3 Optional index for faster search (cosine)

create index if not exists documents_embedding_idx
on public.documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
Enter fullscreen mode Exit fullscreen mode

IVFFlat speeds up ANN (approximate nearest neighbor) search. You can tune lists later.

4.4 Create the match_documents RPC

This function returns the top‑N most similar rows using cosine distance:

create or replace function public.match_documents(
  query_embedding vector(1536),
  match_count int default 5,
  filter jsonb default '{}'
)
returns table (
  id text,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    d.id::text,
    d.content,
    d.metadata,
    1 - (d.embedding <=> query_embedding) as similarity
  from public.documents d
  where d.metadata @> filter
  order by d.embedding <=> query_embedding  -- cosine distance (smaller is closer)
  limit match_count;
end;
$$;
Enter fullscreen mode Exit fullscreen mode

If you used uuid for id, you can still cast to text in the return shape as shown.


5) Project structure

lc-supabase-demo/
├─ .env
├─ package.json
├─ startup.mjs
├─ seed.mjs
└─ index.mjs
Enter fullscreen mode Exit fullscreen mode

6) startup.mjs - initialize clients and vector store

// startup.mjs
import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";

const {
  OPENAI_API_KEY,
  SUPABASE_URL,
  SUPABASE_SERVICE_ROLE_KEY,
} = process.env;

if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
  throw new Error("Missing required env vars. Check your .env.");
}

export const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: OPENAI_API_KEY,
  // Optional: dimensions: 1536, // stick to default 1536
});

export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);

// Table and RPC function names must match what you created in SQL
const vectorStore = new SupabaseVectorStore(embeddings, {
  client: supabase,
  tableName: "documents",
  queryName: "match_documents",
});

export default vectorStore;
Enter fullscreen mode Exit fullscreen mode

7) seed.mjs - add documents

// seed.mjs
import vectorStore from "./startup.mjs";

const doc1 = {
  pageContent: "The powerhouse of the cell is the mitochondria.", // fixed typo
  metadata: { page: 1, source: "https://example.com" },
};
const doc2 = {
  pageContent: "Buildings are commonly made out of brick.",
  metadata: { page: 2, source: "https://example.com" },
};
const doc3 = {
  pageContent: "Mitochondria membranes are rich in lipids.",
  metadata: { page: 3, source: "https://example.com" },
};
const doc4 = {
  pageContent: "The 2024 Olympics were hosted in Paris.",
  metadata: { page: 4, source: "https://example.com" },
};

const documents = [doc1, doc2, doc3, doc4];

// Option A) Let DB auto-generate ids (use UUID schema)
await vectorStore.addDocuments(documents);

// Option B) If you created a text id column and want custom ids:
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

console.log("Seeded documents!");
Enter fullscreen mode Exit fullscreen mode

8) index.mjs - run a similarity search

// index.mjs
import vectorStore from "./startup.mjs";

const query = "Mitochondria are made of what?";
const k = 1;

const results = await vectorStore.similaritySearch(query, k);
console.log(results);

/*
Example shape:
[
  {
    pageContent: 'Mitochondria membranes are rich in lipids.',
    metadata: { page: 3, source: 'https://example.com' },
    id: '...'
  }
]
*/
Enter fullscreen mode Exit fullscreen mode

9) Run it

# 1) Seed the data (one-time or whenever you change documents)
npm run seed

# 2) Query
npm run dev
Enter fullscreen mode Exit fullscreen mode

Troubleshooting & common gotchas

  • PGRST202 could not find function public.match_documents

    You didn’t create the RPC, or it’s not in the public schema, or the name mismatches your queryName option.

  • Dimension mismatch

    Ensure your table column and RPC vector dimensions match your embedding model (1536 for text-embedding-3-small).

  • Service role key exposure

    Use SUPABASE_SERVICE_ROLE_KEY only on the server. For browser apps, create a secure backend route that proxies requests.

  • Inconsistent ids

    If your table uses uuid, don’t pass custom ids in addDocuments. If you want to pass your own ids, define id text primary key.

  • Performance

    Create the ivfflat index (cosine ops), and consider increasing lists and probes depending on data size and latency targets.


Why this works

  • LangChain.js ships a SupabaseVectorStore integration in @langchain/community, which expects a documents table and an RPC (default match_documents) to perform similarity search.
  • pgvector provides the vector type, distance operators, and ANN indexes for fast semantic search.
  • OpenAI text-embedding-3-small gives high‑quality, low‑cost embeddings with 1536 dimensions, perfect for quick demos and many production workloads.

Next steps

  • Add metadata filters (e.g., filter: { source: "https://example.com" }) to scope search.
  • Batch insert larger datasets and tweak ANN index parameters.
  • Swap in other embedding providers (Cohere, Voyage, local models) if needed.
  • Build a RAG pipeline: chunk PDFs/HTML -> store -> query -> rerank -> answer with citations.

Full source (copy‑paste friendly)

startup.mjs

import "dotenv/config";
import { createClient } from "@supabase/supabase-js";
import { OpenAIEmbeddings } from "@langchain/openai";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";

const {
  OPENAI_API_KEY,
  SUPABASE_URL,
  SUPABASE_SERVICE_ROLE_KEY,
} = process.env;

if (!OPENAI_API_KEY || !SUPABASE_URL || !SUPABASE_SERVICE_ROLE_KEY) {
  throw new Error("Missing required env vars. Check your .env.");
}

export const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: OPENAI_API_KEY,
});

export const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY);

const vectorStore = new SupabaseVectorStore(embeddings, {
  client: supabase,
  tableName: "documents",
  queryName: "match_documents",
});

export default vectorStore;
Enter fullscreen mode Exit fullscreen mode

seed.mjs

import vectorStore from "./startup.mjs";

const document1 = {
  pageContent: "The powerhouse of the cell is the mitochondria.",
  metadata: { page: 1, source: "https://example.com" },
};

const document2 = {
  pageContent: "Buildings are commonly made out of brick.",
  metadata: { page: 2, source: "https://example.com" },
};

const document3 = {
  pageContent: "Mitochondria membranes are rich in lipids.",
  metadata: { page: 3, source: "https://example.com" },
};

const document4 = {
  pageContent: "The 2024 Olympics were hosted in Paris.",
  metadata: { page: 4, source: "https://example.com" },
};

const documents = [document1, document2, document3, document4];

// Option A: auto UUID
await vectorStore.addDocuments(documents);

// Option B: custom ids (if table uses text id)
// await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

console.log("Seed complete.");
Enter fullscreen mode Exit fullscreen mode

index.mjs

import vectorStore from "./startup.mjs";

const query = "Mitochondria are made of what?";
const results = await vectorStore.similaritySearch(query, 1);
console.log(results);
Enter fullscreen mode Exit fullscreen mode

Security checklist

  • API keys live in .env, not source files.
  • Service role key is used only on trusted servers (Node, serverless function, etc.).
  • Rate limit and auth‑gate any routes that call your vector store.
  • Consider Row Level Security (RLS) and custom filters in match_documents to restrict result visibility.

Happy building!

Top comments (0)