Four pgvector patterns that kept our RAG SaaS on one Postgres

#postgres #rag #ai #typescript

Most RAG tutorials stop at embedding <=> query. They show you the operator, return five rows, and call it retrieval. Then you ship it, a second customer signs up, and you discover the four things the tutorial skipped: indexing on a column that's half-NULL, the distance-vs-similarity sign flip, the dimension lock-in, and the function that quietly bypasses your tenant isolation.

I run a Discord-native Company Brain. Teams /save docs, links, and PDFs; /ask returns a grounded, cited answer. The whole vector store is one Supabase Postgres with pgvector — no Pinecone, no second system to bill and reconcile. Here are four patterns that made that survive contact with real workspaces.

## The problem: a vector column is not a vector store

A vector(1536) column gives you storage and a distance operator. It does not give you fast search, correct ranking, dimension discipline, or multi-tenant safety. Those are four separate decisions, and getting any one wrong shows up as a production bug, not a compile error.

Our artifacts table holds every chunk a workspace has ingested. The relevant columns:


sql
  CREATE TABLE artifacts (
    id           uuid    PRIMARY KEY DEFAULT gen_random_uuid(),
    workspace_id uuid    NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
    content      text    NOT NULL,
    content_hash text    NOT NULL,        -- sha256, short-circuits re-embedding
    metadata     jsonb   NOT NULL DEFAULT '{}'::jsonb,
    -- 1536 dims = OpenAI text-embedding-3-small.
    embedding    vector(1536),            -- NULLABLE on purpose. See pattern 1.
    created_at   timestamptz NOT NULL DEFAULT now(),
    UNIQUE (workspace_id, source_type, external_id)
  );

  Note that embedding is nullable. Artifacts arrive un-embedded — the web service writes the row instantly, a worker embeds it async on a */15 cron. That single nullable column drives the first pattern.

  Pattern 1: Index only the rows that have a vector

  The naive HNSW index covers the whole column. But half our rows are NULL at any given moment during backfill, and building HNSW graph edges for NULL rows is wasted work and wasted index size.

  The fix is a partial index with a WHERE predicate:

  -- Partial HNSW index: only index rows that actually have an embedding.
  -- Keeps the index small during async backfill and skips HNSW build cost
  -- on NULL rows entirely.
  CREATE INDEX artifacts_embedding_hnsw_idx
    ON artifacts
    USING hnsw (embedding vector_cosine_ops)
    WHERE embedding IS NOT NULL;

  Two choices worth defending:

  - HNSW over IVFFlat. IVFFlat needs training data to build its lists — you have to populate the table first, then build the index, and rebuild as the distribution shifts. HNSW builds incrementally as rows arrive. For a product where every workspace starts at zero artifacts and grows continuously, "no training step, no rebuild" wins. We left m and ef_construction at pgvector defaults and wrote a note to tune them once we have real latency data — premature index tuning is just a guess with extra steps.
  - vector_cosine_ops, not the default. The operator class in the index must match the distance operator your query uses. Index on vector_cosine_ops, query with <=> (cosine distance). Mismatch them and Postgres silently does a sequential scan — correct answers, terrible latency, no error to tell you why.

  Pattern 2: The sign flip — distance is not similarity
  pgvector's <=> returns cosine distance: 0 is identical, 2 is opposite. Humans, dashboards, and threshold configs think in similarity: 1 is identical, 0 is unrelated. The conversion is similarity = 1 - distance, and you have to apply it consistently in three places or your ranking inverts.

  Here's the actual retrieval RPC. Watch where <=> appears raw (ordering) versus converted (filtering and output):

  CREATE OR REPLACE FUNCTION match_artifacts(
    p_workspace_id  uuid,
    query_embedding vector(1536),
    match_count     int   DEFAULT 5,
    min_similarity  float DEFAULT 0.15
  )
  RETURNS TABLE (id uuid, content text, similarity float)
  LANGUAGE sql
  SECURITY INVOKER                      -- critical. See pattern 4.
  AS $$
    SELECT
      a.id,
      a.content,
      1 - (a.embedding <=> query_embedding) AS similarity   -- distance -> similarity
    FROM artifacts a
    WHERE a.workspace_id = p_workspace_id
      AND a.embedding IS NOT NULL
      AND 1 - (a.embedding <=> query_embedding) >= min_similarity  -- filter in similarity space
    ORDER BY a.embedding <=> query_embedding                       -- order in DISTANCE space (ASC)
    LIMIT match_count;
  $$;

  The ORDER BY stays in distance space and sorts ascending — smallest distance first — because that's the direction the HNSW index understands. Flip it to ORDER BY similarity DESC and you get the same logical result but you've handed the planner an expression it can't satisfy from the index, so it sorts in memory after a scan. Order by the raw operator; convert only for the human-facing columns.

  Our retrieval defaults — match_count = 5, min_similarity = 0.15 — came out of tuning against our own corpus, not a paper. Higher k bloats the model's context window without lifting answer quality; a lower threshold lets junk through and the model starts hedging. They're defaults, not laws: the RPC takes both as parameters so we can override per workspace.

  Pattern 3: Dimensions are a one-way door — plan the migration before you need it

  vector(1536) is a hard constraint. The number 1536 is OpenAI's text-embedding-3-small. If you decide to swap models, a different dimension count means the column type no longer fits and every existing embedding is now garbage against the new query vectors.

  We evaluated text-embedding-3-large (3072-dim) in week two. The numbers:

  ┌──────────────────────────┬─────────────────┬───────────────┐
  │           Knob           │ -small (chosen) │    -large     │
  ├──────────────────────────┼─────────────────┼───────────────┤
  │ Dimensions               │ 1536            │ 3072          │
  ├──────────────────────────┼─────────────────┼───────────────┤
  │ Top-5 recall (our eval)  │ baseline        │ ~3 pts higher │
  ├──────────────────────────┼─────────────────┼───────────────┤
  │ Cost per token           │ 1×              │ 6×            │
  ├──────────────────────────┼─────────────────┼───────────────┤
  │ pgvector storage per row │ 1×              │ 2×            │
  └──────────────────────────┴─────────────────┴───────────────┘

  Three points of recall for six times the cost and double the storage did not clear the bar at our scale. Tuning min_similarity lifted precision more cheaply than the extra dimensions did. But the real lesson is the migration rule we wrote down so a future me doesn't fight the column type at 2am:

  ▎ When we change embedding models, the new vector goes in a new column (embedding_v2 vector(3072)), backfilled and dual-read behind a flag — never an in-place ALTER of the 
  ▎ existing column.

  Adding a column lets old and new embeddings coexist while you backfill millions of rows and verify recall didn't regress. Altering the column in place takes a write lock on the
  whole table and gives you no rollback. Pick the boring migration.

  Pattern 4: The function that bypasses your tenancy — SECURITY INVOKER, always

  This one nearly made me quit for the day. Our entire multi-tenant model is Row Level Security keyed on workspace_id: a policy on artifacts means a query physically cannot return
  another tenant's rows. Airtight — except for a function declared SECURITY DEFINER, which runs with the definer's privileges and skips RLS entirely.

  A vector-search RPC is exactly the kind of function people reflexively mark SECURITY DEFINER (it's calling into internals, feels like it should be privileged). Do that, and match_artifacts happily returns chunks across workspace boundaries even though RLS is enabled on the table. The leak doesn't throw — it just quietly serves the wrong tenant's data.

  Two defenses, both in the RPC above:

  1. SECURITY INVOKER — the function runs as the caller, so RLS policies apply inside it exactly as they would on a direct query.
  2. An explicit WHERE a.workspace_id = p_workspace_id predicate — belt and suspenders. RLS is the wall; the predicate is the lock. If a future migration ever fumbles a policy, the predicate still scopes the result.

  And because the only caller is the worker (holding a service-role key on a trusted server), we revoke the function from public roles entirely:

  -- Only the service-role worker needs this. Anon/authenticated never call it.
  REVOKE EXECUTE ON FUNCTION match_artifacts FROM anon, authenticated;

  The TypeScript side stays boring, which is the point — all the safety lives in the database:

  const { data: matches } = await supabase.rpc("match_artifacts", {
    p_workspace_id: workspaceId,     // scoped by the caller, enforced by RLS + predicate
    query_embedding: queryVector,    // 1536-dim, same model as ingest
    match_count: 5,
    min_similarity: 0.15,
  });

  Write the cross-tenant leak test before the retrieval feature, not after. I wrote it after, which is how I learned the difference between DEFINER and INVOKER the expensive way.

  Takeaways

  - Partial-index your vector column when embeddings arrive async — don't pay HNSW cost on NULL rows.
  - HNSW when rows stream in continuously (no training step); match the operator class to your distance operator or you'll silently seq-scan.
  - Convert distance to similarity only for filtering and output — keep ORDER BY in raw distance space so the index does the sorting.
  - Dimensions are immutable: new model means new column, dual-read, backfill — never in-place ALTER.
  - SECURITY INVOKER plus an explicit tenant predicate. A DEFINER vector RPC is a cross-tenant leak with a clean stack trace.

  Keeping embeddings inside the same Postgres that enforces RLS is what makes one operator (me) able to run multi-tenant RAG without a second system to secure. That's the bet behind Acortia (https://acortia.com) — the brain lives where the tenancy is already enforced.

DEV Community

Four pgvector patterns that kept our RAG SaaS on one Postgres

Top comments (0)