I Built 'Chat With Your Docs' From Scratch — Supabase + pgvector + a Free Local Embedder

#ai #rag #supabase #beginners

"Chat with your PDF / your notes / your docs" is everywhere. Today we build it from scratch and you'll see it's just three moves: retrieve, then generate — with one prompt trick that stops the hallucinations.

This is Day 46 of TechFromZero. Yesterday (Day 45) we built the retrieval half with pgvector. Today we add the answer half and host it on Supabase.

RAG in one line

Find the relevant chunks of your documents, paste them into the prompt, and tell the model to answer using only those.

That's Retrieval-Augmented Generation. The "augmented" part is just stuffing real context into the prompt so the model isn't guessing from memory.

1. Storage: Supabase is Postgres, so pgvector is one click

Supabase is hosted Postgres with an auto-generated API. Because it's just Postgres, vector search needs no separate database:

create extension if not exists vector;

create table documents (
  id        bigserial primary key,
  content   text,
  embedding vector(384)
);

-- one RPC the app calls to get the closest chunks
create function match_documents(query_embedding vector(384), match_count int)
returns table (id bigint, content text, similarity float)
language sql stable as $$
  select id, content, 1 - (embedding <=> query_embedding) as similarity
  from documents order by embedding <=> query_embedding limit match_count;
$$;

2. Ingest: chunk → embed → store

Split your docs into paragraph-sized chunks, embed each with a free local model (all-MiniLM-L6-v2 via Transformers.js — no key, nothing leaves your machine), and insert the row + vector:

const embedding = await embed(chunk);     // 384 numbers
await supabase.from("documents").insert({ content: chunk, embedding });

Chunk size matters: too big buries the answer in noise, too small loses meaning. A few hundred characters is a good start.

3. Retrieve + Generate (the payoff)

Embed the question with the same model, ask Supabase for the closest chunks, then hand them to the LLM:

const query_embedding = await embed(question);
const { data: chunks } = await supabase.rpc("match_documents", { query_embedding, match_count: 4 });

const prompt = `Answer using ONLY the context below.
If the answer isn't there, say "I don't know based on the documents."

Context:
${chunks.map(c => "- " + c.content).join("\n")}

Question: ${question}`;

const answer = (await gemini.generateContent(prompt)).response.text();

The line that kills hallucinations

"Answer using ONLY the context. If it isn't there, say I don't know."

Without it, the model blends its own (possibly wrong) memory back in. With it, the model becomes a librarian that quotes your documents instead of a stranger guessing. Return the chunks alongside the answer so users can verify.