Supabase AI and Vector Advanced — pgvector, Embeddings, RAG, and Semantic Caching
Supabase ships with pgvector support out of the box, enabling vector search, RAG pipelines, and semantic caching directly on PostgreSQL. This article covers production-ready patterns for building AI-powered search features with Supabase and Flutter.
pgvector Index Types: ivfflat vs hnsw
pgvector supports two index algorithms, each with different trade-offs:
| Criteria | ivfflat | hnsw |
|---|---|---|
| Build Speed | Fast | Slow (noticeable at scale) |
| Search Accuracy | Medium (tunable via lists) | High |
| Memory Usage | Low | High |
| Best For | Write-heavy, mid-scale | Read-heavy, high-accuracy |
-- hnsw index for cosine similarity (recommended for most cases)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- ivfflat index for inner product
CREATE INDEX ON documents
USING ivfflat (embedding vector_ip_ops)
WITH (lists = 100);
For indie projects under one million rows, hnsw is recommended. The slower build time is a one-time cost, and query accuracy is consistently better.
Generating Embeddings and Storing in Supabase
Use OpenAI's text-embedding-3-small or Google's embedding-001 to vectorize text, then store the results in Supabase.
// Supabase Edge Function: embed-and-store.ts
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
import OpenAI from 'https://esm.sh/openai@4'
const openai = new OpenAI({ apiKey: Deno.env.get('OPENAI_API_KEY')! })
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
)
export async function embedAndStore(text: string, metadata: Record<string, unknown>) {
const embeddingResponse = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
})
const embedding = embeddingResponse.data[0].embedding
const { data, error } = await supabase.from('documents').insert({
content: text,
embedding,
metadata,
})
if (error) throw error
return data
}
-- Table schema
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1536), -- 1536 dimensions for text-embedding-3-small
metadata JSONB DEFAULT '{}'::JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Building a RAG Pipeline
RAG (Retrieval-Augmented Generation) follows four steps: document chunking → embedding → similarity search → LLM answer generation.
// Edge Function: rag-query.ts
export async function ragQuery(userQuery: string): Promise<string> {
// Step 1: Embed the query
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: userQuery,
})
const queryVector = queryEmbedding.data[0].embedding
// Step 2: Find similar documents (cosine similarity > 0.8, top 5)
const { data: docs } = await supabase.rpc('match_documents', {
query_embedding: queryVector,
match_threshold: 0.8,
match_count: 5,
})
// Step 3: Build context and call LLM
const context = docs.map((d: any) => d.content).join('\n\n')
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: `Answer using the following documents:\n${context}` },
{ role: 'user', content: userQuery },
],
})
return completion.choices[0].message.content ?? ''
}
-- Similarity search function
CREATE OR REPLACE FUNCTION match_documents(
query_embedding VECTOR(1536),
match_threshold FLOAT DEFAULT 0.8,
match_count INT DEFAULT 5
)
RETURNS TABLE(id UUID, content TEXT, similarity FLOAT)
LANGUAGE SQL STABLE AS $$
SELECT id, content, 1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE 1 - (embedding <=> query_embedding) > match_threshold
ORDER BY similarity DESC
LIMIT match_count;
$$;
Implementing Semantic Caching
Semantic caching checks if an incoming query's embedding is similar enough to a cached query, avoiding redundant LLM API calls and cutting costs significantly.
// semantic-cache.ts
export async function semanticCachedQuery(
userQuery: string,
similarityThreshold = 0.95
): Promise<string> {
const queryVector = await getEmbedding(userQuery)
// Look for a cached answer with similarity >= 0.95
const { data: cached } = await supabase.rpc('match_query_cache', {
query_embedding: queryVector,
match_threshold: similarityThreshold,
match_count: 1,
})
if (cached && cached.length > 0) {
console.log('Cache hit! similarity:', cached[0].similarity)
return cached[0].answer
}
// Cache miss: run RAG and store the result
const answer = await ragQuery(userQuery)
await supabase.from('query_cache').insert({
query: userQuery,
embedding: queryVector,
answer,
})
return answer
}
A threshold of 0.95 works well in production. Too high and the cache hit rate suffers; too low and semantically different questions receive the same answer.
Calling Vector Search from Flutter
Invoke the Edge Function from Flutter using supabase.functions.invoke():
// vector_search_service.dart
class VectorSearchService {
final SupabaseClient _supabase;
VectorSearchService(this._supabase);
Future<String> ask(String query) async {
final response = await _supabase.functions.invoke(
'rag-query',
body: {'query': query},
);
if (response.status != 200) {
throw Exception('RAG query failed: ${response.data}');
}
return response.data['answer'] as String;
}
}
// Riverpod provider
@riverpod
VectorSearchService vectorSearch(VectorSearchRef ref) {
return VectorSearchService(ref.watch(supabaseClientProvider));
}
@riverpod
Future<String> ragAnswer(RagAnswerRef ref, String query) async {
return ref.watch(vectorSearchProvider).ask(query);
}
The combination of pgvector, Supabase Edge Functions, and Flutter gives you a fully managed AI search backend for well under $25/month — a practical choice for indie products that need smart search without infrastructure overhead.
This series covers Flutter, Supabase, and indie SaaS development. New articles every week.
Top comments (0)