The Problem with LIKE Queries
When users start typing a search query, they expect instant suggestions. A naive WHERE title LIKE '%query%' approach has two problems: it can't use indexes (the leading % prevents it), and it doesn't handle typos or misspellings. At DailyWatch, we needed autocomplete that's both fast and forgiving.
PostgreSQL's pg_trgm extension solves both problems elegantly.
What Are Trigrams?
A trigram is a group of three consecutive characters from a string. The word "video" produces these trigrams: " v", " vi", "vid", "ide", "deo", "eo ". PostgreSQL uses trigram overlap to measure how similar two strings are, which naturally handles typos.
Setting Up pg_trgm
Enable the extension and create the index:
-- Enable trigram extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Create the videos table
CREATE TABLE videos (
id SERIAL PRIMARY KEY,
video_id VARCHAR(16) UNIQUE NOT NULL,
title TEXT NOT NULL,
channel_title TEXT,
view_count BIGINT DEFAULT 0,
region VARCHAR(4) DEFAULT 'US',
fetched_at TIMESTAMP DEFAULT NOW()
);
-- GIN index for trigram similarity (faster for searches)
CREATE INDEX idx_videos_title_trgm ON videos USING GIN (title gin_trgm_ops);
-- Optional: also index channel name for combined search
CREATE INDEX idx_videos_channel_trgm ON videos USING GIN (channel_title gin_trgm_ops);
Building the Autocomplete Query
The key functions are similarity() for ranking and % (the similarity operator) for filtering:
-- Basic autocomplete: find videos matching a partial query
SELECT video_id, title, channel_title, view_count,
similarity(title, 'react tutoral') AS sim_score
FROM videos
WHERE title % 'react tutoral' -- trigram similarity threshold
ORDER BY sim_score DESC, view_count DESC
LIMIT 10;
Notice "tutoral" (misspelled) still matches "tutorial" because the trigram overlap is high enough. The default similarity threshold is 0.3, which you can tune:
-- Lower threshold = more results (fuzzier matching)
SET pg_trgm.similarity_threshold = 0.2;
The Complete Autocomplete Function
Here's a production-ready PostgreSQL function that combines trigram similarity with prefix matching for the best user experience:
CREATE OR REPLACE FUNCTION search_autocomplete(
query TEXT,
max_results INT DEFAULT 10
)
RETURNS TABLE (
video_id VARCHAR,
title TEXT,
channel_title TEXT,
view_count BIGINT,
score FLOAT
) AS $$
BEGIN
RETURN QUERY
SELECT v.video_id, v.title, v.channel_title, v.view_count,
GREATEST(
similarity(v.title, query),
-- Boost exact prefix matches
CASE WHEN lower(v.title) LIKE lower(query) || '%' THEN 0.9 ELSE 0 END,
-- Boost word-start matches
CASE WHEN lower(v.title) LIKE '% ' || lower(query) || '%' THEN 0.7 ELSE 0 END
) AS score
FROM videos v
WHERE v.title % query
OR lower(v.title) LIKE lower(query) || '%'
ORDER BY score DESC, v.view_count DESC
LIMIT max_results;
END;
$$ LANGUAGE plpgsql STABLE;
Usage:
-- User types "pythn web" (typo intended)
SELECT * FROM search_autocomplete('pythn web');
-- Results:
-- "Python Web Development Tutorial" (sim: 0.45)
-- "Python Web Scraping Guide" (sim: 0.42)
-- "Python for Web Beginners" (sim: 0.38)
Python API Endpoint
Wire it up with a fast API endpoint:
from fastapi import FastAPI, Query
import asyncpg
app = FastAPI()
pool: asyncpg.Pool = None
@app.on_event("startup")
async def startup():
global pool
pool = await asyncpg.create_pool("postgresql://user:pass@localhost/dailywatch")
@app.get("/api/autocomplete")
async def autocomplete(q: str = Query(min_length=2, max_length=100)):
rows = await pool.fetch(
"SELECT * FROM search_autocomplete($1, $2)",
q, 8
)
return {
"query": q,
"suggestions": [
{
"video_id": row["video_id"],
"title": row["title"],
"channel": row["channel_title"],
"views": row["view_count"],
}
for row in rows
],
}
Performance Numbers
On DailyWatch's dataset of ~50,000 video titles:
| Approach | Query Time | Handles Typos |
|---|---|---|
LIKE '%query%' |
45ms | No |
pg_trgm GIN index |
3ms | Yes |
pg_trgm + prefix boost |
4ms | Yes |
The GIN index makes trigram searches over 10x faster than sequential scans, and the fuzzy matching catches the typos that users inevitably make on mobile.
Frontend Integration
Debounce the input to avoid hammering the API:
let debounceTimer;
document.getElementById('search-input').addEventListener('input', (e) => {
clearTimeout(debounceTimer);
debounceTimer = setTimeout(async () => {
if (e.target.value.length < 2) return;
const res = await fetch(`/api/autocomplete?q=${encodeURIComponent(e.target.value)}`);
const data = await res.json();
renderSuggestions(data.suggestions);
}, 200);
});
Trigram-based autocomplete is one of the highest-impact search features you can add. It turns a frustrating "no results" experience into an intuitive "did you mean..." experience, which is exactly what users expect from a modern video discovery platform.
This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.
Top comments (0)