Why Full-Text Search Matters for Video Platforms
When you have tens of thousands of video records with titles in multiple languages, LIKE queries stop being viable. At DailyWatch, we aggregate trending videos from 8 regions — US, UK, Germany, France, India, Brazil, Australia, Canada — and users expect instant, relevant search across all of them. PostgreSQL's built-in full-text search (FTS) gave us exactly what we needed without bolting on Elasticsearch.
Setting Up tsvector Columns
The core idea is converting text into a searchable tsvector — a sorted list of normalized lexemes. Rather than computing this on every query, we store it as a column:
ALTER TABLE videos ADD COLUMN search_vector tsvector;
UPDATE videos SET search_vector =
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(channel_title, '')), 'B') ||
setweight(to_tsvector('english', coalesce(description, '')), 'C');
Weight A is the highest priority. Matches in the title rank above matches in the description. This weighting turned out to be critical — without it, a video titled "Python Tutorial" would rank the same as one that merely mentioned Python in a long description.
Creating a GIN Index
Without an index, FTS scans every row. A GIN (Generalized Inverted Index) makes lookups nearly instant:
CREATE INDEX idx_videos_search ON videos USING GIN(search_vector);
On our dataset of ~15,000 videos, this brought search from 80ms down to 3ms. GIN indexes are larger than B-tree indexes (ours is about 4MB), but the speed tradeoff is worth it.
Querying with ts_rank
Here is the query we use in production:
SELECT video_id, title, channel_title,
ts_rank(search_vector, query) AS rank
FROM videos,
to_tsquery('english', 'python & tutorial') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
The @@ operator checks if the vector matches the query. ts_rank scores relevance using the weights we assigned. The & operator means AND — both terms must appear. You can also use | for OR and ! for NOT.
Keeping the Vector Updated
We use a trigger so the search vector stays in sync:
CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(NEW.channel_title, '')), 'B') ||
setweight(to_tsvector('english', coalesce(NEW.description, '')), 'C');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_videos_search_update
BEFORE INSERT OR UPDATE ON videos
FOR EACH ROW EXECUTE FUNCTION update_search_vector();
Every INSERT or UPDATE automatically recalculates the vector. Zero application code needed.
Comparison with SQLite FTS5
DailyWatch originally ran on SQLite with FTS5. Here is an honest comparison:
| Feature | PostgreSQL FTS | SQLite FTS5 |
|---|---|---|
| Weighted ranking | Native (A/B/C/D) | Manual with rank function |
| Language support | 30+ dictionaries | English by default |
| Index type | GIN, GiST | Shadow tables |
| Phrase search | tsquery operators | Quoted strings |
| Concurrent writes | Row-level locking | WAL mode only |
| Setup complexity | Moderate | Simple |
For a single-server site with English content, FTS5 is perfectly adequate. We switched to PostgreSQL when we needed better multi-language support for our 8-region content and more flexible ranking.
Handling Multi-Language Content
Since DailyWatch indexes videos from Germany, France, Brazil, and India, we needed language-aware stemming:
UPDATE videos SET search_vector =
setweight(to_tsvector(
CASE region
WHEN 'DE' THEN 'german'
WHEN 'FR' THEN 'french'
ELSE 'english'
END,
coalesce(title, '')
), 'A') ||
setweight(to_tsvector('simple', coalesce(description, '')), 'C');
The simple dictionary for descriptions works as a fallback — it lowercases and splits on whitespace without language-specific stemming, which handles mixed-language text reasonably well.
Performance in Production
After indexing ~15,000 videos across 8 regions with GIN indexes:
- Simple term search: 2-4ms
- Multi-term AND query: 3-6ms
- Phrase search: 4-8ms
- Index rebuild (full): ~2 seconds
These numbers made PostgreSQL FTS a clear win for our use case. No external search service, no sync pipeline, just the database doing what it does best.
This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.
Top comments (0)