ahmet gedik

Posted on Mar 26

PostgreSQL Full-Text Search for Video Metadata

#postgres #database #searchengine #webdev

Why Full-Text Search Matters for Video Platforms

When you have tens of thousands of video records with titles in multiple languages, LIKE queries stop being viable. At DailyWatch, we aggregate trending videos from 8 regions — US, UK, Germany, France, India, Brazil, Australia, Canada — and users expect instant, relevant search across all of them. PostgreSQL's built-in full-text search (FTS) gave us exactly what we needed without bolting on Elasticsearch.

Setting Up tsvector Columns

The core idea is converting text into a searchable tsvector — a sorted list of normalized lexemes. Rather than computing this on every query, we store it as a column:

ALTER TABLE videos ADD COLUMN search_vector tsvector;

UPDATE videos SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(channel_title, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'C');

Weight A is the highest priority. Matches in the title rank above matches in the description. This weighting turned out to be critical — without it, a video titled "Python Tutorial" would rank the same as one that merely mentioned Python in a long description.

Creating a GIN Index

Without an index, FTS scans every row. A GIN (Generalized Inverted Index) makes lookups nearly instant:

CREATE INDEX idx_videos_search ON videos USING GIN(search_vector);

On our dataset of ~15,000 videos, this brought search from 80ms down to 3ms. GIN indexes are larger than B-tree indexes (ours is about 4MB), but the speed tradeoff is worth it.

Querying with ts_rank

Here is the query we use in production:

SELECT video_id, title, channel_title,
       ts_rank(search_vector, query) AS rank
FROM videos,
     to_tsquery('english', 'python & tutorial') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

The @@ operator checks if the vector matches the query. ts_rank scores relevance using the weights we assigned. The & operator means AND — both terms must appear. You can also use | for OR and ! for NOT.

Keeping the Vector Updated

We use a trigger so the search vector stays in sync:

CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.channel_title, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'C');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_videos_search_update
    BEFORE INSERT OR UPDATE ON videos
    FOR EACH ROW EXECUTE FUNCTION update_search_vector();

Every INSERT or UPDATE automatically recalculates the vector. Zero application code needed.

Comparison with SQLite FTS5

DailyWatch originally ran on SQLite with FTS5. Here is an honest comparison:

Feature	PostgreSQL FTS	SQLite FTS5
Weighted ranking	Native (A/B/C/D)	Manual with rank function
Language support	30+ dictionaries	English by default
Index type	GIN, GiST	Shadow tables
Phrase search	tsquery operators	Quoted strings
Concurrent writes	Row-level locking	WAL mode only
Setup complexity	Moderate	Simple

For a single-server site with English content, FTS5 is perfectly adequate. We switched to PostgreSQL when we needed better multi-language support for our 8-region content and more flexible ranking.

Handling Multi-Language Content

Since DailyWatch indexes videos from Germany, France, Brazil, and India, we needed language-aware stemming:

UPDATE videos SET search_vector =
    setweight(to_tsvector(
        CASE region
            WHEN 'DE' THEN 'german'
            WHEN 'FR' THEN 'french'
            ELSE 'english'
        END,
        coalesce(title, '')
    ), 'A') ||
    setweight(to_tsvector('simple', coalesce(description, '')), 'C');

The simple dictionary for descriptions works as a fallback — it lowercases and splits on whitespace without language-specific stemming, which handles mixed-language text reasonably well.

Performance in Production

After indexing ~15,000 videos across 8 regions with GIN indexes:

Simple term search: 2-4ms
Multi-term AND query: 3-6ms
Phrase search: 4-8ms
Index rebuild (full): ~2 seconds

These numbers made PostgreSQL FTS a clear win for our use case. No external search service, no sync pipeline, just the database doing what it does best.

This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.

DEV Community