DEV Community

ahmet gedik
ahmet gedik

Posted on

PostgreSQL Full-Text Search for Video Metadata

Why Full-Text Search Matters for Video Platforms

When you have tens of thousands of video records with titles in multiple languages, LIKE queries stop being viable. At DailyWatch, we aggregate trending videos from 8 regions — US, UK, Germany, France, India, Brazil, Australia, Canada — and users expect instant, relevant search across all of them. PostgreSQL's built-in full-text search (FTS) gave us exactly what we needed without bolting on Elasticsearch.

Setting Up tsvector Columns

The core idea is converting text into a searchable tsvector — a sorted list of normalized lexemes. Rather than computing this on every query, we store it as a column:

ALTER TABLE videos ADD COLUMN search_vector tsvector;

UPDATE videos SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(channel_title, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'C');
Enter fullscreen mode Exit fullscreen mode

Weight A is the highest priority. Matches in the title rank above matches in the description. This weighting turned out to be critical — without it, a video titled "Python Tutorial" would rank the same as one that merely mentioned Python in a long description.

Creating a GIN Index

Without an index, FTS scans every row. A GIN (Generalized Inverted Index) makes lookups nearly instant:

CREATE INDEX idx_videos_search ON videos USING GIN(search_vector);
Enter fullscreen mode Exit fullscreen mode

On our dataset of ~15,000 videos, this brought search from 80ms down to 3ms. GIN indexes are larger than B-tree indexes (ours is about 4MB), but the speed tradeoff is worth it.

Querying with ts_rank

Here is the query we use in production:

SELECT video_id, title, channel_title,
       ts_rank(search_vector, query) AS rank
FROM videos,
     to_tsquery('english', 'python & tutorial') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

The @@ operator checks if the vector matches the query. ts_rank scores relevance using the weights we assigned. The & operator means AND — both terms must appear. You can also use | for OR and ! for NOT.

Keeping the Vector Updated

We use a trigger so the search vector stays in sync:

CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.channel_title, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'C');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_videos_search_update
    BEFORE INSERT OR UPDATE ON videos
    FOR EACH ROW EXECUTE FUNCTION update_search_vector();
Enter fullscreen mode Exit fullscreen mode

Every INSERT or UPDATE automatically recalculates the vector. Zero application code needed.

Comparison with SQLite FTS5

DailyWatch originally ran on SQLite with FTS5. Here is an honest comparison:

Feature PostgreSQL FTS SQLite FTS5
Weighted ranking Native (A/B/C/D) Manual with rank function
Language support 30+ dictionaries English by default
Index type GIN, GiST Shadow tables
Phrase search tsquery operators Quoted strings
Concurrent writes Row-level locking WAL mode only
Setup complexity Moderate Simple

For a single-server site with English content, FTS5 is perfectly adequate. We switched to PostgreSQL when we needed better multi-language support for our 8-region content and more flexible ranking.

Handling Multi-Language Content

Since DailyWatch indexes videos from Germany, France, Brazil, and India, we needed language-aware stemming:

UPDATE videos SET search_vector =
    setweight(to_tsvector(
        CASE region
            WHEN 'DE' THEN 'german'
            WHEN 'FR' THEN 'french'
            ELSE 'english'
        END,
        coalesce(title, '')
    ), 'A') ||
    setweight(to_tsvector('simple', coalesce(description, '')), 'C');
Enter fullscreen mode Exit fullscreen mode

The simple dictionary for descriptions works as a fallback — it lowercases and splits on whitespace without language-specific stemming, which handles mixed-language text reasonably well.

Performance in Production

After indexing ~15,000 videos across 8 regions with GIN indexes:

  • Simple term search: 2-4ms
  • Multi-term AND query: 3-6ms
  • Phrase search: 4-8ms
  • Index rebuild (full): ~2 seconds

These numbers made PostgreSQL FTS a clear win for our use case. No external search service, no sync pipeline, just the database doing what it does best.


This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.

Top comments (0)