Yasser B.

Posted on Mar 30 • Originally published at rivestack.io

PostgreSQL Full Text Search: A Complete Guide

#sql #database #postgres #tutorial

There's a database already running in your stack. It has your users, your content, your transactions. And buried in that same PostgreSQL instance is a full text search engine you've probably never turned on.

PostgreSQL full text search has been production ready for over a decade. It handles stemming, stop words, multiple languages, weighted ranking, and trigram fuzzy matching. You don't need Elasticsearch for a search feature. You don't need Algolia if your data is already in Postgres. For most applications, especially those with under a few million documents, built-in full text search is the right call.

This guide covers everything you need to ship full text search in PostgreSQL: how the underlying model works, how to index correctly, how to rank results, and how it compares to vector search with pgvector.

How PostgreSQL Full Text Search Works

PostgreSQL doesn't search raw text. It converts text into a normalized representation called a tsvector, then matches queries expressed as tsquery objects. This two-step process is what makes it fast.

A tsvector is a sorted list of lexemes: normalized word forms that strip suffixes and reduce words to their base form. The word "running" becomes "run". "Postgres" becomes "postgr". Stop words like "the", "a", "an" are dropped entirely.

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

A tsquery is what you match against:

SELECT to_tsquery('english', 'jumping');
-- Returns: 'jump'

-- Boolean operators
SELECT to_tsquery('english', 'postgres & search');  -- AND
SELECT to_tsquery('english', 'postgres | mysql');   -- OR
SELECT to_tsquery('english', 'database & !oracle'); -- NOT

The match operator is @@:

SELECT to_tsvector('english', 'PostgreSQL is a powerful database')
  @@ to_tsquery('english', 'powerful');
-- Returns: true

Setting Up Full Text Search

Option 1: Generated column (recommended for PostgreSQL 12+)

ALTER TABLE articles
  ADD COLUMN search_vector TSVECTOR
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

setweight assigns priority to fields: 'A' (highest) to title, 'B' to body. Documents where the search term appears in the title rank higher.

Query with ranking:

SELECT id, title,
  ts_rank(search_vector, query) AS rank
FROM articles,
  to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Option 2: Trigger-maintained column

ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

CREATE FUNCTION articles_search_vector_trigger() RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_vector_update
  BEFORE INSERT OR UPDATE ON articles
  FOR EACH ROW EXECUTE FUNCTION articles_search_vector_trigger();

Indexing for Performance

Without an index, full text search does a full table scan. GIN (Generalized Inverted Index) is purpose built for tsvectors.

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

With this index, full text search queries return in milliseconds even on tables with millions of rows.

Ranking Results

SELECT id, title,
  ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

ts_rank_cd uses cover density (how close matching terms are to each other) and often gives better results for multi-word queries.

Snippet Generation

SELECT id, title,
  ts_headline(
    'english', body,
    to_tsquery('english', 'postgresql'),
    'MaxWords=50, MinWords=15, StartSel=<mark>, StopSel=</mark>'
  ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
LIMIT 5;

Note: ts_headline does not use the GIN index. Call it only on the final page of results, not before pagination.

Handling User Input Safely

-- websearch_to_tsquery: safe, handles Google-style syntax
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"full text search" postgres -oracle');

websearch_to_tsquery (PostgreSQL 11+) is the best default for user input. It's injection-safe, handles partial syntax, and supports quoted phrases and exclusions.

Full Text Search vs pgvector

They solve different problems:

Full text search finds documents containing specific words or phrases. Fast, precise, no ML model required.

pgvector finds semantically similar documents, even without shared keywords. Needs an embedding model.

Use case	Best approach
Blog/docs search	Full text search
Semantic Q&A, RAG	pgvector
E-commerce	Often both

Many systems use both, merging scores with Reciprocal Rank Fusion. For RAG pipelines, see our guides on getting started with pgvector and pgvector with Python.

Complete Production Setup

CREATE TABLE articles (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  search_vector TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED
);

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

CREATE FUNCTION search_articles(query_text TEXT, page_size INT DEFAULT 20, page_offset INT DEFAULT 0)
RETURNS TABLE (id BIGINT, title TEXT, snippet TEXT, rank FLOAT4) AS $$
DECLARE q TSQUERY := websearch_to_tsquery('english', query_text);
BEGIN
  RETURN QUERY
  SELECT a.id, a.title,
    ts_headline('english', a.body, q, 'MaxWords=40, MinWords=10, StartSel=<mark>, StopSel=</mark>'),
    ts_rank(a.search_vector, q)
  FROM articles a
  WHERE a.search_vector @@ q
  ORDER BY ts_rank(a.search_vector, q) DESC
  LIMIT page_size OFFSET page_offset;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM search_articles('postgresql replication', 20, 0);

Keyword based search, relevance ranking, HTML ready snippets, and pagination, all within PostgreSQL.

Try It on Rivestack

PostgreSQL full text search and pgvector work side by side on the same database. No separate infrastructure for keyword vs semantic search.

Rivestack gives you a fully managed PostgreSQL instance with pgvector, pg_trgm, and all standard extensions enabled by default. Try it free, no credit card required.

Originally published at rivestack.io

DEV Community