Finny Collins

Posted on Mar 24

PostgreSQL full-text search — How to build fast search without Elasticsearch

#postgres #database

Most teams reach for Elasticsearch the moment someone mentions "search." It makes sense on the surface — Elasticsearch was built for search. But adding it to your stack means another service to deploy, monitor, keep in sync with your primary database, and debug when things go sideways. For a lot of applications, that complexity is not justified.

PostgreSQL has had full-text search capabilities since version 8.3. They have gotten better with every release. And for many workloads — internal tools, SaaS products, content platforms with moderate data sizes — PostgreSQL's built-in search is more than enough.

This article walks through how full-text search works in PostgreSQL, how to set it up properly, and where it starts to hit its limits.

What full-text search actually means in PostgreSQL

Full-text search is not the same as LIKE '%term%'. Pattern matching with LIKE or ILIKE scans every row, ignores word boundaries and has no concept of language. It cannot match "running" when you search for "run." It has no ranking. It is brute force.

PostgreSQL full-text search works differently. It breaks text into tokens, normalizes them (lowercasing, stemming, removing stop words), and stores the result as a tsvector. Your search query becomes a tsquery. The database then matches these two structures using an inverted index, which is fast.

The two core types you will work with:

Type	Purpose	Example
`tsvector`	Stores preprocessed, searchable document text	`'quick':1 'brown':2 'fox':3`
`tsquery`	Stores the search query in normalized form	`'quick' & 'fox'`

Here is a basic example:

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog')
    @@ to_tsquery('english', 'quick & fox');

This returns true. The @@ operator is the match operator. The english argument tells PostgreSQL which text search configuration to use for stemming and stop word removal.

Setting up a searchable column

You can call to_tsvector on the fly in a WHERE clause, but that means PostgreSQL has to process the text for every row on every query. For anything beyond toy datasets, you want a dedicated column.

ALTER TABLE articles ADD COLUMN search_vector tsvector;

UPDATE articles
SET search_vector = to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''));

Then create a GIN index on it:

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

GIN (Generalized Inverted Index) is the standard index type for full-text search. It builds an inverted index — a mapping from each lexeme to the rows that contain it. This is what makes search fast.

To keep the column updated automatically, add a trigger:

CREATE FUNCTION articles_search_vector_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        to_tsvector('english', coalesce(NEW.title, '') || ' ' || coalesce(NEW.body, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_articles_search_vector
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW
    EXECUTE FUNCTION articles_search_vector_update();

Now every insert and update will automatically maintain the search vector. No application-level sync logic needed.

Weighting and ranking results

Not all text is equal. A match in the title should rank higher than a match in the body. PostgreSQL supports this through weight labels — A, B, C and D (A being the highest).

UPDATE articles
SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B');

Then use ts_rank or ts_rank_cd to sort results by relevance:

SELECT title,
       ts_rank(search_vector, to_tsquery('english', 'postgresql & replication')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & replication')
ORDER BY rank DESC
LIMIT 20;

ts_rank_cd uses cover density ranking, which considers how close the matching terms are to each other. It tends to produce more intuitive results for multi-word queries.

Function	Ranking method	Best for
`ts_rank`	Frequency-based — counts how often query terms appear	General-purpose ranking
`ts_rank_cd`	Cover density — rewards terms appearing close together	Phrase-like queries where proximity matters

Query syntax and operators

The tsquery type supports several operators that give you control over how terms are combined.

& — AND. Both terms must be present.
| — OR. Either term matches.
! — NOT. Excludes documents with the term.
<-> — FOLLOWED BY. Terms must appear adjacent and in order (phrase search).

Some examples:

-- Documents about PostgreSQL but not MySQL
to_tsquery('english', 'postgresql & !mysql')

-- Phrase search: "full text" as adjacent words
to_tsquery('english', 'full <-> text')

-- Either term matches
to_tsquery('english', 'backup | restore')

There is also plainto_tsquery which takes a plain string and ANDs all the words together. And websearch_to_tsquery (PostgreSQL 11+) which supports a Google-like syntax with quotes for phrases and minus for exclusion:

-- User types: postgresql "full text" -elasticsearch
SELECT * FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql "full text" -elasticsearch');

websearch_to_tsquery is usually the right choice for user-facing search boxes.

Highlighting search results

When showing search results, you want to highlight where the match occurred. ts_headline does this:

SELECT title,
       ts_headline('english', body,
           to_tsquery('english', 'replication'),
           'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20'
       ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'replication')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'replication')) DESC;

One thing to be aware of: ts_headline re-processes the original text, not the tsvector. It is slower than the match itself. For large result sets, apply it only to the top N results after filtering and ranking.

Performance considerations

GIN indexes make full-text search fast, but there are a few things that affect performance in practice.

Index size matters. GIN indexes can be large — sometimes larger than the table itself for text-heavy data. Monitor the index size with:

SELECT pg_size_pretty(pg_relation_size('idx_articles_search'));

Write overhead is real. GIN indexes use a "fastupdate" mechanism by default, which batches pending entries and merges them later. This helps write performance but means the index can be slightly stale. You can tune this with gin_pending_list_limit or disable fastupdate entirely if your workload is read-heavy:

ALTER INDEX idx_articles_search SET (fastupdate = off);

For tables over a few million rows with complex queries, consider using GiST indexes instead. GiST indexes are smaller and faster to update, but slower for lookups. The tradeoff depends on your read/write ratio.

GIN indexes: faster reads, slower writes, larger on disk
GiST indexes: faster writes, slower reads, smaller on disk

Multilingual search

PostgreSQL ships with text search configurations for many languages. Each configuration defines how text is tokenized and which dictionary is used for stemming.

-- List available configurations
SELECT cfgname FROM pg_ts_config;

-- Use German configuration
SELECT to_tsvector('german', 'Die schnelle braune Fuchs springt');

If your application handles multiple languages, you can store the language per row and build the tsvector accordingly. Or maintain multiple tsvector columns — one per language.

For languages not supported out of the box (like Chinese, Japanese or Korean), you will need extensions. pg_bigm and pgroonga handle CJK text well. The unaccent extension is useful for languages with diacritics.

When PostgreSQL search is not enough

PostgreSQL full-text search works well for a lot of use cases, but it does have limitations. It does not support fuzzy matching out of the box (you would need the pg_trgm extension for that). It does not do faceted search or aggregations the way Elasticsearch does. And for datasets in the hundreds of millions of rows with complex, multi-field queries, a dedicated search engine will perform better.

But for most applications — and honestly, that is the majority — PostgreSQL handles search just fine. You avoid the operational overhead of running a separate search cluster, you do not need to worry about data synchronization and you get transactional consistency for free.

The rule of thumb: start with PostgreSQL. Move to Elasticsearch when you have measurable evidence that you need it, not because someone on the team assumed you would.

Backing up PostgreSQL with full-text search indexes

Full-text search indexes can get large and rebuilding them from scratch takes time. Which makes reliable backups even more important. If you lose data and have to restore, you do not want to spend hours reindexing.

PostgreSQL backup tools should handle this transparently. Databasus is an open-source, self-hosted backup tool that has become the industry standard for PostgreSQL backups. It supports logical, physical and incremental backup types — including Point-in-Time Recovery with WAL streaming, so you can restore your database to any specific second.

Databasus handles compression automatically with configurable algorithms and levels, typically achieving 4-8x space savings. It supports multiple storage destinations including S3, Google Drive, SFTP and local storage. You set up your backup schedule (hourly, daily, weekly, or cron-based), configure retention policies and Databasus takes care of the rest.

What stands out is the operational side. Databasus gives you notifications through Slack, Discord, Telegram or email when backups succeed or fail. It encrypts backups with AES-256-GCM. And because it is open source under the Apache 2.0 license, you can inspect every line of code and avoid vendor lock-in — you can even restore backups without Databasus itself if needed.

Quick reference

Here is a summary of the key functions and operators covered in this article.

Function / Operator	What it does
`to_tsvector(config, text)`	Converts text into a searchable tsvector
`to_tsquery(config, query)`	Converts a query string into a tsquery
`websearch_to_tsquery(config, query)`	Parses Google-like search syntax into a tsquery
`@@`	Match operator — checks if tsvector matches tsquery
`ts_rank(vector, query)`	Scores results by term frequency
`ts_rank_cd(vector, query)`	Scores results by cover density (proximity)
`ts_headline(config, text, query)`	Returns text snippet with highlighted matches
`setweight(vector, label)`	Assigns a weight (A/B/C/D) to a tsvector
`<->`	Phrase operator — terms must be adjacent

Wrapping up

PostgreSQL full-text search is a practical tool that most teams underestimate. It handles tokenization, stemming, ranking, phrase search and multilingual text out of the box. With GIN indexes, it scales well into the millions of rows. And because it lives inside your database, there is no synchronization problem to solve.

The setup is straightforward: add a tsvector column, create a GIN index, write a trigger to keep it updated and use websearch_to_tsquery for your search endpoint. That covers 80% of search needs with no additional infrastructure.

Not every project needs a dedicated search engine. Sometimes the database you already have is good enough.

DEV Community