Manish

Posted on Oct 3

Skip Elasticsearch: Build Blazing-Fast Full-Text Search Right in Supabase

#postgres #supabase #database #fulltext

Full-text search is a powerful database feature that allows you to look for specific words or phrases within your text field instead of requiring an exact match of the entire field. For example, you can search for "Python REST" in job descriptions and still match text like "design and develop RESTful APIs using Python and FastAPI". A simple SQL WHERE job_description LIKE '%Python REST%' query would miss this because it only finds that exact phrase in that exact order. Traditional queries require precise column matches; on the other hand, full-text search finds any documents containing your terms and even ranks them by relevance. It powers common search features, such as searching blog posts or emails in your inbox, with just a few keywords.

Modern applications handle massive amounts of text, and users expect instant Google-like results when they type queries like "bluetooth headphones" while searching for products. To achieve this without extra infrastructure, Postgres's built-in full-text capabilities offer a major advantage over external tools like Elasticsearch. By keeping your search indexes inside the database, they stay perfectly synchronized with your other structured data through database transactions. This approach effectively eliminates the complexity of maintaining a separate search tool and reduces both operational overhead and costs.

In this article, I explain what full-text search is and how to implement it directly in Supabase using practical examples.

Understanding Full-Text Search in Postgres

Supabase's managed Postgres service gives you direct access to Postgres's built-in full-text search features. Internally, Postgres converts each natural-language document into a search vector (tsvector, an internal format that stores normalized words and their positions) and turns your keyword search input into a search query (tsquery, the parsed form of your keywords). Postgres applies weighted scoring and fuzzy matching to find results that are similar to your search terms, even when they don't match exactly, using techniques such as Soundex for phonetic similarity and Levenshtein distance for character-level similarity. It also handles multiple languages and provides functions for ranking and highlighting search results.

The diagram below gives a high-level overview of how your documents and queries flow through indexing, matching, and scoring to produce the final ranked results:

Full-text search is designed to efficiently find and rank documents that contain human-language text, such as blog posts, product descriptions, or user-generated content. Unlike simple string matching, it breaks down and analyzes text to understand the structure of words so users can search more naturally, even with partial matches or varied word forms.

Let's take a look at a few key concepts that make this possible.

Lexemes: The Building Blocks of Search

When Postgres processes a block of text, it first normalizes it. This involves removing punctuation, ignoring common stop words (such as "and" and "the"), converting everything to lowercase, and reducing words to their root forms (known as stemming). These root words are called lexemes.

For example, "running", "ran", and "runs" all become the lexeme run.

This process ensures that different forms of the same word still work correctly in your search.

tsvector and tsquery: Index and Query Formats

Postgres uses two special data types to handle full-text search:

tsvector: This is how a document is stored for search. It contains a list of lexemes, sometimes with position information. You can think of it as a preprocessed searchable version of your text field.
tsquery: This represents the search input. When a user types a query, Postgres converts it into a structured form (like run & fast) that can be matched against the tsvector.

Matching is done between tsquery and tsvector, not the original text.

Search Configurations: Supporting Multiple Languages

Postgres supports multiple search configurations, which define how text is broken down and interpreted. For example, using the 'english' configuration will apply stemming rules and stop-word lists specific to English. There are also built-in configurations for other languages—like French, Spanish, and German—each one tuned for linguistic accuracy.

Based on your application requirements, you can explicitly specify which configuration to use when building a tsvector or parsing a tsquery, like the following:

to_tsvector('english', 'The quick brown fox')

How Supabase Handles Full-Text Search

Wirh Supabase, you can create tsvector columns using generated columns or triggers that automatically update when source text changes, and create Generalized Inverted Index (GIN) indexes for fast querying. Supabase supports features like ranking search results and offers extensions like PGroonga for multilingual search.

Additionally, Supabase also supports multicolumn searches by combining data into single searchable indexes, as you'll see in the next section. So you get production-grade search features like result ranking, partial word matching, and support for multiple languages built right into Supabase.

Did you know? GIN is a specialized Postgres index type designed specifically for searching within composite data types like text arrays and tsvectors. RUM indexes are an enhanced alternative to GIN indexes that store additional details like word positions and timestamps, allowing for faster phrase searches and more accurate relevance ranking based on text distance. Supabase supports RUM indexes through an extension, making them a good choice for applications that need advanced search ranking, although they have slower index build and insert performance compared to GIN.

Setting Up Full-Text Search in Supabase

Let's build a practical example using a blog platform where users need to search through articles by title, content, and tags. This is a perfect use case for full-text search because users expect to find articles by typing keywords like "react hooks tutorial" rather than remembering exact titles. When designing your schema, identify columns that contain human-readable text that users will search through. These are typically content fields, descriptions, titles, and tags.

The key to fast full-text search is creating a GIN on your searchable content. GIN indexes, which were mentioned earlier, work like a book's index, mapping each lexeme to all the documents containing it. This makes searches considerably fast even across millions of records. Instead of scanning every row, Postgres can instantly jump to documents containing your search terms. The diagram below illustrates how GIN indexes process documents to extract lexemes and build an inverted index that maps each term to its containing documents:

Now, here's how you set up full-text search for the blog example. You can create the immutable wrapper function articles_search_vector(…) to combine multiple text fields; this tells Postgres the function always returns the same output for the same inputs, which is required for generated columns. Use a generated column that automatically combines multiple text fields into a searchable tsvector, and finally, add a GIN index for optimal performance:

-- Create the articles table
CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  tags TEXT[],
  author TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Create an immutable wrapper function for the search vector
CREATE OR REPLACE FUNCTION articles_search_vector(title TEXT, content TEXT, tags TEXT[])
RETURNS tsvector AS $$
BEGIN
  RETURN to_tsvector('english', 
    coalesce(title, '') || ' ' || 
    coalesce(content, '') || ' ' || 
    coalesce(array_to_string(tags, ' '), '')
  );
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Add a generated column that combines searchable fields
ALTER TABLE articles 
ADD COLUMN search_vector tsvector 
GENERATED ALWAYS AS (articles_search_vector(title, content, tags)) STORED;

-- Create a GIN index for fast searches
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

You can run these SQL commands directly in Supabase's SQL Editor or through any Postgres client connected to your Supabase database.

Note: Instead of using a generated column, you can also use a database trigger that runs a small function to populate the search_vector whenever a row is inserted or updated. However, generated columns are often a cleaner and simpler option since they're declarative and inherently maintain transactional consistency.

Here's how the articles table would look in Postgres with the generated column:

id	title	content	tags	author	created_at	search_vector
1	Getting Started with React	React is a popular JavaScript library for building user interfaces…	{javascript,react,frontend}	John Doe	2025-07-16 10:30:00	'build':8 'get':1 'interfac':12 'javascript':6,13 'librari':7 'popular':5 'react':4,14 'start':2 'user':11 'frontend':15
2	Advanced Python Tips	Here are some advanced techniques for Python developers…	{python,programming,tips}	Jane Smith	2025-08-07 14:20:00	'advanc':1,7 'develop':10 'program':12 'python':2,9 'techniqu':8 'tip':3,13
3	Database Design Best Practices	Learn how to design efficient database schemas…	{database,sql,design}	Bob Wilson	2025-09-25 09:15:00	'best':4 'databas':1,7 'design':3,9 'effici':8 'learn':6 'practic':5 'schema':9 'sql':8

The search_vector column shows the tsvector format, where each lexeme is followed by the positions where it appears in the combined text. For example, 'react':4,14 means the word "react" appears at positions 4 and 14 in the processed text. Notice how Postgres reduces words like "getting" and "practices" to their root forms as "get" and "practic" in the search_vector to improve matching across word variations.

The to_tsvector('english', text) function does the heavy lifting: It normalizes your text by removing punctuation, converting to lowercase, removing stop words, and stemming words to their root forms (so "running" becomes "run"). The coalesce(…) functions handle null values by returning an empty string if any field is null, ensuring the search vector can always be built even when some fields are missing data. The generated column automatically updates whenever you insert or update records, keeping your search index perfectly synchronized. With this setup, you're ready to perform fast searches across your entire content.

Basic Full-Text Search Queries with Supabase

Now that the search index is ready, you can start querying. Postgres uses the to_tsquery() function to convert your search terms into the structured format needed for matching. Just like documents are converted to the tsvector format, your search input gets converted to the tsquery format. The @@ operator checks if a search query matches against a text search vector.

The simplest search looks for articles containing specific keywords. You can also combine multiple search terms using AND (&) and OR (|) operators to create more precise queries. For example, searching for 'react & javascript' finds articles containing both terms, while 'react | python' finds articles containing either term. You can also exclude specific terms from your search using the NOT operator (!), which is helpful when you want to find articles about programming but exclude certain languages or topics.

Here are some practical search examples using the blog articles table:

-- Simple keyword search
SELECT title, author 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'react');

-- Search with AND logic (both terms must be present)
SELECT title, author 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'react & javascript');

-- Search with OR logic (either term can be present)
SELECT title, author 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'python | javascript');

-- Exclude terms using the NOT operator (!)
SELECT title, author 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'programming & !python');

These queries return results instantly thanks to the GIN index created earlier. The @@ operator matches your tsquery against the preprocessed search_vector, making even complex searches across large data sets remarkably fast. Remember that stemming works both ways: Searching for "develop" will match articles containing "developer", "development", or "developing". For more full-text search operators and functions, refer to this official documentation.

Advanced Full-Text Search Techniques in Supabase

Real search applications need to rank results by relevance, not just list matches. Postgres provides two main ranking functions: ts_rank() calculates relevance based on how frequently search terms appear, while ts_rank_cd() (cover density) considers how close together your search terms appear in the document. For example, there is a preference for results where "Python" and "microservices" occur next to each other as part of a phrase over results where they are scattered in different paragraphs. Results with higher scores appear first, giving users the most relevant content at the top.

Beyond just ranking by term frequency, you can make certain parts of your documents more important than others using setweight(). For example, matches in titles should typically rank higher than matches in body content since titles are more descriptive of the document's main topic. Postgres supports four weight classes—A (highest), B, C, and D (lowest)—allowing you to prioritize certain fields over others.

Here's how you can implement weighted ranking that prioritizes title matches over content matches:

-- Create an immutable wrapper function for the weighted search vector
CREATE OR REPLACE FUNCTION articles_weighted_search_vector(title TEXT, content TEXT, tags TEXT[])
RETURNS tsvector AS $$
BEGIN
  RETURN setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
         setweight(to_tsvector('english', coalesce(content, '')), 'B') ||
         setweight(to_tsvector('english', coalesce(array_to_string(tags, ' '), '')), 'D');
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Create a weighted search vector (A = title, B = content, D = tags)
ALTER TABLE articles 
ADD COLUMN weighted_search_vector tsvector 
GENERATED ALWAYS AS (articles_weighted_search_vector(title, content, tags)) STORED;

-- Create index on the weighted vector
CREATE INDEX idx_articles_weighted_search ON articles USING GIN(weighted_search_vector);

-- Search with relevance ranking
SELECT title, author, ts_rank(weighted_search_vector, query) AS rank
FROM articles, to_tsquery('english', 'react & javascript') query
WHERE weighted_search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

The articles_weighted_search_vector() function creates a search vector that prioritizes different parts of the article content: It gives titles the highest weight ('A'), the content medium weight ('B'), and the tags the lowest weight ('D'). The resulting weighted_search_vector ranks search matches in titles higher than matches in content or tags. Here's how the weighted_search_vector column would look:

weighted_search_vector
'applic':19B 'build':12B 'compon':22B 'creat':16B 'dynam':17B 'frontend':25 'get':1A 'interfac':14B 'javascript':9B,23 'librari':10B 'popular':8B 'react':4A,5B,24 'reusabl':21B 'start':2A 'user':13B 'web':18B
'advanc':1A,7B 'best':20B 'code':18B 'develop':11B 'effici':15B 'maintain':17B 'practic':21B 'program':23 'python':2A,10B,22 'techniqu':8B 'tip':3A,24 'use':19B 'write':13B
'across':19B 'applic':21B 'best':3A 'complex':20B 'data':17B 'databas':1A,10B,22 'design':2A,8B,24 'effici':9B 'integr':18B 'learn':5B 'maintain':16B 'practic':4A 'scale':13B 'schema':11B 'sql':23 'well':14B

This approach ensures that articles with your search terms in the title rank higher than those with the same terms buried in the content (Google uses this approach for its search results). The ts_rank() function automatically considers the weights when calculating relevance scores, delivering more intuitive search results to your users.

Optimizing Full-Text Search Performance

Now that you've covered the basic and advanced search techniques, let's explore some practical tips to avoid common pitfalls and keep your search efficient in production.

Use a GIN index on your tsvector column: Skipping the GIN index (or using a generic index) is a common mistake. A GIN index allows Postgres to quickly look up matching documents without scanning the entire table.
Set the correct language configuration: When generating the search vector, choose the right language (eg 'french' or 'german') with to_tsvector(). This ensures words are stemmed and filtered correctly, which is essential if your application serves non-English users.
Keep your search vector updated: As explained earlier in this article, use generated columns with immutable wrapper functions to update the vector automatically when content changes. Alternatively, automate updates with Supabase database triggers so your search index always stays in sync without manual effort.
Precompute results with materialized views: For frequently searched content that doesn't change often, use materialized views to store precomputed search results and refresh them periodically.
Partition large tables: For large data sets, partition tables by date or category to keep indexes smaller and run queries faster. This is especially useful for blogs or news articles, where recent content is searched more often than older entries.
Monitor and refine queries with the Supabase Performance Advisor: Use Performance Advisor to track slow search queries and get optimization suggestions.

Conclusion

You've now covered the essentials of full-text search and how to implement it efficiently using Supabase and Postgres. From working with tsvector and tsquery to using GIN indexes, generated columns, and weighted ranking, you have all the necessary knowledge to build fast, accurate, and fully integrated full-text search without any external dependency— no need to set up a separate Elasticsearch cluster.

DEV Community