Franck Pachot for MongoDB

Posted on Sep 19 • Edited on Oct 10

Text Search With MongoDB (BM25 TF-IDF) and PostgreSQL

#database #mongodb #postgres #elasticsearch

MongoDB search indexes provide full‑text search capabilities directly within MongoDB, allowing complex queries to be run without copying data to a separate search system. Initially deployed in Atlas, MongoDB’s managed service, search indexes are now also part of the community edition. This post compares the default full‑text search behaviour between MongoDB and PostgreSQL, using a simple example to illustrate the ranking algorithm.

Setup: a small dataset

I’ve inserted nine small documents, each consisting of different fruits, using emojis to make it more visual. The 🍎 and 🍏 emojis represent our primary search terms. They appear at varying frequencies in documents of different lengths.

db.articles.deleteMany({});

db.articles.insertMany([
 { description : "🍏 🍌 🍊" },                // short, 1 🍏
 { description : "🍎 🍌 🍊" },                // short, 1 🍎
 { description : "🍎 🍌 🍊 🍎" },             // larger, 2 🍎
 { description : "🍎 🍌 🍊 🍊 🍊" },          // larger, 1 🍎
 { description : "🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰" },  // large, 1 🍎
 { description : "🍎 🍎 🍎 🍎 🍎 🍎" },       // large, 6 🍎
 { description : "🍎 🍌" },                 // very short, 1 🍎
 { description : "🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎" },  // large, 1 🍎
 { description : "🍎 🍎 🍌 🍌 🍌" },          // shorter, 2 🍎
]);

To enable dynamic indexing, I created a MongoDB search index without specifying any particular field names:

db.articles.createSearchIndex("default",
  { mappings: { dynamic: true } }
);

I created the equivalent on PostgreSQL:

DROP TABLE IF EXISTS articles;
CREATE TABLE articles (
    id BIGSERIAL PRIMARY KEY,
    description TEXT
);

INSERT INTO articles(description) VALUES
('🍏 🍌 🍊'),
('🍎 🍌 🍊'),
('🍎 🍌 🍊 🍎'),
('🍎 🍌 🍊 🍊 🍊'),
('🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰'),
('🍎 🍎 🍎 🍎 🍎 🍎'),
('🍎 🍌'),
('🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎'),
('🍎 🍎 🍌 🍌 🍌');

Since text search needs multiple index entries for each row, I set up a Generalized Inverted Index (GIN) and use tsvector to extract and index the relevant tokens.

CREATE INDEX articles_fts_idx
  ON articles USING GIN (to_tsvector('simple', description))
;

MongoDB text search (Lucene BM25):

I use my custom search index to find articles containing either 🍎 or 🍏 in their descriptions. The results are sorted by relevance score and displayed as follows:

db.articles.aggregate([
  { $search: { text: { query: ["🍎", "🍏"], path: "description" }, index: "default" } },
  { $project: { _id: 0, score: { $meta: "searchScore" }, description: 1 } },
  { $sort: { score: -1 } }
]).forEach( i=> print(i.score.toFixed(3).padStart(5, " "),i.description) )

Here are the results, presented in order of best to worst match:

1.024 🍏 🍌 🍊
0.132 🍎 🍎 🍎 🍎 🍎 🍎
0.107 🍎 🍌 🍊 🍎
0.101 🍎 🍎 🍌 🍌 🍌
0.097 🍎 🍌
0.088 🍎 🍌 🍊
0.073 🍎 🍌 🍊 🍊 🍊
0.059 🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰
0.059 🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎

All documents were retrieved by this search since each contains a red or green apple. However, they are assigned different scores:

Multiple appearances boost the score: When a document contains the search term more than once, its ranking increases compared to those with only a single appearance. That's why documents featuring several 🍎 are ranked higher than those containing only one.
Rarity outweighs quantity: When a term like 🍎 appears in every document, it has less impact than a rare term, such as 🍏. Therefore, even if 🍏 only appears once, the document containing it ranks higher than others with multiple 🍎. In this model, rarity carries more weight than mere frequency.
Diminishing returns on term frequency: Each extra occurrence of a term adds less to the relevance score. For instance, increasing 🍎 from one to six times (from 🍎 🍌 to 🍎 🍎 🍎 🍎 🍎 🍎) boosts the score, but not by a factor of six. The effect of term repetition diminishes as the count rises.
Document length matters: A term that appears only once is scored higher in a short document than in a long one. That's why 🍎 🍌 ranks higher than 🍎 🍌 🍊, which itself ranks higher than 🍎 🍌 🍊 🍊 🍊.

MongoDB Atlas Search indexes are powered by Lucene’s BM25 algorithm, a refinement of the classic TF‑IDF model:

Term frequency (TF): More occurrences of a term in a document increase its relevance score, but with diminishing returns.
Inverse document frequency (IDF): Terms that appear in fewer documents receive higher weighting.
Length normalization: Matches in shorter documents contribute more to relevance than the same matches in longer documents.

To demonstrate the impact of IDF, I added several documents that do not contain any of the apples I'm searching for.

const fruits = [ "🍐","🍊","🍋","🍌","🍉","🍇","🍓","🫐",         
                 "🥝","🥭","🍍","🥥","🍈","🍅","🥑","🍆",  
                 "🍋","🍐","🍓","🍇","🍈","🥭","🍍","🍑",  
                 "🥝","🫐","🍌","🍉","🥥","🥑","🥥","🍍" ];
function randomFruitSentence(min=3, max=8) {
  const len = Math.floor(Math.random() * (max - min + 1)) + min;
  return Array.from({length: len}, () => fruits[Math.floor(Math.random()*fruits.length)]).join(" ");
}
db.articles.insertMany(
  Array.from({length: 500}, () => ({ description: randomFruitSentence() }))
);

db.articles.aggregate([
  { $search: { text: { query: ["🍎", "🍏"], path: "description" }, index: "default" } },
  { $project: { _id: 0, score: { $meta: "searchScore" }, description: 1 } },
  { $sort: { score: -1 } }
]).forEach( i=> print(i.score.toFixed(3).padStart(5, " "),i.description) )

3.365 🍎 🍎 🍎 🍎 🍎 🍎
3.238 🍏 🍌 🍊
2.760 🍎 🍌 🍊 🍎
2.613 🍎 🍎 🍌 🍌 🍌
2.506 🍎 🍌
2.274 🍎 🍌 🍊
1.919 🍎 🍌 🍊 🍊 🍊
1.554 🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰
1.554 🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎

Although the result set is unchanged, the score has increased and the frequency gap between 🍎 and 🍏 has narrowed. As a result, 🍎 🍎 🍎 🍎 🍎 🍎 now ranks higher than 🍏 🍌 🍊, since the inverse document frequency (IDF) of 🍏 does not fully offset its term frequency (TF) within a single document. Crucially, changes made in other documents can influence the score of any given document, unlike in traditional indexes, where changes in one document do not impact others' index entries.

PostgreSQL text search (TF only):

Here is the result in PostgreSQL:

SELECT ts_rank_cd(  

        to_tsvector('simple', description)
     ,  
        to_tsquery('simple', '🍎 | 🍏')  

       ) AS score, description  
FROM articles  
WHERE
       to_tsvector('simple', description) 
    @@ 
       to_tsquery('simple', '🍎 | 🍏')  

ORDER BY score DESC;

It retrieves the same documents, but with many having the same score, even with different patterns:

 score |       description
-------+-------------------------
   0.6 | 🍎 🍎 🍎 🍎 🍎 🍎
   0.2 | 🍎 🍌 🍊 🍎
   0.2 | 🍎 🍎 🍌 🍌 🍌
   0.1 | 🍏 🍌 🍊
   0.1 | 🍎 🍌
   0.1 | 🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎
   0.1 | 🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰
   0.1 | 🍎 🍌 🍊
   0.1 | 🍎 🍌 🍊 🍊 🍊
(9 rows)

With PostgreSQL text search, only the term frequency (TF) matters, and is a direct multiplicator of the score: six apples rank three times higher than two, and six times higher than one.

There's some possible normalization available with additiona flags:

SELECT ts_rank_cd(
         to_tsvector('simple', description),
         to_tsquery('simple', '🍎 | 🍏')  ,
            0 -- (the default) ignores the document length
         |  1 -- divides the rank by 1 + the logarithm of the document length
    --   |  2 -- divides the rank by the document length
    --   |  4 -- divides the rank by the mean harmonic distance between extents (this is implemented only by ts_rank_cd)
         |  8 -- divides the rank by the number of unique words in document
    --   | 16 -- divides the rank by 1 + the logarithm of the number of unique words in document
    --   | 32 -- divides the rank by itself + 1
       ) AS score,
       description
FROM articles
WHERE to_tsvector('simple', description) @@ to_tsquery('simple', '🍎 | 🍏')
ORDER BY score DESC
;
    score    |       description
-------------+-------------------------
    0.308339 | 🍎 🍎 🍎 🍎 🍎 🍎
 0.055811062 | 🍎 🍎 🍌 🍌 🍌
  0.04551196 | 🍎 🍌
  0.04142233 | 🍎 🍌 🍊 🍎
 0.024044918 | 🍏 🍌 🍊
 0.024044918 | 🍎 🍌 🍊
 0.018603688 | 🍎 🍌 🍊 🍊 🍊
 0.005688995 | 🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰
 0.005688995 | 🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎
(9 rows)

This penalizes longer documents and those with more unique terms. Still, it doesn't consider other documents like IDF.

PostgreSQL fFull text search scoring with ts_rank_cd is based on term frequency and proximity. It does not compute inverse document frequency, so scores do not change as the corpus changes. Normalization flags can penalize long documents or those with many unique terms, but they are length-based adjustments, not true IDF, like we have in TF‑IDF or BM25‑style search engine.

ParadeDB with pg_search (Tantivy BM25)

PostgreSQL popularity is not only due to its features but also its extensibility and ecosystem. The pg_search extension adds functions and operators that use BM25 indexes (Tantivy, a Rust-based search library inspired by Lucene). It is easy to test with ParadeDB:

docker run --rm -it paradedb/paradedb bash

POSTGRES_PASSWORD=x docker-entrypoint.sh postgres &

psql -U postgres

The extension is installed in version 0.18.4:

postgres=# \dx
                                        List of installed extensions
          Name          | Version |   Schema   |                        Description
------------------------+---------+------------+------------------------------------------------------------
 fuzzystrmatch          | 1.2     | public     | determine similarities and distance between strings
 pg_cron                | 1.6     | pg_catalog | Job scheduler for PostgreSQL
 pg_ivm                 | 1.9     | pg_catalog | incremental view maintenance on PostgreSQL
 pg_search              | 0.18.4  | paradedb   | pg_search: Full text search for PostgreSQL using BM25
 plpgsql                | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgis                | 3.6.0   | public     | PostGIS geometry and geography spatial types and functions
 postgis_tiger_geocoder | 3.6.0   | tiger      | PostGIS tiger geocoder and reverse geocoder
 postgis_topology       | 3.6.0   | topology   | PostGIS topology spatial types and functions
 vector                 | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods
(9 rows)

I created and inserted the same as I did above on PostgreSQL and created the BM25 index:

CREATE INDEX search_idx ON articles
       USING bm25 (id, description)
       WITH (key_field='id')
;

We can query using the @@@ operator and rank with paradedb.score(id). Unlike PostgreSQL’s built‑in @@, which uses query‑local statistics, @@@ computes scores using global IDF and Lucene’s BM25 length normalization—so adding unrelated documents can still change the scores.

SELECT description, paradedb.score(id) AS score
FROM articles
WHERE description @@@ '🍎' OR description @@@ '🍏'
ORDER BY score DESC, description;

 description | score
-------------+-------
(0 rows)

The result is empty. Using emoji as terms can lead to inconsistent tokenization results, so I replaced them with text labels instead:

UPDATE articles SET description 
 = replace(description, '🍎', 'Gala');
UPDATE articles SET description 
 = replace(description, '🍏', 'Granny Smith');
UPDATE articles SET description 
 = replace(description, '🍊', 'Orange');

This time, the scoring is more precise and takes into account the term frequency within the document (TF), the term’s rarity across the entire indexed corpus (IDF), along with a length normalization factor to prevent longer documents from having an unfair advantage:

SELECT description, paradedb.score(id) AS score
FROM articles
WHERE description @@@ 'Gala' OR description @@@ 'Granny Smith'
ORDER BY score DESC, description;

          description          |   score
-------------------------------+------------
 Granny Smith 🍌 Orange        |  3.1043208
 Gala Gala Gala Gala Gala Gala | 0.79529095
 Gala Gala 🍌 🍌 🍌            |  0.7512194
 Gala 🍌                       | 0.69356775
 Gala 🍌 Orange Gala           | 0.63589364
 Gala 🍌 Orange                |  0.5195716
 Gala 🍌 Orange 🌴 🫐 🍈 🍇   |  0.5195716
 🍌 Orange 🌴 🫐 🍈 🍇   Gala |  0.5195716
 Gala 🍌 Orange Orange Orange  | 0.34597924
(9 rows)

It looks very similar to the MongoDB result. Lucene may give a slight edge to terms that appear more frequently (🍎 🍌 🍊 🍎), even if the document length penalty is higher. Tantivy might apply length normalization in a slightly different way, so the shorter (🍎 🍌) gets a bigger boost.

Here is the execution plan in ParadeDB:

EXPLAIN(ANALYZE, BUFFERS, VERBOSE)
SELECT description, paradedb.score(id) AS score
FROM articles
WHERE description @@@ 'Gala' OR description @@@ 'Granny Smith'
ORDER BY score DESC, description
;

 Gather Merge  (cost=1010.06..1010.68 rows=5 width=31) (actual time=5.893..8.237 rows=8 loops=1)
   Output: description, (score(id))
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=333
   ->  Sort  (cost=10.04..10.05 rows=3 width=31) (actual time=0.529..0.540 rows=3 loops=3)
         Output: description, (score(id))
         Sort Key: (score(articles.id)) DESC, articles.description
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=306
         Worker 0:  actual time=0.548..0.558 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           Buffers: shared hit=64
         Worker 1:  actual time=0.596..0.607 rows=0 loops=1
           Sort Method: quicksort  Memory: 25kB
           Buffers: shared hit=64
         ->  Parallel Custom Scan (ParadeDB Scan) on public.articles  (cost=10.00..10.02 rows=3 width=31) (actual time=0.367..0.444 rows=3 loops=3)
               Output: description, score(id)
               Table: articles
               Index: search_idx
               Segment Count: 5
               Heap Fetches: 8
               Virtual Tuples: 0
               Invisible Tuples: 0
               Parallel Workers: {"-1":{"query_count":0,"claimed_segments":[{"id":"a17b19a2","deleted_docs":0,"max_doc":9},{"id":"3fa71653","deleted_docs":6,"max_doc":6},{"id":"3c243f8e","deleted_docs":1,"max_doc":1},{"id":"badbcd7e","deleted_docs":8,"max_doc":8},{"id":"add79d5d","deleted_docs":9,"max_doc":9}]}}
               Exec Method: NormalScanExecState
               Scores: true
               Tantivy Query: {"boolean":{"should":[{"with_index":{"query":{"parse_with_field":{"field":"description","query_string":"Gala","lenient":null,"conjunction_mode":null}}}},{"with_index":{"query":{"parse_with_field":{"field":"description","query_string":"Granny Smith","lenient":null,"conjunction_mode":null}}}}]}}
               Buffers: shared hit=216
               Worker 0:  actual time=0.431..0.441 rows=0 loops=1
                 Buffers: shared hit=19
               Worker 1:  actual time=0.447..0.457 rows=0 loops=1
                 Buffers: shared hit=19

This PostgreSQL plan shows ParadeDB executing a parallel full-text search with Tantivy. The Parallel Custom Scan node issues a BM25 query (Gala OR "Granny Smith") to the segmented Tantivy index. Each worker searches its segments, scores, fetches matching descriptions, and sorts locally. The Gather Merge then combines these into a single ranked list. Since search and scoring are done within Tantivy across CPU cores and results are fetched from shared memory, the query is quick and efficient.

In the execution plan, the Tantivy query closely resembles a MongoDB search query. Specifically, "boolean" in Tantivy is equivalent to "compound" in MongoDB, "should" matches "should", "parse_with_field.field" is similar to "path".

PostgreSQL’s built-in search only provides basic, local term frequency scoring. To get a full-featured text search that can be used in an application's search boxes, it can be extended with third-party tools like ParadeDB's pg_search.

Conclusion

Relevance scoring in text search can differ widely between systems because each uses its own ranking algorithms and analyzers. To better visualize my results in these tests, I used emojis and opted for the simplest definitions. I selected PostgreSQL's to_tsvector('simple') configuration to prevent language-specific processing, while for MongoDB Atlas Search, I used the default dynamic mapping.

MongoDB Atlas Search (and now MongoDB Community Edition) uses Lucene’s BM25 algorithm, combining:

Term frequency (TF): Frequent terms in a document boost scores, but with diminishing returns.
Inverse document frequency (IDF): Rare terms across the corpus get higher weight.
Length normalization: Matches in shorter documents are weighted more than the same matches in longer ones

PostgreSQL’s full-text search (ts_rank_cd()) evaluates only term frequency and position, overlooking other metrics like IDF. For more advanced features such as BM25, extensions like ParadeDB’s pg_search are needed, which require extra configuration and are not always available on managed platforms. PostgreSQL offers a modular approach, allowing extensions to add advanced ranking algorithms, such as BM25. MongoDB provides built‑in, BM25‑based, full‑text search in both Atlas and the Community Edition.

The next post provides more insights on the internals and how the score is calculated: