Atlas Search scoring calculation and why Lucene scores differ from Elasticsearch and pg_(text)search

#mongodb #elasticsearch #postgres #text

With @james_blackwoodsewell_58 we were comparing the BM25 text search scores between MongoDB Atlas (Lucene), ElasticSearch (Lucene) and ParadeDB (using Tantivy) which provide the same ordering but MongoDB Atlas shows constantly a lower score by a factor of 2.2:

Revisiting "Text Search With MongoDB (BM25 TF-IDF) and PostgreSQL"

Back in October Franck Pachot from MongoDB (love your work) wrote a post comparing text search in MongoDB and PostgreSQL (with both the built-in tsvector and ParadeDB's pg_search extension). I'm not going to recap his whole post, but basically Mongo seemed to behave exactly how it should returning B

linkedin.com

It was the occasion for me to look at the score details which gives the calculation details for the score.

Test case

I've built the same test case as in my previous blog:

db.articles.drop();
db.articles.deleteMany({});
db.articles.insertMany([
 { description : "🍏 🍌 🍊" },                // short, 1 🍏
 { description : "🍎 🍌 🍊" },                // short, 1 🍎
 { description : "🍎 🍌 🍊 🍎" },             // larger, 2 🍎
 { description : "🍎 🍌 🍊 🍊 🍊" },          // larger, 1 🍎
 { description : "🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰" },  // large, 1 🍎
 { description : "🍎 🍎 🍎 🍎 🍎 🍎" },       // large, 6 🍎
 { description : "🍎 🍌" },                 // very short, 1 🍎
 { description : "🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎" },  // large, 1 🍎
 { description : "🍎 🍎 🍌 🍌 🍌" },          // shorter, 2 🍎
]);
db.articles.createSearchIndex("default",
  { mappings: { dynamic: true } }
);

Score with details

I ran the same query, adding scoreDetails: true to the search stage, and scoreDetails: { $meta: "searchScoreDetails" } } to the projection stage:


db.articles.aggregate([
  {
    $search: {
      text: {  query: ["🍎", "🍏"],  path: "description"  },
      index: "default",
      scoreDetails: true
    }
  },
  {  $project: {  
        _id: 0,  description: 1,  
        score: { $meta: "searchScore" },  
        scoreDetails: { $meta: "searchScoreDetails" }  }  },
  { $sort: { score: -1 } }  ,
  { $limit: 1 }
])

Here is the result:

mdb> db.articles.aggregate([
...   {
...     $search: {
...       text: {  query: ["🍎", "🍏"],  path: "description"  },
...       index: "default",
...       scoreDetails: true
...     }
...   },
...   {  $project: {  _id: 0,  description: 1,  score: { $meta: "searchScore" },  scoreDetails: { $meta: "searchScoreDetails" }  }  },
...   { $sort: { score: -1 } }  ,
...   { $limit: 1 }
... ])
[
  {
    description: '🍏 🍌 🍊',
    score: 1.0242118835449219,
    scoreDetails: {
      value: 1.0242118835449219,
      description: 'sum of:',
      details: [
        {
          value: 1.0242118835449219,
          description: '$type:string/description:🍏 [BM25Similarity], result of:',
          details: [
            {
              value: 1.0242118835449219,
              description: 'score(freq=1.0), computed as boost * idf * tf from:',
              details: [
                {
                  value: 1.8971199989318848,
                  description: 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:',
                  details: [
                    {
                      value: 1,
                      description: 'n, number of documents containing term',
                      details: []
                    },
                    {
                      value: 9,
                      description: 'N, total number of documents with field',
                      details: []
                    }
                  ]
                },
                {
                  value: 0.5398772954940796,
                  description: 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:',
                  details: [
                    {
                      value: 1,
                      description: 'freq, occurrences of term within document',
                      details: []
                    },
                    {
                      value: 1.2000000476837158,
                      description: 'k1, term saturation parameter',
                      details: []
                    },
                    {
                      value: 0.75,
                      description: 'b, length normalization parameter',
                      details: []
                    },
                    {
                      value: 3,
                      description: 'dl, length of field',
                      details: []
                    },
                    {
                      value: 4.888888835906982,
                      description: 'avgdl, average length of field',
                      details: []
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  }
]

So all is there. Here is the scoring breakdown for "🍏 🍌 🍊", which produced a score of 1.0242118835449219.

IDF calculation (inverse document frequency)

Search result:

Number of documents containing the term: n = 1
Total number of documents with this field: N = 9

idf = log(1 + (N - n + 0.5) / (n + 0.5))
    = log(1 + (9 - 1 + 0.5) / (1 + 0.5))
    = log(6.666666666666667)`  
    ≈ 1.8971199989318848

TF calculation (term frequency)

Parameters are the Lucene defaults:

Term saturation parameter: k1 = 1.2000000476837158
Length normalization parameter: b = 0.75

Document field statistics:

Average length of the field: avgdl = 44 / 9 ≈ 4.888888835906982
Occurrences of the term in this document: freq = 1

tf = freq / (freq + k1 * (1 - b + b * dl / avgdl)) 
   = 1 / (1 + 1.2000000476837158 × (0.25 + 0.75 × (3 / 4.888888835906982))) 
   ≈ 0.5398772954940796

Final score

Parameter:

Boost: 1.0

score = boost × idf × tf 
      = 1.0 × 1.8971199989318848 × 0.5398772954940796 
      ≈ 1.0242118835449219

That confirms that Atlas Search uses the same scoring as Lucene https://github.com/apache/lucene/blob/releases/lucene/10.3.2/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L183

What about ElasticSearch and Tantivy

Eight years ago, Lucene removed the (k1 + 1) factor in LUCENE-8563. For k1 = 1.2, this change reduces the score by a factor of 2.2 from that version onward. Tantivy and Elasticsearch apparently still use the old formula, while Atlas Search uses the updated one, which explains the observed differences in scoring.

Conclusion

MongoDB Atlas Search indexes are built on Lucene and use its parameters and scoring formulas. When you compare Atlas Search with other Lucene‑based text search engines that use older Lucene scoring formulas, you may see score differences of roughly a factor of 2.2. However, this has no practical impact because scores are only used to order results, so the relative ranking of documents remains the same.

Text search scores can seem magical, but they are deterministic and based on open-source formulas. In MongoDB, you can include the score details option in a text search query to inspect all the parameters and formulas behind the score.