Playing With Search: Which Clause(s) Matched?

#mongodb #search #index #programming

This article is part of a series. Please see the complete series above.

Here's our playground of the post.

With this custom scoring trick, clauses that match a query can be determined with a little binary math.

Given these documents:

[
  {
    _id: 1,
    name: "Red"
  },
  {
    _id: 2,
    name: "Red Green"
  },
  {
    _id: 3,
    name: "Red Green Blue"
  }
]

Let's search for documents that match any of the terms "red", "green", or "blue" in a way we can determine which ones matched:

[
  {
    $search: {
      index: "default",
      compound: {
        should: [
          {
            text: {
              path: "name",
              query: "red",
              score: {
                constant: { value: 1 }
              }
            }
          },
          {
            text: {
              path: "name",
              query: "green",
              score: {
                constant: { value: 2 }
              }
            }
          },
                    {
            text: {
              path: "name",
              query: "blue",
              score: {
                constant: { value: 4 }
              }
            }
          }
        ]
      }
    }
  },
  {
    $addFields: {
      score: { $meta: "searchScore" }
    }
  }
]

This query overrides the score for each clause, setting the score for "red" matches to 1.0, matches of "green" are assigned the score 2.0, and matches for "blue" are scored 4.0. These values are the first values of the sequence of decimal values of binary digits. In binary, 1.0 is represented as 001, 2.0 is 010, and 4.0 is 100. Because each match has a unique digit position, we can reverse engineer the resulting relevancy searchScore to determine which clauses matched.

Let's analyze the search results:

[
  {
    "_id": 3,
    "name": "Red Green Blue",
    "score": 7
  },
  {
    "_id": 2,
    "name": "Red Green",
    "score": 3
  },
  {
    "_id": 1,
    "name": "Red",
    "score": 1
  }
]

A score of 7.0 is binary 111, meaning every clause matched. The score of 3.0 is binary 011, meaning that the first two ("red" and "green") clauses matched, but not the third ("blue").

See more about adjusting the scores.

DEV Community

Playing With Search: Which Clause(s) Matched?

Top comments (0)