It's extremely difficult to create a search engine that is typo-tolerant, effective, and efficient. Even if the desired item is in the database, a typographical error could cause the search to fail. Typesense could save a lot of time and effort by eliminating the need to build a search engine from the ground up. Users will be able to successfully use the app's search feature, resulting in a positive user experience. Typesense is a free, open-source typo-tolerant search engine for programmers that aims to cut down on the time it takes to perform effective and efficient searches. Click here for more information on Typesense.
So, in Typesense, you can rank your search results based on your preferences, so this article will go over how to display and rank the search results.
Typesense uses a simple tie-breaking sorting algorithm that relies on the Text match score, which is exposed as a special _text_match field and User-defined indexed numerical fields to rank search results for example based on the popularity, rating, and score. Not only that, but in their latest v0.23.0.rc17 version, they also added an awesome feature: it can now be ranked using user-defined indexed string fields for example name.
Typesense calculates a _text_match score based on the criteria listed below to rank documents based on text relevance :
- Frequency: The number of tokens shared by the search query and a text field. Documents with more overlapping tokens will be ranked higher than those with fewer.
- Edit distance: If a query token isn't found, Typesense searches for tokens within num typos characters of the query tokens. Documents with exact matches to the query tokens are ranked higher than those with longer edit distances.
- Proximity: Whether the query tokens are present in their full extent or are mixed in with other tokens in the field. Documents with query tokens that are right next to each other in a text field will be ranked higher than documents with query tokens that are far apart in a text field.
query_byfields: A document that matches on a field earlier in the list of "query by" fields is deemed more relevant than one that matches later in the list.
Field weights specified in
query_by_weightsfield: A document that matches a field with a higher score is thought to be more relevant than one that matches a field with a lower score.
Tie-breaking based search Ranking:
Many documents may contain the same exact tokens in a search query in some cases. Their
_text_match will be the same in this case as well. The tie can then be broken by using the user-defined indexed numerical and string fields. To use for ranking, you can specify up to two user-defined fields. Let's say we're looking for a Movie with the phrase "Ghost" in the title. If multiple movies contain the same exact words, the text match score for all of those documents will be the same. Up to two additional sort by fields could be specified to break the tie.
The above results ☝️ would be sorted as follows: The
_text_match score is used to sort all matching records. Sort documents by
imdb rating if they have the same text match score. If there is still a tie, sort them according to the date the movie was published for example,
Exact search results match ranking.
By default, Typesense considers a document to be the most relevant and prioritizes it if the search query exactly matches a field value. However, there may be times when this isn't the best course of action. When searching, set prioritize_exact_match=false to turn this exact results match ranking feature off.
Default Search order Ranking
If you don't specify a sort by parameter in your search request, the documents will be ranked first by the _text_match score, then by the default sorting field values specified in the collection's schema, and finally by document insertion order.
Strict Ordering / Hard Sorting search results
If you want to sort the documents solely by an indexed numerical or string field, such as ratings, name or genre, simply move the text match score criteria to the end, as shown below:
Promoting or Hiding Search Results
You can choose to pin or hide specific records in specific ranking positions based on their ID: Setting up Overrides, also known as Curation or customization, based on a search query using the
hidden_hits search parameters dynamically for example, if someone searches for some products, you can set up an override to pin a specific product with a good deal to the top of the search results.
Another common application of pinning results is in e-commerce marketing and promotion, where a vendor or marketer may want to curate the exact products that should appear next to each other for a given product category. One can use the
pinned_hits parameter to specify which records should appear in which position for each category page that the user is viewing. Also, one can easily let the internal users modify the
Category Page -> pinned_hits mapping in a CMS system, and then have your application pull down this mapping when a specific category page is rendered.
Tuning Typo Tolerance in search results
Typesense automatically corrects typographical errors for you right out of the box. However, there may be times when you need to disable typo tolerance or adjust its sensitivity, for example, zip code numbers, phone numbers and age. When searching, set
typo_tokens_threshold=0 to completely disable typo tolerance. You can also increase the sensitivity of typo tolerance by changing the values to higher numbers as needed. The min_len_1typo and min_len_2typo search parameters can be used to control typo tolerance based on the word length. By specifying multiple comma-separated values for
num_typos, you can adjust typo tolerance settings for individual fields. Set
num_typos=2,0,0,0 if you have
query_by=name, age, phone number, zip code and don't want typo tolerance on age, phone number or zip code.
Handling and ranking no search Results.
You might not want to show the user a No results found message in some cases if none of the user's search terms doesn't match any of the documents. You can have Typesense drop words/tokens from the user's search query one at a time and repeat the search to show results that are similar to the user's original query in such cases. The
drop_tokens_threshold search parameter, which has a default value of 1, controls this behaviour. If a search query returns only 1 or 0 results, Typesense will start dropping search keywords and repeat the search until at least 1 result is found. Therefore, set
drop_tokens_threshold=0 to disable this behaviour.
Typesense was built with several distinctive features primarily aimed at making the developer’s job easier while also giving the customer as well as users the ability to provide a better search experience as possible. This article may have been entertaining as well as instructive in terms of how to rank the search results in typesense. Join Aviyel’s community to learn more about the open source project, get tips on how to contribute, and join active dev groups.
Join Aviyel’s community to learn more about the open source project, get tips on how to contribute, and join active dev groups. Aviyel is a collaborative platform that assists open source project communities in monetizing and long-term sustainability. To know more visit Aviyel.com and find great blogs and events, just like this one! Sign up now for early access, and don’t forget to follow us on our socials!
Top comments (0)