DEV Community

Jesper Mayntzhusen
Jesper Mayntzhusen

Posted on

Searching with Umbraco Examine: Avoid these common filtering mistakes

With Umbracos Codegarden conference mere days away I am starting to feel the Codegarden spirit! And that has helped motivate me to blog a bit again, at the top of my blogpost ideas list is something interesting I found a few months back about how you can filter in Examine - Umbracos API layer on top of Lucene.

Setup

The starting point of this blogpost is an Umbraco 13 site with The Starter Kit.
And a simple search setup inspired by this docs article.

The starter kit has a People section, where people have tags assigned in a property called "department".
I've added my own "Person" with a nice AI generated image:

Image description

Here is a quick overview of the starting code:

PeopleController.cs
using ExamineTesting.Models;
using ExamineTesting.Services;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.ViewEngines;
using Umbraco.Cms.Core.Models.PublishedContent;
using Umbraco.Cms.Core.PublishedCache;
using Umbraco.Cms.Core.Web;
using Umbraco.Cms.Web.Common.Controllers;

namespace ExamineTesting.Controllers;

public class PeopleController : RenderController
{
    private readonly IPublishedValueFallback _publishedValueFallback;
    private readonly ISearchService _searchService;
    private readonly ITagQuery _tagQuery;

    public PeopleController(ILogger<RenderController> logger,
        ICompositeViewEngine compositeViewEngine,
        IUmbracoContextAccessor umbracoContextAccessor,
        IPublishedValueFallback publishedValueFallback,
        ISearchService searchService,
        ITagQuery tagQuery) : base(logger,
        compositeViewEngine,
        umbracoContextAccessor)
    {
        _publishedValueFallback = publishedValueFallback;
        _searchService = searchService;
        _tagQuery = tagQuery;
    }

    public override IActionResult Index()
    {
        var tags = HttpContext.Request.Query["tags"];

        var allTags = _tagQuery.GetAllContentTags();

        var searchResults = _searchService.SearchContentByTag(tags);

        // Create the view model and pass it to the view
        SearchViewModel viewModel = new(CurrentPage!, _publishedValueFallback)
        {
            SearchResults = searchResults.results,
            Tags = allTags,
            Query = searchResults.query
        };

        return CurrentTemplate(viewModel);
    }
}
Enter fullscreen mode Exit fullscreen mode

SearchService.cs
using Examine;
using Microsoft.Extensions.Primitives;
using Umbraco.Cms.Core.Models.PublishedContent;
using Umbraco.Cms.Web.Common;

namespace ExamineTesting.Services;

public class SearchService : ISearchService
{
    private readonly IExamineManager _examineManager;
    private readonly UmbracoHelper _umbracoHelper;

    public SearchService(IExamineManager examineManager,
        UmbracoHelper umbracoHelper)
    {
        _examineManager = examineManager;
        _umbracoHelper = umbracoHelper;
    }

    public (IEnumerable<IPublishedContent> results, string query) SearchContentByTag(StringValues tags)
    {
        IEnumerable<string> ids = Array.Empty<string>();
        var queryString = string.Empty;
        if (_examineManager.TryGetIndex("ExternalIndex",
                out IIndex? index))
        {
            var q = index
                .Searcher
                .CreateQuery("content")
                .NodeTypeAlias("person");

            if (tags.Any())
            {
                q.And().Field("department", tags.FirstOrDefault());
            }

            ids = q
                .Execute()
                .Select(x => x.Id);

            queryString = q.ToString();
        }

        var results = new List<IPublishedContent>();

        foreach (var id in ids)
        {
            results.Add(_umbracoHelper.Content(id));
        }

        return (results, queryString);
    }
}
Enter fullscreen mode Exit fullscreen mode

people.cshtml
@using Microsoft.AspNetCore.Mvc.TagHelpers
@inherits Umbraco.Cms.Web.Common.Views.UmbracoViewPage<ExamineTesting.Models.SearchViewModel>
@{
    Layout = "master.cshtml";
}
@{
    void SocialLink(string content, string service)
    {
        if (!string.IsNullOrEmpty(content))
        {
            ; //semicolon needed otherwise <a> cannot be resolved
            <a class="employee-grid__item__contact-item" href="http://@(service).com/@content">@service</a>
        }
    }
}

@Html.Partial("~/Views/Partials/SectionHeader.cshtml")

<section class="section">

    <div class="container">
        <div>
            <span>@Model.Query</span>
            <form action="@Model.Url()" method="get">
                <label for="tags">Choose a tag:</label>

                <select name="tags" id="tags">
                    @foreach (var tag in Model.Tags)
                    {
                        <option value="@tag?.Text">@tag?.Text</option>
                    }
                </select>
                <button>Search</button>
            </form>
        </div>
        <div class="employee-grid">
            @foreach (Person person in Model.SearchResults)
            {

                <div class="employee-grid__item">
                    <div class="employee-grid__item__image" style="background-image: url('@person.Photo?.Url()')"></div>
                    <div class="employee-grid__item__details">
                        <h3 class="employee-grid__item__name">@person.Name</h3>
                        @if (!string.IsNullOrEmpty(person.Email))
                        {
                            <a href="mailto:@person.Email" class="employee-grid__item__email">@person.Email</a>
                        }
                        <div class="employee-grid__item__contact">
                            @{ SocialLink(person.FacebookUsername, "Facebook"); }
                            @{ SocialLink(person.TwitterUsername, "Twitter"); }
                            @{ SocialLink(person.LinkedInUsername, "LinkedIn"); }
                            @{ SocialLink(person.InstagramUsername, "Instagram"); }
                        </div>
                    </div>
                </div>
            }
        </div>
    </div>

</section>
Enter fullscreen mode Exit fullscreen mode

So at this point we have a dropdown with all available tags, and can choose one, click search and it filters into that specific tag. (To help debugging I've also added the Lucene query string output):

Image description

The problem

Now at this point is where I would have often stopped in the past. As I showed in the image just above it works, the query adds +department:denmark, the + in Lucene queries means AND, and it returns the only person in our set of people that has the tag "denmark".

However, what if I try to find the test person - King Arthur - who I added earlier based on his tag "Fairytale kingdom":

Image description

Suddenly a bunch of unexpected results are added, and while these are all great people - none of them have the "Fairytale kingdom" tag.

And we can also see that the apparent way that examine treats a query string with a space is to split it in two and do an OR search: +(department:fairytale department:kingdom) (the space in Lucene means OR)

After a bit of digging in the data, it occurs that all of these extra people have the tag "United Kingdom", thus they are treated as hit based on the "kingdom" part.

Handling multiword phrases in filters

The fix for this problem is actually quite easy - in our code we have this which is what adds the tag filter:

q.And().Field("department", tags.FirstOrDefault());
Enter fullscreen mode Exit fullscreen mode

Now without an IDE it may be a bit tough to understand what tags are - but in this case the var tags is of the type StringValues which is what we get back from the HTTP Query.

It allows you to fx have a query like this:

domain.com?q=searchterm&color=red&color=blue

Where if you then tried to get the color query value you would get StringValues with 2 values - red and blue.

In our case we only ever pass along one tag, so adding a .FirstOrDefault() to it will return it as a string.

The Examine .Field() method has two versions, one that takes the query as a string, and one that takes it as an IExamineValue:

Image description

An IExamineValue is basically a search term with additional logic applied. So if we fx wanted to boost this specific part of our queries importance we can add the boost extension which then turns out string into an ExamineValue: tags.FirstOrDefault().Boost(10).

But there is another string extension that takes a search string and turns it into a "phrase match" string, where the string will basically need to be an exact match otherwise it wont work.

We can achieve this by changing it to tags.FirstOrDefault().Escape()

A new search at this point shows the query now has it in quotes: +department:"Fairytale Kingdom"

However, it also doesn't return King Arthur as it should:

Image description

It does in the Examine dashboard in the backoffice though:

Image description

The problem is reported and discussed here: https://github.com/Shazwazza/Examine/issues/329

So rather than going into it I will just ensure it works, but making sure it is indexed and searched for in lowercase:

We add a quick TransformingIndexValues event where we can take the tags values from the "department" property, and lowercase them and save into a new "departmentLower" field:

using Examine;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Notifications;
using Umbraco.Cms.Web.Common.PublishedModels;

namespace ExamineTesting.Notifications;

public class ExternalIndexTransformations : INotificationHandler<UmbracoApplicationStartedNotification>
{
    private readonly IExamineManager _examineManager;

    public ExternalIndexTransformations(IExamineManager examineManager)
    {
        _examineManager = examineManager;
    }

    public void Handle(UmbracoApplicationStartedNotification notification)
    {
        if (!_examineManager.TryGetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName, out var index))
        {
            throw new InvalidOperationException(
                $"No index found by name {Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName}");
        }

        index.TransformingIndexValues += IndexOnTransformingIndexValues;
    }

    private void IndexOnTransformingIndexValues(object? sender, IndexingItemEventArgs e)
    {
        if(e.ValueSet.ItemType is not Person.ModelTypeAlias) return;

        var hasDepartment = e.ValueSet.Values.TryGetValue("department", out var values);
        if (!hasDepartment) return;

        var tagsLowerCase = values!.Select(x => x.ToString()?.ToLowerInvariant());

        var valuesDictionary = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());
        var newValues = new List<object>();

        foreach (var tag in tagsLowerCase)
        {
            if (!string.IsNullOrWhiteSpace(tag))
            {
                newValues.Add(tag);
            }
        }
        valuesDictionary.Add("departmentLower", newValues );

        e.SetValues(valuesDictionary.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
    }
}
Enter fullscreen mode Exit fullscreen mode

After a quick reindex we can see the new field:

Image description

Now after changing our tags filter in the searchservice to look into that field and also lowercase the searchterm:

q.And().Field("departmentLower", tags.FirstOrDefault()?.ToLowerInvariant().Escape());
Enter fullscreen mode Exit fullscreen mode

Now we get the search result that we expected:

Image description

Wohoo now the filtering works just as expected!! 🎉

Or does it... 🤔

Partial word matches

What if you were to add a dragon, and that dragon is tagged with "Fairytale":

Image description

The exact phrase "fairytale" matches the value "fairytale kingdom", so now we get extra results again:

Image description

In some cases that may be exactly what we want - however in others it will be considered a wrong match.

So how can we handle this case?

In my search for a solution I stumbled upon this old blogpost about lucene phrase matching from 2012, so shout out to Mark Leighton Fisher who describes an easy workaround in this blogpost!

Basically what he suggests is to wrap your indexed value and searchterm in a delimeter word - in his example he uses the word lucenematch.

The reason this works is that we go from having the indexed values:

fairytale
fairytale kingdom

to

lucenematch fairytale lucenematch
lucenematch fairytale kingdom lucenematch

And while fairytale will be a match to fairytale kingdom.
lucenematch fairytale lucenematch will not match lucenematch fairytale kingdom lucenematch

So I'll add a quick string extension to add the delimiter word:

namespace ExamineTesting.Extensions;

public static class SearchExtensions
{
    private const string DelimiterWord = "lucenematch";

    public static string AddLuceneDelimiterWord(this string value)
    {
        return $"{DelimiterWord} {value} {DelimiterWord}";
    }
}
Enter fullscreen mode Exit fullscreen mode

And in the TransformingIndexValues event I will ensure that we add it to the values before indexing:

foreach (var tag in tagsLowerCase)
{
    if (!string.IsNullOrWhiteSpace(tag))
    {
        newValues.Add(tag.AddLuceneDelimiterWord());
    }
}
Enter fullscreen mode Exit fullscreen mode

If I then go and reindex the externalIndex I can see it is added to our departmentLower field:

Image description

Then we can add it to our searchService as well:

if (tags.Any())
{
    q.And().Field("departmentLower", tags.FirstOrDefault()?.AddLuceneDelimiterWord().ToLowerInvariant().Escape());
}
Enter fullscreen mode Exit fullscreen mode

And when trying the search again we can see the query now adds the delimiter word, and the results are back to only the specific one we wanted:

Image description

Outro

Thanks for following along! Please let me know if this was helpful to you 🙂

Also feel free to reach out to me on Mastodon: https://umbracocommunity.social/@Jmayn

Top comments (0)