With Umbracos Codegarden conference mere days away I am starting to feel the Codegarden spirit! And that has helped motivate me to blog a bit again, at the top of my blogpost ideas list is something interesting I found a few months back about how you can filter in Examine - Umbracos API layer on top of Lucene.
Setup
The starting point of this blogpost is an Umbraco 13 site with The Starter Kit.
And a simple search setup inspired by this docs article.
The starter kit has a People section, where people have tags assigned in a property called "department".
I've added my own "Person" with a nice AI generated image:
Here is a quick overview of the starting code:
PeopleController.cs
using ExamineTesting.Models;
using ExamineTesting.Services;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.ViewEngines;
using Umbraco.Cms.Core.Models.PublishedContent;
using Umbraco.Cms.Core.PublishedCache;
using Umbraco.Cms.Core.Web;
using Umbraco.Cms.Web.Common.Controllers;
namespace ExamineTesting.Controllers;
public class PeopleController : RenderController
{
private readonly IPublishedValueFallback _publishedValueFallback;
private readonly ISearchService _searchService;
private readonly ITagQuery _tagQuery;
public PeopleController(ILogger<RenderController> logger,
ICompositeViewEngine compositeViewEngine,
IUmbracoContextAccessor umbracoContextAccessor,
IPublishedValueFallback publishedValueFallback,
ISearchService searchService,
ITagQuery tagQuery) : base(logger,
compositeViewEngine,
umbracoContextAccessor)
{
_publishedValueFallback = publishedValueFallback;
_searchService = searchService;
_tagQuery = tagQuery;
}
public override IActionResult Index()
{
var tags = HttpContext.Request.Query["tags"];
var allTags = _tagQuery.GetAllContentTags();
var searchResults = _searchService.SearchContentByTag(tags);
// Create the view model and pass it to the view
SearchViewModel viewModel = new(CurrentPage!, _publishedValueFallback)
{
SearchResults = searchResults.results,
Tags = allTags,
Query = searchResults.query
};
return CurrentTemplate(viewModel);
}
}
SearchService.cs
using Examine;
using Microsoft.Extensions.Primitives;
using Umbraco.Cms.Core.Models.PublishedContent;
using Umbraco.Cms.Web.Common;
namespace ExamineTesting.Services;
public class SearchService : ISearchService
{
private readonly IExamineManager _examineManager;
private readonly UmbracoHelper _umbracoHelper;
public SearchService(IExamineManager examineManager,
UmbracoHelper umbracoHelper)
{
_examineManager = examineManager;
_umbracoHelper = umbracoHelper;
}
public (IEnumerable<IPublishedContent> results, string query) SearchContentByTag(StringValues tags)
{
IEnumerable<string> ids = Array.Empty<string>();
var queryString = string.Empty;
if (_examineManager.TryGetIndex("ExternalIndex",
out IIndex? index))
{
var q = index
.Searcher
.CreateQuery("content")
.NodeTypeAlias("person");
if (tags.Any())
{
q.And().Field("department", tags.FirstOrDefault());
}
ids = q
.Execute()
.Select(x => x.Id);
queryString = q.ToString();
}
var results = new List<IPublishedContent>();
foreach (var id in ids)
{
results.Add(_umbracoHelper.Content(id));
}
return (results, queryString);
}
}
people.cshtml
@using Microsoft.AspNetCore.Mvc.TagHelpers
@inherits Umbraco.Cms.Web.Common.Views.UmbracoViewPage<ExamineTesting.Models.SearchViewModel>
@{
Layout = "master.cshtml";
}
@{
void SocialLink(string content, string service)
{
if (!string.IsNullOrEmpty(content))
{
; //semicolon needed otherwise <a> cannot be resolved
<a class="employee-grid__item__contact-item" href="http://@(service).com/@content">@service</a>
}
}
}
@Html.Partial("~/Views/Partials/SectionHeader.cshtml")
<section class="section">
<div class="container">
<div>
<span>@Model.Query</span>
<form action="@Model.Url()" method="get">
<label for="tags">Choose a tag:</label>
<select name="tags" id="tags">
@foreach (var tag in Model.Tags)
{
<option value="@tag?.Text">@tag?.Text</option>
}
</select>
<button>Search</button>
</form>
</div>
<div class="employee-grid">
@foreach (Person person in Model.SearchResults)
{
<div class="employee-grid__item">
<div class="employee-grid__item__image" style="background-image: url('@person.Photo?.Url()')"></div>
<div class="employee-grid__item__details">
<h3 class="employee-grid__item__name">@person.Name</h3>
@if (!string.IsNullOrEmpty(person.Email))
{
<a href="mailto:@person.Email" class="employee-grid__item__email">@person.Email</a>
}
<div class="employee-grid__item__contact">
@{ SocialLink(person.FacebookUsername, "Facebook"); }
@{ SocialLink(person.TwitterUsername, "Twitter"); }
@{ SocialLink(person.LinkedInUsername, "LinkedIn"); }
@{ SocialLink(person.InstagramUsername, "Instagram"); }
</div>
</div>
</div>
}
</div>
</div>
</section>
So at this point we have a dropdown with all available tags, and can choose one, click search and it filters into that specific tag. (To help debugging I've also added the Lucene query string output):
The problem
Now at this point is where I would have often stopped in the past. As I showed in the image just above it works, the query adds +department:denmark
, the +
in Lucene queries means AND, and it returns the only person in our set of people that has the tag "denmark".
However, what if I try to find the test person - King Arthur - who I added earlier based on his tag "Fairytale kingdom":
Suddenly a bunch of unexpected results are added, and while these are all great people - none of them have the "Fairytale kingdom" tag.
And we can also see that the apparent way that examine treats a query string with a space is to split it in two and do an OR search: +(department:fairytale department:kingdom)
(the space in Lucene means OR)
After a bit of digging in the data, it occurs that all of these extra people have the tag "United Kingdom", thus they are treated as hit based on the "kingdom" part.
Handling multiword phrases in filters
The fix for this problem is actually quite easy - in our code we have this which is what adds the tag filter:
q.And().Field("department", tags.FirstOrDefault());
Now without an IDE it may be a bit tough to understand what tags
are - but in this case the var tags
is of the type StringValues
which is what we get back from the HTTP Query.
It allows you to fx have a query like this:
domain.com?q=searchterm&color=red&color=blue
Where if you then tried to get the color query value you would get StringValues with 2 values - red and blue.
In our case we only ever pass along one tag, so adding a .FirstOrDefault() to it will return it as a string.
The Examine .Field()
method has two versions, one that takes the query as a string, and one that takes it as an IExamineValue
:
An IExamineValue is basically a search term with additional logic applied. So if we fx wanted to boost this specific part of our queries importance we can add the boost extension which then turns out string into an ExamineValue: tags.FirstOrDefault().Boost(10)
.
But there is another string extension that takes a search string and turns it into a "phrase match" string, where the string will basically need to be an exact match otherwise it wont work.
We can achieve this by changing it to tags.FirstOrDefault().Escape()
A new search at this point shows the query now has it in quotes: +department:"Fairytale Kingdom"
However, it also doesn't return King Arthur as it should:
It does in the Examine dashboard in the backoffice though:
The problem is reported and discussed here: https://github.com/Shazwazza/Examine/issues/329
So rather than going into it I will just ensure it works, but making sure it is indexed and searched for in lowercase:
We add a quick TransformingIndexValues event where we can take the tags values from the "department" property, and lowercase them and save into a new "departmentLower" field:
using Examine;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Notifications;
using Umbraco.Cms.Web.Common.PublishedModels;
namespace ExamineTesting.Notifications;
public class ExternalIndexTransformations : INotificationHandler<UmbracoApplicationStartedNotification>
{
private readonly IExamineManager _examineManager;
public ExternalIndexTransformations(IExamineManager examineManager)
{
_examineManager = examineManager;
}
public void Handle(UmbracoApplicationStartedNotification notification)
{
if (!_examineManager.TryGetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName, out var index))
{
throw new InvalidOperationException(
$"No index found by name {Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName}");
}
index.TransformingIndexValues += IndexOnTransformingIndexValues;
}
private void IndexOnTransformingIndexValues(object? sender, IndexingItemEventArgs e)
{
if(e.ValueSet.ItemType is not Person.ModelTypeAlias) return;
var hasDepartment = e.ValueSet.Values.TryGetValue("department", out var values);
if (!hasDepartment) return;
var tagsLowerCase = values!.Select(x => x.ToString()?.ToLowerInvariant());
var valuesDictionary = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());
var newValues = new List<object>();
foreach (var tag in tagsLowerCase)
{
if (!string.IsNullOrWhiteSpace(tag))
{
newValues.Add(tag);
}
}
valuesDictionary.Add("departmentLower", newValues );
e.SetValues(valuesDictionary.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
}
}
After a quick reindex we can see the new field:
Now after changing our tags filter in the searchservice to look into that field and also lowercase the searchterm:
q.And().Field("departmentLower", tags.FirstOrDefault()?.ToLowerInvariant().Escape());
Now we get the search result that we expected:
Wohoo now the filtering works just as expected!! 🎉
Or does it... 🤔
Partial word matches
What if you were to add a dragon, and that dragon is tagged with "Fairytale":
The exact phrase "fairytale" matches the value "fairytale kingdom", so now we get extra results again:
In some cases that may be exactly what we want - however in others it will be considered a wrong match.
So how can we handle this case?
In my search for a solution I stumbled upon this old blogpost about lucene phrase matching from 2012, so shout out to Mark Leighton Fisher who describes an easy workaround in this blogpost!
Basically what he suggests is to wrap your indexed value and searchterm in a delimeter word - in his example he uses the word lucenematch
.
The reason this works is that we go from having the indexed values:
fairytale
fairytale kingdom
to
lucenematch fairytale lucenematch
lucenematch fairytale kingdom lucenematch
And while fairytale
will be a match to fairytale kingdom
.
lucenematch fairytale lucenematch
will not match lucenematch fairytale kingdom lucenematch
So I'll add a quick string extension to add the delimiter word:
namespace ExamineTesting.Extensions;
public static class SearchExtensions
{
private const string DelimiterWord = "lucenematch";
public static string AddLuceneDelimiterWord(this string value)
{
return $"{DelimiterWord} {value} {DelimiterWord}";
}
}
And in the TransformingIndexValues event I will ensure that we add it to the values before indexing:
foreach (var tag in tagsLowerCase)
{
if (!string.IsNullOrWhiteSpace(tag))
{
newValues.Add(tag.AddLuceneDelimiterWord());
}
}
If I then go and reindex the externalIndex I can see it is added to our departmentLower field:
Then we can add it to our searchService as well:
if (tags.Any())
{
q.And().Field("departmentLower", tags.FirstOrDefault()?.AddLuceneDelimiterWord().ToLowerInvariant().Escape());
}
And when trying the search again we can see the query now adds the delimiter word, and the results are back to only the specific one we wanted:
Outro
Note: Joe has written a similar blogpost that solves the same problem with a different approach and goes a bit more in-depth with explaining the underlying Examine/Lucene parts.
Please check it out here: https://joe.gl/ombek/blog/tag-style-exact-matching-with-examine/
Thanks for following along! Please let me know if this was helpful to you 🙂
Also feel free to reach out to me on Mastodon: https://umbracocommunity.social/@Jmayn
Top comments (0)