DEV Community

Jesper Mayntzhusen
Jesper Mayntzhusen

Posted on

The problem of referenced content in Examine indexing (Umbraco 11)

Intro

When you use a picker property editor in Umbraco, it will reference some other content - the built-in Umbraco index which automatically indexes all content properties will also index the field, but only with a reference to the other content node.

I will run through the details of how this works, and some potential solutions as well as their shortcommings in this blogpost.

The examples below are using a fresh Umbraco 11.2.0 site with the Clean starter kit installed for some base content.

NOTE: While this blog is using code and examples for v11, the core concepts are valid for all Umbraco versions!

The problem

Imagine you have a blogging website where you have authors saved as nodes on their own and then referenced on blog posts like this:

Image description

Image description

This works great, and you can fetch the data from the referenced node to output on the frontend quite easily.

However, when it comes to the search index, if you want the blogpost to be searchable by author then you need to do some extra work. Currently the blogpost will have the author node referenced like this:

Image description

Indexing External node data

So the first step towards solving the problem is that we can hook into the TransformingIndexValues event. This event happens whenever you index content - so either on a full index rebuild or when a specific node is indexed on a publish.
In the event we can add one or more new fields to the index where we save the external node data.

We add this on the UmbracoApplicationStartedNotification, which means it gets added on startup:

using Examine;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Notifications;

namespace IndexingBlog.Indexing;

public class CustomIndexer : INotificationHandler<UmbracoApplicationStartedNotification>
{
    private readonly IExamineManager _examineManager;

    public CustomIndexer(IExamineManager examineManager)
    {
        _examineManager = examineManager;
    }

    public void Handle(UmbracoApplicationStartedNotification notification)
    {
        if (!_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out IIndex index))
            throw new InvalidOperationException($"No index found by name {Constants.UmbracoIndexes.ExternalIndexName}");

        if (index is not BaseIndexProvider indexProvider)
            throw new InvalidOperationException("Could not cast");

        indexProvider.TransformingIndexValues += IndexProvider_TransformingIndexValues;
    }

    private void IndexProvider_TransformingIndexValues(object? sender, IndexingItemEventArgs e)
    {
        // Do stuff
    }
}
Enter fullscreen mode Exit fullscreen mode

And we can then register it in the startup like this:

public void ConfigureServices(IServiceCollection services)
{
    services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .AddNotificationHandler<UmbracoApplicationStartedNotification, CustomIndexer>()
        .Build();
}
Enter fullscreen mode Exit fullscreen mode

In the TransformingIndexValues event we can get the node UDI, then get the node from the cache and pull the values we want - then add them to the index for the node:

private void IndexProvider_TransformingIndexValues(object? sender, IndexingItemEventArgs e)
{
    // We don't do anything if the indextype isn't content (so fx media)
    if (e.ValueSet.Category != IndexTypes.Content) return;

    // We get the author field, if it's not there we return
    var hasAuthorProp = e.ValueSet.Values.TryGetValue("author", out var values);
    if (!hasAuthorProp) return;

    // The author field contains the content UDI as a string, we get it and parse it as a UDI
    var contentUdiString = values?.FirstOrDefault()?.ToString();
    var contentUdi = UdiParser.Parse(contentUdiString);

    using var context = _umbracoContextFactory.EnsureUmbracoContext();
    // Get the content node from the UDI
    var node = context?.UmbracoContext?.Content?.GetById(contentUdi);

    // Get the author name and description from the node
    var authorName = node?.Value<string>("authorName");
    var authorDescription = node?.Value<string>("description");

    // We can't add to the values collection, so we get it and add our fields - then overwrite it with our new collection
    var valuesDictionary = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());
    if(authorName is not null) 
        valuesDictionary.Add("authorName", new List<object>() { authorName });
    if(authorDescription is not null) 
        valuesDictionary.Add("authorDescription", new List<object>() {authorDescription});

    e.SetValues(valuesDictionary.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
}
Enter fullscreen mode Exit fullscreen mode

At this point whenever the blog node is saved it will also index the author name and description from the referenced node.

This works nicely and is searchable - however there is one major problem:
Imagine you have one author that is referenced on several blogposts - then you update the description of that author.
At that point the description would not be updated in the searchindex for all of the blog posts referencing that author - as their index values are only set when the node itself is published or the entire index is rebuilt.

So how do we keep the index values up to date?

Since Umbraco 8.6 there is a default relation that gets added when you pick a content item from a different content item.

If you are unfamiliar with Umbracos relations, then it is a built-in functionality that tracks entities that are connected somehow.

For example there is a media relation that shows where media items are used:

Image description

The relations can be found in the settings section - and there is a new relation type called Related document - and you can see that it already tracks the items that are connected by being picked:

Image description

So the flow we want to implement is something like this:

  1. When a node is published we check if it is the author type
  2. If it is the author type we use the relation to look up the nodes it is used on
  3. We create a list of all the related nodes and then reindex them

First thing we want to do is to create a Notification event handler:

using Examine;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Notifications;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Infrastructure.Examine;

namespace IndexingBlog.Notifications;

public class ContentNotifications : INotificationHandler<ContentPublishedNotification>
{
    private readonly IRelationService _relationService;
    private readonly IExamineManager _examineManager;
    private readonly IPublishedContentValueSetBuilder _publishedContentValueSetBuilder;
    private readonly IContentService _contentService;

    public ContentNotifications(IRelationService relationService, IExamineManager examineManager, IPublishedContentValueSetBuilder publishedContentValueSetBuilder, IContentService contentService)
    {
        _relationService = relationService;
        _examineManager = examineManager;
        _publishedContentValueSetBuilder = publishedContentValueSetBuilder;
        _contentService = contentService;
    }

    public void Handle(ContentPublishedNotification notification)
    {
        // Do stuff
    }
}
Enter fullscreen mode Exit fullscreen mode

And register it in the startup:

services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .AddNotificationHandler<UmbracoApplicationStartedNotification, CustomIndexer>()
        .AddNotificationHandler<ContentPublishedNotification, ContentNotifications>()
        .Build();
Enter fullscreen mode Exit fullscreen mode

Finally we add the code needed to find the relations and to reindex the related nodes:

public void Handle(ContentPublishedNotification notification)
{
    // We start a list of content ids we need to reindex
    var nodesToReindex = new List<int>();

    foreach (var node in notification.PublishedEntities)
    {
        // We only want to do this for the author type, so if it's not that type we continue the loop
        if(node.ContentType.Alias != "author") continue;

        // When we get a author node that is published we use the relations service and check the related documents
        var relations = _relationService.GetByChildId(node.Id, "umbDocument");
        // We add all the related node ids to our list for reindexing
        nodesToReindex.AddRange(relations.Select(relation => relation.ParentId));
    }

    if (!nodesToReindex.Any()) return;

    // Get the content from the id list
    var contentToReindex = _contentService.GetByIds(nodesToReindex);

    // Get the external index
    if (!_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out IIndex index))
        throw new InvalidOperationException($"No index found by name {Constants.UmbracoIndexes.ExternalIndexName}");

    // Get the valuesets and index them
    var valueSets = _publishedContentValueSetBuilder.GetValueSets(contentToReindex.ToArray());
    index.IndexItems(valueSets);
}
Enter fullscreen mode Exit fullscreen mode

Things to keep in mind

This is one way to keep the indexed content up to date. And it works great for a local small site. However you should be wary of how many nodes this may affect - it also hits the db several times through the service usages.

So if you have a large site with many editors this may not be the right way to solve the problem!

Top comments (1)

Collapse
 
owainwilliams profile image
Owain Williams • Edited

h5yr - this has come in super handy for me today! I was needing to something very similar and your blog has been super easy to follow. Thanks!