C#: functional approaches explained solving one specific problem

#functional #dotnet #csharp #linq

Today I would like to tell you about functional approaches. Why they are good, and why they are simple.

We need to write some code

Let us imagine, that we have some array of strings. It may be an array of names :

var items = new[]
{
    "Peter",
    "Albert",
    "John",
    "Jim",
    "Pavel",
    "Robert"
};

We would like to choose only names with a unique first character : "Peter", "Albert", "John", "Robert".
In C# we have LINQ to operate collections, and it has Distinct function. But, this function would compare full names, when we need to compare only their first characters.
.NET has DistinctBy function starting from .NET 6. But, what if we use another versions of .NET, or even .NET framework?
We are going to write an own DistinctBy implementation then.

Procedural approach

Many times, in someone else's code, I met exactly this procedural implementation :

using System.Collections.Generic;

static class EnumerableExtensions
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        HashSet<TKey> keys = new HashSet<TKey>();
        foreach(var item in source)
        {
            var key = keySelector(item);
            if(!keys.Contains(key))
            {
                keys.Add(key);
                yield return item;
            }
        }
    }
}
class Program
{
    static void Main()
    {
        var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
        Console.WriteLine(string.Join(", ", items.DistinctBy(n => n[0])));
    }
}

It does work, and it is not so bad.
But, now we have two separate implementations of almost the same thing : Distinct and DistinctBy functions. How may we solve that?

Object-oriented approach

It may seem obvious to use a special overload of the Distinct function, that accepts IEqualityComparer parameter. This way we have a control over how the Distinct function compares names. Let us implement IEqualityComparer interface with a necessary logic and apply it to our solution :

using System.Collections.Generic;
using System;

class FirstCharacterComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        return x[0] == y[0];
    }

    public int GetHashCode(string obj)
    {
        return obj[0].GetHashCode();
    }
}

class Program
{
    static void Main()
    {
        var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
        Console.WriteLine(string.Join(", ", items.Distinct(new FirstCharacterComparer())));
    }
}

Much better? I agree, but, how many classes we would create to implement such a small logic? If we would need to compare last characters, we would create a LastCharacterComparer class? And, to compare first decimal symbols of two numbers, we would create a FirstDecimalSymbolComparer? Object-oriented approach does give us a flexibility, but class declarations are too verbose.

Semi-functional approach

Let us describe our problem in a mathematical way.
We have some set X as an input (array of strings), a selector function X -> Y (where Y is a set of first characters respectively) and a distinct function (X, (X -> Y)) -> Z (where Z is a set of names with unique first symbols).
A selector function, in our case, is s => s[0], a distinct function is System.LINQ.Enumerable.Distinct.
Now, keeping this model in mind, let us implement and apply IEqualityComparer again :

using System.Collections.Generic;
using System;

class KeyComparer<TSource, TKey> : IEqualityComparer<TSource>
{
    Func<TSource, TKey> keySelector;

    public KeyComparer(Func<TSource, TKey> keySelector)
    {
        this.keySelector = keySelector;
    }

    public bool Equals(TSource x, TSource y)
    {
        return keySelector(x).Equals(keySelector(y));
    }

    public int GetHashCode(TSource obj)
    {
        return keySelector(obj).GetHashCode();
    }
}

class Program
{
    static void Main()
    {
        var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
        Console.WriteLine(string.Join(", ", items.Distinct(new KeyComparer<string, char>(s => s[0]))));
    }
}

Yes, now we have a common solution to make a set with unique items from an input set, comparing content by its property.
But, expression new KeyComparer<string, char>(s => s[0]) looks redundant. C# compiler can't infer generic type arguments for a constructor call, what increases amount of code.

Functional approach

Final step to perfection is to replace a constructor call with something else, or wrap it into something.
An obvious solution -- to wrap in into a DistinctBy function. Compiler will easily infer generic type arguments, we don't even need to think anymore.

static class EnumerableExtensions
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        return source.Distinct(new KeyComparer<TSource, TKey>(keySelector));
    }
}

And, updated Main method :

static void Main()
{
    var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
        Console.WriteLine(string.Join(", ", items.DistinctBy(s => s[0])));
}

Conclusion

Same approach may be used in many other cases. It keeps code simple, and does not repeat already existing one.
Also, we found today one more evidence that static functions are better, than constructors. If you did not read yet my article about that -- you may read it.

That's all for today. Thanks for reading this article, would be nice to read your thoughts in comments.