Today I would like to tell you about functional approaches. Why they are good, and why they are simple.
We need to write some code
Let us imagine, that we have some array of strings. It may be an array of names :
var items = new[]
{
"Peter",
"Albert",
"John",
"Jim",
"Pavel",
"Robert"
};
We would like to choose only names with a unique first character : "Peter", "Albert", "John", "Robert".
In C# we have LINQ to operate collections, and it has Distinct
function. But, this function would compare full names, when we need to compare only their first characters.
.NET has DistinctBy
function starting from .NET 6. But, what if we use another versions of .NET, or even .NET framework?
We are going to write an own DistinctBy
implementation then.
Procedural approach
Many times, in someone else's code, I met exactly this procedural implementation :
using System.Collections.Generic;
static class EnumerableExtensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> keys = new HashSet<TKey>();
foreach(var item in source)
{
var key = keySelector(item);
if(!keys.Contains(key))
{
keys.Add(key);
yield return item;
}
}
}
}
class Program
{
static void Main()
{
var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
Console.WriteLine(string.Join(", ", items.DistinctBy(n => n[0])));
}
}
It does work, and it is not so bad.
But, now we have two separate implementations of almost the same thing : Distinct
and DistinctBy
functions. How may we solve that?
Object-oriented approach
It may seem obvious to use a special overload of the Distinct
function, that accepts IEqualityComparer
parameter. This way we have a control over how the Distinct
function compares names. Let us implement IEqualityComparer
interface with a necessary logic and apply it to our solution :
using System.Collections.Generic;
using System;
class FirstCharacterComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return x[0] == y[0];
}
public int GetHashCode(string obj)
{
return obj[0].GetHashCode();
}
}
class Program
{
static void Main()
{
var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
Console.WriteLine(string.Join(", ", items.Distinct(new FirstCharacterComparer())));
}
}
Much better? I agree, but, how many classes we would create to implement such a small logic? If we would need to compare last characters, we would create a LastCharacterComparer
class? And, to compare first decimal symbols of two numbers, we would create a FirstDecimalSymbolComparer
? Object-oriented approach does give us a flexibility, but class declarations are too verbose.
Semi-functional approach
Let us describe our problem in a mathematical way.
We have some set X as an input (array of strings), a selector function X -> Y
(where Y is a set of first characters respectively) and a distinct function (X, (X -> Y)) -> Z
(where Z is a set of names with unique first symbols).
A selector function, in our case, is s => s[0]
, a distinct function is System.LINQ.Enumerable.Distinct
.
Now, keeping this model in mind, let us implement and apply IEqualityComparer
again :
using System.Collections.Generic;
using System;
class KeyComparer<TSource, TKey> : IEqualityComparer<TSource>
{
Func<TSource, TKey> keySelector;
public KeyComparer(Func<TSource, TKey> keySelector)
{
this.keySelector = keySelector;
}
public bool Equals(TSource x, TSource y)
{
return keySelector(x).Equals(keySelector(y));
}
public int GetHashCode(TSource obj)
{
return keySelector(obj).GetHashCode();
}
}
class Program
{
static void Main()
{
var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
Console.WriteLine(string.Join(", ", items.Distinct(new KeyComparer<string, char>(s => s[0]))));
}
}
Yes, now we have a common solution to make a set with unique items from an input set, comparing content by its property.
But, expression new KeyComparer<string, char>(s => s[0])
looks redundant. C# compiler can't infer generic type arguments for a constructor call, what increases amount of code.
Functional approach
Final step to perfection is to replace a constructor call with something else, or wrap it into something.
An obvious solution -- to wrap in into a DistinctBy
function. Compiler will easily infer generic type arguments, we don't even need to think anymore.
static class EnumerableExtensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.Distinct(new KeyComparer<TSource, TKey>(keySelector));
}
}
And, updated Main
method :
static void Main()
{
var items = new[] { "Peter", "Albert", "John", "Jim", "Pavel", "Robert"};
Console.WriteLine(string.Join(", ", items.DistinctBy(s => s[0])));
}
Conclusion
Same approach may be used in many other cases. It keeps code simple, and does not repeat already existing one.
Also, we found today one more evidence that static functions are better, than constructors. If you did not read yet my article about that -- you may read it.
That's all for today. Thanks for reading this article, would be nice to read your thoughts in comments.
Top comments (7)
What about
items.GroupBy(a => a.First()).Select(a => a.First()).ToList()
?Mathematically this solution is 100% correct :)
Good, thanks.
I think dotnet should add the ability to use a predicate instead of IEqualityComparer (and IComparer) in the default api.
Yes!
Or they should add a possibility to implement an interface by delegate.
Nice!
FYI next time, if you’re writing a codeblock for C# code you can use ‘’’cs in the markdown and it will generate nice colorful C# code
Thank you! :)
so cool!