LINQ (3 Part Series)
Let’s talk about how LINQ’s
GroupBy method makes grouping data in C# easy. We’ll start basic and then go over every available overload of the
GroupBy method to explore advanced use cases.
GroupBy allows you to quickly group collections of related data by specific properties on your data. The grouped data is then arranged by sub-collections of items in those groups.
Note: LINQ provides variants of each method in this article that work with either
IQueryable. These methods are otherwise identical, so for the purposes of this article we will ignore the difference in data source type.
Let’s take a look at a sample involving a small data set of books:
Assuming I have this data loaded into a collection of
books, I can then use
GroupBy to group items by various properties by specifying the key selector.
For example, I’ll group by author by using
books.GroupBy(b => b.Author)
This results in an
IEnumerable<IGrouping<string, Book>>. Don’t let that return type scare you – it just means a collection of groups where each group is based on a string value (whatever value your key selector returned) and contains
If that is a mouthful, let’s look at the following:
Essentially, we split our collection into multiple sub-collections, and each one of those has a
Key property of whatever type we grouped on, and itself serves as an
IEnumerable<T> that allows us to enumerate over the items in the group.
If we you don’t want the entire body of the grouped item in the sub collection, you can use the overload that also takes in an element selector. The element selector is just a function that selects the part of the object that the grouped element will contain.
For example, if in our earlier example I did
books.GroupBy(b => b.Author, b.Title), I would get a collection of groups by author that then contained only string values for titles by that author:
Admittedly, the results here look a little odd when serialized to JSON. This is because the group key is not included in the serialized results and so we don’t see the author listed. Rest assured that you could still getting at the author by looking at the
Key property of each group.
Now we’ve seen how to work with key and value selectors, let’s introduce a third type of selector: result selectors.
Result selectors let you customize the generated collection. Instead of working with an
IGrouping<TKey, TValue> you can effectively project the collection into whatever shape you’d like it to be in.
Let’s use this to address the serialization quirk we saw with the last example:
Here we select the author and title for key and value, like we did before, but now we project each group into a new anonymous type, setting an Author property to the key of each group and setting the groups collection to the collection of title values.
The end JSON is much more useful for representing our group:
I should point out an important point here – with this overload, we are no longer returning an
IEnumerable<IGrouping<TKey, TValue>> but rather an
TProjected is whatever the result of our result selector is.
The last possible parameter to
GroupBy is an equality comparer. Comparers are used when determining which group an item belongs in and can be helpful to use if you have data that is not being grouped properly.
For example, let’s say that your data has a few rows with different casing for the same author:
- Michael Crichton
- michael crichton
- Michael CRichton
We can pass in an
IEqualityComparer<TKey> that will be used to compare various key values. Since our key, author, is a string value, we need an
Thankfully, .NET ships with several of these built in to the
StringComparer class. In our case, we’ll use
StringComparer.CurrentCultureIgnoreCase to compare our authors:
Using the comparer in this way will ignore any casing differences between author entries.
You may be wondering which key value is used if multiple values can compare to the same group. The answer is that LINQ uses the first value encountered in that group as the official group key. This means that even if you use a
StringComparer to ignore casing differences, you could still wind up with a key value that might not match the ideal formatting.
This is why I would advocate that if you consider using an
IEqualityComparer you should also think about cleaning up and normalizing your data source instead.
That said, if you need to group elements on some criteria other than reference equality, implementing a custom
IEqualtyComparer can be the way to go. I would expect, however, that these cases would be few and far between.
Before writing this article, I found the
IGrouping result too inconvenient to work with and iterate over and largely avoided LINQ
Upon fully exploring this method and its overloads, I think there are a number of compelling reasons to use
GroupBy, particularly the variant that allows you to project groups into custom objects and formats.
If you still have questions or would like to learn more about the material in question, take a look at MSDN’s documentation on the GroupBy method.
GroupBy joins your set of tools you frequently use or not, it is a powerful and capable component of LINQ and a tool to keep in mind.