Let’s talk about how LINQ’s GroupBy
method makes grouping data in C# easy. We’ll start basic and then go over every available overload of the GroupBy
method to explore advanced use cases.
GroupBy
allows you to quickly group collections of related data by specific properties on your data. The grouped data is then arranged by sub-collections of items in those groups.
Note: LINQ provides variants of each method in this article that work with either IEnumerable
or IQueryable
. These methods are otherwise identical, so for the purposes of this article we will ignore the difference in data source type.
Simple Grouping
Let’s take a look at a sample involving a small data set of books:
Assuming I have this data loaded into a collection of IEnumerable<Book>
called books
, I can then use GroupBy
to group items by various properties by specifying the key selector.
For example, I’ll group by author by using books.GroupBy(b => b.Author)
This results in an IEnumerable<IGrouping<string, Book>>
. Don’t let that return type scare you – it just means a collection of groups where each group is based on a string value (whatever value your key selector returned) and contains Books
objects.
If that is a mouthful, let’s look at the following:
Essentially, we split our collection into multiple sub-collections, and each one of those has a Key
property of whatever type we grouped on, and itself serves as an IEnumerable<T>
that allows us to enumerate over the items in the group.
GroupBy Element Selectors
If we you don’t want the entire body of the grouped item in the sub collection, you can use the overload that also takes in an element selector. The element selector is just a function that selects the part of the object that the grouped element will contain.
For example, if in our earlier example I did books.GroupBy(b => b.Author, b.Title)
, I would get a collection of groups by author that then contained only string values for titles by that author:
Admittedly, the results here look a little odd when serialized to JSON. This is because the group key is not included in the serialized results and so we don’t see the author listed. Rest assured that you could still getting at the author by looking at the Key
property of each group.
Result Selectors
Now we’ve seen how to work with key and value selectors, let’s introduce a third type of selector: result selectors.
Result selectors let you customize the generated collection. Instead of working with an IGrouping<TKey, TValue>
you can effectively project the collection into whatever shape you’d like it to be in.
Let’s use this to address the serialization quirk we saw with the last example:
Here we select the author and title for key and value, like we did before, but now we project each group into a new anonymous type, setting an Author property to the key of each group and setting the groups collection to the collection of title values.
The end JSON is much more useful for representing our group:
I should point out an important point here – with this overload, we are no longer returning an IEnumerable<IGrouping<TKey, TValue>>
but rather an IEnumerable<TProjected>
where TProjected
is whatever the result of our result selector is.
Equality Comparer
The last possible parameter to GroupBy
is an equality comparer. Comparers are used when determining which group an item belongs in and can be helpful to use if you have data that is not being grouped properly.
For example, let’s say that your data has a few rows with different casing for the same author:
- Michael Crichton
- michael crichton
- Michael CRichton
We can pass in an IEqualityComparer<TKey>
that will be used to compare various key values. Since our key, author, is a string value, we need an IEqualityComparer<string>
.
Thankfully, .NET ships with several of these built in to the StringComparer
class. In our case, we’ll use StringComparer.CurrentCultureIgnoreCase
to compare our authors:
Using the comparer in this way will ignore any casing differences between author entries.
You may be wondering which key value is used if multiple values can compare to the same group. The answer is that LINQ uses the first value encountered in that group as the official group key. This means that even if you use a StringComparer
to ignore casing differences, you could still wind up with a key value that might not match the ideal formatting.
This is why I would advocate that if you consider using an IEqualityComparer
you should also think about cleaning up and normalizing your data source instead.
That said, if you need to group elements on some criteria other than reference equality, implementing a custom IEqualtyComparer
can be the way to go. I would expect, however, that these cases would be few and far between.
Closing Thoughts
Before writing this article, I found the IGrouping
result too inconvenient to work with and iterate over and largely avoided LINQ GroupBy
syntax.
Upon fully exploring this method and its overloads, I think there are a number of compelling reasons to use GroupBy
, particularly the variant that allows you to project groups into custom objects and formats.
If you still have questions or would like to learn more about the material in question, take a look at MSDN’s documentation on the GroupBy method.
Whether GroupBy
joins your set of tools you frequently use or not, it is a powerful and capable component of LINQ and a tool to keep in mind.
The post LINQ GroupBy in Depth appeared first on Kill All Defects.
Top comments (4)
I never knew about the overloads. I usually just projected after with Select. Thanks!
I didn't either! I wrote this article at someone's request and learned a lot. The overloaded way is the difference maker for me.
Great exaplaination. I always find myself looking on Google for the correct way to use GroupBy. Guess that maybe this time I will remember it.
I can guarantee you that I will look at this article myself as a reference for
GroupBy
at some point in time.