Sam Walpole

Posted on Apr 25, 2021 • Originally published at samwalpole.com

LINQ: Beware of deferred execution

#csharp #dotnet #dotnetcore #tutorial

If you've spent much time around C# and .NET, it's likely that you will have come across LINQ (Language-Integrated Query), which allows you to use a range of powerful querying capabilities directly in the C# language.

The example below demonstrate a couple of common features of LINQ (note that I am using the extension method syntax rather than LINQ expressions). In the example, we have a list of people and want to obtain a list of names of the adults in that list. We will then iterate over those names twice (this will be useful to demonstrate the differences between immediate and deferred execution).

Using LINQ, we can:

Filter by age using Where
Map from a Person object to the name string using Select
Evaluate the query to a list using ToList

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
}

var people = new List<Person>
{
    new Person { Name = "Sam", Age = 27 },
    new Person { Name = "Suzie", Age = 17 },
    new Person { Name = "Harry", Age = 23 },
};

var adultNames = people
    .Where(person => 
    {
        Console.WriteLine("Filtering by age...");
        return person.Age >= 18;
    })
    .Select(person => person.Name)
    .ToList();

foreach(var name in adultNames)
    Console.Writeline(name);

foreach(var name in adultNames)
    Console.Writeline(name);

/* output
Filtering by age
Filltering by age
Filtering by age
Sam
Harry
Sam
Harry
*/

Notice that in the above example, we explicitly convert the query to a list. This immediately executes the query to give a new list containing only the adult names, which we can then iterate over.

So what happens if we leave off the ToList?

var adultNames = people
    .Where(person => 
    {
        Console.WriteLine("Filtering by age...");
        return person.Age >= 18;
    })
    .Select(person => person.Name);

foreach(var name in adultNames)
    Console.Writeline(name);

foreach(var name in adultNames)
    Console.Writeline(name);

/* output
Filtering by age
Sam
Filtering by age
Filtering by age
Harry
Filtering by age
Sam
Filtering by age
Filtering by age
Harry
*/

Now the output looks quite different. Instead of doing all the filtering first, then iterating over the adult names, the filtering for each item is now happening immediately before we evaluate that item. Importantly, the filtering is also happening every time we iterate over the items. This is known as deferred execution, since we wait until we actually need the values to evaluate the query.

Benefits of Deferred Execution

It looks like deferred execution is the default behaviour of LINQ, unless you explicitly tell it to evaluate immediately (using ToList, ToDictionary etc.). So there must be some benefit to doing this, right?

1. Better Performance

In most cases it is expected that deferred execution will result in better performance, since you don't have to execute the query on the whole data set at once. Instead, you perform the query on one item at a time as you are already iterating over it.

2. Query Construction

Since the query does not need to be immediately executed, you can build it up in several steps, perhaps passing through additional conditional logic. This gives you additional power to create more complex queries.

public IEnumerable<Person> GetNames(IEnumerable<Person> people, bool onlyAdults)
{
    var query = people.AsEnumerable();

    if (onlyAdults)
    {
        // only add this filter when onlyAdults is true
        query = query.Where(person => person.Age >= 18);
    }

    query = query.Select(person => person.Name);

    return query.ToList();
}

3. Always revaluated

Since the query is always revaluated on every enumeration, you can add/remove/change elements of you collection after the query has been constructed and the query will know about these changes. In this way, you know that you are always iterating over the most up to date data.

var people = new List<Person>
{
    new Person { Name = "Sam", Age = 27 },
    new Person { Name = "Suzie", Age = 17 },
    new Person { Name = "Harry", Age = 23 },
};

var adultNames = people
    .Where(person => person.Age >= 18)
    .Select(person => person.Name);

foreach(var name in adultNames)
    Console.WriteLine(name);

people.Add(new Person { Name = "Sally", Age = 26 });

foreach(var name in adultNames)
    Console.WriteLine(name);

/* output
Sam
Harry
Sam
Harry
Sally
*/

The pitfalls of deferred execution

Despite so far singing the praises of LINQs deferred execution, this post was inspired by some of the issues I've been experiencing using it. One of it's benefits is also a pitfall if you do not take enough care when writing your code - the query is always revaluated.

Although deferred execution is seen as a performance benefit generally, there can be cases where it can actually dramatically slow down your application if you're not careful. Any time that you know you will need to repeated iterate over the same collection numerous times (for example a nested for/foreach loop), make sure you call to list first. Otherwise, you will be evaluating the whole collection every single time, which will dramatically reduce performance. This is especially true if the source collection is particularly large since, even if your query does a lot of filtering, the query will be applied every time to the whole source collection.

The final pitfall to mention is using Select to run a collection of tasks. I've seen arguments to say that this is something that you shouldn't do at all, but I've seen it enough in codebases to know that it's something that people do do and something you should be aware of. Imagine the below scenario:

var listOfIds = new List<int> { 1, 5, 8, 15 };

var tasks = listOfIds.Select(id => _repository.GetAsync(id));
await Task.WhenAll(tasks);
var results = tasks.Select(task => task.Result).ToList();

In the above example, the GetAsync method is actually executed twice for every ID, once when it is first declared, and a second time when the query is evaluated using ToList. Not only does this have a massive impact on performance by performing expensive operations multiple times, but also, since the task is re-executed, its not guaranteed to be completed when you actually come to evaluate it. As you might imagine, it is also particularly dangerous if the task you are running is actually a create or update operation (yes I have seen this too). To do the get safely, you need to immediately evaluate the query:

var tasks = listOfIds.Select(id => _repository.GetAsync(id)).ToList();

Conclusion

In this article I have introduced deferred execution in the context of .NETs LINQ. I have shown some of its features and why it can be beneficial compared to immediate execution. Finally, I have discussed some common pitfalls to looks out for when using LINQ and deffered execution.

I post mostly about full stack .NET and Vue web development. To make sure that you don't miss out on any posts, please follow this blog and subscribe to my newsletter. If you found this post helpful, please like it and share it. You can also find me on Twitter.

Top comments (3)

jeikabu • Apr 25 '21

I never really thought of the re-evaluation as a problem- it's conceptually similar to a function that calls a for loop.

Honestly, I feel that calling ToList() is more suspect. Not when you actually need a list, but simply to "evaluate" an expression. Besides just extra allocations, what if the number of items is huge? We tended to treat it like a query and use first/firstordefault/any/etc. I guess it makes sense if you want to save the results for later, maybe we just didn't use linq in cases where the results were static.

Matt Pogue • Jun 22 '21

I just recently had this exact realization, which led to a WHOLE BUNCH of code rewriting. I started using SQL Profiler and looking at my database transactions while running and my app and I was appalled - no wonder some of my pages were taking so long to load! I was assuming something like the following:

var items = (from i in Data.Items where i.Cost > 20 select i);
foreach(var item in items)....

Would make 1 trip to the DB and loop through the returned collection. Wrong! The initial bug was a query similar to the above where the page was taking 3:30 to load - as in 3 minutes, 30 seconds. There were 275 items being returned in my query and - you guessed it - 275 round trips to the database; one trip for each item. Something like this fixes it:

List<Data.Item> items = (from i in Data.Items where i.Cost > 20 select i).ToList();
foreach(Data.Item item in items)...

It may seem like a trivial distinction, but I didn't understand the consequences when developing my code initially which is causing headaches now that our database has grown significantly. Important lesson for all C#/.Net devs!

Ivy Puckett • Jun 13 '21

There are also lots of problems if you use async methods with an IQueryable. I can't remember why, but our team has had lots of headaches because of that deferred execution.