Iterator Benchmarks That Shocked With Unexpected Results!

#dotnet #csharp #performance #programming

(This article was originally posted on my blog)

This article is the follow-up to my previous write-up sharing the potential pitfalls of iterators and materialized collections, but I’ll admit it… When I initially set out to go write this post on iterator benchmarks and contrast them with materialized collections, I was quite confident about what I’d find. I was so confident in what I’d expect that I coded up all of the sample code and thought I’d record a video on the results as soon as they hit the console output. I pressed the record button, brought up the console window, and realized I couldn’t explain half of what I was seeing.

Realistically, this was a way more interesting outcome for me. Putting together an analysis can be boring if you know what to expect every step of the way. The results were only half what I expected, and the remainder was truly a big shock for me when investigating these iterator benchmarks.

Companion Video!

Benchmark Setup

We’re of course going to be using BenchmarkDotNet for our benchmarks, and you can find all of the code for these over at GitHub. To start, we need an entry point hook for our single Benchmark class that will be defining the permutations of scenarios that we’d like to run. This will be relatively basic as follows:

var config = ManualConfig  
    .Create(DefaultConfig.Instance)  
    .WithOptions(ConfigOptions.DisableOptimizationsValidator);  

var summary = BenchmarkRunner.Run(  
    typeof(Benchmarks),  
    config);  

if (!summary.BenchmarksCases.Any())  
{  
    throw new InvalidOperationException("Benchmarks did not have results.");  
}

Next will be the portion of the Benchmarks class that actually defines the permutations we’d like to have our scenarios run against:

[Orderer(BenchmarkDotNet.Order.SummaryOrderPolicy.Method)]
[ShortRunJob(RuntimeMoniker.Net70)]
[MemoryDiagnoser]
public class Benchmarks
{
    public enum ImplementationScenario
    {
        Iterator,
        PopulateAsList,
        PopulateAsROCollection
    }

    private int[] _dataset;

    [Params(
        1_000,
        100_000,
        10_000_000)]
    public int NumberOfEntriesInDataset;

    [Params(
        ImplementationScenario.Iterator,
        ImplementationScenario.PopulateAsList, 
        ImplementationScenario.PopulateAsROCollection)]
    public ImplementationScenario Implementation;

    [GlobalSetup]
    public void Setup()
    {
        // seed this for consistency
        var random = new Random(1337);
        _dataset = new int[NumberOfEntriesInDataset];
        for (int i = 0; i < _dataset.Length; i++)
        {
            _dataset[i] = random.Next();
        }
    }

  // TODO: all the benchmark methods go here
}

The code above will set up a source array for us to work with that will have 1,000, 100,000, or 10,000,000 random integers depending on the benchmark run. We use a seeded Random for our benchmarks, but the scenarios we’re going to be looking at shouldn’t be concerned with the values of the integers. Next, we have one of three scenarios that will be selected to exercise and we can see that code as follows:

    private IEnumerable<int\> GetResultsUsingIterator()  
    {  
        foreach (var item in \_dataset)  
        {  
            yield return item;  
        }  
    }  

    private List<int\> GetResultsUsingListAsList()  
    {  
        var results = new List<int\>();  
        foreach (var item in \_dataset)  
        {  
            results.Add(item);  
        }  

        return results;  
    }  

    private IReadOnlyCollection<int\> GetResultsUsingListAsReadOnlyCollection()  
    {  
        var results = new List<int\>();  
        foreach (var item in \_dataset)  
        {  
            results.Add(item);  
        }  

        return results;  
    }

The first two methods show an iterator and a materialized collection variant. The third method is also a materialized collection, but we’ve used an IReadOnlyCollection<T> and we’ll come back to this as we dive into the examples. A hint is that I wanted to help make sense of what I was observing in the results of iterator benchmarks!

LINQ Any() Results are as Expected!

I’ll start us off with one of the iterator benchmarks that I was expecting, and that will be calling Any() on the result of our functions that are being tested. This Benchmark code is as follows (and can be found here on GitHub):

[Benchmark]
    public void LinqAny()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().Any(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().Any(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().Any(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

My expectation is that the iterator would be both faster AND use less memory than the materialized collection variant. This is because the call to Any() only needs the iterator to indicate that there is in fact at least one item and because iterators are streaming, there’s no allocation of results in a new collection. Conversely, we’d expect that the materialized collections are forced to fully enumerate before they return. This means that we get an entire copy of the dataset, blowing up our memory usage, and we pay the performance hit of full enumeration.

The results below demonstrate this when we look at the Mean and the Allocated columns.

(Try opening the image in a new tab to improve readability!)

LINQ Count() Results Start to Look Interesting…

The LINQ Count() method is where I started to get caught off guard. The Benchmark method for this is very much like the one we previously looked at, but we use Count() instead of Any():

    [Benchmark]
    public void LinqCount()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().Count(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().Count(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().Count(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

What I had hypothesized was that we’d observe a similar low memory allocation from the iterator because there was no extra materialization of a secondary collection. This checks out in the results below:

(Try opening the image in a new tab to improve readability!)

But what I found very peculiar was that the runtime performance for the iterator was not significantly faster than the materialized collections. In fact, it was sometimes slower if not roughly the same. Foolishly, I did forget that the Count() method from LINQ has an optimization that if you have a collection, it can check the count/length of it as a shortcut and not force enumeration. So while that would cut down runtime for materialized collections, shouldn’t the overhead of allocating that whole additional collection still incur some type of additional cost?

The results weren’t totally obvious here, but they start to get more interesting as we continue looking at the remaining iterator benchmarks!

Comparing Collection vs Iterator Benchmarks for Foreach

The code for this Benchmark is as follows:

[Benchmark]
    public void Foreach()
    {
        switch (Implementation)
        {
            case ImplementationScenario.Iterator:
                foreach (var x in GetResultsUsingIterator())
                {
                }
                break;
            case ImplementationScenario.PopulateAsList:
                foreach (var x in GetResultsUsingListAsList())
                {
                }
                break;
            case ImplementationScenario.PopulateAsROCollection:
                foreach (var x in GetResultsUsingListAsReadOnlyCollection())
                {
                }
                break;
            default:
                throw new NotImplementedException(Implementation.ToString());
        }
    }

And if you’re wondering why I only went with a foreach instead of a traditional for loop with a counter, it’s because IEnumerable and therefore iterators do not have this ability. I wanted to keep the comparisons equivalent.

Let’s dive into the results of the benchmarks:

(Try opening the image in a new tab to improve readability!)

When we look at the results of our Benchmark output for foreach, the memory allocation in the Allocated column is on point with what I would expect to see. When we consider an iterator compared to materializing a whole collection, the iterator does not need to allocate an entire extra collection as it streams data.

However, the runtime performance under the Mean column is where I started to get really confused. And I should add, initially I did not have the third variation that has an IReadOnlyCollection<T> return type, so this was added in afterward to help with my analysis.

Given that iterators are able to stream results, the foreach loop would result in one full enumeration over the dataset. However, materialized collections in this case (unlike LINQ Count() that we saw earlier) absolutely need to do a second full enumeration. The first is for creating the materialized collection and the second is for enumerating over it. How is it that the List<T> return type could be faster than the iterator in every single one of these scenarios? And the other brain teaser (that might give you the answer if you’re brushed up on your dotnet versions) is why is the IReadOnlyCollection<T> return type so much slower when the return type is the only difference?

The ‘Aha!’ Moment

Things weren’t adding up for me. I was sitting at my computer with a dumb look on my face, turning off my OBS recording because I just didn’t expect to see the performance metrics on some of these scenarios.

So if you want to read more about how I figured out where the discrepancies were, check out the original article to see how I made sense of these iterator benchmarks!