Benchmarking C# (2 Part Series)
Lately I've been interested in benchmarking the features of C# that I use everyday. I've found a lot of my thoughts are based on assumptions instead of facts. Today I'm going to be diving into the Yield statement in C# and seeing how it compares against explicitly creating an enumeration.
Code is located via the project's Github repo to see the full benchmark and run it yourself.
Attaining accurate and valid benchmark results is not trivial. Earlier in my career I resorted to manually timing if I wanted to get a "benchmark." But that's a pretty naive approach to benchmarking. Luckily there's a great library called BenchmarkDotNet which makes benchmarking easy and does all the hard things for you.
Check their website or an article I wrote for an introduction to the library.
So what is the yield statement? The Microsoft docs say:
Using yield to define an iterator removes the need for an explicit extra class (the class that holds the state for an enumeration, see IEnumerator for an example) when you implement the IEnumerable and IEnumerator pattern for a custom collection type.
Take a look at the following example:
Both methods return an IEnumerable, but using the yield statement removes the need to create a list to hold our values as we iterate.
When I first saw this I didn't really know what to make of it. How did a simple yield statement transform into a IEnumerable return type?
But after walking through some examples, I began to understand the power of using yield.
Take a look at the following example, what would you expect the output to be?
20 printed lines? 10 for the "Inserting X" and 10 more for the "Yielding X"?
When run, the following is printed:
If you've never used yield before, take a moment to think through what might be happening? We have declared two loops from 0-9, but only one generated all the values.
What allowed yield to stop after 1? Whatever it was, it allowed us to avoid generating 80% of the numbers.
This behavior occurs because yield only enumerates values when needed. Instead of generating all 10 values, then returning those values for FindOne() to loop over. Our yield method only returns the next value whenever it is requested.
That's worth restating. Yield only returns the next value in the iteration when it is requested. If it's never requested, it's never returned.
In our example, FindOne() enters the foreach loop and starts the first loop. Calls the GetIEnumerabYield() method, the first time through i is 0, it hits the yield statement, returns 0 and pauses there. FindOne() gets the 0, that isn't the value it's looking for, so onto the next iteration of FindOne(). On the second iteration of FindOne(), it calls GetIEnumerableYield() again. GetIEnumerableYield() begins from the yield statement, generates the next value in the sequence, 1, hits the yield statement, returns 1 and pauses again. FindOne() has the value it needs and breaks.
The FindOne() and GetIEnumerableYield() methods are in constant communication. GetIEnumerableYield() is generated values as needed instead of generating them all beforehand.
If whatever is being created is an expensive operation, using yield could potentially save your program a lot of work.
Here's another example to get your brain working.
Does this method ever stop running?
Because we're using a yield statement, yes.
Even though we're in an infinite while loop, the FindOne() method will stop requesting values after finding the 1. Because yield methods only generate values as needed, only two iteration of the while loop will ever be performed.
I hope now you're beginning to see some of the possibilities of using a yield statement.
Now that we know a bit about what yield is and how it works, lets turn our attention to answering the question of what it's performance is.
Even though this can be a useful feature, if it's slower we're probably not going to use it. So lets benchmark it.
Here's a simple version of my benchmark. We're basically just going to be generating numbers. One version creates a list, adds to it and returns it. The other just yields those same values. I've created a couple more Benchmarks like this, each one having another layer of nested method calls. If you're curious, all code is on my github.
If I run all my benchmarks, I get the following result:
The important columns to pay attention to is "Mean" and "Allocated". "Mean" is the time it took to run the benchmarks and "Allocated" is the amount of memory used.
For all but the first one benchmark, yield was significantly faster and used significantly less memory. Especially as the amount of work got bigger, yield was by far the better choice.
I was not expecting this big a difference between the two different methods. This benchmark surprised me a bit and I'm going to use yield more in the future because of these results.
It's worth noting that the one downside I find when using yield is it makes debugging harder. This has honestly been the reason that I've shied away from using it in the past.
At no point is there a list that contains all my values for me to investigate. That list doesn't ever get created, only individual values are yielded when needed.
To be honest, If I'm going to be debugging something for any length of time, sometimes I'll comment out the yield statements and replace them with regular lists. Doing that makes my debugging easier and I can just change it back before checking code in.
Probably not the best habit, but dealing with yields while debugging isn't the easiest thing to do.
If anyone has better strategies, I'd love to hear it.
Using a yield statement instead of explicitly creating enumerations has benefits of both speed and performance. But I've found a tradeoff between readability and debug-ability.
Before benchmarking yield, I had no clue that there was this much of a difference between techniques. Which goes to show why you should benchmark as a developer. Because what you know or assume may just be false.
Thanks for reading!
I’m Morgan Kenyon. I’m a .NET developer working in the DFW area. I love solving a good problem and want to talking about the tech I use. If you found this article helpful or thought provoking leave a comment and lets connect over LinkedIn!