C#, Task.WhenAll vs Parallel.ForEach

#csharp #dotnet

Today, I need to complete an interesting test to demonstrate my knowledge of asynchronous programming by creating a method that downloads three resources and aggregates the content length of all 3 responses.

The first idea ran to my mind is Task.WhenAll:



        static async Task<long> FetchContentLengthWaitAllAsync(Uri[] urls, CancellationToken cancellationToken)
        {
            // Start a Stopwatch to evaluate performance
            var sw = new Stopwatch();
            sw.Start();

            // Current thread id
            Console.WriteLine("Current Thread Id: " + Thread.CurrentThread.ManagedThreadId);

            // Request
            var contentLength = (await Task.WhenAll(urls.Select(url =>
            {
                // Web client for the request
                using (var wc = new WebClient())
                {
                    // Cancellation on the token will cancel the request
                    cancellationToken.Register(() =>
                    {
                        wc.CancelAsync();
                        Console.WriteLine($"Request cancelled!");
                    });

                    wc.DownloadDataCompleted += (s, e) =>
                    {
                        Console.WriteLine("FetchContentLengthWaitAllAsync Thread Id: " + Thread.CurrentThread.ManagedThreadId);
                    };

                    // 返回
                    return wc.DownloadDataTaskAsync(url);
                }
            }))).Sum(bytes => bytes.Length);

            // Output
            sw.Stop();
            Console.WriteLine("FetchContentLengthDirectAsync Miliseconds: " + sw.ElapsedMilliseconds);

            // Return
            return contentLength;
        }

I studied further and learned several posts like 'Concurrency vs. Parallel vs. Async in .NET' (https://dev.to/scotthannen/concurrency-vs-parallel-vs-async-in-net-3812) and 'Parallel Foreach async in C#' (https://medium.com/@alex.puiu/parallel-foreach-async-in-c-36756f8ebe62). They put me in vague than before. Just hands-on:



        static async Task<long> FetchContentLengthAsync(Uri[] urls, CancellationToken cancellationToken)
        {
            // Start a Stopwatch to evaluate performance
            var sw = new Stopwatch();
            sw.Start();

            // Request
            var resultCollection = new ConcurrentBag<long>();
            await Task.Run(() =>
            {
                Parallel.ForEach(urls, (url) =>
                {
                    // Web client for the request
                    using (var wc = new WebClient())
                    {
                        // Cancellation on the token will cancel the request
                        cancellationToken.Register(() =>
                        {
                            wc.CancelAsync();
                            Console.WriteLine($"Request cancelled!");
                        });

                        // 返回
                        resultCollection.Add(wc.DownloadData(url).Length);

                        Console.WriteLine("FetchContentLengthAsync Thread Id: " + Thread.CurrentThread.ManagedThreadId);
                    }
                });
            });

            // Output
            sw.Stop();
            Console.WriteLine("FetchContentLengthAsync Miliseconds: " + sw.ElapsedMilliseconds);

            // Return
            return resultCollection.Sum();
        }

What are the differences? I implemented a test:
        static async Task Main(string[] args)
        {
            // Requested URLs
            var urls = new Uri[] { new Uri("https://dev.to/garryxiao/react-typescript-electron-all-steps-to-start-lcc"), new Uri("https://dev.to/bipinrajbhar/the-beginner-s-guide-to-react-introduction-50i0"), new Uri("https://stackoverflow.com/questions/22024233/read-response-header-from-webclient-in-c-sharp") };

            // Cancellation token source
            using (var cts = new CancellationTokenSource())
            {
                // WaitAll way
                var contentLengthWaitAll = await FetchContentLengthWaitAllAsync(urls, cts.Token);

                // Common
                var contentLengthCommon = await FetchContentLengthAsync(urls, cts.Token);
            }
        }

And here was the result:

Conclusions:

Parallel.ForEach is quicker than Task.WhenAll. Parallel itself is synchronous.
Parallel.ForEach is multiple threads solution while Task.WhenAll will probably share threads. If tasks share the same thread, they are just pieces of the thread and will need more time to complete the tasks.
Because they are both concurrencies, so keep an eye on thread-safe issues.
async/await are kinds of syntactic sugar that enables concurrency potential but not necessarily multithreading.

For more details: https://www.codemag.com/Article/1211071/Tasks-and-Parallelism-The-New-Wave-of-Multithreading

Oldest comments (5)

alexandis • Sep 15 '20 • Edited

Hi Garry, what about using async within Parallel.ForEach, like Parallel.ForEach(items, async (item) => {})? Does it make sense? My case is as follows: I am getting list of data from DB. Then, using Parallel.ForEach I'm checking the child entities - images. If each of image has FilePath - in our case it means the image is stored in file system and I need to read image data from stream into Image entity (byte[]). I'm using the above approach, but don't know if it has some dangerous pitfalls.

Garry Xiao • Oct 8 '20

Hi Alexandis, in the production environment, there are three points I would mention:

Resource limitation. So after you read the list, if you want to use a different thread for each item, you need to know how many items there and inside the thread, how many images may have and the processing would have.
If you make a new thread other than the main thread, async makes no sense. Maybe I am wrong.
In multiple threads, fewer relations more benefits, pay attention to the lock issue.

E.R. Nurwijayadi • Jan 16 '21 • Edited

This is a good article and I think this explanation deserve more exposure.

I'm a beginner in C#. To help more beginner, I have made a simpler working example with source code in github.

🕷 epsi.bitbucket.io/lambda/2020/12/2...

First the process sender and receiver

And then run both

I hope this could help other who seeks for other case example.

🙏🏽

Thank you for posting about concurrency.

Prakhar Londhe • Jan 2 '22

There are certain drawbacks I can see with this:

In the second method, where you use Parallel.ForEach, that is highly dubious. You are essentially making synchronous calls from different threads. It is not the same as async/await, where the threads are free to do other work and some other thread is captured only when the response is ready ( be it from http or db ).
If you change the order of the parallel.foreach and task.whenall, you will most probably notice that now task.WhenAll is taking less time. This can mostly be attributed to request caching, as well as usage of pre-existing connection the second time.