DEV Community

Henrique Ferreira
Henrique Ferreira

Posted on

Memory management in C# - Part 2: hands-on

Let's profile a real C# app!

Note: The tooling used in this article is linked at the end. The analysis approach is independent of tooling, but there is a practical bias here towards the Windows platform. I expect you have some familiarity with Visual Studio and the command line, but even without that, you should be able to follow along.

This will be more practical, so please keep in mind that the memory analysis one does is as good as an expected baseline and the experienced delta. In other words, we should try to pick a “control group” (the expected memory consumption) to compare it with a use case (the scenario of analysis, in which the memory consumption seems “too high”, as per some threshold). For example, C# ASP.NET Core applications can use from tens up to hundreds of MB at the start in debug mode. The gist for now is: this is not a micro optimization theoretical exercise - we'll measure an application performance like in the real world.

Which app?

We need a long-running process. Let's start by running a simple ASP.NET Core web application and having a look around the memory consumption.

For that, I've created a folder called complex-app and ran

dotnet new webapi
Enter fullscreen mode Exit fullscreen mode

to download and set up a pre-configured app.

To run it, we can simply use

dotnet run
Enter fullscreen mode Exit fullscreen mode

Or build our project using Visual Studio and run the executable. When assessing performance, prefer the Release build (not Debug), because it will be closer to the realistic production binary. Debug builds carry different symbols we simply don't need for our purposes.

This will start a new .NET Core runtime process with our solution and bind it a to a HTTP port, exposing any available controller routes (the usual stuff, from the Program.cs entry point).

Profiling

To introduce some visualization and look into what's happening memory-wise behind the scenes, let's use PerfView. It's a simple standalone .exe, so I've downloaded it and included it in our Release output folder (in my case complex-app\bin\Release\net10.0).

For that, we'll run PerfView and select the Collect -> Run option.

For the Command, I'll select our program "complex-app.exe".

And click on "Run command". This is going to start our process from within the Profiler, collecting memory (and trace, and other) data from its execution. We can see the logs of our application by clicking on the "Log" button on the bottom-right corner.

Analyzing

To analyze our profiling session, all we have to do is click on "Cancel" on the bottom right corner. Then we can get the .etl (analysis) file that was produced as output and open it in PerfView.

Feel free to familiarize yourself with the structure of this file. I invite you to start by double clicking on "CPU Stacks" and selecting the "complex-app" process, like in the following screenshot:

Then double click on the "complex-app" process and we'll see what is really consuming time in our program. And the winner is the coreclr! That's expected, the Core Language Runtime will definitely be doing most of the heavy lifting of our app, in my case representing 36% of total execution time.

Tip: you can drill down by double clicking on the coreclr reference itself and checking individual method calls. Cool, isn't it? Now we can see the likes of extensions, aspnetcore, logging and much more!

Let's look at the memory.

For that, I'll expand the "Memory" folder and navigate to "GC Heap Net Mem", as this is the snapshot of the Garbage Collector.

So this is how my "complex-app" looks like, there's a bunch of bytes of Int32, a few of String... We'll make things more interesting in the next section.

Load

A few things to keep in mind before we increase the scale of our app and move to a more relevant analysis:

  • State of the art CPUs can process billions of instructions per cycle, so ultimately the memory can be allocated, detected in scope and cleared out very fast
  • Also due to the complexity of modern applications, when it comes to profiling and optimization we're mostly interested in hot paths and benchmarks (remember the baseline and deltas)
  • There is a trade-off between clean code and performance, arguably that goes all the way down to design discussions between OOP and procedural programming
  • Load and scale are very important factors that usually come together: simplistic algorithms perform well with smaller inputs

Let's work on some information retrieval functionality for our app. Generally, it's a good practice to think about the data itself: different algorithms behave differently depending on the shape, quantity and quality of our data. Classes vs. structs, primitive vs. complex types, the more we understand about our data, the better. We'll simulate a random list of users with the following user generator:

namespace complex_app.Data;

using complex_app.Models;
using System.Text;

public static class UserGenerator
{
    private static readonly Random _rand = new Random();

    private static readonly string[] FirstNames =
    {
        "John", "Maria", "Alex", "Sophie", "Daniel", "Lucas", "Emma", "Noah",
        "Isabella", "Liam", "Oliver", "Mia", "Ethan", "Amelia"
    };

    private static readonly string[] LastNames =
    {
        "Smith", "Johnson", "Brown", "Taylor", "Anderson", "Silva",
        "Kowalski", "Dubois", "Novak", "Rossi", "Fernandez"
    };

    public static List<User> GenerateUsers(int count)
    {
        var users = new List<User>(count);

        for (int i = 0; i < count; i++)
        {
            double roll = _rand.NextDouble();

            string name = roll switch
            {
                < 0.70 => NormalName(),
                < 0.85 => CaseNoiseName(),
                < 0.95 => LongName(),
                _ => EmailStyleName()
            };

            users.Add(new User
            {
                Id = i,
                Name = name
            });
        }

        return users;
    }

    private static string NormalName()
    {
        return $"{Rand(FirstNames)} {Rand(LastNames)}";
    }

    private static string CaseNoiseName()
    {
        return ToRandomCase(NormalName());
    }

    private static string LongName()
    {
        return $"{Rand(FirstNames)} {Rand(LastNames)} {Rand(LastNames)} III (Contractor - EU West Region)";
    }

    private static string EmailStyleName()
    {
        return $"{Rand(FirstNames).ToLower()}.{Rand(LastNames).ToLower()}@company-internal-domain.eu";
    }

    private static string Rand(string[] arr)
        => arr[_rand.Next(arr.Length)];

    private static string ToRandomCase(string input)
    {
        var sb = new StringBuilder(input.Length);

        foreach (var c in input)
        {
            sb.Append(_rand.Next(2) == 0
                ? char.ToLowerInvariant(c)
                : char.ToUpperInvariant(c));
        }

        return sb.ToString();
    }
}
Enter fullscreen mode Exit fullscreen mode

Simulation is one of the most powerful tools in performance analysis. A late great simulation will never be the same as a memory production issue, but an early good simulation can definitely avoid the issue entirely.

Now we add the list to our app, in Program.cs - I'll inject it as a global singleton dependency, with 20K users:

// ...
var users = UserGenerator.GenerateUsers(20_000);
builder.Services.AddSingleton(users);
// ...
Enter fullscreen mode Exit fullscreen mode

Finally, we can come up with a small search behavior: I have the following interface, with the Search and SearchFast methods. You can probably see where we're heading from here, but I promise looking at the memory footprint will be interesting and perhaps can give some unexpected insights even for experienced engineers.

using complex_app.Models;

namespace complex_app.Services;

public interface IUserSearch
{
    List<User> Search(string query);
    List<User> SearchFast(string query);
}

public class UserSearch : IUserSearch
{
    private readonly List<User> _users;

    public UserSearch(List<User> users)
    {
        _users = users;
    }

    public List<User> Search(string query)
    {
        query = query.ToLowerInvariant();

        var results = _users
            .Where(u => u.Name.ToLowerInvariant().Contains(query))
            .ToList();

        return results;
    }

    public List<User> SearchFast(string query)
    {
        return _users
            .Where(u => u.Name.Contains(query, StringComparison.OrdinalIgnoreCase))
            .ToList();
    }
}
Enter fullscreen mode Exit fullscreen mode

Exercising the app

Now what we need to do is simply run our app like before, with the run command. For the sake of empirical analysis, we'll compare:

  • A run using only the Search method, and
  • Another run using only the SearchFast method

In this experiment, each method will be called 10 times directly through the endpoints GET /seach?q=abc and GET /search-fast?q=abc, respectively.

While service warm up is definitely a factor influencing our performance, for this exercise start up times will be aggregated and neglected - for simplicity: both simulate the same piece of behavior with different implementations.

Final results

At last, let's compare using Search against SearchFast. So we have two files open in PerfView, a PerfViewData.etl.zip and below it a PerfViewData_Fast.etl.zip (I'm not very good at naming things, as you can see).

Starting with CPU stacks, we have our UserSearch appearing with 3.4% of total CPU time when it comes to Search (highlighted):

In SearchFast, interestingly, the search method stopped appearing in CPU traces entirely. Not because it disappeared, but because it was no longer a meaningful contributor to execution time - its time was below the sampling threshold of PerfView. Why is that the case? Let's look at the GC next.

We can do an in depth GC analysis by clicking on Memory -> GC Heap Net Mem. That will open a panel with our GC profile. Looking at Search first, we see 90% of garbage collection pressure was attributed to string operations. At first glance, it looked like the search logic was the bottleneck.

Comparatively, our User model took 0.8% of GC pressure, coming is as second place.

In SearchFast, after removing seemingly innocent allocations created by ToLowerInvariant, string-related GC pressure disappeared entirely. The dominant cost shifted to the user model itself.

The surprising part wasn’t that performance just improved, it was that the profiler stopped showing strings as a meaningful contributor at all.

Not only that, we see what looks like, for our purposes, a much healthier application, in which the GC is diving its time between our data models and JSON parsing for the HTTP pipeline.

Conclusion

What started as a seemingly simple search implementation ended up revealing something more fundamental about performance in .NET applications: the real cost of a feature is often not where we initially expect it to be.

At first glance, the Search and SearchFast methods appear almost identical in intent. Both iterate over the same dataset, both perform substring matching, and both return the same result type. From a purely algorithmic perspective, you might even assume their performance characteristics would be similar at this scale.

However, profiling tells a different story, and that's the importance of understanding about how we leverage memory in practice.

Commonly in managed runtimes like .NET, the observed slowness or memory leak is actually introduced by a large volume of short-lived allocations. In our example, repeated string transformations, particularly using ToLowerInvariant. This shifts the workload away from CPU-bound computation and into garbage collection pressure, with the profiler showing strings as the dominant contributor to GC activity.

In the optimized version, removing these allocations did not simply "improve performance" fundamentally changed the shape of the application. As we saw, string-related GC pressure disappeared entirely from the trace, and the remaining cost shifted naturally toward the data model and pipeline processing.

This is the key insight: performance work is not always about making a single operation faster. More often, it is about changing what the runtime is actually spending its time and memory on.

I hope that was useful and see you on the next one :)

References and relevant tools

Top comments (0)