DEV Community: Henrique Ferreira

Memory management in C# - Your app probably uses more memory than you think

Henrique Ferreira — Mon, 18 May 2026 16:51:56 +0000

Let's profile a real C# app!

Note: The tooling used in this article is linked at the end. The analysis approach is independent of tooling, but there is a practical bias here towards the Windows platform. I expect you have some familiarity with Visual Studio and the command line, but even without that, you should be able to follow along.

This will be more practical, so please keep in mind that the memory analysis one does is as good as an expected baseline and the experienced delta. In other words, we should try to pick a “control group” (the expected memory consumption) to compare it with a use case (the scenario of analysis, in which the memory consumption seems “too high”, as per some threshold). For example, C# ASP.NET Core applications can use from tens up to hundreds of MB at the start in debug mode. The gist for now is: this is not a micro optimization theoretical exercise - we'll measure an application performance like in the real world.

Which app?

We need a long-running process. Let's start by running a simple ASP.NET Core web application and having a look around the memory consumption.

For that, I've created a folder called complex-app and ran

dotnet new webapi

to download and set up a pre-configured app.

To run it, we can simply use

dotnet run

Or build our project using Visual Studio and run the executable. When assessing performance, prefer the Release build (not Debug), because it will be closer to the realistic production binary. Debug builds carry different symbols we simply don't need for our purposes.

This will start a new .NET Core runtime process with our solution and bind it a to a HTTP port, exposing any available controller routes (the usual stuff, from the Program.cs entry point).

Profiling

To introduce some visualization and look into what's happening memory-wise behind the scenes, let's use PerfView. It's a simple standalone .exe, so I've downloaded it and included it in our Release output folder (in my case complex-app\bin\Release\net10.0).

For that, we'll run PerfView and select the Collect -> Run option.

For the Command, I'll select our program "complex-app.exe".

And click on "Run command". This is going to start our process from within the Profiler, collecting memory (and trace, and other) data from its execution. We can see the logs of our application by clicking on the "Log" button on the bottom-right corner.

Analyzing

To analyze our profiling session, all we have to do is click on "Cancel" on the bottom right corner. Then we can get the .etl (analysis) file that was produced as output and open it in PerfView.

Feel free to familiarize yourself with the structure of this file. I invite you to start by double clicking on "CPU Stacks" and selecting the "complex-app" process, like in the following screenshot:

Then double click on the "complex-app" process and we'll see what is really consuming time in our program. And the winner is the coreclr! That's expected, the Core Language Runtime will definitely be doing most of the heavy lifting of our app, in my case representing 36% of total execution time.

Tip: you can drill down by double clicking on the coreclr reference itself and checking individual method calls. Cool, isn't it? Now we can see the likes of extensions, aspnetcore, logging and much more!

Let's look at the memory.

For that, I'll expand the "Memory" folder and navigate to "GC Heap Net Mem", as this is the snapshot of the Garbage Collector.

So this is how my "complex-app" looks like, there's a bunch of bytes of Int32, a few of String... We'll make things more interesting in the next section.

Load

A few things to keep in mind before we increase the scale of our app and move to a more relevant analysis:

State of the art CPUs can process billions of instructions per cycle, so ultimately the memory can be allocated, detected in scope and cleared out very fast

Also due to the complexity of modern applications, when it comes to profiling and optimization we're mostly interested in hot paths and benchmarks (remember the baseline and deltas)

There is a trade-off between clean code and performance, arguably that goes all the way down to design discussions between OOP and procedural programming

Load and scale are very important factors that usually come together: simplistic algorithms perform well with smaller inputs

Let's work on some information retrieval functionality for our app. Generally, it's a good practice to think about the data itself: different algorithms behave differently depending on the shape, quantity and quality of our data. Classes vs. structs, primitive vs. complex types, the more we understand about our data, the better. We'll simulate a random list of users with the following user generator:

namespace complex_app.Data;

using complex_app.Models;
using System.Text;

public static class UserGenerator
{
    private static readonly Random _rand = new Random();

    private static readonly string[] FirstNames =
    {
        "John", "Maria", "Alex", "Sophie", "Daniel", "Lucas", "Emma", "Noah",
        "Isabella", "Liam", "Oliver", "Mia", "Ethan", "Amelia"
    };

    private static readonly string[] LastNames =
    {
        "Smith", "Johnson", "Brown", "Taylor", "Anderson", "Silva",
        "Kowalski", "Dubois", "Novak", "Rossi", "Fernandez"
    };

    public static List<User> GenerateUsers(int count)
    {
        var users = new List<User>(count);

        for (int i = 0; i < count; i++)
        {
            double roll = _rand.NextDouble();

            string name = roll switch
            {
                < 0.70 => NormalName(),
                < 0.85 => CaseNoiseName(),
                < 0.95 => LongName(),
                _ => EmailStyleName()
            };

            users.Add(new User
            {
                Id = i,
                Name = name
            });
        }

        return users;
    }

    private static string NormalName()
    {
        return $"{Rand(FirstNames)} {Rand(LastNames)}";
    }

    private static string CaseNoiseName()
    {
        return ToRandomCase(NormalName());
    }

    private static string LongName()
    {
        return $"{Rand(FirstNames)} {Rand(LastNames)} {Rand(LastNames)} III (Contractor - EU West Region)";
    }

    private static string EmailStyleName()
    {
        return $"{Rand(FirstNames).ToLower()}.{Rand(LastNames).ToLower()}@company-internal-domain.eu";
    }

    private static string Rand(string[] arr)
        => arr[_rand.Next(arr.Length)];

    private static string ToRandomCase(string input)
    {
        var sb = new StringBuilder(input.Length);

        foreach (var c in input)
        {
            sb.Append(_rand.Next(2) == 0
                ? char.ToLowerInvariant(c)
                : char.ToUpperInvariant(c));
        }

        return sb.ToString();
    }
}

Simulation is one of the most powerful tools in performance analysis. A late great simulation will never be the same as a memory production issue, but an early good simulation can definitely avoid the issue entirely.

Now we add the list to our app, in Program.cs - I'll inject it as a global singleton dependency, with 20K users:

// ...
var users = UserGenerator.GenerateUsers(20_000);
builder.Services.AddSingleton(users);
// ...

Finally, we can come up with a small search behavior: I have the following interface, with the Search and SearchFast methods. You can probably see where we're heading from here, but I promise looking at the memory footprint will be interesting and perhaps can give some unexpected insights even for experienced engineers.

using complex_app.Models;

namespace complex_app.Services;

public interface IUserSearch
{
    List<User> Search(string query);
    List<User> SearchFast(string query);
}

public class UserSearch : IUserSearch
{
    private readonly List<User> _users;

    public UserSearch(List<User> users)
    {
        _users = users;
    }

    public List<User> Search(string query)
    {
        query = query.ToLowerInvariant();

        var results = _users
            .Where(u => u.Name.ToLowerInvariant().Contains(query))
            .ToList();

        return results;
    }

    public List<User> SearchFast(string query)
    {
        return _users
            .Where(u => u.Name.Contains(query, StringComparison.OrdinalIgnoreCase))
            .ToList();
    }
}

Exercising the app

Now what we need to do is simply run our app like before, with the run command. For the sake of empirical analysis, we'll compare:

A run using only the Search method, and
Another run using only the SearchFast method

In this experiment, each method will be called 10 times directly through the endpoints GET /seach?q=abc and GET /search-fast?q=abc, respectively.

While service warm up is definitely a factor influencing our performance, for this exercise start up times will be aggregated and neglected - for simplicity: both simulate the same piece of behavior with different implementations.

Final results

At last, let's compare using Search against SearchFast. So we have two files open in PerfView, a PerfViewData.etl.zip and below it a PerfViewData_Fast.etl.zip (I'm not very good at naming things, as you can see).

Starting with CPU stacks, we have our UserSearch appearing with 3.4% of total CPU time when it comes to Search (highlighted):

In SearchFast, interestingly, the search method stopped appearing in CPU traces entirely. Not because it disappeared, but because it was no longer a meaningful contributor to execution time - its time was below the sampling threshold of PerfView. Why is that the case? Let's look at the GC next.

We can do an in depth GC analysis by clicking on Memory -> GC Heap Net Mem. That will open a panel with our GC profile. Looking at Search first, we see 90% of garbage collection pressure was attributed to string operations. At first glance, it looked like the search logic was the bottleneck.

Comparatively, our User model took 0.8% of GC pressure, coming is as second place.

In SearchFast, after removing seemingly innocent allocations created by ToLowerInvariant, string-related GC pressure disappeared entirely. The dominant cost shifted to the user model itself.

The surprising part wasn’t that performance just improved, it was that the profiler stopped showing strings as a meaningful contributor at all.

Not only that, we see what looks like, for our purposes, a much healthier application, in which the GC is diving its time between our data models and JSON parsing for the HTTP pipeline.

Conclusion

What started as a seemingly simple search implementation ended up revealing something more fundamental about performance in .NET applications: the real cost of a feature is often not where we initially expect it to be.

At first glance, the Search and SearchFast methods appear almost identical in intent. Both iterate over the same dataset, both perform substring matching, and both return the same result type. From a purely algorithmic perspective, you might even assume their performance characteristics would be similar at this scale.

However, profiling tells a different story, and that's the importance of understanding about how we leverage memory in practice.

Commonly in managed runtimes like .NET, the observed slowness or memory leak is actually introduced by a large volume of short-lived allocations. In our example, repeated string transformations, particularly using ToLowerInvariant. This shifts the workload away from CPU-bound computation and into garbage collection pressure, with the profiler showing strings as the dominant contributor to GC activity.

In the optimized version, removing these allocations did not simply "improve performance" fundamentally changed the shape of the application. As we saw, string-related GC pressure disappeared entirely from the trace, and the remaining cost shifted naturally toward the data model and pipeline processing.

This is the key insight: performance work is not always about making a single operation faster. More often, it is about changing what the runtime is actually spending its time and memory on.

I hope that was useful and see you on the next one :)

References and relevant tools

Memory management in C# - Why we should care

Henrique Ferreira — Wed, 20 Sep 2023 15:59:57 +0000

One could argue that as software programmers our main goal is to efficiently manage hardware resources.

There are many of those, like network cards, hard drive disks, SSDs, I/O devices, telemetry devices, and arguably by far most commonly CPUs and RAM modules (notice the different abstraction layers in those examples).

So this time let’s talk about RAM modules. And C#.

Please notice this article greatly simplifies the contents. If you want to expand on the knowledge, please use the references at the end. There are too many great books on the subject by people much more knowledgeable than me :-)

“Wait, but isn’t it all abstracted away from us, as application developers?” Yes, and here’s why we should still care.

Where’s my memory? (a recap)

There are lots of interesting online explanations on Heap vs. Stack memory, especially theoretical and introductory. But how and when should we actually choose to use them?

Naturally, we are constrained to the Stack. When executing a program, the OS uses a stack implementation. Nothing too fancy, it’s literally (with some simplification) a stack of function calls (i.e. the Call Stack), internally implemented as a contiguous array of so-called stack frames.

Every language has a calling convention which specifies what goes into this stack (i.e. what is a stack frame). Generally, data is included regarding

Return addresses, for actually JUMPing between function calls
Parameter data
Some space for return values
Scoped variables

For our purposes, let’s leave other implementation details to the C# language specification, and the lower level stuff for the OS and the likes of Intel and AMD and their Instruction Set Architectures

From another perspective, the Call Stack tracks where the program is in terms of execution. It has to be a stack basically due to the efficiency of pushing and popping functions in it, which is exactly how a program executes in the first place!

And there it is, your parameter and variable memory is in “the Stack”. When the function returns, it finishes execution and gets popped from the stack, meaning the memory is automatically free’d up.

Since C# is a general purpose programming language, we could arguably write any solution with Stack memory only 👀 It’s fast, easy to write, and there’s no need for explicit memory management (the GC will end up managing it as well, but we’ll discuss this in the next sections). Here are some good use cases:

Recursive functions
Procedural programming
Functional programming

Of course, it is limited. The C# (.NET) stack size for 32bit architectures defaults to 1MB per Thread, which is a lot for most applications - we can go tens of thousands of levels deep for simple recursive functions. If you exceed it, be prepared for an infamous StackOverflow exception (that said please don't be afraid of recursive code).

The other problem is scope, meaning we’d have to pass down local variables as parameters all the way through the code execution, which would make complex code less readable and a nightmare to maintain.

Should it be all “new”?

But don’t panic! (and make sure to bring your towel). We still have the Heap. As with the Stack, it also resides within RAM. But it's a different data structure, for a different kind of access.

A heap can be implemented just as well from a simple array. The rules are different though, since it is a tree-like structure that allows for some optimized read access at the cost of more complex writes. So yes, it is slower than the Stack (it can even end up with some fragmentation). So why use it? It's virtually limitless! We can easily throw data at it, and we get the reference to where that data is.

As opposed to a value which holds the value itself (int x = 2), a reference is a pointer which holds an address to where the value is dynamically allocated (var x = new string[10]). The address size can vary, but on an 32-bit machine, they are usually 4 bytes long (pointing to anything in the Heap, including functions!). The difference is commonly abstracted away from the programmer in C#, so it is a good idea to have the C# Value type docs and the C# reference types docs close in case of doubts.

So how do we use the heap in C#? The short, over-simplified and slightly off answer is with the new keyword (or instantiations). The complete answer is:

A reference type is allocated on the heap
A value type is created where the scope is: if it's a field in a class, along in the heap. If it's a local variable in a method, most likely on the stack (notice C# has tons of features and there are more complex scenarios, for lambda expressions there have shared scopes for example)

The above list is short and ambiguous on purpose. The language specification makes no guarantees on what goes where, and the compiler will ultimately and optimally decide.

To another extent, the C# heap is a global Managed Heap. That also means there is a runtime data structure running along with our code: the Garbage Collector.

So my code is produces Garbage?

Jokes apart, the Garbage Collector (GC) doesn't mean to offend us. It is responsible for effectively managing the C# memory (Heap and Stack), and can be our best friend when used correctly.

To reiterate, the GC runs with our code and goes reclaiming dead objects (pieces of memory we're not using). Hence we don’t need to explicitly free memory, which is a great and dangerous power. Most content out there will tell you to leave it all to the GC, since it knows better.

If you’ve gone this far though, you’re like me and is not particularly happy with having some data structure we can’t control taking care of so much of our program. So like Morpheus, I’m hereby admittedly inviting you to take the red pill with me.

Collecting the garbage

Remember the reference type from the previous sections? Let’s assume the OOP nature of C# and start calling it “object” (it has to be an object anyways given how an OOP compiler makes Semantic Analysis).

In a “managed” language like C#, objects have lifetimes: a short-lived object has been allocated recently and is likely closed related to others also just allocated, while a long-lived is older and maybe less related.

For completeness sake, the GC also differentiates between Large Objects (85,000 bytes and larger) and regular ones, with a separate heap for each type

The criteria to define an object lifespan is based on so-called GC generations. There are three of them:

Gen 0: consists of youngest objects, like a recently allocated temporary variable. It’s the cheapest to manage, being the first option to free space for new allocations and also where GC collections are more frequent. All new objects implicitly go to Gen 0, except Large Objects, which go straight to Gen 3. We like Gen 0
Gen 1: literally the mid way of object lifetimes, in between short and long-lived objects. After the Gen 0 collection, the memory is compacted and objects - like collections, for example - are promoted to Gen 1. Gen 1 is not checked every time for collection, only when the Gen 0 collection couldn’t feee up enough space for newer allocations. We have mixed feelings for Gen 1.
Gen 2: long-lives objects, usually alive for the duration of a process (like Singletons, for example). Surviving a Gen 2 collection (i.e. a full collection) means sticking around until the memory is determined unreachable. We’re not fans of Gen 2, but it’s serves its purpose
Gen 3: is just a nickname space for Large Objects only

Objects can survive collections, being promoted from Gen 0 to Gen 1 and Gen 1 to Gen 2. In simples terms, the GC will try to find the sweet spot between not getting too intrusive in collections and not letting the memory become too big.

GC - Ghost Component

What we’ll discuss now may seem like a lot of developer work for a GC that automatically sorts out the memory usage. It isn’t. The thing is the GC makes thread pauses (for ephemeral collections - Gen 0 and Gen 1 - lasting a couple of milliseconds) to actually cleanup the memory. In general, what we want is to make these pauses less frequent and faster. Most of the real challenge goes to understanding how have we implemented the solution - how we have used the memory.

So you may have heard about “problems with the GC” or “memory leaks in C#”. These are usually due to Gen 2 objects that are hanged around or maybe a concentration of Large Objects.

Some objects interact with “out-of-process” resources (i.e. unmanaged resources), like disk I/O, and network connections (and any other general OS resources). When that happens, explicit cleanup becomes necessary (after all we wouldn’t want the GC to step in and shut down a network port, and it wouldn’t know if the port is being used anyways).

With regards to such unmanaged resources, a case-by-case analysis is necessary. For the GC to be able to collect them, we should implement the IDisposable.Dispose method. We should also consider using destructors, or in C# terms, the Object.Finalize method in case there’s a chance of Dispose not being called by a developer.

Finally, if you make too many allocations, you may get a nasty OutOfMemory exception. That ultimately means the OS was not able to provide addressable space for allocations. In such a case, remember: the .NET CLR, .exe and other .dll modules are sharing the memory with your application.

Analyzing and profiling

In the next parts, we’ll actually analyze and profile a C# program in practice. Then, we’ll discuss tools, trade-offs and myths in terms of memory management. See you there!

PS.: Please leave your comments and feedback. This very condensed article should be used as a general discussion and not a static reference. Let me know if I should expand on any of the content :)