El cat bot

Posted on Jul 5

Understanding the .NET CLR: What Every C# Developer Should Know

#clr #csharp #dotnet

Introduction

Every day of developer's career, interaction with a programming language is the way to compile and run applications. If no errors found, it just executes and performs its job.

Sometimes, new to intermediate developers have asked themselves at least one of the following questions:

What happens when code is compiled?
is CPU able to understand C# code?
How variables, objects are handled in memory?
What does "Run" mean?

Questions before are completely valid. To answer them, .NET has a powerful platform doing so much work behind the scenes: The CLR.

What is the CLR?

.NET CLR (Common Language Runtime) is the engine of the .NET environment where C# code is compiled and executed. Every application built on ASP.NET Core MVC, WinForms, MAUI, etc, needs a runtime to execute. It supports different programming languages such as C#, Visual Basic, F#, C++/CLI, among others. That's why the "Common Language" part of the name.

To put it simple, the CLR is a virtual machine that handles very important tasks:

C# code compilation into IL (Common Intermediate Language)
Code execution by JIT (Just-in-Time compiler)
Memory management by GC (Garbage Collector)
Exception handling
Thread management

It is worth mentioning that C# code is just text. A platform needs to bring C# code to life. That is CLR's job.

CLR code flow

All of those processes occur when a developer runs an application.

Taking into account the previous flow, now, questions at the beginning can be answered:

What happens when code is compiled?

After developers hit compile or execute "dotnet build" command, CLR transforms C# code into IL code and stores it in a .dll or .exe file. This file contains all IL code and metadata necessary for the JIT compiler to read. It is worth pointing out that IL is CPU independent, meaning it has to be translated later.

is CPU able to understand C# code?

No, CPU cannot read C# nor IL code. That leads us to think about how machine can read code. CLR is perfectly aware of this, which is why it has a specialized compiler called JIT (Just-in-Time). When application starts, JIT takes all the IL code from .dll files, performs optimizations and loads native machine code into memory. Now, CPU threads can execute instructions in code they can understand.

How variables, objects are handled in memory?

Once program code is running, information about objects and their values need an storage location. That's the job for:

The Stack
Garbage Collector (GC)

Value types (int, bool, decimal, etc ) and local variables in a method execution go directly to the Stack. The Stack is a LIFO (Last-in, First-out) storage. It allows two operations: adding data on the top (push) and returning data from the top (pop). This model is perfect for local variables to be disposed immediately after the method finishes. Think about a method doing Sum operation as the following:

public int Sum(int a, int b) => a + b;

As mention before, integers are value types and they are loaded to the Stack for fast execution. Therefore, the Sum of "a" and "b" is performed and the result is returned. Afterward, "a" and "b" are unloaded from the Stack:

For reference types (objects), the GC communicates to the OS to reserve memory and create a location where .NET objects will reside. This location is the GC Managed Heap. There, reference types and static objects are allocated. The managed heap is divided into segments called generations:

Generation 0 (G0)
Generation 1 (G1)
Generation 2 (G2)
Generation 3 or Large Object Heap (LOH)

Let's say there is an existing class called MyType. When creating an instance of MyType:

var myType = new MyType();

It goes directly to G0. The "new" keyword tells the compiler to add the necessary metadata to allocate this reference type, right on the Managed Heap.

It is worth noting that G0, G1 and G2 are also known as Small Object Heap (SOH). Together, GC and JIT allocate objects into SOH if object size is <= 85000 bytes. Otherwise, they go to the LOH.

When JIT needs to allocate a new object, it "asks" GC to retrieve a memory address, if there is no space on the managed heap, GC performs a Collection. It means, it releases memory by getting rid of unused objects.

What does "Run" mean?

When a dotnet application is executing previous CLR steps, it is running. That simple. On application startup, CLR performs IL compilation once, JIT compilation every time a method is called for the first time and memory management by allocating-collecting objects.

Conclusion

The CLR is a massive engine (almost invisible to developers) that powers all .NET Applications. It's important to start delving into CLR fundamentals to write better code. For example, when experiencing performance issues, it's better to understand how Garbage Collector works to detect memory leaks. Therefore, the problem could be a Managed Heap full of objects and likely in C# code level.

What's next?

Start reading about CLR implementation to understand this interesting but important topic. The following Microsoft documentation and books are packed with concepts to help digging deeper into CLR code and architecture:

The book of the Runtime: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/README.md
Pro .NET Memory Management by Konrad Kokosa
The Garbage Collection Handbook by Richard Jones, Antony Hosking
CLR via C# by Jeffrey Richter
.NET IL Assembler by Serge Lidin

DEV Community