In the previous blog post, we dived into the MSBuild Engine by creating a project file from scratch. The project file is what MSBuild Engine uses to build our application.
In this part of this series, we will review the compiled CIL file that was built previously to understand how the compiler translated our code.
Common Intermediate Language (CIL)
When we compile our code written in any .NET language, the associated compiler (C#, VB Compiler) generates binaries called assembly which contains IL code. These instructions are low level human readable language which can be converted into machine language by the run-time compiler during its first execution. It's done just during execution so that the compiler has before hand knowledge of which environment it's going to run in so that it can emit the optimized machine language code targeting that platform. This is also known as Just-In-Time (JIT) compiler.
CIL is an _ object-oriented assembly language _ and is CPU and platform-independent instructions that can be executed in any environment supporting the Common Language Infrastructure such as the .NET run time.
Evaluation Stack
Before we look at the IL code, it's important that we understand the role of Evaluation Stack in executing the CIL instructions.
A stack is the data structure that follows _ Last In - First out _ data storing method as demonstrated in the image below.
Evaluation stack is used to hold the local variable or the method argument before they are evaluated. Instructions that copy values from memory to the evaluation stack are called Load , and instructions that copy values from stack back to memory are called Store. All the Opcodes starting with ld
are used for loading the item on the stack, and the Opcodes starting with st
are used for storing the item in memory.
At the beginning of the function, it is required to provide the maximum items that would be present on that stack at any particular time. This is done using the .maxstack
directive. If not provided, it will be default to 8.
Now, we're ready to go look at some codes. 👩💻
CIL Example
If you follow along the tutorial in the last part, by now you should have the HelloWorld.exe
file in the Bin
folder. Because the compiler embeds IL in files, we need to use a _ disassembler _ to view the CIL. All .NET flavors come with Microsoft's own disassembler called _ ILDASM _ - Intermediate Languague Disassembler. To use ILDASM, we need to use the Developer Command Prompt for Visual Studio. Invoke the following command from the command prompt:
ildasm Bin\HelloWorld.exe /output:Bin\HelloWorld.il
Let's look at the output HelloWorld.il
file. This file is filled with IL code. If you have ever worked in or seen assembly-level programming, you might notice some similarities. Common Intermediate Language is definitely harder to read and more "close to the metal" than regular C# code, but it's not as mysterious as it might look. By stepping through the IL code line by line , you'll see that this is just a different syntax for programming concepts you already know.
// Microsoft (R) .NET Framework IL Disassembler. Version 4.8.3928.0
// Copyright (c) Microsoft Corporation. All rights reserved.
// Metadata version: v4.0.30319
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
.ver 4:0:0:0
}
.assembly HelloWorld
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::.ctor(int32) = ( 01 00 08 00 00 00 00 00 )
.custom instance void [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::.ctor() = ( 01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78 // ....T..WrapNonEx
63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 ) // ceptionThrows.
// --- The following custom attribute is added automatically, do not uncomment -------
// .custom instance void [mscorlib]System.Diagnostics.DebuggableAttribute::.ctor(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = ( 01 00 07 01 00 00 00 00 )
.hash algorithm 0x00008004
.ver 0:0:0:0
}
.module HelloWorld.exe
// MVID: {381571BD-67C6-4919-A3A1-5BAC05B0DDD1}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003 // WINDOWS_CUI
.corflags 0x00000001 // ILONLY
// Image base: 0x06F00000
// =============== CLASS MEMBERS DECLARATION ===================
.class private auto ansi beforefieldinit HelloWorld
extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 13 (0xd)
.maxstack 8
IL_0000: nop
IL_0001: ldstr "Hello World!"
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: ret
} // end of method HelloWorld::Main
.method public hidebysig specialname rtspecialname
instance void .ctor() cil managed
{
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: nop
IL_0007: ret
} // end of method HelloWorld::.ctor
} // end of class HelloWorld
// =============================================================
// ***********DISASSEMBLY COMPLETE***********************
// WARNING: Created Win32 resource file Bin\HelloWorld.res
Let's go over some of the syntax we notice in the code above.
CIL Directives, Tokens, and Attributes
In the above code, we notice some names (CIL Tokens) with the.
. prefix, e.g..assembly
,.namespace
,.class
,.method
,.ctor
,.override
. These are called CIL Directives. The _ tokens _ that are used along CIL Directive and describe how the CIL Directive should be processed are called CIL Attributes.CIL Opcodes
Operation codes are tokens that are used to build the type's implementation logic. This is the area where we are going to focus in our remaining article.CIL Code Labels
The tokens likeIL_000
,IL_001
, etc. are called CIL Code Labels. These are just optional labels that can be replaced with any text of your choice.
Now that you understand some of the syntax. Let's look at the code.
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
.ver 4:0:0:0
}
This first block of code has the .assembly extern
declaration, which is used to reference an external assembly. In this case, it's the mscorlib
, which contains the definition of System.Console
- the only type that we have used outside of our assembly. The next block of code also has the .assembly
directive but without the extern
declaration, which is used to declare the mame of the assembly of this program.
For the remaining of this article we will focus on the last block of code, which has the Main
method that is the "heart" of our simple console application.
// =============== CLASS MEMBERS DECLARATION ===================
.class private auto ansi beforefieldinit HelloWorld
extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 13 (0xd)
.maxstack 8
IL_0000: nop
IL_0001: ldstr "Hello World!"
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: ret
} // end of method HelloWorld::Main
.method public hidebysig specialname rtspecialname
instance void .ctor() cil managed
{
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: nop
IL_0007: ret
} // end of method HelloWorld::.ctor
} // end of class HelloWorld
// =============================================================
- The
.ctor
directive represents instance level constructor.ctor
is always qualified withspecialname
andrtspecialname
attribute. Special name is used to indicate that this token can be treated differently by different tools.
Next, let's look at the Main
method, which was declared as _ private _ and _ static. _
- The
hidebysig
attribute means that the member in the base class with the same name and signature is hidden from derived class. - The
.entrypoint
directive is the _ entry point _ of the executable program. When the C# compiler compiles this code, it marks theMain
method with.entrypoint
IL directive. In .NET, the Common Language Runtime (CLR) looks for a specific _ entry point _ in the compiled executable, making it the application's starting method. - The
nop
instruction is simply a debug build artifact and are used to allow to put breakpoint on the curly braces. - The
ldstr
instruction load the string on the stack. In this case, it's the "Hello World!" string value. - Next, the
call
opcode calls the base class constructor. - Finally, the
ret
opcode exits a method and return a value to the caller (if any).
Does this feel too deep in the weeds for you? Don't worry! We won't be learning how to code in assembly language, I promise.😛 The intent is to get a high level understanding of how everything is wired together under the hood. In the next few posts in this series, we will focus on learning about the ASP.NET Core framework. Stay tuned!
Top comments (0)