DEV Community

muneeb-devp
muneeb-devp

Posted on • Edited on

Introduction to CLR | Part I

CLR 101

CLR (Common Language Runtime) is Microsoft's version of JVM (Java Virtual Machine). It's essentially a runtime engine that is at the core of .NET Framework. CLR does most of the heavy lifting for every kind of software that you write in the .NET world. CLR is what actually executes the instructions you wrote in your software on a processor-level and provides a variety of features (discussed below) to help you write more robust and secure software. 

Here's a list (not comprehensive by any means) of features that CLR provides to programmers:

  • Dynamic Memory Management
  • JIT (Just In Time) Compilation of IL code
  • Garbage Collection
  • Type Safety
  • Thread Management
  • Cross-language interoperability
  • Exception Handling
  • Reflection
  • Enforces CTS (Comman Type System)
  • Code Execution
  • Security

First released in 1999, alongside the .NET Framework and C# programming language, the .NET framework has evolved into a robust framework for developing performant and enterprise level Console, Web and Mobile applications. Windows operating system itself is built on top of .NET framewok, leveraging many of the core features of CLR described above. This has allowed windows to become the dominant operating system in the desktop OS market. 


 

Compilation

Traditional compilers were built to target a certain architecture of processor, directly translating source code into machine code. All this changed with the advent of Java and JVM, which instead compiled source code into byte-code, an intermediate language that could be interpreted by the JVM and then executed on any kind of platform, making the code truely portable. Microsoft took inspiration from this approach and bulit its own version in 1999. Initially .NET framework was just an abstraction layer over the Win32 API and COM but slowly evolved into a much bigger ecosystem for software development. The Common Language Runtime is language-agnostic, which means when executing the IL instructions, it does not know what programming language it was written in. Puzzled? How is that even possible? 

The answer is CLS (Common Language Specification), it's a specification developed by Microsoft and standardized by ECMA that defined a common set of features and rules that must be adhered to generate assemblies for .NET. Any compiler that conforms to the CLS specification is qualified to produce assemblies  that can be executed by CLR and can even be used by other programming languages that have a CLS compliant compiler. At the time of writing, there are about a dozen or so languages that have CLS compliant compilers and can produce IL code, including some of the most popular languages such as Python, PHP, Prolog, Smalltalk, LISP, COBOL, Haskell, Lua, Ada, VB.NET, C#, F# to name a few. 

 

Compilation process

 

 

Now the next question that comes to mind is, what even is a Managed module? Simply put, Managed module is a file type specific to Microsoft called PE32 (Portal Executable 32) or PE32+ (Portable Executable for 64bit systems). Here's a brief (not at all comprehensive by design) description of a PE32 file parts:

  • PE32/PE32+ Header

    *   Tells about the type of file i.e. is this a console app, a GUI or a DLL. 
    
  • CLR header

    *   Contains information to be interpreted by CLR
    
    • Contains the CLR version number
    • metadata for the entry point of the managed module (Main method)
    • location and size of metadata for the module
    • resources
  • Metadata

    *   Contains two main types of metadata tables:
    
            *   A table that describes the types defined in the module (classes, enums, structs, delegates e.t.c)
    *   A table that defines the members (methods) referenced by the code
    
  • IL Code

    *   Code produced by the compiler
    
    • IL code is CPU-agnostic, you can think of it as an OO version of machine code.
    • That is what gets translated into machine code by CLR and then executed

Any CLS compliant compiler for .NET is required to add all the metadata in the same file as the IL code. This tightly binds the managed module. Metdata for a managed module includes the defined types, methods, imports and dependencies on other modules and even all the information related to what parameter a certain method expects. This is what allows Visual Studio's IntelliSense feature to give you exact information for each type, method, property, event e.t.c. Metdata also plays a cruicial role in code-verification process (a subject for future article), serialization of objects, and garbage collection.

 

But wait, compilers produce Managed Modules but in the .NET world you'll almost always hear about Assemblies (exe/dll file) as the output of a build. So how do Managed Modules and Assemblies differ?

Simply put, Assemblies are a logical grouping of one or more managed modules and any associated resource files. 

Managed modules in assemblies

CLR works with Assemblies and not Managed Modules directly, so compiler by default automate the job of grouping multiple managed modules into an Assembly. The important thing to note about an Assembly is the Manifest. The Assembly Manifest itself is just a data table that describes what source files are included in the Assembly, publicly exported types and resource or data files associated with an assembly (if any). Assemblies are stand-alone components, they have a name and version number, dependency information (i.e. referenced assemblies)

 

Now let's put all this together into a simple program to illustrate how CLR executes the code you write, 

namespace Greeting
{
    class Hello {         
        static void Main()
        {
            Console.WriteLine("Hello");
            Console.WriteLine("Goodbye");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

 

The compiler first checks all of the types that are being referenced by the Main method, in our example it only refers to a single type System.Console. An internal data structure in the CLR holds references to the address that contains the code for each method. When initializing this datastructure, CLR adds each entry to an internal function, which for the sake of argument we shall call JITCompiler here. The JITCompiler method bears the responsibility of compiling the method's IL code to native machine code and executing it. Here's a visual illustration of the process:

 

 

That's about it for this article. We'll go into more depth into CLR's internal working in future articles. 

Top comments (0)