DEV Community

Cristian Sifuentes
Cristian Sifuentes

Posted on

Basic Concepts of C# Data Types — From Bits to LLM‑Ready Mental Models

Basic Concepts of C# Data Types — From Bits to LLM‑Ready Mental Models<br>

Basic Concepts of C# Data Types — From Bits to LLM‑Ready Mental Models

Most developers can list C# data types from memory:

int, double, bool, char, string, decimal, enum, struct, class...
Enter fullscreen mode Exit fullscreen mode

But if you ask deeper questions, the room gets quiet:

  • How do these types actually map to IL stack types (I4, I8, R4, R8, OBJ)?
  • When does a value live in a register, when does it live on the stack, and when on the heap?
  • Why does 0.1 + 0.2 misbehave with double but not with decimal?
  • What does the JIT do differently for List<int> vs List<double> vs List<object>?

And more important for this era:

How do I talk about data types with LLMs so they can help me at a systems level, not just tutorial level?

In this post we’ll walk through a DataTypesDeepDive.cs file that treats C# data types like a compiler engineer would — and we’ll connect that to how you can ask better questions to LLMs and expand your understanding.

If you can compile a C# console app, you can follow along.


Table of Contents

  1. Mental Model: How Any Data Type Travels Through the Stack
  2. The Demo File: DataTypesDeepDive.cs
  3. Basic Types in C#: What They Really Mean to the CPU
  4. Integer Types: Bits, Two’s Complement, and Registers
  5. Floating Point: IEEE‑754, Precision, and Why 0.1 + 0.2 ≠ 0.3
  6. Boolean: Just a Byte on Top of CPU Flags
  7. char and string: UTF‑16, Interning, and Allocations
  8. Struct Layout & Padding: Why Field Order Can Matter
  9. Enums: Type‑Safe Names over Raw Integers
  10. Generics & Reification: Different JIT Code per Type
  11. Using This Mental Model to Get More from LLMs
  12. Data‑Type Mastery Checklist (Top‑1% Developer Mindset)

1. Mental Model: How Any Data Type Travels Through the Stack

At a high level, every data type in C# goes through the same pipeline:

// File: DataTypesDeepDive.cs
// Author: Cristian Sifuentes + ChatGPT
// Goal: Explain C# data types like a systems / compiler / performance engineer.
//
// High-level mental model (how ANY data type travels through the stack):
//  1. The C# compiler (Roslyn) translates your code into IL (Intermediate Language).
//  2. The JIT compiler (at runtime) translates that IL into machine code for your CPU.
//  3. The CLR runtime + JIT decide how each data type is represented:
//       - Which IL "stack type" it uses (I4, I8, R8, OBJ, etc.).
//       - Whether it lives in a register, stack slot, or on the managed heap.
//  4. The CPU only sees bits: fixed-width integer registers, floating-point registers,
//     and bytes in memory. “int”, “double”, “string” are abstractions on top of this.
Enter fullscreen mode Exit fullscreen mode

Key idea: int, double, string, enum, struct are names for patterns of bits and access rules.

The CPU doesn’t see types — it sees instructions and data widths.

If you want LLMs to act like real systems experts, your questions should reference this pipeline: Roslyn → IL → JIT → CLR → CPU.


2. The Demo File: DataTypesDeepDive.cs

Here’s the “front door” of our demo method:

partial class Program
{
    static void DataTypesDeepDive()
    {
        var integer = 42;
        double decimalNumber = 3.1416;
        bool isTrue = true;
        char character = 'C';
        string text = "Hi C#";
        Console.WriteLine($"Int: {integer}, Decimal: {decimalNumber}, Boolean: {isTrue}, Char: {character}, Text: {text}");

        BasicDataTypesIntro();
        IntegerBitLevel();
        FloatingPointInternals();
        BooleanSemantics();
        CharAndStringInternals();
        StructLayoutAndPadding();
        EnumUnderlyingTypes();
        GenericSpecializationDemo();
    }
}
Enter fullscreen mode Exit fullscreen mode

This single method calls a set of focused “labs” — each one exploring how a specific group of types behaves internally.

We’ll use this file as the reference artifact you can commit to your GitHub repo and send to LLMs when asking questions.


3. Basic Types in C#: What They Really Mean to the CPU

The first lab, BasicDataTypesIntro, revisits the classic example — but from the IL and CPU point of view:

static void BasicDataTypesIntro()
{
    var integer = 42;               // System.Int32
    double decimalNumber = 3.1416;  // System.Double
    bool isTrue = true;             // System.Boolean
    char character = 'C';           // System.Char
    string text = "Hola C#";        // System.String

    Console.WriteLine(
        $"[BasicDataTypesIntro] Entero: {integer}, Decimal: {decimalNumber}, " +
        $"Booleano: {isTrue}, Carácter: {character}, Texto: {text}");
}
Enter fullscreen mode Exit fullscreen mode

Conceptually, IL will have locals:

.locals init (
    [0] int32   integer,
    [1] float64 decimalNumber,
    [2] bool    isTrue,
    [3] char    character,
    [4] string  text
)
Enter fullscreen mode Exit fullscreen mode

And to the CPU:

  • int32general‑purpose registers (e.g., EAX/RAX).
  • float64 (double) → floating‑point/XMM registers.
  • bool → just 0 or 1 (often extended to 32 bits in registers).
  • char → a 16‑bit integer (UTF‑16 code unit).
  • stringpointer to a heap object with layout roughly: [object header][method table pointer][int32 Length][UTF‑16 chars...].

Takeaway: The same high-level syntax maps to very different low-level representations and instruction sets.


4. Integer Types: Bits, Two’s Complement, and Registers

IntegerBitLevel() goes deeper into signed/unsigned integers and two’s complement:

static void IntegerBitLevel()
{
    sbyte  s8  = -1;
    byte   u8  = 255;
    short  s16 = -12345;
    ushort u16 = 65535;
    int    s32 = -123456789;
    uint   u32 = 4000000000;
    long   s64 = -1234567890123456789L;
    ulong  u64 = 18446744073709551615UL;

    Console.WriteLine($"[IntegerBitLevel] int: {s32}, uint: {u32}");
    Console.WriteLine($"sbyte -1 raw bits: {Convert.ToString(s8, 2).PadLeft(8, '0')}");
}
Enter fullscreen mode Exit fullscreen mode

Two’s complement refresher

For an N‑bit signed value, the raw bits represent:

value = raw_bits if the sign bit is 0

value = -(2^N - raw_bits) if the sign bit is 1

So sbyte s8 = -1 has bits:

1111 1111  (0xFF)
Enter fullscreen mode Exit fullscreen mode

Performance note

  • On 32/64‑bit CPUs, int (System.Int32) is the “natural” size for arithmetic.
  • The JIT often extends byte/short to 32 bits in registers anyway.
  • Smaller types help mainly with memory footprint & bandwidth (arrays, serialization, network packets).

💬 LLM prompt idea

“Given this IntegerBitLevel() method, explain how the JIT extends smaller integer types to 32 bits in registers and why int is usually the most efficient integer type for computation.”


5. Floating Point: IEEE‑754, Precision, and Why 0.1 + 0.2 ≠ 0.3

FloatingPointInternals() tackles the classic trap:

static void FloatingPointInternals()
{
    double a = 0.1;
    double b = 0.2;
    double c = a + b;

    Console.WriteLine($"[FloatingPointInternals] 0.1 + 0.2 = {c:R}");
}
Enter fullscreen mode Exit fullscreen mode

Why is the result slightly off? Because double is IEEE‑754 binary64:

  • 1 bit sign
  • 11 bits exponent (biased)
  • 52 bits fraction (mantissa)

Values like 0.1 and 0.2 are not exactly representable in base‑2, so the nearest representable numbers are stored, and their sum reflects that rounding error.

The code also inspects the raw bits:

long bits = BitConverter.DoubleToInt64Bits(c);
Console.WriteLine($"Bits of (0.1+0.2): 0x{bits:X16}");
Enter fullscreen mode Exit fullscreen mode

Then it compares with decimal:

decimal d1 = 0.1m;
decimal d2 = 0.2m;
decimal d3 = d1 + d2;
Console.WriteLine($"decimal 0.1m + 0.2m = {d3}");
Enter fullscreen mode Exit fullscreen mode

Tradeoff

  • double: hardware‑accelerated, very fast, but binary fractions.
  • decimal: software‑implemented, slower, but base‑10 friendly and ideal for money.

💬 LLM prompt idea

“Using the FloatingPointInternals() example, explain how IEEE‑754 binary64 encodes 0.1 and 0.2, and how decimal differs in representation and performance. Include IL and CPU perspectives.”


6. Boolean: Just a Byte on Top of CPU Flags

BooleanSemantics() reminds us that bool is tiny on the surface but powerful in control flow:

static void BooleanSemantics()
{
    bool flag = true;
    Console.WriteLine($"[BooleanSemantics] flag = {flag}");

    int x = 5, y = 10;
    bool less = x < y;
    Console.WriteLine($"x < y = {less}");
}
Enter fullscreen mode Exit fullscreen mode
  • In IL, bool is defined as 1 byte.
  • In registers, it’s just a 0 or non‑zero integer.
  • Comparisons like x < y use a CPU instruction (e.g., cmp) that sets flags, which then drive conditional jumps.

Lesson: Boolean is a logical view on top of integer bits + CPU status flags.


7. char and string: UTF‑16, Interning, and Allocations

CharAndStringInternals() explores Unicode and string layout:

static void CharAndStringInternals()
{
    char ch = 'C';
    Console.WriteLine($"char: {ch}, code unit: {(int)ch}");

    string s1 = "Hola C#";
    string s2 = "Hola C#";

    Console.WriteLine($"ReferenceEquals(s1, s2): {object.ReferenceEquals(s1, s2)}");
}
Enter fullscreen mode Exit fullscreen mode

Key facts:

  • System.Char is a UTF‑16 code unit (16‑bit).
  • Strings are immutable, heap‑allocated, and can be interned.
    • Identical string literals often share the same instance in the intern pool.
  • Layout is roughly: [object header][method table][int Length][chars...].

The demo also encodes to UTF‑8:

byte[] utf8 = Encoding.UTF8.GetBytes(s1);
Enter fullscreen mode Exit fullscreen mode

Performance note: repeated string concatenation (+ in loops) produces many allocations. Prefer StringBuilder or span‑based APIs for hot paths.

💬 LLM prompt idea

“Given CharAndStringInternals(), explain the in‑memory layout of string in .NET, how interning works, and when that matters for performance and memory.”


8. Struct Layout & Padding: Why Field Order Can Matter

With [StructLayout(LayoutKind.Sequential)] you can see how padding works:

[StructLayout(LayoutKind.Sequential)]
struct PackedExample1
{
    public bool Flag;
    public double Value;
}

[StructLayout(LayoutKind.Sequential)]
struct PackedExample2
{
    public double Value;
    public bool Flag;
}
Enter fullscreen mode Exit fullscreen mode

StructLayoutAndPadding() checks their sizes:

int size1 = Marshal.SizeOf<PackedExample1>();
int size2 = Marshal.SizeOf<PackedExample2>();

Console.WriteLine($"Size1 (bool,double) = {size1} bytes");
Console.WriteLine($"Size2 (double,bool) = {size2} bytes");
Enter fullscreen mode Exit fullscreen mode

Due to alignment rules:

  • A double prefers an 8‑byte boundary.
  • The runtime may add padding bytes after bool or at the end of the struct.

For arrays of these structs in hot loops, layout can affect:

  • How many elements fit in a cache line.
  • How many cache misses and TLB misses you incur.

This is a micro‑optimization, but in data‑heavy systems it can matter.


9. Enums: Type‑Safe Names over Raw Integers

Enums give you names but compile down to integers:

enum Status : byte
{
    None = 0,
    Started = 1,
    Completed = 2,
    Failed = 3
}
Enter fullscreen mode Exit fullscreen mode

EnumUnderlyingTypes():

Status st = Status.Completed;
Console.WriteLine($"Status = {st}, raw = {(byte)st}");
Enter fullscreen mode Exit fullscreen mode

IL wise, an enum is essentially a struct with an integral value__ field. At runtime:

  • Comparisons on enums are as cheap as integer comparisons.
  • Enums can use different underlying sizes to pack data tightly (e.g., byte for small state machines).

💬 LLM prompt idea

“Using the Status : byte enum, explain the IL shape of an enum type and how the underlying integral type affects memory layout and interop.”


10. Generics & Reification: Different JIT Code per Type

Finally, GenericSpecializationDemo() shows how .NET generics are reified:

class SimpleList<T> where T : struct
{
    private T[] _items;
    private int _count;

    public void Add(T item) { ... }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public T Sum()
    {
        dynamic sum = default(T);
        for (int i = 0; i < _count; i++)
        {
            sum += (dynamic)_items[i];
        }
        return (T)sum;
    }
}
Enter fullscreen mode Exit fullscreen mode

And in the demo:

var listInt = new SimpleList<int>();
listInt.Add(10);
listInt.Add(20);

var listDouble = new SimpleList<double>();
listDouble.Add(3.14);
listDouble.Add(2.71);
Enter fullscreen mode Exit fullscreen mode

The JIT will generate:

  • Specialized machine code for SimpleList<int> using integer registers.
  • Different specialized code for SimpleList<double> using floating‑point registers.

This is a big reason why List<int> and List<double> are so efficient compared to “boxed” approaches.

⚠ The dynamic keyword in Sum() is only for demo purposes; in real numeric code you’d want a different pattern to avoid dynamic overhead.


11. Using This Mental Model to Get More from LLMs

Now the fun part: how do you use all this with LLMs like ChatGPT or Claude?

11.1 Ask Questions Across Layers

Instead of:

“Explain double in C#.”

Ask:

“Using this FloatingPointInternals() method from my DataTypesDeepDive.cs, explain how Roslyn, IL, the JIT, and the CPU each see the value 0.1, and why 0.1 + 0.2 is not exactly 0.3.”

You’re forcing the model to traverse all abstraction layers.

11.2 Bring Your Code as Context

Paste parts of your file and say:

“Given this struct + StructLayout.Sequential, draw the field offsets and padding in memory on x64, and reason about cache behavior for an array of a million of these structs.”

Now you’re not getting a generic answer — you’re getting feedback on your codebase.

11.3 Drill into IL and Assembly

Examples:

  • “Show the IL for GenericSpecializationDemo() and explain how SimpleList<int> and SimpleList<double> produce different machine code.”
  • “Show how the JIT might translate x < y into CPU flags and conditional branches.”
  • “Given this enum, show how it’s represented in memory when used inside a struct.”

11.4 Turn Questions into Experiments

Ask the model for microbenchmark designs:

“Design BenchmarkDotNet tests to measure the difference between double and decimal for simple additions at scale, and predict the outcome before running.”

You’re using the model as both a teacher and a research collaborator.


12. Data‑Type Mastery Checklist (Top‑1% Developer Mindset)

Use this list to track your progress — and as a prompt basis when you want deep dives from LLMs.

  • [ ] I can explain how C# data types map to IL stack types (I4, I8, R4, R8, O).
  • [ ] I understand how value types vs reference types impact stack, heap, and registers.
  • [ ] I can reason about integer types, two’s complement, and little‑endian byte order.
  • [ ] I know how IEEE‑754 floating point works and when to use decimal instead of double.
  • [ ] I can explain how bool is implemented on top of CPU flags and integer registers.
  • [ ] I know the internal representation of char and string, and how interning affects memory.
  • [ ] I can reason about struct layout, padding, and how field order might affect cache usage.
  • [ ] I understand enums as named integer values with well‑defined underlying types.
  • [ ] I know that .NET generics are reified and that value types get specialized JIT code.
  • [ ] I can ask LLMs questions that explicitly mention Roslyn, IL, JIT, CLR, stack, heap, and CPU behavior.

Once you think like this, you stop seeing int and string as “magic C# types” and start seeing them as bit patterns, layouts, and contracts that travel through a pipeline of compiler and runtime stages.

That’s when LLMs stop being copy‑paste generators and start becoming partners in systems‑level reasoning.

Happy hacking — and may your bits, bytes, and types always line up exactly how you expect. ⚡

Top comments (1)

Collapse
 
shemith_mohanan_6361bb8a2 profile image
shemith mohanan

Nice work — you took “C# data types” from a beginner topic and turned it into a systems-level view. The Roslyn → IL → JIT → CPU pipeline is explained clearly, and the examples make it practical, not theoretical. Loved the parts on struct padding, string internals, and generics reification — that’s the kind of detail most devs miss. It reads like a solid guide for anyone who wants to think beyond syntax and actually understand what the machine does