DEV Community

Dhruv
Dhruv

Posted on

Beyond the Bootloader: The 32-bit to 64-bit Leap in Rust OS-Dev

When GRUB hands control to your kernel entry point via the Multiboot2 protocol, your CPU is in 32-bit Protected Mode. Interrupts are likely off. The A20 line is enabled. Segment registers have been set up with flat 4 GiB descriptors. You're running, but you're still in the 1990s.

To write a real Rust kernel — one that can use 64-bit pointers, access more than 4 GiB of RAM, and benefit from the x86_64 calling convention — we need to leave that world behind and enter Long Mode.

This transition is not handled by a library. There's no std. There's no runtime. There's nothing between you and the silicon except a handful of assembly instructions and a very specific sequence of CPU configuration steps. Get the sequence wrong, and you'll triple-fault into a reboot. Get it right, and you'll be calling Rust code from the most fundamental level of your system.

Let's walk through every step.


The Architecture of Long Mode Entry

Before writing a single line, let's understand what the CPU actually requires before it will enter 64-bit Long Mode. According to the Intel SDM (Vol. 3A, Section 9.8.5), the processor checks all of these in order:

  1. Long Mode is supported — confirmed via CPUID
  2. PAE is enabled — Physical Address Extension, bit 5 of CR4
  3. CR3 points to a valid PML4 (P4) table — your page tables
  4. LME is set in EFER — the Long Mode Enable bit
  5. Paging is enabled — bit 31 of CR0
  6. A 64-bit GDT is loaded — and a far jump reloads CS

Each of these is a hard requirement. Miss one and the CPU either ignores your attempt silently or triple-faults. We'll implement all six, in order.

1. The Pre-Flight Check: CPUID

CPUID is the CPU's self-identification instruction. Before enabling anything, we need to confirm two things:

  1. The CPUID instruction itself is supported (not guaranteed in very old code)
  2. Long Mode (the LM bit) is available

Checking for CPUID Support

The CPUID instruction is available if and only if bit 21 of EFLAGS can be toggled. If the bit is read-only, the CPU predates CPUID. Here's the check:

; ── Check if CPUID is supported ──────────────────────────────────────────────
; We attempt to flip bit 21 (the ID flag) in EFLAGS.
; If it stays flipped, CPUID exists. If not, we're too old.
check_cpuid:
    pushfd                      ; Push EFLAGS onto the stack
    pop  eax                    ; Pop them into EAX
    mov  ecx, eax               ; Save original in ECX for comparison
    xor  eax, 1 << 21           ; Flip bit 21 (the "ID" flag)
    push eax                    ; Push the modified value
    popfd                       ; Load it back into EFLAGS
    pushfd                      ; Push EFLAGS again to read the result
    pop  eax                    ; Load result into EAX
    push ecx                    ; Restore original EFLAGS
    popfd
    cmp  eax, ecx               ; Did anything change?
    je   .no_cpuid              ; If identical: bit was read-only → no CPUID
    ret
.no_cpuid:
    mov  al, "1"                ; Error code "1"
    jmp  error
Enter fullscreen mode Exit fullscreen mode

Checking for Long Mode Support

With CPUID confirmed, we query the "Extended Processor Info" leaf (0x80000001). Bit 29 of EDX in the response is the LM bit — "Long Mode Supported."

; ── Check for Long Mode via Extended CPUID ───────────────────────────────────
check_long_mode:
    mov  eax, 0x80000000        ; Query highest supported extended function
    cpuid
    cmp  eax, 0x80000001        ; Does the CPU support extended info?
    jb   .no_long_mode          ; If not, Long Mode definitely isn't available

    mov  eax, 0x80000001        ; Extended processor info and feature bits
    cpuid
    test edx, 1 << 29           ; Check the LM (Long Mode) bit
    jz   .no_long_mode          ; Not set? We can't continue.
    ret
.no_long_mode:
    mov  al, "2"
    jmp  error
Enter fullscreen mode Exit fullscreen mode

Why not just skip this check? If you try to enter Long Mode on a CPU that doesn't support it, the behavior is undefined — usually a triple fault. Worse, you might be running in a VM with unusual CPU configuration. Always check.


2. Setting Up Page Tables

This is the most involved part of the bootstrap. Long Mode requires paging to be active. You cannot enter it without valid page tables loaded into CR3.

We're going to set up a minimal 4-level paging hierarchy (P4 → P3 → P2 → P1), also called PML4 → PDPT → PD → PT in Intel documentation. For the initial bootstrap, we'll identity-map the first 1 GiB of physical memory using 2 MiB huge pages (so we only need P4, P3, and P2 — no P1 needed).

Understanding 4-Level Paging

A 64-bit virtual address is split into five fields:

 Bit: 63      48 47    39 38    30 29    21 20    12 11       0
      ┌──────────┬────────┬────────┬────────┬────────┬──────────┐
      │ Sign ext │ P4 idx │ P3 idx │ P2 idx │ P1 idx │  Offset  │
      │ (ignored)│  9 bit │  9 bit │  9 bit │  9 bit │  12 bit  │
      └──────────┴────────┴────────┴────────┴────────┴──────────┘
Enter fullscreen mode Exit fullscreen mode

Each table has 512 entries (2^9), and each entry is 8 bytes, making each table exactly 4,096 bytes — one page. The CPU walks this structure on every memory access (unless TLB-cached).

Allocating the Tables in Assembly

In your linker script, declare BSS sections for the tables. They must be 4 KiB aligned:

section .bss
align 4096
p4_table:
    resb 4096
p3_table:
    resb 4096
p2_table:
    resb 4096
Enter fullscreen mode Exit fullscreen mode

Wiring Up the Hierarchy

Each table entry holds the physical address of the next level table (or the physical frame for leaf entries), combined with control bits in the lower 12 bits.

For our bootstrap:

  • Bit 0 (Present): This entry is valid
  • Bit 1 (Writable): The mapped memory can be written
  • Bit 7 (Huge Page): Used in P2 entries to map 2 MiB at a time (skips P1)
; ── Set up page tables ───────────────────────────────────────────────────────
setup_page_tables:

    ; Point P4[0] → P3 table base address (with Present + Writable bits)
    mov  eax, p3_table
    or   eax, 0b11              ; bit 0 = Present, bit 1 = Writable
    mov  [p4_table], eax

    ; Point P3[0] → P2 table base address
    mov  eax, p2_table
    or   eax, 0b11
    mov  [p3_table], eax

    ; Map each P2 entry to a 2 MiB huge page.
    ; We loop 512 times, covering the full first 1 GiB.
    mov  ecx, 0

.map_p2_table:
    ; Each entry maps address: ecx * 2 MiB = ecx * 0x200000
    mov  eax, 0x200000          ; 2 MiB
    mul  ecx                    ; eax = ecx * 2MiB
    or   eax, 0b10000011        ; Present + Writable + Huge Page (bit 7)
    mov  [p2_table + ecx * 8], eax  ; Write the entry

    inc  ecx
    cmp  ecx, 512               ; Have we filled all 512 entries?
    jne  .map_p2_table
    ret
Enter fullscreen mode Exit fullscreen mode

Why huge pages here? Using 2 MiB pages means we don't need a P1 table for the bootstrap. This keeps our setup minimal. You'll add proper 4 KiB pages in Rust once you have a frame allocator.

Why identity mapping? After enabling paging, the CPU's next instruction fetch uses the new virtual address space. If we didn't identity-map the code that enables paging, we'd immediately page-fault. Identity mapping (virtual address = physical address) is the bootstrap solution.


3. Enabling PAE and Loading CR3

Before the EFER register will accept the Long Mode Enable (LME) bit, the CPU requires Physical Address Extension to be active.

; ── Enable PAE (Physical Address Extension) ──────────────────────────────────
enable_paging:
    ; Step 1: Set PAE bit (bit 5) in CR4
    mov  eax, cr4
    or   eax, 1 << 5            ; PAE enable
    mov  cr4, eax

    ; Step 2: Load the physical address of P4 into CR3
    ; CR3 is the "page directory base register" — it always holds the P4 base
    mov  eax, p4_table
    mov  cr3, eax
Enter fullscreen mode Exit fullscreen mode

CR3 is the root of your entire virtual address space. When you later context-switch between processes, you'll swap CR3 to switch address spaces. For now, all kernel code shares this single PML4.


4. Setting the LME Bit in EFER

The Extended Feature Enable Register (EFER) is an MSR (Model Specific Register). Unlike general-purpose registers, MSRs are accessed through two special instructions:

  • rdmsr — reads the MSR whose index is in ECX into EDX:EAX
  • wrmsr — writes EDX:EAX into the MSR at index ECX

EFER's MSR index is 0xC0000080. The LME bit is bit 8.

    ; Step 3: Set the LME bit (Long Mode Enable) in EFER MSR
    mov  ecx, 0xC0000080        ; EFER MSR address
    rdmsr                       ; Read current EFER value into EDX:EAX
    or   eax, 1 << 8            ; Set bit 8: LME (Long Mode Enable)
    wrmsr                       ; Write back
Enter fullscreen mode Exit fullscreen mode

EFER also contains the NXE bit (bit 11), which enables the No-Execute bit in page table entries. Setting this now lets you later mark data pages as non-executable, a crucial security feature. You can OR in 1 << 11 alongside the LME bit.

At this point, the CPU is in "Long Mode Inactive" state. Long Mode is enabled in EFER, but not yet active — paging isn't on yet. The CPU is holding its breath.


5. The Point of No Return: Enabling Paging

Now we flip the final switch. Setting bit 31 (PG) in CR0 activates paging. At this exact moment, the CPU transitions to "Long Mode Active" — but only because all the prerequisites are satisfied:

  • PAE is set in CR4 ✓
  • A valid P4 table is in CR3 ✓
  • LME is set in EFER ✓
    ; Step 4: Enable paging (and confirm protection) via CR0
    mov  eax, cr0
    or   eax, (1 << 31) | (1 << 0)   ; PG (bit 31) + PE (bit 0, protection)
    mov  cr0, eax
Enter fullscreen mode Exit fullscreen mode

We're now in Long Mode. But we're not in 64-bit mode yet — we're in IA-32e Compatibility Mode. The CPU is executing 32-bit code inside a 64-bit paging structure. We need one more step.


6. The 64-bit GDT and the Far Jump

The Global Descriptor Table (GDT) is a legacy structure from the 286 era, but it persists in 64-bit mode in a simplified form. In Long Mode, segmentation is largely disabled — base and limit fields are ignored for code and data. But the type bits still matter: you must load a GDT with a descriptor where the L bit (bit 53) is set to tell the CPU "this is a 64-bit code segment."

The Minimal 64-bit GDT

section .rodata
gdt64:
    dq 0                        ; Entry 0: null descriptor (required)
.code: equ $ - gdt64            ; Offset of the code segment descriptor
    dq (1<<43)  ; Executable
    or (1<<44)  ; Descriptor type
    or (1<<47)  ; Present
    or (1<<53)  ; L-bit: 64-bit code segment
.pointer:
    dw $ - gdt64 - 1            ; GDT limit: size minus 1
    dq gdt64                    ; GDT base: linear address of gdt64
Enter fullscreen mode Exit fullscreen mode

Let's unpack bit 53 specifically. Without the L-bit set, the CPU treats the segment as a 32-bit segment even in Long Mode. Your code will execute in compatibility mode forever, and 64-bit instructions will fault. The L-bit is the actual "flip the world to 64-bit" switch at the descriptor level.

Loading the GDT and Jumping

    ; Load the 64-bit GDT
    lgdt [gdt64.pointer]

    ; Far jump: loads the new CS from gdt64.code, flushes the pipeline
    ; This jump is what actually puts the CPU into 64-bit mode.
    jmp gdt64.code:long_mode_start


; We are now in 64-bit Long Mode.
bits 64
long_mode_start:
    ; Zero out the data segment registers (they're ignored in 64-bit mode,
    ; but old values can cause GPFs in some contexts)
    mov ax, 0
    mov ss, ax
    mov ds, ax
    mov es, ax

    ; Call the Rust kernel entry point
    extern rust_main
    call rust_main

    ; rust_main should never return, but halt just in case
    hlt
Enter fullscreen mode Exit fullscreen mode

The jmp gdt64.code:long_mode_start is a far jump — it simultaneously changes EIP/RIP and reloads CS from the GDT. This is what flushes the instruction pipeline and forces the CPU to re-decode all subsequent instructions as 64-bit. Without this specific kind of jump, the processor's decode unit might still think it's in 32-bit mode even after you set the L-bit.


7. The Rust Entry Point

Your Rust kernel now needs a function that matches the symbol rust_main. In your main.rs (or lib.rs if building a library kernel):

#![no_std]
#![no_main]

use core::panic::PanicInfo;

/// The entry point called from assembly after Long Mode is established.
/// 
/// At this point:
/// - We're in 64-bit Long Mode
/// - The first 1 GiB is identity-mapped
/// - The stack is set up (via the `stack_top` label in assembly)
/// - No interrupts are configured yet
#[no_mangle]
pub extern "C" fn rust_main() -> ! {
    // Write directly to VGA text buffer at physical address 0xb8000
    // This is identity-mapped, so virtual == physical here.
    let vga_buffer = 0xb8000 as *mut u8;

    let msg = b"64-bit Long Mode reached!";
    for (i, &byte) in msg.iter().enumerate() {
        unsafe {
            // Each VGA cell is 2 bytes: character byte + attribute byte
            *vga_buffer.add(i * 2)     = byte;  // ASCII character
            *vga_buffer.add(i * 2 + 1) = 0x0f;  // White on black
        }
    }

    loop {}
}

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}
Enter fullscreen mode Exit fullscreen mode

And your Cargo.toml / target spec should be building for x86_64-unknown-none (or a custom target JSON), which disables the standard library and uses the panic = "abort" strategy.

8. Don't Forget: The Stack

One thing easy to forget: you need a stack before you can call anything, including rust_main. The x86_64 ABI requires a stack pointer aligned to 16 bytes before a call instruction.

Set this up in your assembly before calling the page table setup:

section .bss
align 16
stack_bottom:
    resb 65536          ; 64 KiB of stack space
stack_top:

; ── Entry point (called by GRUB) 
global _start
_start:
    ; Set up a proper stack immediately
    mov  esp, stack_top         ; Stack grows downward; start at the top

    ; Now safe to call functions
    call check_cpuid
    call check_long_mode
    call setup_page_tables
    call enable_paging

    ; Load GDT and far-jump to 64-bit code
    lgdt [gdt64.pointer]
    jmp  gdt64.code:long_mode_start
Enter fullscreen mode Exit fullscreen mode

The Full Transition Sequence (Summary)

Here's the complete order of operations, which must not be reordered:

GRUB loads kernel → Protected Mode (32-bit)
        │
        ▼
 1. Set ESP to stack_top
 2. check_cpuid         — confirm CPUID exists
 3. check_long_mode     — confirm LM bit via CPUID 0x80000001
 4. setup_page_tables   — fill P4/P3/P2; map first 1 GiB (huge pages)
 5. Set PAE in CR4      — required before LME
 6. Load P4 into CR3    — set page table root
 7. Set LME in EFER     — tell CPU Long Mode is desired
 8. Set PG + PE in CR0  — actually activate paging → IA-32e active
 9. lgdt gdt64          — load 64-bit GDT (L-bit set in code descriptor)
10. far jmp CS:rip      — reload CS, flush pipeline → 64-bit mode
        │
        ▼
   long_mode_start      — 64-bit assembly
        │
        ▼
   rust_main()          — Rust kernel
Enter fullscreen mode Exit fullscreen mode

Common Mistakes and How to Debug Them

Triple fault on paging enable: Usually means CR3 points to garbage, the P4 table isn't page-aligned, or the identity mapping doesn't cover the code currently executing. Print the P4 address to a serial port before loading CR3.

Triple fault on the far jump: Your GDT descriptor is malformed, or the L-bit isn't set. Double-check your dq expression for the code descriptor. Use a debugger (QEMU + GDB) and info registers right before the jump.

Rust code crashes immediately: Check your stack alignment. The x86_64 ABI requires RSP to be aligned to 16 bytes before the call instruction pushes the return address, meaning RSP must be 16-byte aligned at call time, which leaves it 8-byte aligned at function entry.

Code runs but produces garbage: You're probably still in compatibility mode, not 64-bit mode. Verify the L-bit in your GDT descriptor. You can confirm by checking if 64-bit registers like rax behave correctly.

Top comments (0)