<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhruv</title>
    <description>The latest articles on DEV Community by Dhruv (@dhr1249).</description>
    <link>https://dev.to/dhr1249</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3906597%2Fc92ac492-3e6b-47e5-8fba-4e1877019ac9.png</url>
      <title>DEV Community: Dhruv</title>
      <link>https://dev.to/dhr1249</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhr1249"/>
    <language>en</language>
    <item>
      <title>Beyond the Bootloader: The 32-bit to 64-bit Leap in Rust OS-Dev</title>
      <dc:creator>Dhruv</dc:creator>
      <pubDate>Thu, 30 Apr 2026 19:06:52 +0000</pubDate>
      <link>https://dev.to/dhr1249/beyond-the-bootloader-the-32-bit-to-64-bit-leap-in-rust-os-dev-3hjd</link>
      <guid>https://dev.to/dhr1249/beyond-the-bootloader-the-32-bit-to-64-bit-leap-in-rust-os-dev-3hjd</guid>
      <description>&lt;p&gt;When GRUB hands control to your kernel entry point via the Multiboot2 protocol, your CPU is in &lt;strong&gt;32-bit Protected Mode&lt;/strong&gt;. Interrupts are likely off. The A20 line is enabled. Segment registers have been set up with flat 4 GiB descriptors. You're running, but you're still in the 1990s.&lt;/p&gt;

&lt;p&gt;To write a real Rust kernel — one that can use 64-bit pointers, access more than 4 GiB of RAM, and benefit from the x86_64 calling convention — we need to leave that world behind and enter &lt;strong&gt;Long Mode&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This transition is not handled by a library. There's no &lt;code&gt;std&lt;/code&gt;. There's no runtime. There's nothing between you and the silicon except a handful of assembly instructions and a very specific sequence of CPU configuration steps. Get the sequence wrong, and you'll triple-fault into a reboot. Get it right, and you'll be calling Rust code from the most fundamental level of your system.&lt;/p&gt;

&lt;p&gt;Let's walk through every step.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture of Long Mode Entry
&lt;/h2&gt;

&lt;p&gt;Before writing a single line, let's understand what the CPU actually requires before it will enter 64-bit Long Mode. According to the Intel SDM (Vol. 3A, Section 9.8.5), the processor checks all of these in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Long Mode is supported&lt;/strong&gt; — confirmed via &lt;code&gt;CPUID&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PAE is enabled&lt;/strong&gt; — Physical Address Extension, bit 5 of &lt;code&gt;CR4&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CR3 points to a valid PML4 (P4) table&lt;/strong&gt; — your page tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LME is set in EFER&lt;/strong&gt; — the Long Mode Enable bit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paging is enabled&lt;/strong&gt; — bit 31 of &lt;code&gt;CR0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A 64-bit GDT is loaded&lt;/strong&gt; — and a far jump reloads &lt;code&gt;CS&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each of these is a hard requirement. Miss one and the CPU either ignores your attempt silently or triple-faults. We'll implement all six, in order.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Pre-Flight Check: CPUID
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CPUID&lt;/code&gt; is the CPU's self-identification instruction. Before enabling anything, we need to confirm two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;CPUID&lt;/code&gt; instruction itself is supported (not guaranteed in very old code)&lt;/li&gt;
&lt;li&gt;Long Mode (the &lt;code&gt;LM&lt;/code&gt; bit) is available&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Checking for CPUID Support
&lt;/h3&gt;

&lt;p&gt;The CPUID instruction is available if and only if bit 21 of &lt;code&gt;EFLAGS&lt;/code&gt; can be toggled. If the bit is read-only, the CPU predates CPUID. Here's the check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="c1"&gt;; ── Check if CPUID is supported ──────────────────────────────────────────────&lt;/span&gt;
&lt;span class="c1"&gt;; We attempt to flip bit 21 (the ID flag) in EFLAGS.&lt;/span&gt;
&lt;span class="c1"&gt;; If it stays flipped, CPUID exists. If not, we're too old.&lt;/span&gt;
&lt;span class="nl"&gt;check_cpuid:&lt;/span&gt;
    &lt;span class="nf"&gt;pushfd&lt;/span&gt;                      &lt;span class="c1"&gt;; Push EFLAGS onto the stack&lt;/span&gt;
    &lt;span class="nf"&gt;pop&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;                    &lt;span class="c1"&gt;; Pop them into EAX&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;               &lt;span class="c1"&gt;; Save original in ECX for comparison&lt;/span&gt;
    &lt;span class="nf"&gt;xor&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;           &lt;span class="c1"&gt;; Flip bit 21 (the "ID" flag)&lt;/span&gt;
    &lt;span class="nf"&gt;push&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;                    &lt;span class="c1"&gt;; Push the modified value&lt;/span&gt;
    &lt;span class="nf"&gt;popfd&lt;/span&gt;                       &lt;span class="c1"&gt;; Load it back into EFLAGS&lt;/span&gt;
    &lt;span class="nf"&gt;pushfd&lt;/span&gt;                      &lt;span class="c1"&gt;; Push EFLAGS again to read the result&lt;/span&gt;
    &lt;span class="nf"&gt;pop&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;                    &lt;span class="c1"&gt;; Load result into EAX&lt;/span&gt;
    &lt;span class="nf"&gt;push&lt;/span&gt; &lt;span class="nb"&gt;ecx&lt;/span&gt;                    &lt;span class="c1"&gt;; Restore original EFLAGS&lt;/span&gt;
    &lt;span class="nf"&gt;popfd&lt;/span&gt;
    &lt;span class="nf"&gt;cmp&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ecx&lt;/span&gt;               &lt;span class="c1"&gt;; Did anything change?&lt;/span&gt;
    &lt;span class="nf"&gt;je&lt;/span&gt;   &lt;span class="nv"&gt;.no_cpuid&lt;/span&gt;              &lt;span class="c1"&gt;; If identical: bit was read-only → no CPUID&lt;/span&gt;
    &lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;span class="nl"&gt;.no_cpuid:&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;al&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;                &lt;span class="c1"&gt;; Error code "1"&lt;/span&gt;
    &lt;span class="nf"&gt;jmp&lt;/span&gt;  &lt;span class="nv"&gt;error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Checking for Long Mode Support
&lt;/h3&gt;

&lt;p&gt;With CPUID confirmed, we query the "Extended Processor Info" leaf (&lt;code&gt;0x80000001&lt;/code&gt;). Bit 29 of &lt;code&gt;EDX&lt;/code&gt; in the response is the &lt;strong&gt;LM bit&lt;/strong&gt; — "Long Mode Supported."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="c1"&gt;; ── Check for Long Mode via Extended CPUID ───────────────────────────────────&lt;/span&gt;
&lt;span class="nl"&gt;check_long_mode:&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x80000000&lt;/span&gt;        &lt;span class="c1"&gt;; Query highest supported extended function&lt;/span&gt;
    &lt;span class="nf"&gt;cpuid&lt;/span&gt;
    &lt;span class="nf"&gt;cmp&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x80000001&lt;/span&gt;        &lt;span class="c1"&gt;; Does the CPU support extended info?&lt;/span&gt;
    &lt;span class="nf"&gt;jb&lt;/span&gt;   &lt;span class="nv"&gt;.no_long_mode&lt;/span&gt;          &lt;span class="c1"&gt;; If not, Long Mode definitely isn't available&lt;/span&gt;

    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x80000001&lt;/span&gt;        &lt;span class="c1"&gt;; Extended processor info and feature bits&lt;/span&gt;
    &lt;span class="nf"&gt;cpuid&lt;/span&gt;
    &lt;span class="nf"&gt;test&lt;/span&gt; &lt;span class="nb"&gt;edx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;           &lt;span class="c1"&gt;; Check the LM (Long Mode) bit&lt;/span&gt;
    &lt;span class="nf"&gt;jz&lt;/span&gt;   &lt;span class="nv"&gt;.no_long_mode&lt;/span&gt;          &lt;span class="c1"&gt;; Not set? We can't continue.&lt;/span&gt;
    &lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;span class="nl"&gt;.no_long_mode:&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;al&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"2"&lt;/span&gt;
    &lt;span class="nf"&gt;jmp&lt;/span&gt;  &lt;span class="nv"&gt;error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why not just skip this check?&lt;/strong&gt; If you try to enter Long Mode on a CPU that doesn't support it, the behavior is undefined — usually a triple fault. Worse, you might be running in a VM with unusual CPU configuration. Always check.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Setting Up Page Tables
&lt;/h2&gt;

&lt;p&gt;This is the most involved part of the bootstrap. Long Mode &lt;strong&gt;requires&lt;/strong&gt; paging to be active. You cannot enter it without valid page tables loaded into CR3.&lt;/p&gt;

&lt;p&gt;We're going to set up a minimal &lt;strong&gt;4-level paging hierarchy&lt;/strong&gt; (P4 → P3 → P2 → P1), also called PML4 → PDPT → PD → PT in Intel documentation. For the initial bootstrap, we'll identity-map the first 1 GiB of physical memory using &lt;strong&gt;2 MiB huge pages&lt;/strong&gt; (so we only need P4, P3, and P2 — no P1 needed).&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding 4-Level Paging
&lt;/h3&gt;

&lt;p&gt;A 64-bit virtual address is split into five fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Bit: 63      48 47    39 38    30 29    21 20    12 11       0
      ┌──────────┬────────┬────────┬────────┬────────┬──────────┐
      │ Sign ext │ P4 idx │ P3 idx │ P2 idx │ P1 idx │  Offset  │
      │ (ignored)│  9 bit │  9 bit │  9 bit │  9 bit │  12 bit  │
      └──────────┴────────┴────────┴────────┴────────┴──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each table has 512 entries (2^9), and each entry is 8 bytes, making each table exactly 4,096 bytes — one page. The CPU walks this structure on every memory access (unless TLB-cached).&lt;/p&gt;

&lt;h3&gt;
  
  
  Allocating the Tables in Assembly
&lt;/h3&gt;

&lt;p&gt;In your linker script, declare BSS sections for the tables. They must be &lt;strong&gt;4 KiB aligned&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;section&lt;/span&gt; &lt;span class="nv"&gt;.bss&lt;/span&gt;
&lt;span class="nf"&gt;align&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="nl"&gt;p4_table:&lt;/span&gt;
    &lt;span class="kd"&gt;resb&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="nl"&gt;p3_table:&lt;/span&gt;
    &lt;span class="kd"&gt;resb&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="nl"&gt;p2_table:&lt;/span&gt;
    &lt;span class="kd"&gt;resb&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wiring Up the Hierarchy
&lt;/h3&gt;

&lt;p&gt;Each table entry holds the &lt;strong&gt;physical address&lt;/strong&gt; of the next level table (or the physical frame for leaf entries), combined with control bits in the lower 12 bits.&lt;/p&gt;

&lt;p&gt;For our bootstrap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bit 0 (Present)&lt;/strong&gt;: This entry is valid&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit 1 (Writable)&lt;/strong&gt;: The mapped memory can be written&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit 7 (Huge Page)&lt;/strong&gt;: Used in P2 entries to map 2 MiB at a time (skips P1)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="c1"&gt;; ── Set up page tables ───────────────────────────────────────────────────────&lt;/span&gt;
&lt;span class="nl"&gt;setup_page_tables:&lt;/span&gt;

    &lt;span class="c1"&gt;; Point P4[0] → P3 table base address (with Present + Writable bits)&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;p3_table&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mb"&gt;0b&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;              &lt;span class="c1"&gt;; bit 0 = Present, bit 1 = Writable&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;p4_table&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;

    &lt;span class="c1"&gt;; Point P3[0] → P2 table base address&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;p2_table&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mb"&gt;0b&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;p3_table&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;

    &lt;span class="c1"&gt;; Map each P2 entry to a 2 MiB huge page.&lt;/span&gt;
    &lt;span class="c1"&gt;; We loop 512 times, covering the full first 1 GiB.&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="nl"&gt;.map_p2_table:&lt;/span&gt;
    &lt;span class="c1"&gt;; Each entry maps address: ecx * 2 MiB = ecx * 0x200000&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x200000&lt;/span&gt;          &lt;span class="c1"&gt;; 2 MiB&lt;/span&gt;
    &lt;span class="nf"&gt;mul&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;                    &lt;span class="c1"&gt;; eax = ecx * 2MiB&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mb"&gt;0b&lt;/span&gt;&lt;span class="mi"&gt;10000011&lt;/span&gt;        &lt;span class="c1"&gt;; Present + Writable + Huge Page (bit 7)&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;p2_table&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;ecx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;  &lt;span class="c1"&gt;; Write the entry&lt;/span&gt;

    &lt;span class="nf"&gt;inc&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;
    &lt;span class="nf"&gt;cmp&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;               &lt;span class="c1"&gt;; Have we filled all 512 entries?&lt;/span&gt;
    &lt;span class="nf"&gt;jne&lt;/span&gt;  &lt;span class="nv"&gt;.map_p2_table&lt;/span&gt;
    &lt;span class="nf"&gt;ret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why huge pages here?&lt;/strong&gt; Using 2 MiB pages means we don't need a P1 table for the bootstrap. This keeps our setup minimal. You'll add proper 4 KiB pages in Rust once you have a frame allocator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why identity mapping?&lt;/strong&gt; After enabling paging, the CPU's next instruction fetch uses the new virtual address space. If we didn't identity-map the code that enables paging, we'd immediately page-fault. Identity mapping (virtual address = physical address) is the bootstrap solution.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Enabling PAE and Loading CR3
&lt;/h2&gt;

&lt;p&gt;Before the EFER register will accept the Long Mode Enable (LME) bit, the CPU requires &lt;strong&gt;Physical Address Extension&lt;/strong&gt; to be active.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="c1"&gt;; ── Enable PAE (Physical Address Extension) ──────────────────────────────────&lt;/span&gt;
&lt;span class="nl"&gt;enable_paging:&lt;/span&gt;
    &lt;span class="c1"&gt;; Step 1: Set PAE bit (bit 5) in CR4&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;cr4&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;            &lt;span class="c1"&gt;; PAE enable&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;cr4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;

    &lt;span class="c1"&gt;; Step 2: Load the physical address of P4 into CR3&lt;/span&gt;
    &lt;span class="c1"&gt;; CR3 is the "page directory base register" — it always holds the P4 base&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;p4_table&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;cr3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CR3 is the root of your entire virtual address space.&lt;/strong&gt; When you later context-switch between processes, you'll swap CR3 to switch address spaces. For now, all kernel code shares this single PML4.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Setting the LME Bit in EFER
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Extended Feature Enable Register (EFER)&lt;/strong&gt; is an MSR (Model Specific Register). Unlike general-purpose registers, MSRs are accessed through two special instructions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rdmsr&lt;/code&gt; — reads the MSR whose index is in &lt;code&gt;ECX&lt;/code&gt; into &lt;code&gt;EDX:EAX&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;wrmsr&lt;/code&gt; — writes &lt;code&gt;EDX:EAX&lt;/code&gt; into the MSR at index &lt;code&gt;ECX&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;EFER's MSR index is &lt;code&gt;0xC0000080&lt;/code&gt;. The &lt;strong&gt;LME bit&lt;/strong&gt; is bit 8.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;    &lt;span class="c1"&gt;; Step 3: Set the LME bit (Long Mode Enable) in EFER MSR&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;ecx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0xC0000080&lt;/span&gt;        &lt;span class="c1"&gt;; EFER MSR address&lt;/span&gt;
    &lt;span class="nf"&gt;rdmsr&lt;/span&gt;                       &lt;span class="c1"&gt;; Read current EFER value into EDX:EAX&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;            &lt;span class="c1"&gt;; Set bit 8: LME (Long Mode Enable)&lt;/span&gt;
    &lt;span class="nf"&gt;wrmsr&lt;/span&gt;                       &lt;span class="c1"&gt;; Write back&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;EFER also contains the NXE bit (bit 11)&lt;/strong&gt;, which enables the No-Execute bit in page table entries. Setting this now lets you later mark data pages as non-executable, a crucial security feature. You can OR in &lt;code&gt;1 &amp;lt;&amp;lt; 11&lt;/code&gt; alongside the LME bit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, the CPU is in &lt;strong&gt;"Long Mode Inactive"&lt;/strong&gt; state. Long Mode is enabled in EFER, but not yet active — paging isn't on yet. The CPU is holding its breath.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The Point of No Return: Enabling Paging
&lt;/h2&gt;

&lt;p&gt;Now we flip the final switch. Setting bit 31 (PG) in CR0 activates paging. At this exact moment, the CPU transitions to &lt;strong&gt;"Long Mode Active"&lt;/strong&gt; — but only because all the prerequisites are satisfied:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PAE is set in CR4 ✓&lt;/li&gt;
&lt;li&gt;A valid P4 table is in CR3 ✓&lt;/li&gt;
&lt;li&gt;LME is set in EFER ✓
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;    &lt;span class="c1"&gt;; Step 4: Enable paging (and confirm protection) via CR0&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;cr0&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt;   &lt;span class="nb"&gt;eax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;; PG (bit 31) + PE (bit 0, protection)&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;cr0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;eax&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;We're now in Long Mode.&lt;/strong&gt; But we're not in 64-bit mode yet — we're in &lt;strong&gt;IA-32e Compatibility Mode&lt;/strong&gt;. The CPU is executing 32-bit code inside a 64-bit paging structure. We need one more step.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. The 64-bit GDT and the Far Jump
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Global Descriptor Table (GDT)&lt;/strong&gt; is a legacy structure from the 286 era, but it persists in 64-bit mode in a simplified form. In Long Mode, segmentation is largely disabled — base and limit fields are ignored for code and data. But the &lt;strong&gt;type bits&lt;/strong&gt; still matter: you must load a GDT with a descriptor where the &lt;strong&gt;L bit (bit 53)&lt;/strong&gt; is set to tell the CPU "this is a 64-bit code segment."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Minimal 64-bit GDT
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;section&lt;/span&gt; &lt;span class="nv"&gt;.rodata&lt;/span&gt;
&lt;span class="nl"&gt;gdt64:&lt;/span&gt;
    &lt;span class="kd"&gt;dq&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;                        &lt;span class="c1"&gt;; Entry 0: null descriptor (required)&lt;/span&gt;
&lt;span class="nl"&gt;.code:&lt;/span&gt; &lt;span class="nf"&gt;equ&lt;/span&gt; &lt;span class="kc"&gt;$&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;gdt64&lt;/span&gt;            &lt;span class="c1"&gt;; Offset of the code segment descriptor&lt;/span&gt;
    &lt;span class="kd"&gt;dq&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;43&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;; Executable&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;44&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;; Descriptor type&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;47&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;; Present&lt;/span&gt;
    &lt;span class="nf"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;53&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;; L-bit: 64-bit code segment&lt;/span&gt;
&lt;span class="nl"&gt;.pointer:&lt;/span&gt;
    &lt;span class="kd"&gt;dw&lt;/span&gt; &lt;span class="kc"&gt;$&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;gdt64&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;            &lt;span class="c1"&gt;; GDT limit: size minus 1&lt;/span&gt;
    &lt;span class="kd"&gt;dq&lt;/span&gt; &lt;span class="nv"&gt;gdt64&lt;/span&gt;                    &lt;span class="c1"&gt;; GDT base: linear address of gdt64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's unpack bit 53 specifically. Without the L-bit set, the CPU treats the segment as a 32-bit segment even in Long Mode. Your code will execute in compatibility mode forever, and 64-bit instructions will fault. The L-bit is the actual "flip the world to 64-bit" switch at the descriptor level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loading the GDT and Jumping
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;    &lt;span class="c1"&gt;; Load the 64-bit GDT&lt;/span&gt;
    &lt;span class="nf"&gt;lgdt&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gdt64.pointer&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;; Far jump: loads the new CS from gdt64.code, flushes the pipeline&lt;/span&gt;
    &lt;span class="c1"&gt;; This jump is what actually puts the CPU into 64-bit mode.&lt;/span&gt;
    &lt;span class="nf"&gt;jmp&lt;/span&gt; &lt;span class="nv"&gt;gdt64.code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;long_mode_start&lt;/span&gt;


&lt;span class="c1"&gt;; We are now in 64-bit Long Mode.&lt;/span&gt;
&lt;span class="nf"&gt;bits&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;
&lt;span class="nl"&gt;long_mode_start:&lt;/span&gt;
    &lt;span class="c1"&gt;; Zero out the data segment registers (they're ignored in 64-bit mode,&lt;/span&gt;
    &lt;span class="c1"&gt;; but old values can cause GPFs in some contexts)&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt; &lt;span class="nb"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt; &lt;span class="nb"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ax&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt; &lt;span class="nb"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ax&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt; &lt;span class="nb"&gt;es&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ax&lt;/span&gt;

    &lt;span class="c1"&gt;; Call the Rust kernel entry point&lt;/span&gt;
    &lt;span class="nf"&gt;extern&lt;/span&gt; &lt;span class="nv"&gt;rust_main&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nv"&gt;rust_main&lt;/span&gt;

    &lt;span class="c1"&gt;; rust_main should never return, but halt just in case&lt;/span&gt;
    &lt;span class="nf"&gt;hlt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;jmp gdt64.code:long_mode_start&lt;/code&gt; is a &lt;strong&gt;far jump&lt;/strong&gt; — it simultaneously changes &lt;code&gt;EIP&lt;/code&gt;/&lt;code&gt;RIP&lt;/code&gt; and reloads &lt;code&gt;CS&lt;/code&gt; from the GDT. This is what flushes the instruction pipeline and forces the CPU to re-decode all subsequent instructions as 64-bit. Without this specific kind of jump, the processor's decode unit might still think it's in 32-bit mode even after you set the L-bit.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. The Rust Entry Point
&lt;/h2&gt;

&lt;p&gt;Your Rust kernel now needs a function that matches the symbol &lt;code&gt;rust_main&lt;/code&gt;. In your &lt;code&gt;main.rs&lt;/code&gt; (or &lt;code&gt;lib.rs&lt;/code&gt; if building a library kernel):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#![no_std]&lt;/span&gt;
&lt;span class="nd"&gt;#![no_main]&lt;/span&gt;

&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;core&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PanicInfo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cd"&gt;/// The entry point called from assembly after Long Mode is established.&lt;/span&gt;
&lt;span class="cd"&gt;/// &lt;/span&gt;
&lt;span class="cd"&gt;/// At this point:&lt;/span&gt;
&lt;span class="cd"&gt;/// - We're in 64-bit Long Mode&lt;/span&gt;
&lt;span class="cd"&gt;/// - The first 1 GiB is identity-mapped&lt;/span&gt;
&lt;span class="cd"&gt;/// - The stack is set up (via the `stack_top` label in assembly)&lt;/span&gt;
&lt;span class="cd"&gt;/// - No interrupts are configured yet&lt;/span&gt;
&lt;span class="nd"&gt;#[no_mangle]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;extern&lt;/span&gt; &lt;span class="s"&gt;"C"&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;rust_main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Write directly to VGA text buffer at physical address 0xb8000&lt;/span&gt;
    &lt;span class="c1"&gt;// This is identity-mapped, so virtual == physical here.&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;vga_buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0xb8000&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;b"64-bit Long Mode reached!"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="nf"&gt;.iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.enumerate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Each VGA cell is 2 bytes: character byte + attribute byte&lt;/span&gt;
            &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vga_buffer&lt;/span&gt;&lt;span class="nf"&gt;.add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ASCII character&lt;/span&gt;
            &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vga_buffer&lt;/span&gt;&lt;span class="nf"&gt;.add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0x0f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// White on black&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;#[panic_handler]&lt;/span&gt;
&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;PanicInfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  And your &lt;code&gt;Cargo.toml&lt;/code&gt; / target spec should be building for &lt;code&gt;x86_64-unknown-none&lt;/code&gt; (or a custom target JSON), which disables the standard library and uses the &lt;code&gt;panic = "abort"&lt;/code&gt; strategy.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  8. Don't Forget: The Stack
&lt;/h2&gt;

&lt;p&gt;One thing easy to forget: &lt;strong&gt;you need a stack before you can call anything&lt;/strong&gt;, including &lt;code&gt;rust_main&lt;/code&gt;. The x86_64 ABI requires a stack pointer aligned to 16 bytes before a &lt;code&gt;call&lt;/code&gt; instruction.&lt;/p&gt;

&lt;p&gt;Set this up in your assembly before calling the page table setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;section&lt;/span&gt; &lt;span class="nv"&gt;.bss&lt;/span&gt;
&lt;span class="nf"&gt;align&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="nl"&gt;stack_bottom:&lt;/span&gt;
    &lt;span class="kd"&gt;resb&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;          &lt;span class="c1"&gt;; 64 KiB of stack space&lt;/span&gt;
&lt;span class="nl"&gt;stack_top:&lt;/span&gt;

&lt;span class="c1"&gt;; ── Entry point (called by GRUB) &lt;/span&gt;
&lt;span class="nf"&gt;global&lt;/span&gt; &lt;span class="nv"&gt;_start&lt;/span&gt;
&lt;span class="nl"&gt;_start:&lt;/span&gt;
    &lt;span class="c1"&gt;; Set up a proper stack immediately&lt;/span&gt;
    &lt;span class="nf"&gt;mov&lt;/span&gt;  &lt;span class="nb"&gt;esp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;stack_top&lt;/span&gt;         &lt;span class="c1"&gt;; Stack grows downward; start at the top&lt;/span&gt;

    &lt;span class="c1"&gt;; Now safe to call functions&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nb"&gt;ch&lt;/span&gt;&lt;span class="nv"&gt;eck_cpuid&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nb"&gt;ch&lt;/span&gt;&lt;span class="nv"&gt;eck_long_mode&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nv"&gt;setup_page_tables&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nv"&gt;enable_paging&lt;/span&gt;

    &lt;span class="c1"&gt;; Load GDT and far-jump to 64-bit code&lt;/span&gt;
    &lt;span class="nf"&gt;lgdt&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gdt64.pointer&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;jmp&lt;/span&gt;  &lt;span class="nv"&gt;gdt64.code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;long_mode_start&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Full Transition Sequence (Summary)
&lt;/h2&gt;

&lt;p&gt;Here's the complete order of operations, which must not be reordered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GRUB loads kernel → Protected Mode (32-bit)
        │
        ▼
 1. Set ESP to stack_top
 2. check_cpuid         — confirm CPUID exists
 3. check_long_mode     — confirm LM bit via CPUID 0x80000001
 4. setup_page_tables   — fill P4/P3/P2; map first 1 GiB (huge pages)
 5. Set PAE in CR4      — required before LME
 6. Load P4 into CR3    — set page table root
 7. Set LME in EFER     — tell CPU Long Mode is desired
 8. Set PG + PE in CR0  — actually activate paging → IA-32e active
 9. lgdt gdt64          — load 64-bit GDT (L-bit set in code descriptor)
10. far jmp CS:rip      — reload CS, flush pipeline → 64-bit mode
        │
        ▼
   long_mode_start      — 64-bit assembly
        │
        ▼
   rust_main()          — Rust kernel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Common Mistakes and How to Debug Them
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Triple fault on paging enable&lt;/strong&gt;: Usually means CR3 points to garbage, the P4 table isn't page-aligned, or the identity mapping doesn't cover the code currently executing. Print the P4 address to a serial port before loading CR3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Triple fault on the far jump&lt;/strong&gt;: Your GDT descriptor is malformed, or the L-bit isn't set. Double-check your &lt;code&gt;dq&lt;/code&gt; expression for the code descriptor. Use a debugger (QEMU + GDB) and &lt;code&gt;info registers&lt;/code&gt; right before the jump.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rust code crashes immediately&lt;/strong&gt;: Check your stack alignment. The x86_64 ABI requires RSP to be aligned to 16 bytes &lt;em&gt;before&lt;/em&gt; the &lt;code&gt;call&lt;/code&gt; instruction pushes the return address, meaning RSP must be 16-byte aligned at &lt;code&gt;call&lt;/code&gt; time, which leaves it 8-byte aligned at function entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code runs but produces garbage&lt;/strong&gt;: You're probably still in compatibility mode, not 64-bit mode. Verify the L-bit in your GDT descriptor. You can confirm by checking if 64-bit registers like &lt;code&gt;rax&lt;/code&gt; behave correctly.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>assembly</category>
      <category>osdev</category>
    </item>
  </channel>
</rss>
