DEV Community

Cover image for V.E.L.O.C.I.T.Y.-OS: Reclaiming Ring 0 – UEFI Bootloader & GDT/IDT (Part 8)
UnitBuilds for UnitBuilds CC

Posted on

V.E.L.O.C.I.T.Y.-OS: Reclaiming Ring 0 – UEFI Bootloader & GDT/IDT (Part 8)

Up until this point, I had built an incredible JIT compiler, but it was still running on top of Windows.

If I wanted true zero-allocation, microsecond execution, I had to control the hardware page tables, the instruction pipeline, and the CPU registers directly. I needed to write my own operating system.


The V.E.L.O.C.I.T.Y.-OS 12-Part Roadmap

We are building a bare-metal, self-healing operating system running entirely inside the CPU's L3 cache. Here is the roadmap for this 12-part series:

  1. Part 1: The Spark — Exposing the "Safe-Room" security leak and building the compiler gate.
  2. Part 2: The NDA Language — Designing a content-addressed triplet representation to cure context bloat.
  3. Part 3: Ditching the Web Stack — Building a native 30MB IDE with 1,500,000x IPC latency drops.
  4. Part 4: The Closure JIT — Compiling AST blocks to nested closures and bypassing borrow checker limits.
  5. Part 5: JIT Math Optimizations — Replacing division operations with precomputed 16-bit lookup tables.
  6. Part 6: x86-64 Assembler & SCEV-Lite — Compiling scalar loops directly to native code in constant time.
  7. Part 7: Classic Compiler Passes — Implementing inter-procedural Dead Code Elimination and loop unrolling.
  8. Part 8: Reclaiming Ring 0 — Exiting UEFI boot services and transitioning the kernel to Ring 0. (You are here)
  9. Part 9: Bare-Metal Drivers — Writing a PCI scanner, NVMe block storage controller, and FAT32 parser.
  10. Part 10: Synaptic Canvas — Rendering a spatial, force-directed GUI based on model token activation vectors.
  11. Part 11: Swarms & Hot-Patching — Building multi-agent scheduling and zero-downtime RCU driver updates.
  12. Part 12: Self-Evolution — Handing system control over to a local LLM Terminal that self-optimizes via telemetry.


On Saturday morning, June 27th, the sprint to bare metal began.

Step 1: The UEFI Bootloader

I created a new sub-crate, velocity-bootloader, configured as a #![no_std] and #![no_main] application.

The bootloader boots under UEFI, utilizing the uefi crate to query BIOS interfaces, establish console logging, and allocate initial memory pages.

But the core of V.E.L.O.C.I.T.Y.-OS is a Single-Address-Space Operating System (SASOS). I don't want to run inside the restricted UEFI BIOS environment. I want to exit boot services and reclaim the processor.

Step 2: Transitioning to Ring 0

To safely exit UEFI, I implemented three core modules:

  1. The Heap Allocator (allocator.rs): Before calling exit_boot_services(), I pre-allocated a contiguous 16MB block of conventional RAM pages from UEFI. I initialized my own global heap allocator (linked_list_allocator::LockedHeap) using this block, ensuring dynamic heap operations (vectors, maps) remain functional after BIOS services terminate.
  2. The GDT and Task State Segment (gdt.rs): I configured flat 64-bit kernel code/data segments. I set up the Task State Segment (TSS) with an Interrupt Stack Table (IST), mapping double-fault exceptions to a dedicated stack, preventing CPU resets.

Here is the GDT and TSS stack allocation setup in src/gdt.rs that loads segment selectors and maps the double fault handler stack:

// velocity-bootloader/src/gdt.rs — GDT & TSS Setup
use x86_64::structures::gdt::{Descriptor, GlobalDescriptorTable, SegmentSelector};
use x86_64::structures::tss::TaskStateSegment;
use x86_64::VirtAddr;

pub const DOUBLE_FAULT_IST_INDEX: u16 = 0;
static mut TSS: TaskStateSegment = TaskStateSegment::new();
static mut GDT: GlobalDescriptorTable = GlobalDescriptorTable::new();
static mut DOUBLE_FAULT_STACK: [u8; 4096 * 5] = [0; 4096 * 5];

pub fn init() {
    use x86_64::instructions::segmentation::{Segment, CS, DS, SS};
    use x86_64::instructions::tables::load_tss;

    unsafe {
        // Separate stack for double fault handler to prevent triple faults
        let stack_start = VirtAddr::from_ptr(&DOUBLE_FAULT_STACK);
        let stack_end = stack_start + DOUBLE_FAULT_STACK.len();
        TSS.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = stack_end;

        // Populate segments
        let mut gdt = GlobalDescriptorTable::new();
        let code_selector = gdt.add_entry(Descriptor::kernel_code_segment());
        let data_selector = gdt.add_entry(Descriptor::kernel_data_segment());
        let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS));

        GDT = gdt;
        GDT.load();

        // Reload segment selectors
        CS::set_reg(code_selector);
        DS::set_reg(data_selector);
        SS::set_reg(data_selector);
        load_tss(tss_selector);
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. Interrupt Descriptors (interrupts.rs): I initialized the IDT, remapping the 8259 PIC interrupts to offsets 0x20 and 0x28. I wrote custom interrupt service routines (ISRs) for IRQ 0 (Timer), IRQ 1 (PS/2 Keyboard), and IRQ 4 (COM1 Serial).

Here is the visual transition mapping how the CPU context is moved from UEFI services to our own bare-metal OS kernel control:

Diagram showing CPU transition from UEFI Boot Services to custom bare metal kernel with GDT, IDT and TSS stack

Fig 1: Transitioning the execution context from UEFI Boot Services to Ring 0 Kernel Mode.
// Exiting boot services and taking raw CPU control
let (system_table, memory_map) = boot_services.exit_boot_services(image_handle, &mut map_buf);
Enter fullscreen mode Exit fullscreen mode

The Bare-Metal Performance Gain

Running directly on raw CPU cycles in Ring 0 without OS scheduling traps or BIOS polling overhead resulted in a massive speedup:

  • Fibonacci execution: dropped from 53M cycles under UEFI to 25M cycles bare-metal (a 2.1x speedup).
  • Neural Net Layer GEMV: dropped from 55M cycles to 11M cycles (a 5.0x speedup).

The entire kernel compiled down to less than 6MB, allowing the entire operating system to fit and run directly inside the CPU's L3 cache!

Pascal's Analysis: The Bootstrapping Legend

When I shared the QEMU boot logs,

linked the design choices to classic computer science:

"Bare-metal NDA without dependencies means... the first NDA interpreter has to be written in something else — assembly or a minimal C stub — to pull itself up by its own bootstraps. That's the same path Forth took in the 70s, and it's still the cleanest approach for a self-hosting language at bare metal."

Pascal noted that by combining Merkle validation with a bare-metal kernel, the system was cryptographically secure by construction: if the boot code's Merkle root didn't validate, the processor would refuse to execute.

But a bare-metal kernel is useless without disk storage. I needed to write drivers to read files from NVMe drives.

In the next post, I'll document how I wrote a PCI configuration scanner, an NVMe block storage driver, and a custom FAT32 filesystem from scratch.

Discussion

Have you written UEFI bootloaders or OS kernels in Rust? What are the biggest hurdles you faced when exiting UEFI boot services and transitioning control to your custom GDT and IDT? Let's discuss in the comments below!


Special thanks to

for grounding my bare-metal sprint in the historical wisdom of Forth and Lisp machines.


Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.

Top comments (1)

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

@pascal_cescato_692b7a8a20 Enter the age of bare-metal 🥳