Rust promises memory safety without garbage collection. That's why many of us dream of writing a kernel in it. After several years of building a from‑scratch operating system in Rust, I've collected the real — not theoretical — challenges that will make you question your life choices.
Here are the five hardest problems, and the pragmatic solutions that actually work.
1. The unsafe Infection: Your Core is Not Safe
The kernel's job is to manage memory, poke hardware registers, and handle interrupts. That means unsafe is not an exception — it's the norm.
The problem: A single unsafe block can corrupt state that safe code depends on. In userspace, you isolate unsafe behind a small API. In the kernel, the entire bottom layer is unsafe. A bug in the page fault handler trashes everything.
What doesn't work:
Pretending that "only 5% of the code is unsafe". In practice, the scheduler, the memory allocator, the interrupt handlers — they all need unsafe. You can't push it to the edges.
What works:
Treat unsafe as a capability.
- Every
unsafefunction must have a// SAFETY:comment explaining why it's sound. - Use static assertions (
const_assert!) to validate invariants at compile time. - Isolate hardware access behind a
halcrate whereunsafeis contained, but don't cheat — the rest of the kernel still needsunsafefor core operations.
Example — writing to a memory-mapped register:
/// SAFETY: addr must be a valid MMIO address for this device,
/// aligned to 4 bytes, and the caller must hold the device lock.
pub unsafe fn mmio_write(addr: *mut u32, value: u32) {
addr.write_volatile(value);
}
The comment doesn't make it safe — it documents the contract so the caller knows what they must guarantee.
- Memory Allocation Before alloc You want Vec, Box, Arc. But alloc requires a global allocator. The allocator requires a lock. The lock requires a working scheduler. The scheduler requires memory allocation. Classic chicken‑and‑egg.
The problem: You can't allocate memory to create the structures that manage memory allocation.
What doesn't work:
Waiting until "later" to set up the allocator. You need dynamic data structures early in boot (e.g., to build the initial free list).
What works:
Two‑phase allocation:
Phase 1 — Bootstrap allocator: A simple bump allocator (just increment a pointer) that runs before any locks or scheduler. It can allocate but never free.
Phase 2 — Real allocator: After you have a working scheduler and a spinlock, you replace the bootstrap allocator with a proper buddy allocator or slab allocator.
// Bootstrap: just move a pointer
static mut BOOT_HEAP_START: usize = 0;
static mut BOOT_HEAP_OFFSET: usize = 0;
pub unsafe fn boot_alloc(size: usize) -> *mut u8 {
let ptr = (BOOT_HEAP_START + BOOT_HEAP_OFFSET) as *mut u8;
BOOT_HEAP_OFFSET += size;
ptr
}
Later, you replace it via #[global_allocator] and the alloc crate.
- Interrupts: The Stack is a Hostile Environment Interrupt handlers run between instructions. They can't block, they can't allocate, and they must be extremely fast. In Rust, they also can't panic (kernel panic is fine, but unwinding is not).
The problem: Normal Rust code assumes it can panic and unwind. In an interrupt handler, unwinding would corrupt the interrupted context.
What doesn't work:
Using unwrap() or expect() anywhere near an interrupt. Even debug assertions that may panic are dangerous.
What works:
Mark interrupt handlers with #[naked] or assembly wrappers that save/restore registers and call a Rust extern "C" fn that never panics.
Use #![feature(naked_functions)] for raw handlers.
For the non‑naked portion, use #![deny(unsafe_op_in_unsafe_fn)] to force careful review.
Example — IDT entry wrapper (x86_64):
[naked]
extern "C" fn double_fault_handler() {
unsafe {
asm!("push rax; push rcx; push rdx; ...",
options(noreturn));
// call the actual handler in safe Rust
}
}
Inside the safe handler, you log and halt. No unwinding.
- Concurrency Without a Scheduler (Yet) Spinlocks seem simple: while lock.is_locked() { hint::spin_loop(); }. But this fails as soon as you have multiple cores and a scheduler that can preempt the lock holder.
The problem:
On a single core, a spinlock that spins forever blocks the entire system. You need to disable interrupts. On multiple cores, a spinlock is fine if the lock holder cannot be preempted (i.e., you disable preemption on that core).
What doesn't work:
Using a plain spinlock from spin crate without disable_irq. If an interrupt handler tries to acquire the same lock, deadlock.
What works:
Phase 1 (single‑core, no scheduler): Use a spinlock that disables interrupts. lock() = disable_irq() + spin.
Phase 2 (multi‑core, scheduler running): Use proper Mutex that parks the thread if the lock is held.
The turning point is when you have a working scheduler. Before that, all locks are effectively just disabling interrupts.
Example — interrupt‑safe spinlock for early boot:
pub struct IrqSpinlock {
lock: AtomicBool,
data: UnsafeCell,
}
impl IrqSpinlock {
pub fn lock(&self) -> IrqGuard {
let flags = disable_interrupts();
while self.lock.swap(true, Ordering::Acquire) {
enable_and_wait(flags);
flags = disable_interrupts();
}
IrqGuard { lock: self, flags }
}
}
- The Allocator‑Scheduler‑Lock Tango You want a Vec. But Vec needs the allocator. The allocator needs a lock. The lock needs the scheduler to yield if contested. The scheduler needs a Vec of runnable processes.
The problem: Cyclic dependency between core kernel components.
What doesn't work:
Trying to implement them independently. They are coupled by design.
What works:
Layer your dependencies explicitly and accept temporary bootstrap stubs:
Bootstrap phase: No scheduler, no proper locks. Use a bump allocator (no locking needed) and a &'static mut array for the process list (fixed capacity).
Initialization phase: Create a simple round‑robin scheduler that works with the bump allocator. Locks are still just disable_irq spinlocks.
Transition phase: Build the real allocator using the bootstrap allocator to allocate its own metadata. Then replace the global allocator.
Scheduler replacement: Build the proper Vec‑based scheduler using the real allocator. Swap it in atomically.
The key insight: it's okay to have a "good enough" stub for a short period. You don't need a perfect scheduler before you have an allocator. You just need something that doesn't crash.
The Bottom Line
Writing an OS in Rust is harder than writing one in C — not because Rust is bad, but because Rust forces you to be explicit about the unsafety that C hides. The problems above are not bugs in Rust; they are fundamental constraints of kernel development. Rust just makes you face them up front.
If you survive the unsafe infection, the allocator deadlock, and the interrupt unwinding nightmares, what you get is a kernel where most panics are real bugs, not null dereferences, and where memory safety violations are rare enough to be shocking.
Would I do it again? Yes. But I'd keep this list on my wall.
Top comments (0)