Page Tables: A Love Story (It's Not)

#osdev #programming #virtualmemory #x86

Dear diary, today I discovered that leaving the comfortable embrace of UEFI is like moving out of your parents' house at 40. Everything that used to work magically now requires you to actually understand how the world functions.

It was 9am when I sat down with my coffee, confident that transitioning from UEFI to bare metal would be straightforward. After all, I had successfully implemented AHCI storage and a key-value store. How hard could it be to set up a Global Descriptor Table and start running my own kernel? The hubris was palpable.

The plan seemed reasonable: call ExitBootServices, set up a proper GDT for 64-bit long mode, get polling keyboard input working, and run a kernel shell. I'd even built a logging system that writes directly to the SSD so I could debug across reboots. What could possibly go wrong?

Everything. Everything could go wrong.

The first attempt was promising. ExitBootServices succeeded, the GDT loaded without complaint, and I was running in kernel mode. I could even see my kernel shell prompt. Victory seemed assured until I tried to enable interrupts with a confident sti instruction.

The machine triple-faulted immediately.

Now, a triple fault is the x86 processor's way of saying "I give up, you're on your own" before performing the digital equivalent of flipping the table and storming out. It's simultaneously the most and least helpful error condition - you know something is catastrophically wrong, but the CPU has decided that telling you what would be too much effort.

I spent the next two hours in what I like to call the "interrupt denial phase." Surely it wasn't the interrupts themselves. Maybe the GDT was wrong. I rewrote it three times, each iteration more baroque than the last. Maybe the stack was corrupted. I added stack canaries and verification code. Maybe UEFI had left some state that was interfering. I tried clearing every register I could think of.

The machine continued to triple fault with the same mechanical precision that I continued to make coffee.

By noon, I had accepted that interrupts were indeed the problem and decided to punt. Polling keyboard input wasn't elegant, but it would work. I implemented a simple PS/2 controller polling loop and got basic keyboard input working. The kernel shell was functional, and I could even save logs to the SSD. Milestone 5 was technically complete, but it felt like winning a race by getting out and pushing the car across the finish line.

But you know what they say about kernel development - if you're not moving forward, you're moving backward into a triple fault. So naturally, I decided to tackle interrupts properly for Milestone 6.

The afternoon was spent in the IDT mines, crafting interrupt service routines with the careful precision of a medieval scribe copying manuscripts. I wrote elegant macro systems that generated perfect stack frames. I created sophisticated handlers that could gracefully manage any interrupt condition. The code was beautiful, abstracted, and completely broken.

The first test with interrupts enabled produced a Debug Exception (Vector 1) immediately after sti. This was actually progress - instead of a triple fault, I was getting a specific exception. The CPU was at least trying to tell me what was wrong, even if what it was telling me made no sense.

Debug exceptions fire when you hit a debug register breakpoint or when the trap flag is set for single-stepping. I wasn't using any debugger, and I certainly hadn't set the trap flag intentionally. But x86 processors are like that relative who remembers every slight from thirty years ago - they hold onto state in the most inconvenient places.

It took me another hour to realize that UEFI might have left debugging state enabled. I added code to clear all the debug registers (DR0 through DR7) and the trap flag in RFLAGS. The debug exception disappeared, but now I had a new problem: the timer interrupt wasn't firing.

This began what I now refer to as "the silent treatment phase" of debugging. The PIC was configured, the IDT was set up, interrupts were enabled, but my timer tick counter remained stubbornly at zero. The system wasn't crashing, which was somehow more frustrating than when it was exploding spectacularly.

I verified the PIC configuration seventeen times. I read Intel manuals until my eyes bled. I checked and rechecked the IDT entries. Everything looked correct on paper, but the hardware seemed to be politely ignoring my carefully crafted interrupt handlers.

The breakthrough came at 6pm when I was explaining the problem to my rubber duck (a literal rubber duck I keep on my desk for debugging purposes - don't judge). As I described my elegant ISR macro system, I realized the problem: I was being too clever.

My macros were generating complex stack frame management code that was somehow corrupting the interrupt return address. When I looked at the actual assembly output, it was a nightmare of stack manipulation that would make a spaghetti factory jealous.

So I threw it all away and wrote the simplest possible interrupt handlers using naked functions with inline assembly. No fancy macros, no elegant abstractions, just the bare minimum code to handle an interrupt and return cleanly:

__attribute__((naked)) void isr_timer(void) {
    asm volatile (
        "push %rax\n"
        "incq g_timer_ticks\n"
        "movb $0x20, %al\n"
        "outb %al, $0x20\n"  // Send EOI
        "pop %rax\n"
        "iretq"
    );
}

It was inelegant. It was primitive. It worked perfectly.

The moment I enabled interrupts with the new handlers, the timer immediately started ticking at exactly 100 Hz. The keyboard interrupt began capturing input flawlessly. After eight hours of fighting with sophisticated abstractions, the solution was to write interrupt handlers like it was 1985.

There's something profoundly humbling about spending an entire day implementing "modern" kernel architecture only to discover that the most primitive approach is the most reliable. It's like spending hours crafting a gourmet meal and then realizing that a peanut butter sandwich would have been both more satisfying and less likely to poison you.

By evening, I had a fully functional interrupt-driven kernel. The timer was ticking, the keyboard was responsive, and the kernel shell worked flawlessly. I could watch the timer ticks increment in real-time, each one a small victory over the chaos of bare metal programming.

The final test was letting the system run while I went to make dinner. When I returned, the timer showed 3,432 ticks - exactly 34 seconds of stable operation. No crashes, no mysterious hangs, no triple faults. Just a kernel quietly doing its job, handling dozens of interrupts per second with the reliability of a Swiss timepiece.

I saved the kernel log to review later:

[KERNEL] Enabling interrupts (STI)...
[KERNEL] Interrupts ENABLED.
[KERNEL] Timer ticks after delay: 199
[KERNEL] Kernel mode active (interrupt mode)

Those simple log messages represent eight hours of debugging, three complete rewrites of the interrupt system, and more coffee than any human should consume in a single day. But they also represent something more: a functioning kernel that has successfully transitioned from UEFI's protective embrace to the harsh reality of bare metal operation.

Looking back, the lessons are clear. First, x86 processors remember everything and forgive nothing - always clear the debug registers when transitioning from UEFI. Second, the PIC hasn't changed significantly since the 1980s, and trying to abstract away its quirks usually makes things worse. Third, when sophisticated solutions fail, sometimes the answer is to write code like it's three decades ago.

Most importantly, I learned that there's a particular satisfaction in building something from first principles, even when those principles seem designed to maximize human suffering. Every successful interrupt is a small victory over the entropy of the universe. Every timer tick is proof that somewhere in the chaos of transistors and electrons, my code is executing exactly as intended.

Tomorrow I'll tackle content-addressed storage and time travel debugging. Because apparently, I haven't suffered enough yet, and the beauty of hobby OS development is that there's always another layer of complexity waiting to humble you.

But tonight, I'm going to sit here and watch my timer tick counter increment, one interrupt at a time, and pretend that building an operating system is a reasonable way to spend one's free time.