Interrupt Handlers Design Document
The Problem
Multicore kernel development is plagued by synchronization challenges:
- Spinlocks waste CPU cycles and cause cache line bouncing
- Mutexes require complex ownership tracking and can lead to priority inversion
- Lock ordering must be carefully designed to avoid deadlocks
- Atomic operations need memory barriers and are error-prone
- Each new subsystem reinvents its own locking strategy
What if we could build synchronization directly into the hardware's existing interrupt mechanism? This is a follow up to my Physical Memory Manager design.
The Key Insight
When a CPU receives an interrupt, it has no choice. It must transfer control to the registered handler. There's no skipping, no deferring, no excuses. This mandatory behavior is enforced by the CPU itself.
This makes interrupts the perfect foundation for synchronization.
A Simple Interrupt Architecture
Most kernels hardcode handlers directly in the IDT. I add one level of indirection:
1. The IDT Entries (Static)
; Each interrupt has a tiny prestub
irq1:
cli
pushl $0 ; Dummy error code
pushl $33 ; Interrupt number
jmp common_stub
These prestubs do three things:
- Push an error code (dummy if the CPU didn't provide one)
- Push the interrupt number
- Jump to a common stub
2. The Common Stub
common_stub:
pushal ; Save all registers
pushl %ds ; Save data segment
movl $0x10, %eax; Load kernel segments
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs
call int_handle ; Call our C handler finder
The common stub handles the architecture-specific context saving, then calls into C code.
3. The Handler Finder (C)
// Global array of function pointers - THIS IS THE KEY
inth_t handles[256];
void int_handle(register_t layout) {
// Look up the handler for this interrupt
inth_t handle = handles[layout.inum];
// Call it
handle(layout);
// Acknowledge PIC if needed
if (layout.inum >= 40) outb(PICS_CMD, PIC_EOI);
outb(PICM_CMD, PIC_EOI);
}
// Simple function to swap handlers
void reg_handle(uint8_t intidx, inth_t handle) {
handles[intidx] = handle; // Just a pointer write
}
That's it. The entire architecture is:
- Static IDT entries that only know their interrupt number
- A common stub that saves context
- A global array of function pointers
- A finder function that dereferences the array
The Magic: Handler Swapping
Because the actual behavior is just a function pointer in global memory, changing behavior is instantaneous and global:
// Normal keyboard handler
void kbd_normal(register_t layout) {
uint8_t scancode = inb(0x60);
// ... process key ...
}
// "Busy" keyboard handler - does nothing
void kbd_busy(register_t layout) {
outb(0x20, 0x20); // Just acknowledge
// Return immediately
}
// To temporarily disable keyboard handling:
reg_handle(33, kbd_busy); // All cores now use busy handler (where int 33/0x21 is just irq 1 (irq are mapped with offset 32/0x20 here)
// Do critical work...
reg_handle(33, kbd_normal); // All cores back to normal
When core 0 changes the handler for interrupt 33:
- All cores see the change immediately (the array is global)
- Any core that gets interrupt 33 now runs the busy handler
- No special IPC or cross-core messaging needed
The Lock Pattern
Now extend this to any operation that needs synchronization:
// Define interrupts for your operations
#define PMM_ALLOC_INT 0x80
#define PMM_FREE_INT 0x81
// Normal handlers
void pmm_alloc_normal(register_t layout) {
// Perform allocation
void* result = internal_alloc(layout.ebx);
layout.eax = (uint32_t)result;
}
void pmm_alloc_busy(register_t layout) {
layout.eax = -1; // Signal "try again"
}
// To perform an allocation safely on SMP:
uint32_t smp_safe_alloc(uint32_t pages) {
// "Acquire lock" - swap to busy handler
reg_handle(PMM_ALLOC_INT, pmm_alloc_busy);
// At this moment, ALL cores now see the busy handler
// Only this core can successfully allocate
// Perform the actual allocation
uint32_t result = internal_alloc(pages);
// "Release lock" - restore normal handler
reg_handle(PMM_ALLOC_INT, pmm_alloc_normal);
return result;
}
The Critical Insight
When another core calls PMM_ALLOC_INT while the busy handler is active:
- The CPU forces it to enter the interrupt handler
- The handler finder sees
pmm_alloc_busyin the global array - The busy handler executes and returns -1
- The calling core gets immediate feedback: "try again"
No spinning. No waiting. No deadlocks.
Why This Is Deadlock-Free
Consider a traditional spinlock:
Core A: spinlock_acquire(&lock) → while(locked) { pause(); }
Core B: spinlock_acquire(&lock) → while(locked) { pause(); }
If Core A dies while holding the lock, Core B spins forever. Deadlock.
With handler swapping:
Core A: reg_handle(INT, busy) → do_work() → reg_handle(INT, normal)
Core B: calls INT → gets -1 immediately
If Core A dies while in the critical section:
- The busy handler remains installed
- Other cores calling the interrupt get -1
- They can detect this (repeated -1 responses) and take recovery action
- No core is stuck waiting - they all get immediate feedback
The Retry Pattern
Callers can implement simple retry logic:
uint32_t retry_alloc(uint32_t pages) {
uint32_t result;
int retries = 0;
do {
// Call the interrupt
__asm__ volatile (
"movl %1, %%ebx\n"
"int $0x80\n"
"movl %%eax, %0"
: "=r"(result)
: "r"(pages)
: "eax", "ebx"
);
if (result != -1) return result;
// Busy - wait a bit and retry
for (int i = 0; i < (1 << retries); i++) {
__asm__ volatile ("pause");
}
retries++;
} while (retries < MAX_RETRIES);
return -1; // Operation genuinely failed
}
The Benefits
1. No Spinlocks
- No wasted CPU cycles spinning
- No cache line contention
- No need for atomic operations
2. No Deadlocks
- Cores never wait - they get immediate feedback
- No lock ordering to manage
- No priority inversion (no waiting = no inversion)
3. Automatic Cross-Core Synchronization
- One pointer write affects all cores
- No IPIs needed for lock state changes
- The interrupt controller handles distribution
4. Simple Retry Semantics
- -1 means "busy, try again"
- Other values mean "operation complete" or "genuine failure"
- Callers decide their retry strategy
5. Reusable Pattern
- Device drivers (like the keyboard example)
- Memory management (allocation/free)
- System calls
- Inter-process communication
- Any operation that needs atomicity
Practical Considerations
Multiple Locks
Different interrupts protect different resources:
#define LOCK_KEYBOARD 33
#define LOCK_PMM_ALLOC 0x80
#define LOCK_PMM_FREE 0x81
#define LOCK_FILESYSTEM 0x82
Each is independent. Core A can hold the PMM alloc lock while Core B holds the filesystem lock.
Nested Critical Sections
What if a handler needs to call another synchronized operation?
void pmm_alloc_normal(register_t layout) {
// We need more metadata pages
if (need_expansion()) {
// This will call the filesystem to allocate
uint32_t page = smp_safe_fs_alloc();
// Use the page...
}
}
This works because:
- The PMM alloc lock is held (busy handler installed)
- The filesystem alloc has its own interrupt vector
- Different locks = no deadlock
- The filesystem call will retry if its lock is busy
Detecting Stuck Locks
If a core dies with a busy handler installed, other cores see repeated -1 responses. The system can:
- Detect the pattern (e.g., 1000 retries all returning -1)
- Assume the lock is abandoned
- Reset the handler to normal
- Log the recovery for debugging
Comparison to Traditional Approaches
| Aspect | Spinlocks | Mutexes | Handler Swapping |
|---|---|---|---|
| Waiting | Spins CPU | Sleeps | No waiting - immediate return |
| Deadlock risk | Yes | Yes | No (no waiting) |
| Cross-core | IPI needed | IPI needed | Automatic via handler array |
| Cache impact | High (bouncing) | Low | Minimal (one pointer write) |
| Complexity | Medium | High | Low |
| Hardware required | None | Scheduler | Only interrupts |
The Philosophy
This design embodies a simple principle: let the hardware do what it's good at.
CPUs are exceptional at handling interrupts. They've been optimized for decades to:
- Save context quickly
- Dispatch to handlers efficiently
- Return to interrupted work seamlessly
- Maintain core isolation
Traditional locking fights this by trying to bypass or augment the interrupt system. This design embraces it.
Try It Yourself (Basic C and Assembly required)
This pattern works on any architecture with vectored interrupts:
x86: Use INT instructions for software interrupts, IRQs for hardware
ARM: Use the GIC and SVC instructions
RISC-V: Use the CLINT/PLIC and ECALL
The core requirements are minimal:
- A way to trigger software interrupts
- A global array of function pointers
- A common stub that saves context
- A finder function that dereferences the array
Conclusion
We've built a complete multicore synchronization system using:
- Existing hardware (interrupts, IDT)
- Existing code paths (stubs, handlers)
- One level of indirection (the global handler array)
- One simple pattern (swap handler → do work → swap back)
No spinlocks. No atomic operations. No lock ordering. Just function pointers and the CPU's own interrupt mechanism.
The lock is just a function pointer. The unlock is just a function pointer. Everything else is the CPU doing its job.
Contact
Author: debug
E-Mail: debugcodefurry@gmail.com
I would really appreciate you feedback. Feel free to contact me to clarify anything.
License
Released into the public domain under CC0 1.0 Universal
Feel free to use, modify, and distribute without restriction, just make sure to give me credit.
Top comments (0)