debug

Posted on Mar 15

Interrupt Handlers as Locks: A Deadlock-Free SMP Design

#osdev #kernel #interrupthandling

Interrupt Handlers Design Document

The Problem

Multicore kernel development is plagued by synchronization challenges:

Spinlocks waste CPU cycles and cause cache line bouncing
Mutexes require complex ownership tracking and can lead to priority inversion
Lock ordering must be carefully designed to avoid deadlocks
Atomic operations need memory barriers and are error-prone
Each new subsystem reinvents its own locking strategy

What if we could build synchronization directly into the hardware's existing interrupt mechanism? This is a follow up to my Physical Memory Manager design.

The Key Insight

When a CPU receives an interrupt, it has no choice. It must transfer control to the registered handler. There's no skipping, no deferring, no excuses. This mandatory behavior is enforced by the CPU itself.

This makes interrupts the perfect foundation for synchronization.

A Simple Interrupt Architecture

Most kernels hardcode handlers directly in the IDT. I add one level of indirection:

1. The IDT Entries (Static)

; Each interrupt has a tiny prestub
irq1:
    cli
    pushl $0        ; Dummy error code
    pushl $33       ; Interrupt number
    jmp common_stub

These prestubs do three things:

Push an error code (dummy if the CPU didn't provide one)
Push the interrupt number
Jump to a common stub

2. The Common Stub

common_stub:
    pushal          ; Save all registers

    pushl %ds       ; Save data segment
    movl $0x10, %eax; Load kernel segments
    movw %ax, %ds
    movw %ax, %es
    movw %ax, %fs
    movw %ax, %gs

    call int_handle ; Call our C handler finder

The common stub handles the architecture-specific context saving, then calls into C code.

3. The Handler Finder (C)

// Global array of function pointers - THIS IS THE KEY
inth_t handles[256];

void int_handle(register_t layout) {
    // Look up the handler for this interrupt
    inth_t handle = handles[layout.inum];

    // Call it
    handle(layout);

    // Acknowledge PIC if needed
    if (layout.inum >= 40) outb(PICS_CMD, PIC_EOI);
    outb(PICM_CMD, PIC_EOI);
}

// Simple function to swap handlers
void reg_handle(uint8_t intidx, inth_t handle) {
    handles[intidx] = handle;  // Just a pointer write
}

That's it. The entire architecture is:

Static IDT entries that only know their interrupt number
A common stub that saves context
A global array of function pointers
A finder function that dereferences the array

The Magic: Handler Swapping

Because the actual behavior is just a function pointer in global memory, changing behavior is instantaneous and global:

// Normal keyboard handler
void kbd_normal(register_t layout) {
    uint8_t scancode = inb(0x60);
    // ... process key ...
}

// "Busy" keyboard handler - does nothing
void kbd_busy(register_t layout) {
    outb(0x20, 0x20);  // Just acknowledge
    // Return immediately
}

// To temporarily disable keyboard handling:
reg_handle(33, kbd_busy);    // All cores now use busy handler (where int 33/0x21 is just irq 1 (irq are mapped with offset 32/0x20 here)
// Do critical work...
reg_handle(33, kbd_normal);   // All cores back to normal

When core 0 changes the handler for interrupt 33:

All cores see the change immediately (the array is global)
Any core that gets interrupt 33 now runs the busy handler
No special IPC or cross-core messaging needed

The Lock Pattern

Now extend this to any operation that needs synchronization:

// Define interrupts for your operations
#define PMM_ALLOC_INT 0x80
#define PMM_FREE_INT  0x81

// Normal handlers
void pmm_alloc_normal(register_t layout) {
    // Perform allocation
    void* result = internal_alloc(layout.ebx);
    layout.eax = (uint32_t)result;
}

void pmm_alloc_busy(register_t layout) {
    layout.eax = -1;  // Signal "try again"
}

// To perform an allocation safely on SMP:
uint32_t smp_safe_alloc(uint32_t pages) {
    // "Acquire lock" - swap to busy handler
    reg_handle(PMM_ALLOC_INT, pmm_alloc_busy);

    // At this moment, ALL cores now see the busy handler
    // Only this core can successfully allocate

    // Perform the actual allocation
    uint32_t result = internal_alloc(pages);

    // "Release lock" - restore normal handler
    reg_handle(PMM_ALLOC_INT, pmm_alloc_normal);

    return result;
}

The Critical Insight

When another core calls PMM_ALLOC_INT while the busy handler is active:

The CPU forces it to enter the interrupt handler
The handler finder sees pmm_alloc_busy in the global array
The busy handler executes and returns -1
The calling core gets immediate feedback: "try again"

No spinning. No waiting. No deadlocks.

Why This Is Deadlock-Free

Consider a traditional spinlock:

Core A: spinlock_acquire(&lock) → while(locked) { pause(); }
Core B: spinlock_acquire(&lock) → while(locked) { pause(); }

If Core A dies while holding the lock, Core B spins forever. Deadlock.

With handler swapping:

Core A: reg_handle(INT, busy) → do_work() → reg_handle(INT, normal)
Core B: calls INT → gets -1 immediately

If Core A dies while in the critical section:

The busy handler remains installed
Other cores calling the interrupt get -1
They can detect this (repeated -1 responses) and take recovery action
No core is stuck waiting - they all get immediate feedback

The Retry Pattern

Callers can implement simple retry logic:

uint32_t retry_alloc(uint32_t pages) {
    uint32_t result;
    int retries = 0;

    do {
        // Call the interrupt
        __asm__ volatile (
            "movl %1, %%ebx\n"
            "int $0x80\n"
            "movl %%eax, %0"
            : "=r"(result)
            : "r"(pages)
            : "eax", "ebx"
        );

        if (result != -1) return result;

        // Busy - wait a bit and retry
        for (int i = 0; i < (1 << retries); i++) {
            __asm__ volatile ("pause");
        }
        retries++;

    } while (retries < MAX_RETRIES);

    return -1; // Operation genuinely failed
}

The Benefits

1. No Spinlocks

No wasted CPU cycles spinning
No cache line contention
No need for atomic operations

2. No Deadlocks

Cores never wait - they get immediate feedback
No lock ordering to manage
No priority inversion (no waiting = no inversion)

3. Automatic Cross-Core Synchronization

One pointer write affects all cores
No IPIs needed for lock state changes
The interrupt controller handles distribution

4. Simple Retry Semantics

-1 means "busy, try again"
Other values mean "operation complete" or "genuine failure"
Callers decide their retry strategy

5. Reusable Pattern

Device drivers (like the keyboard example)
Memory management (allocation/free)
System calls
Inter-process communication
Any operation that needs atomicity

Practical Considerations

Multiple Locks

Different interrupts protect different resources:

#define LOCK_KEYBOARD   33
#define LOCK_PMM_ALLOC  0x80
#define LOCK_PMM_FREE   0x81
#define LOCK_FILESYSTEM 0x82

Each is independent. Core A can hold the PMM alloc lock while Core B holds the filesystem lock.

Nested Critical Sections

What if a handler needs to call another synchronized operation?

void pmm_alloc_normal(register_t layout) {
    // We need more metadata pages
    if (need_expansion()) {
        // This will call the filesystem to allocate
        uint32_t page = smp_safe_fs_alloc(); 
        // Use the page...
    }
}

This works because:

The PMM alloc lock is held (busy handler installed)
The filesystem alloc has its own interrupt vector
Different locks = no deadlock
The filesystem call will retry if its lock is busy

Detecting Stuck Locks

If a core dies with a busy handler installed, other cores see repeated -1 responses. The system can:

Detect the pattern (e.g., 1000 retries all returning -1)
Assume the lock is abandoned
Reset the handler to normal
Log the recovery for debugging

Comparison to Traditional Approaches

Aspect	Spinlocks	Mutexes	Handler Swapping
Waiting	Spins CPU	Sleeps	No waiting - immediate return
Deadlock risk	Yes	Yes	No (no waiting)
Cross-core	IPI needed	IPI needed	Automatic via handler array
Cache impact	High (bouncing)	Low	Minimal (one pointer write)
Complexity	Medium	High	Low
Hardware required	None	Scheduler	Only interrupts

The Philosophy

This design embodies a simple principle: let the hardware do what it's good at.

CPUs are exceptional at handling interrupts. They've been optimized for decades to:

Save context quickly
Dispatch to handlers efficiently
Return to interrupted work seamlessly
Maintain core isolation

Traditional locking fights this by trying to bypass or augment the interrupt system. This design embraces it.

Try It Yourself (Basic C and Assembly required)

This pattern works on any architecture with vectored interrupts:

x86: Use INT instructions for software interrupts, IRQs for hardware
ARM: Use the GIC and SVC instructions
RISC-V: Use the CLINT/PLIC and ECALL

The core requirements are minimal:

A way to trigger software interrupts
A global array of function pointers
A common stub that saves context
A finder function that dereferences the array

Conclusion

We've built a complete multicore synchronization system using:

Existing hardware (interrupts, IDT)
Existing code paths (stubs, handlers)
One level of indirection (the global handler array)
One simple pattern (swap handler → do work → swap back)

No spinlocks. No atomic operations. No lock ordering. Just function pointers and the CPU's own interrupt mechanism.

The lock is just a function pointer. The unlock is just a function pointer. Everything else is the CPU doing its job.

Contact

Author: debug
E-Mail: debugcodefurry@gmail.com

I would really appreciate you feedback. Feel free to contact me to clarify anything.

License

Released into the public domain under CC0 1.0 Universal
Feel free to use, modify, and distribute without restriction, just make sure to give me credit.

DEV Community