Table Of Contents
- Introduction
- Project Overview
- Working Features
- Concepts
- Debugging
- What I'd Like to Add Next
- What I Learned
- I'm Open to Feedback
- References & Resources
Introduction
I've always been fascinated by operating systems. One of the main reasons I learned the C programming language was to understand OS development. So, after finishing my last project (a mini shell), I couldn’t resist developing a very minimal kernel for an x86 32-bit CPU.
Most of my time wasn’t spent coding. In fact it was spent understanding the concepts. Finding good resources was challenging, but I learned so much. It even helped me revisit the OS concepts I studied at university.
This post covers the current features of the kernel, the concepts I learned, an explanation of how each one works, the most challenging parts, and what I’d like to add next. Many code snippets were sourced from various tutorials, but understanding even 15 lines of code often took hours. This project wasn’t just about writing code, it was about grasping the architecture of how a kernel works. You can view the full code on my GitHub page.
Project Overview
My mini kernel currently prints a classic "Hello World" on the screen, handles interrupts (both exceptions and IRQs), and includes keyboard and timer drivers. It also supports minimal memory management, setting up a heap and implementing physical memory allocation and freeing.
Terminal output is handled via a serial driver.
Tools used:
- Emulator:
QEMU
- Assembler:
NASM
- Cross Compiler:
i686-elf-gcc
- Linker: GNU Linker from Binutils
- Bootloader:
GRUB
- Debugger:
GDB
withQEMU
- Build:
Makefile
Demo:
Working Features
These are the features currently implemented:
-
Display
- Prints: "Hello, Welcome To Yega Kernel!"
- Implemented via
VGA
text mode
-
Descriptor Tables
- Global Descriptor Table (GDT)
- Interrupt Descriptor Table (IDT)
-
Interrupts
- Remaps the PIC
- Handles CPU exceptions and hardware IRQs via ISRs
-
Drivers
- Keyboard driver (prints keys using VGA)
- Timer driver (ticks clock at 100 MHz)
-
Memory Management
- Inspects free memory using Multiboot memory map
- Initializes heap
- Implements
kalloc
andkfree
-
Error Handling
- Halts CPU and disables interrupts if boot errors occur
Concepts
Boot Flow
Understanding the boot sequence is essential. Here’s a simplified modern boot flow:
- BIOS/UEFI starts and loads GRUB
- GRUB parses the ELF kernel file
- GRUB loads the kernel at a known location (1MB or 2MB)
- GRUB jumps to kernel entry
- Kernel sets up paging
- Kernel enables paging
- Kernel jumps to higher-half address space
Also, it’s critical to understand CPU modes like Real Mode, Protected Mode, and Long Mode.
At this stage, these were some of the big questions I had, each one led me to dig deeper and truly understand how early system boot and memory addressing worked:
Why do modern OSes still boot in Real Mode?
It felt outdated and unnecessary. why start in a 16-bit mode? Turns out, it’s for legacy compatibility. Even UEFI systems simulate Real Mode for early boot.
How was all memory addressed with only a 20-bit address bus and 16-bit registers?
This confused me a lot. With a 20-bit address bus (1MB total addressable memory) and only 16-bit registers (64KiB max value), how could we access the full memory? I learned that segment registers are used with an offset (segment:offset) to calculate addresses, effectively extending the range.
Why does the bootloader load the kernel at low memory addresses like 1MB or 2MB?
I didn’t understand the choice at first. But it's rooted in convention (1MB is where Real Mode ends), and it ensures compatibility with older memory maps and leaves low memory available for BIOS data.
Cross Compiler
A cross compiler is needed because the default host compiler includes platform-specific libraries (like glibc). With a cross compiler, you:
- Generate binaries for a target system
- Avoid linking host-specific libraries
- Define your own runtime
These are some terms and questions I looked up:
- Going Self-hosted
- Bootstrapping
- How to build a cross compiler
ELF (Executable and Linkable Format)
The ELF format tells the OS how to load and run your binary. It's used for:
- Executables
- Object files
- Shared libraries
- Core dumps
I also was wondering why this file format is so special and is used in LINUX:
it's not just any binary format. It's a well-defined standard that the OS kernel, compiler, linker, loader, and debugger all agree on — to communicate how a program is laid out in memory and how to run it.
GNU Linker
At first it was surprising to me that I had to write my own kernel script, because up until now I only thought that the linker is only used to link the object files (which is true, but it's a lot more than that).
Linker controls the memory layout of our program.
The linker:
- Combines object files
- Assigns memory addresses
- Resolves symbols
- Produces the final binary
A custom linker script lets you define memory layout: where code starts, where .text
, .data
, .bss
go, and what the entry point is.
Multiboot Header
Based on the OSDev Tutorial:
There exists a Multiboot Standard that defines a simple interface between the bootloader and the operating system kernel.
It works by placing a few magic values in specific global variables (known as the multiboot header), which the bootloader searches for.
When the bootloader finds these values, it recognizes the kernel as multiboot-compatible, knows how to load it, and can even pass important information such as memory maps.
I used the NASM
assembler. The code is based on this resource.
Important Note:
Since there is no stack set up yet and you must ensure the global variables are initialized correctly, this initialization has to be done in assembly.
Because I’ve always programmed at the user level, I didn’t realize that when the bootloader first loads our kernel, there is no stack; so using C right away is impossible. We must set up the stack and stack pointers in this assembly file first.
Also, make sure to properly set up the EBX
and EAX
registers!
I wasn’t aware of this until near the end of my project. GRUB passes a lot of crucial information to your kernel_main
, with the most important being the address of the memory map. You can use this memory map to inspect the memory layout, which is essential for setting up memory management.
GRUB stores the address of this information in the EBX
register. You need to define structs that exactly match the layout GRUB uses. You can read more about this in the GRUB Multiboot Specification.
VGA Text Mode Buffer
VGA text mode buffer starts at 0xB8000
. Writing here displays characters on the screen. It’s the easiest way to output text early in boot.
Later, I added a serial driver for debugging via COM1
.
Segmentation & Flat Memory Model
One of the many steps in building a mini hobby kernel is setting up the Global Descriptor Table (GDT). It’s a table of segment descriptors.
Segmentation was a big question for me. Back in early OSes, segmentation felt like a hack to access all memory (remember the 20-bit address bus and 16-bit registers?). It also helped protect processes, which is why segmentation and segment registers were necessary. But while reading about this, I wondered: what’s the point of segment registers in modern OSes running in protected mode? I spent a lot of time trying to find the answer, and honestly, I’m still not 100% convinced. One reason is that segment registers are crucial for the CPU; it won’t work properly without them.
If you look at this table from OSDev, you’ll see that the base and limit for all segments are the same. That’s because we’re not really using a segmented memory model. Instead, there is one big segment that covers the entire memory, unlike the traditional segmented memory model:
Offset | Use | Content |
---|---|---|
0x0000 | Null Descriptor | Base = 0 Limit = 0x00000000 Access Byte = 0x00 Flags = 0x0 |
0x0008 | Kernel Mode Code Segment | Base = 0 Limit = 0xFFFFF Access Byte = 0x9A Flags = 0xC |
0x0010 | Kernel Mode Data Segment | Base = 0 Limit = 0xFFFFF Access Byte = 0x92 Flags = 0xC |
0x0018 | User Mode Code Segment | Base = 0 Limit = 0xFFFFF Access Byte = 0xFA Flags = 0xC |
0x0020 | User Mode Data Segment | Base = 0 Limit = 0xFFFFF Access Byte = 0xF2 Flags = 0xC |
0x0028 | Task State Segment | Base = &TSS Limit = sizeof(TSS)-1 Access Byte = 0x89 Flags = 0x0 |
It’s subtle in a flat memory model, but imagine someone tells the CPU: “Go to this address and run code” or “Go to this address and get data.” It sounds correct, but it’s not that simple. How does the CPU know if it’s allowed to execute code or read data from that address? What if it doesn’t have permission? What if it shouldn’t access that memory at all?
Segments tell the CPU where data starts, whether user-mode is allowed to access it, and more.
If we weren’t in a flat memory model, each segment would have different base and limit values.
PIC Remapping
The Programmable Interrupt Controller (PIC) — in legacy x86 systems, the Intel 8259A
chip — is responsible for:
- Accepting interrupt signals (IRQs) from hardware devices
- Deciding which IRQ to send to the CPU based on priority and masking
- Sending the corresponding interrupt vector to the CPU when requested
Without the PIC, the CPU would have no sane way to handle multiple interrupt sources.
Why remap the PIC?
By default, the PIC maps IRQ0-15 to interrupt vectors 0-15. This clashes with CPU exceptions like divide-by-zero (vector 0) or page fault (vector 14), making it impossible to distinguish hardware interrupts from CPU exceptions. So, remapping is essential to avoid this conflict.
This is a very good Example of what happens when an interrupt happens that I found in ChatGPT:
You Press a Key:
- The keyboard controller tells PIC to cause an interrupt
- The controller sends IRQ1 to PIC
- PIC decides whether the CPU should be immediately notified or not and translate the IRQ number into a [[Interrupt Vector]] for the CPU's table
- PIC forwards this interrupt to CPU
- CPU jumps to ISR for vector 33
- The OS is supposed to handle the interrupt by talking to the keyboard, via
in
andout
instructions (orinportb
/outportb
,inportw
/outportw
, andinportd
/outportd
in C) - Asking what key was pressed, doing something about it (such as displaying the key on the screen, and notifying the current application that a key has been pressed) and returning to whatever code was executing when the interrupt came in
- ISR reads scancode from I/O port
0x60
- ISR decodes key, puts it in a buffer
- Sends EOI
- CPU resumes
What's the difference between controller and driver? (for example keyboard controller and keyboard driver)
Controller: A physical chip managing hardware communication (e.g., keyboard controller with ports
0x60
for data and0x64
for commands/status). It only sends raw scancodes. It doesn’t interpret key presses.Driver: Kernel code that interacts with the controller, interprets scancodes, and acts on them (e.g., printing characters on the screen).
Memory Management
This is the part that i loved the MOST. Because finally I could write some code myself and decide what kind of design I want to have.
Heap Initialization
Used Multiboot memory map to find free memory blocks.
I wrote a function that checked available blocks (up to 32 blocks because at this point we cannot use dynamic memory allocation.):
int find_available_memory(multiboot_info_t *mbi) {
serial_writestring("\nmmap addr= ");
serial_writehex(mbi->mmap_addr);
serial_writestring("\nmmap length= ");
serial_writehex(mbi->mmap_length);
int num_block = 0;
uint8_t *mmap = (uint8_t *)mbi->mmap_addr;
uint8_t *mmap_end = mmap + mbi->mmap_length;
serial_writestring("\nflags= ");
serial_writehex(mbi->flags);
if (!CHECK_FLAG(mbi->flags, 6))
return 0;
while (mmap < mmap_end) {
multiboot_mmap_entry_t *entry = (multiboot_mmap_entry_t *)mmap;
serial_writestring("\nentry addr= ");
serial_writehex(entry->addr);
serial_writestring("\nentry len= ");
serial_writehex(entry->len);
serial_writestring("\nentry type= ");
serial_writehex(entry->type);
serial_writestring("\nentry size= ");
serial_writehex(entry->size);
if (num_block < MAX_MEMORY_BLOCKS && entry->type == 1) {
uint64_t entry_start = entry->addr;
uint64_t entry_end = entry_start + entry->len;
if (entry_start <= KERNEL_END && entry_end > KERNEL_END) {
free_memory_blocks[num_block].start = KERNEL_END;
free_memory_blocks[num_block].end = entry_end;
num_block++;
} else if (entry_start > KERNEL_END) {
free_memory_blocks[num_block].start = entry_start;
free_memory_blocks[num_block].end = entry_end;
num_block++;
}
}
mmap += entry->size + sizeof(entry->size);
}
return num_block;
}
This gives us important information about where to start our heap:
heap_start = free_memory_blocks[0].start;
heap_start = ALIGNUP(heap_start, PAGE_SIZE);
heap_end = heap_start + heap_size;
kalloc_ptr = heap_start;
kalloc
I use a pointer,
kalloc_ptr
, which points to the next free block on the heap.During memory management initialization,
kalloc_ptr
is set to the heap start address (heap_start
).Each memory block has a header describing its size, whether it’s free, and a pointer to the next block:
typedef struct heap_block {
size_t size;
bool is_freed;
struct heap_block *next;
} heap_block_t;
Allocation works like a simplified malloc
:
If there’s no existing free block big enough to satisfy the request, a new block is allocated at
kalloc_ptr
.I implemented a First-Fit strategy: scan from
kalloc_ptr
for the first free block whose size is ≥ requested size.
When allocating, kalloc_ptr
is advanced by the total required size, which includes the block header:
size_t total_req = req + sizeof(heap_block_t);
...
kalloc_ptr += total_req;
We add sizeof(heap_block_t)
because the header itself consumes memory. When returning a pointer to the user, we skip the header:
return (void *)(curr + 1);
This is the implementation of kalloc
:
heap_block_t *head = NULL;
void *kalloc(size_t req) {
req = ALIGNUP(req, ALIGN);
size_t total_req = req + sizeof(heap_block_t);
if ((kalloc_ptr + total_req) > heap_end) {
serial_writestring("Not enough memory!\n");
return NULL;
}
heap_block_t *curr = head;
heap_block_t *prev = NULL;
while (curr) {
if (curr->is_freed && curr->size >= req) {
curr->is_freed = false;
return (void *)(curr + 1);
}
prev = curr;
curr = curr->next;
}
curr = (heap_block_t *)kalloc_ptr;
curr->size = req;
curr->is_freed = false;
curr->next = NULL;
if (!head)
head = curr;
else
prev->next = curr;
kalloc_ptr += total_req;
return (void *)(curr + 1);
}
kfree
Freeing memory is much simpler.
The user passes a pointer to the allocated memory.
To access the block header, I subtract 1 from the pointer (since the header is placed just before the returned memory).
I mark the block as free.
If the freed block is the last one (i.e.
node->next == NULL
), I movekalloc_ptr
back to reclaim space.
Here’s the implementation:
void kfree(void *ptr) {
if (!ptr) return;
heap_block_t *node = ((heap_block_t *)ptr) - 1;
node->is_freed = true;
if (!node->next)
kalloc_ptr -= node->size + sizeof(heap_block_t);
}
Alignment
This part was honestly confusing at first. I always knew alignment happened “under the hood,” and I assumed the compiler took care of it. But working on my kernel project made it unavoidable.
For this project, I had to care about alignment because:
The heap had to be aligned to the page size which is essential for paging and virtual memory, which I’ll add later.
Each allocation had to be aligned to 8 bytes, which is standard on 32-bit systems to avoid unaligned memory access penalties.
The alignment formula I used is from this Wikipedia article:
padding = (align - (offset & (align - 1))) & (align - 1)
= -offset & (align - 1)
aligned = (offset + (align - 1)) & ~(align - 1)
= (offset + (align - 1)) & -align
In my code, I implemented it like this:
#define ALIGNUP(offset, align) (((offset) + (align - 1)) & ~((align) - 1))
Debugging
I believe the most difficult part of writing C codes is debugging. In my kernel project, I ad to check many registers and figure out if their values make sense.
- Checking if after initializing
GDT
, segment register values are correct and based on how I implemented them. I did this:
- Checking if interrupts are enabled. For this we have to check
EFLAGS
registers:
`EFL` means `EFLAGS` which is: `0000 0000 0000 0000 0000 0010 0000 0110` here.
The 9th bit is for `IF` flag which shows if interrupt is enabled. here it is `0` so interrupts are not enabled.
- Check if
PIC
is running correctly.PIC
has three registers:
- `IRR`: Interrupt Request Register (pending interrupts)
- `ISR`: In-Service Register (interrupts currently being handled)
- `IMR` = Interrupt Mask Register (masked/disabled IRQ lines)
1. If `IRR` is non-zero but `ISR` is zero, it means interrupts are enables but the CPU isn’t acknowledging them.
2. If you see `ISR` non-zero forever, it means you forgot to send an `EOI` to `PIC` after handling the interrupt.
3. If you see `IMR` masking all, then no new interrupts will come.
For example one time I had a `General Protection Fault` exception:
and I got these values for `IRR`, `ISR`, `IMR`:
*Before pressing a key:*
first bit of `IRR` is 1; so it means `IRQ0` is fired.
*After pressing a key:*
first and second bit of `IRR` are 1; so `IRQ0` and `IRQ1` are fired
But because of an error none of them are being acknowledged by the CPU.
What I'd Like to Add Next
- Virtual memory and paging
-
printf
and basicstdio
-like functions - Multitasking between dummy tasks
- Porting my mini shell into the kernel
What I Learned
This project taught me more than university courses. Key takeaways:
- Assembly for OS dev
- Boot flow and GRUB
- GDT and IDT setup
- ISRs and IRQs
- PIC remapping
- CPU modes
- Inline assembly
- Cross compilation
- Multiboot headers
- PIT and timers
- ELF structure
- Memory alignment and heap allocators
- VGA and serial debugging
-
GDB
withQEMU
I'm Open to Feedback
If you've worked on kernels or OS dev and have feedback, suggestions, or corrections, I'd really appreciate it. Especially if you can point out where my understanding may be off or recommend what to explore next.
References
Books
- Operating Systems: Three Easy Pieces — Remzi & Andrea Arpaci-Dusseau
- Operating System Concepts — Silberschatz, Galvin
OSDev Wiki
- Multiboot
- Bare Bones
- Interrupts Tutorial
- Interrupts
- GDT Tutorial
- Interrupt Descriptor Table
- 8259 PIC
- Serial Ports
- PS/2 Keyboard
- PIT
Other Resources
- PIC - ScienceDirect
- Heap on StackOverflow
- Multiboot Spec
- Data Alignment - Wikipedia
- r/kernel on sbrk
- Bran's Kernel Dev
Top comments (2)
Can you fix my code?
Hello! What code exactly?