<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sandu Bogdan</title>
    <description>The latest articles on DEV Community by Sandu Bogdan (@bogdansandu).</description>
    <link>https://dev.to/bogdansandu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3900913%2F909a858a-f638-4b3d-b1ff-ca031e44b72d.png</url>
      <title>DEV Community: Sandu Bogdan</title>
      <link>https://dev.to/bogdansandu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bogdansandu"/>
    <language>en</language>
    <item>
      <title>Making a hello-world program in kernel space</title>
      <dc:creator>Sandu Bogdan</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:26:52 +0000</pubDate>
      <link>https://dev.to/bogdansandu/making-a-hello-world-program-in-kernel-space-48ph</link>
      <guid>https://dev.to/bogdansandu/making-a-hello-world-program-in-kernel-space-48ph</guid>
      <description>&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;p&gt;This article assumes a single-core RV64I RISC-V CPU with OpenSBI firmware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting booted by OpenSBI
&lt;/h2&gt;

&lt;p&gt;OpenSBI (the most common firmware used by RISC-V machines) typically loads itself at &lt;code&gt;0x80000000&lt;/code&gt;, anything below being reserved for MMIO&lt;sup&gt;1&lt;/sup&gt; devices. To ensure your kernel's load address doesn't overlap with OpenSBI code, you would typically load your kernel at an address such as &lt;code&gt;0x80200000&lt;/code&gt; (Bonus: as your kernel is 2MB-aligned, when you set up paging, you can use large pages&lt;sup&gt;2&lt;/sup&gt; to minimize TLB misses, which makes code faster).&lt;/p&gt;

&lt;h6&gt;
  
  
  &lt;strong&gt;1&lt;/strong&gt;: MMIO, or Memory-mapped I/O, is the protocol through which you communicate with external devices via writing to memory. While CPU architectures such as x86-64 use port I/O (dedicated &lt;code&gt;inb&lt;/code&gt;/&lt;code&gt;outb&lt;/code&gt;) instructions, RISC-V prefers MMIO for simplicity.
&lt;/h6&gt;

&lt;h6&gt;
  
  
  &lt;strong&gt;2&lt;/strong&gt;: Pages generally come in sizes of 4KB, 2MB, and 1GB. These sizes are enforced by the MMU (Memory Management Unit). Large pages are 2MB pages, often used for large memory regions in order to minimise TLB misses.
&lt;/h6&gt;

&lt;h2&gt;
  
  
  Preparing to jmp into C code
&lt;/h2&gt;

&lt;p&gt;Unfortunately, we have to set some things up ourselves via Assembly until we can run C code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the stack
&lt;/h3&gt;

&lt;p&gt;Setting up the stack is fairly simple. First, we have to define a section that we can use for the stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.section .stack
.global stack_top
.balign 16  # Use 128-bit alignment (yes this is necessary)
.space 16384  # 16KB stack
stack_top:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're going to define the &lt;code&gt;stack&lt;/code&gt; section later, in our linker file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Preparing an entry stub in Assembly
&lt;/h3&gt;

&lt;p&gt;This is what OpenSBI will jmp to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.section .text
.global _start
.extern stack_top

.extern kernel_main

.section .text.entry  # Practically tell the linker to place this function at the start of kernel memory
.balign 8  # 64-bit alignment
_start:
    # Zero out the .bss section, 32 bits at a time
    la t0, __bss_start
    la t1, __bss_end
    bss_loop:
        bge t0, t1, bss_done
        sw zero, 0(t0)
        addi t0, t0, 4
        j bss_loop
    bss_done:

    # Load the stack pointer
    la sp, stack_top

    call kernel_main  # This is what jumps to our C code
# If kernel_main returns, spin indefinitely
spin:
    j spin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Making the linker file
&lt;/h3&gt;

&lt;p&gt;This is going to look a lot like dark magic, but I am going to explain as much as possible using comments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OUTPUT_ARCH(riscv)  /* Tell ld that we want to use RISC-V */
ENTRY( _start )  /* Tell ld that our program should start at _start */

MEMORY
{
    RAM (rwxa) : ORIGIN = 0x80200000, LENGTH = 1G  /* This is where usable memory starts. rwxa makes it readable, writable, executable, and allocateable */
    STACKRAM (rw) : ORIGIN = 0xc0200000, LENGTH = 16K  /* Separate the stack from other data (practically inserting guard pages if you ever set up paging). By keeping it non-allocateable, leftover memory from .bss can't flow into STACKRAM */
}

SECTIONS
{
    . = 0x80200000;  /* Remember that magic load address? */
    .text : {
        /* This is our code */
        *(.text.entry)  /* make sure _start is first */
        *(.text .text.*)
    } &amp;gt; RAM

    .rodata : {
        /* This is our read-ony data */
        . = ALIGN(16);
        *(.rodata .rodata.*)
    } &amp;gt; RAM

    .data : {
        . = ALIGN(16); /* Indeed more alignment (RISC-V faults on misaligned access) */
        *(.data .data.*)
    } &amp;gt; RAM

    /* Uninitialised variables */
    .bss : {
        . = ALIGN(16);
        __bss_start = .;  /* Tell it where to start */
        *(.bss .bss.*)
        *(COMMON)
        . = ALIGN(4);
        __bss_end = .;  /* Tell it where to end */
    } &amp;gt; RAM

    /* Our stack section from earlier */
    .stack (NOLOAD) : {  /* (NOLOAD) makes the .stack section not be loaded from within the binary */
        *(.stack)
    } &amp;gt; STACKRAM
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making a C entrypoint
&lt;/h2&gt;

&lt;p&gt;Now that we got all the setup issues out of the way, let's make a basic hello world program in C.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making an UART driver
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// uart.h
typedef unsigned char uchar;

void uart_putc(uchar c);
void uart_puts(const uchar* s);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// uart.c&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"uart.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#define UART_BASE 0x10000000  // Note: This can change between machines
&lt;/span&gt;
&lt;span class="cp"&gt;#define UART_THR  0x00 // Write Offset
#define UART_LSR  0x05 // Is it occupied?
&lt;/span&gt;
&lt;span class="cp"&gt;#define LSR_TX_IDLE 0x20
&lt;/span&gt;
&lt;span class="cp"&gt;#define UART_REG(reg) ((volatile uint8_t *)(UART_BASE + reg))  // A helper to get UART addresses as volatile (writes immediately)
&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;uart_putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uchar&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Wait until we can write to UART again&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;UART_REG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UART_LSR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;LSR_TX_IDLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;__asm__&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fence o, o"&lt;/span&gt; &lt;span class="o"&gt;:::&lt;/span&gt;&lt;span class="s"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;UART_REG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UART_THR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;uart_puts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;uchar&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;uart_putc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  (Finally) a hello world program
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"uart.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;kernel_main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;uart_puts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello from Kernel World!"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congrats! You've now written your first hello world program in kernel space, with only ~100 lines of code.&lt;/p&gt;

&lt;p&gt;You can now run it in a QEMU VM to see if it works! Make sure to give it something like 2GB of RAM, or to update the linker script for a lower ram requirement.&lt;/p&gt;

&lt;p&gt;If you're stuck, here is how I built and ran it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;riscv64-unknown-elf-gcc main.c uart.c entry.S stack.S -T linker.ld -o kernel.elf -mcmodel=medany -ffreestanding -nostdlib -nostartfiles
qemu-system-riscv64 -machine virt   -cpu rv64   -m 2G   -nographic   -bios default   -kernel kernel.elf
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Now that you have a Hello World program up and running, here are a few challenges left for the reader:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Making a printf() implementation&lt;/li&gt;
&lt;li&gt;Adding support for paging&lt;/li&gt;
&lt;li&gt;Adding a basic fault handler&lt;/li&gt;
&lt;/ol&gt;

&lt;h6&gt;
  
  
  Note: All code above is licensed under the &lt;strong&gt;MIT License&lt;/strong&gt;, so feel free to use it!
&lt;/h6&gt;

</description>
      <category>riscv</category>
      <category>kernel</category>
      <category>osdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
