DEV Community

Andelf
Andelf

Posted on

Bare-Metal Embedded Programming on K230 Using Rust

This article is translated to English with the help of CharGPT. The original Chinese version is at 基于 Rust 的 K230 裸机嵌入式编程

Difficulty: Intermediate. Readers should have foundational knowledge of embedded systems and Rust embedded development.

This article documents the process of conducting bare-metal development using Rust on the K230 chip. From analyzing the startup process, parsing the firmware format, writing bare-metal Rust programs, perfecting initialization code, to actual peripheral control and function implementation, and exploring optimization schemes in subsequent development—all have been thoroughly investigated.

Relevant code repository for this article: k230-bare-metal. It is recommended to refer to earlier commits such as e15968040 for better understanding.

Project Background

Previously, I received a review opportunity for the Luchan Pi K230-CanMV development board from LCSC(The same group as JLC). In addition, I also own a CanMV-K230 development board.

The K230 chip is an AIoT SoC launched by Canaan Technology, adopting a heterogeneous unit acceleration computing architecture. It integrates 2 RISC-V computing cores and an AI subsystem KPU (Knowledge Process Unit). In terms of the timeline, it should be one of the earliest chips on the market to support the RVV 1.0 vector extension. Main features include:

  • Dual-core RISC-V processor:
    • Core 0: 64-bit RISC-V (RV64GCB), 800MHz
    • Core 1: 64-bit RISC-V, 1.6GHz, supports RVV 1.0 vector extension
  • Dedicated acceleration units:
    • KPU: AI inference accelerator, supports INT8/INT16
    • DPU: 3D structured light depth calculation unit
    • VPU: Video codec, supports 4K resolution
  • Rich peripheral interfaces:
    • Communication interfaces: UART×5, I2C×5, SPI×3
    • Storage interfaces: USB 2.0×2, SD/eMMC
    • Others: GPIO×72, PWM×6, WDT, RTC, Timer

Under normal usage, the development board runs the CanMV firmware, which is compatible with OpenMV, providing a very convenient development environment for developers.

The firmware is based on RT-Thread Smart (RT-Smart), which is a version of RT-Thread that supports user-space applications, suitable for SoCs with MMU, such as the K230. CanMV is implemented as an application (a fork of MicroPython) on RT-Thread.

Additionally, early versions of the CanMV firmware used Linux + RT-Thread + MicroPython. The official sources also provide a pure Linux version of the firmware.

This project aims to explore:

  1. The differences in startup methods and usage modes between MPUs and MCUs
  2. How to use Rust for bare-metal development on MPU chips
  3. The underlying startup mechanism and hardware features of the K230

For MPUs and most MCUs, there is an on-chip Boot ROM used to start the system. Typically, the Boot ROM initializes some hardware (e.g., SPI Flash, TF Card), loads the firmware into memory, and then executes the first instruction of the system firmware (such as U-Boot). Subsequently, the system firmware provided by the user further initializes more hardware and loads the actual operating system.

Bare-metal development refers to running programs directly on the hardware without using an operating system, similar to how an MCU runs directly after the system's Boot ROM.

Boot Code Analysis

First, we need to read the code from the official CanMV repository to determine if there are any non-open-source parts, especially core components like U-Boot and RT-Thread/Linux drivers.

For U-Boot, we also need to confirm whether the Secondary Program Loader (SPL) is open-source. SPL is often used to initialize peripherals like DDR and to load U-Boot. Many manufacturers do not open-source it and only provide binary files.

Note: SPL literally means Secondary Program Loader. Boot ROM is generally considered the first-stage loader.

The good news is that the relevant code is all in the CanMV repository and open-source. However, the code structure is relatively complex, requiring some time to read and analyze the specific startup processes and logic.

With the advent of ChatGPT, we can complete code analysis more quickly. I once joked that if ChatGPT had appeared a few years earlier, many toolchains would not need to exist.

Here, we only consider the TF card startup scenario, where the system firmware is on the TF card, and the on-chip Boot ROM loads the firmware into memory. That is, our program needs to perform the same tasks as U-Boot, including the functions of SPL.

Note: TF card, SD card, and eMMC are essentially the same at the protocol level. This article does not make strict distinctions among them.

From Power-On Reset to Loading and Executing User Firmware

First, the Boot ROM loads the firmware into memory. This part of the logic is directly solidified in the chip's Boot ROM and is uncontrollable because the Boot ROM's code and logic are integrated inside the chip and cannot be modified or interfered with by the user. The Boot ROM determines the boot method by reading the status of the BOOT0 and BOOT1 pins. The voltage levels of these two pins decide from which medium the chip loads the boot program during startup.

According to the chip manual, the Boot ROM's memory-mapped location is 0x9120_0000 ~ 0x9121_0000, using the first half of the SRAM 0x8020_0000 ~ 0x8030_0000. This information can be confirmed by reading characteristics like sp/ra through bare-metal programs. For example, the Boot ROM sets the stack pointer sp to the highest address of available memory. The Boot ROM typically uses the call instruction to transfer control to the user firmware, and ra will be set to the pc of the current jump function.

The Boot ROM loads the firmware (usually U-Boot) from the TF card according to a predetermined fixed format. Specifically, the Boot ROM accesses the TF card, reads the firmware area, decodes it, and copies it to the specified memory location 0x8030_0000.

After the firmware is loaded, the Boot ROM transfers control to the firmware just loaded into memory, i.e., it jumps to execute U-Boot. This marks the transition of the startup process from the Boot ROM stage to the firmware (U-Boot) stage. U-Boot, as a more powerful bootloader, can further initialize system hardware, load the operating system kernel such as RT-Thread or Linux Kernel, and execute other user-defined startup tasks.

U-Boot has a two-stage startup process: SPL and U-Boot. SPL is used to initialize peripherals like DDR and to load U-Boot. We will not consider the logic after U-Boot (e.g., OpenSBI, RT-Thread Smart) in this article. From the firmware format, this part exists in the form of firmware partitions, sequentially loaded by U-Boot SPL to load U-Boot, and then U-Boot loads RT-Thread/Linux Kernel.

The K230 is equipped with two CPUs, referred to as CPU0 (small core) and CPU1 (big core). The two cores operate at different frequencies, and CPU1 supports the RVV 1.0 vector extension, constituting a heterogeneous multi-core architecture.

During the startup process, when the chip's reset signal is released, the Boot ROM starts execution on the small core. This means that CPU0 is the first activated core, responsible for executing the initial boot program and performing basic system initialization. Meanwhile, the de-reset process of the big core is controlled by the small core. In other words, while the small core completes its own initialization, it also needs to send instructions to release the reset state of the big core, allowing it to start running from a specific location. This architectural design ensures that the small core not only shoulders the responsibility of booting the system but also controls the startup process of the big core, laying the foundation for the entire SoC to begin functioning.

Firmware Format

To ensure our firmware is recognized by the Boot ROM, it needs to conform to a specific firmware format. Different SoC manufacturers have different solutions; some use fixed filenames on FAT32, some use fixed formats at specific offsets, and some use configuration files. The K230 uses a fixed offset firmware format.

The Boot ROM of the K230 identifies data characteristics at a fixed offset on the TF card, and firmware that meets the format will be loaded into memory. The Boot ROM has initialized UART0 and will output simple error messages, such as "boot failed with exit code 19" indicating that the TF card was not found, or "boot failed with exit code 13" indicating a firmware format error.

After analyzing the relevant compilation process, we deduced the firmware format of the K230 as follows:

00000000  +-------------+-------------+-------------+-------------+
          | ........... | ........... | ........... | ........... |  <- Partition table / any other data
          | ........... | ........... | ........... | ........... |
          +-------------+-------------+-------------+-------------+
00100000  | 4B 32 33 30 | 8C FC 02 00 | 00 00 00 00 | BF 8D 0F 38 |   <- Firmware header: "K230...........8"
          | MAGIC: K230 | Length      | Encryption  | SHA256 hash |   <- Encryption 0: none, 1: SM4, 2: AES+RSA
          +-------------+-------------+-------------+-------------+
00100010  | 03 F3 87 07 | FA 1B D8 1D | 4F A0 CD A0 | 7B 54 35 BD |   <- SHA256 hash continuation
          +-------------+-------------+-------------+-------------+
00100020  | 35 82 85 89 | 66 4D AC 27 | CA F8 56 49 | 00 00 00 00 |   <- SHA256 hash continuation + Padding
          +-------------+-------------+-------------+-------------+
00100030  | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 |   <- Padding zeros
          +-------------+-------------+-------------+-------------+
          | ........... | ........... | ........... | ........... |   <- Padding zeros
          +-------------+-------------+-------------+-------------+
00100210  | 00 00 00 00 | 73 25 40 F1 | 2A 82 AE 84 | 93 01 00 00 |   <- Firmware data, length zero position
          | Version     | OpCodes     | Data        | Padding     |   <- Version: 0
          | ........... | ........... | ........... | ........... |   <- Firmware data, raw opcodes
          +-------------+-------------+-------------+-------------+
Enter fullscreen mode Exit fullscreen mode

Relevant C structure definitions are located in CanMV at src/uboot/uboot/board/kendryte/common/board_common.h.

Here, we simplify the processing by not encrypting the firmware and using version number 0. We write a Python script to create the .img firmware file for the TF card image:

#!/usr/bin/env python3
# genimage.py

import hashlib

MAGIC = b"K230"

def sha256(message):
    digest = hashlib.sha256(message).digest()
    return digest

VERSION = b"\x00\x00\x00\x00"

with open("./firmware.bin", "rb") as f:
    data = f.read()

input_data = VERSION + data

data_len = len(input_data)
raw_data_len = data_len.to_bytes(4, byteorder="little")

encryption_type = 0
encryption_type = encryption_type.to_bytes(4, byteorder="little")

hash_data = sha256(input_data)

firmware = MAGIC + raw_data_len + encryption_type + hash_data

firmware += bytes(516 - 32)  # padding
firmware += input_data

img = bytes(0x100000) + firmware  # image offset 0x100000

# Ensure the image size is a multiple of 512 bytes
if len(img) % 512 != 0:
    img += bytes(512 - len(img) % 512)

with open("./firmware.img", "wb") as f:
    f.write(img)

print("len", len(img))
Enter fullscreen mode Exit fullscreen mode

Where firmware.bin is generated via objcopy -O binary:

cargo objcopy --release -- -O binary firmware.bin && python3 genimage.py
Enter fullscreen mode Exit fullscreen mode

Note that disk images are generally aligned to 512 bytes, so we need to pad to align to 512 bytes.

Flashing the firmware can be done using any programming tool, including the dd command.

Start Writing Some Bare-Metal Code

With the firmware loading settled, the SoC control flow can be handed over to our program. Here, we use the Rust language to write a bare-metal program.

Essential elements for Rust bare-metal embedded development include:

  • Toolchain target: Install using rustup: rustup target add riscv64gc-unknown-none-elf
  • Linker script link.x: Used to define memory layout (can also directly define firmware layout)
  • Startup code: Used to initialize the stack, jump to Rust code, similar to start.S in C embedded development

From the relevant code reading, we know that the code in the TF card is loaded to 0x8030_0000 ~ 0x8040_0000. To avoid additional uncertainties, we can directly use the linker script from U-Boot to ensure the symbols defined in Rust code are properly loaded.

MEMORY { .spl_mem : ORIGIN = 0x80300000, LENGTH = 0x80000 }
MEMORY { .bss_mem : ORIGIN = 0x80380000, LENGTH = 0x20000 }

OUTPUT_ARCH("riscv")

ENTRY(_start)
PROVIDE(__stack_start__ = ORIGIN(.bss_mem) + LENGTH(.bss_mem));

/* Omitted specific section definitions */
Enter fullscreen mode Exit fullscreen mode

Due to the lack of first-hand chip development materials, we do not know exactly what the initialized state is after the Boot ROM; at this time, we can only rely on speculation and experimentation.

Verifying Bare-Metal Execution - UART

For bare-metal programming, we need to initialize the device's initial state, including the stack pointer sp, system execution mode, interrupt table, enabling interrupts, etc. These tasks are usually completed by start.S or crt0.c. Minimal initialization code often only needs to set the stack pointer sp to ensure that functions can be called. If sp is invalid, using the stack (e.g., function calls) will lead to memory access violations or illegal instruction exceptions, i.e., "running wild."

Without a JTAG debugging environment (the chip supports it, but I didn't use CK-LINK), how do we determine whether our code is being executed and whether it is executing correctly? Here, we can use UART0 to output debugging information. Since the Boot ROM has already initialized UART0, we can use it directly.

From the Device Tree .dtsi files in the U-Boot source code, we can see that the K230 uses a lot of DesignWare IP peripherals, such as UART0, SPI, I2C, etc. The specific register manuals for these peripherals can be obtained online. The UART peripheral is compatible with the 16550, which is the serial port chip we're familiar with on PCs. The register address for UART0 is 0x9140_0000.

We can use global_asm! to output characters to verify whether the firmware code is being executed. For example:

#![no_std]
#![no_main]

global_asm!(r#"
.section .text.start
.global _start
     la sp, __stack_start__
     call _start_rust
"#);

#[no_mangle]
pub extern "C" fn _start_rust() {
    loop {
        // UART0.THR = 'A'
        core::ptr::write_volatile(0x9140_0000 as *mut u32, 0x41);

        for _ in 0..100000000 {
            unsafe { asm!("nop") }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

After compiling and flashing the above code, you should see a series of A characters in the serial terminal. This indicates that our code has been successfully executed.

Accessing Peripheral Registers - PAC

In Rust embedded development, accessing peripheral registers is often done through PAC (Peripheral Access Crate), such as the stm32xxxx-pac crate. However, since the K230 is a relatively new chip, there is no relevant PAC crate available. The official sources are also unlikely to provide an SVD file for reference. Therefore, I chose to use the chiptool method and employed the yaml2pac tool to generate the PAC crate by manually maintaining YAML definitions of the peripheral registers. Regarding PAC access, please refer to my article Peripheral Register Access in Rust Embedded Development: From svd2rust to chiptool and metapac - Using hpm-data as an Example.

The relevant YAML files can be conveniently created with the assistance of LLM (Large Language Models) by extracting OCR from PDF manuals.

Using the yaml2pac tool, we can easily generate our own PAC library:

yaml2pac -i registers/uart_dw.yaml -o pac/src/uart_dw.rs
Enter fullscreen mode Exit fullscreen mode

Then, add specific peripheral address definitions in lib.rs:

#[path = "uart_dw.rs"]
pub mod uart;

pub const UART0: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_0000 as *mut ()) };
pub const UART1: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_1000 as *mut ()) };
pub const UART2: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_2000 as *mut ()) };
pub const UART3: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_3000 as *mut ()) };
Enter fullscreen mode Exit fullscreen mode

With simple encapsulation, we can conveniently access peripherals. Creating and maintaining a PAC in the absence of documentation is relatively difficult, but once completed, it can greatly improve development efficiency.

Facilitating Debugging - println! Macro

With the peripheral register definitions, we can now write a complete UART HAL driver or achieve a println! macro through simple register access.

#[derive(Debug)]
pub struct Console;

impl core::fmt::Write for Console {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        use pac::UART0;

        for c in s.as_bytes() {
            unsafe {
                while !UART0.lsr().read().thre() {
                    asm!("nop");
                }

                UART0.thr().write(|w| w.set_thr(*c));
            }
        }

        Ok(())
    }
}

#[macro_export]
macro_rules! println {
    ($($arg:tt)*) => {
        {
            use core::fmt::Write;
            writeln!(&mut $crate::Console, $($arg)*).unwrap();
        }
    };
    () => {
        {
            use core::fmt::Write;
            writeln!(&mut $crate::Console, "").unwrap();
        }
    };
}
Enter fullscreen mode Exit fullscreen mode

With the println! macro, we can conveniently output debugging information, significantly improving development efficiency.

Complete Initialization Code

So far, we've only initialized the stack; other essential elements such as the system interrupts and even the .bss section have not been initialized. In a complete embedded program, these are necessary.

Unlike MCU programming, the code execution of an MPU is loaded into a certain area of memory by the Boot ROM, so the .data section copy commonly seen in the start.S of an MCU is not needed. Clearing the .bss section depends on the situation; since it's relatively simple, we will skip the memory initialization part in this section.

Interrupt Handler

For RISC-V, the interrupt handler is a special function. Rust provides the "riscv-interrupt-m" ABI specifically for the special logic of interrupt handlers. Specifically, it adds stack frame preservation and restoration for interrupt handlers and uses the mret instruction instead of ret to return.

#[link_section = ".trap"]
#[no_mangle]
unsafe extern "riscv-interrupt-m" fn _start_trap_rust() {
    println!("trap!");

    let mcause = riscv::register::mcause::read();
    println!("mstatus: {:016x}", riscv::register::mstatus::read().bits());
    println!("mcause:  {:016x}", riscv::register::mcause::read().bits());
    println!("mtval:   {:016x}", riscv::register::mtval::read());
    println!("mepc:    {:016x}", riscv::register::mepc::read());

    loop {}
}
Enter fullscreen mode Exit fullscreen mode

Here, we print some important interrupt information to help determine whether the interrupt function is being called correctly.

Using #[no_mangle] is to expose the symbol so that we can set the interrupt handler entry address in assembly code.

Using #[link_section = ".trap"] is to place this function in the .trap section for handling in the linker script, especially memory alignment (ALIGN(8)). This is a common error when writing bare-metal code because the address of the mtvec register must be aligned (the lower 2 bits are occupied by the vector mode bits); otherwise, it will cause an exception.

For now, we don't need to handle interrupts; we just need to observe if interrupts are being triggered and whether the interrupt handler is being executed. So we use loop {}.

Interrupt Initialization

For RISC-V, initializing interrupts generally involves the following steps:

  • Set mtvec: Interrupt handler entry address
  • Set the MIE bit in mstatus: Enable interrupts
  • Set the MEIE bit in mie: Enable external interrupts, timer interrupts, etc.

The K230 uses a Xuantie C908 core, supporting CLINT and PLIC interrupt controllers. Relevant information can be obtained from the C908 manual.

global_asm!("
    .option push
    .option norelax
    la gp, __global_pointer$
    .option pop

    la t1, __stack_start__
    addi sp, t1, -16

    // Initialize interrupts
    la t0, _start_trap_rust
    csrw mtvec, t0

    call _early_init

    // Continue to call _start_rust
    call _start_rust
");

#[no_mangle]
unsafe extern "C" fn _early_init() {
    use riscv::register::*;

    mstatus::set_mie(); // Enable global interrupts
    mstatus::set_sie(); // Enable supervisor interrupts
    mie::set_mext();    // Enable external interrupts
    mie::set_msoft();   // Enable software interrupts
    mie::set_mtimer();  // Enable timer interrupts
}
Enter fullscreen mode Exit fullscreen mode

The MIE bit in the mstatus register is used to control interrupt enabling, and the MEXT bit in the mie register is used to control external interrupt enabling, i.e., PLIC, for handling peripheral interrupts.

Here, we also initialize gp, which is a global pointer register used for accessing global variables in Rust (defined at a special location in the linker script). Of course, when using small and concentrated memory regions, you may not see instructions using the gp register.

Other CSR Initialization

Depending on the platform, other hardware may need initialization, such as disabling PMP, initializing the FPU, enabling mcycle and mtime counters, etc.

Initializing the FPU is necessary; otherwise, any floating-point instruction will cause an exception. Rust's "riscv-interrupt-m" implementation isn't intelligent enough to determine FPU usage, so when the target includes +f/+d, the ABI will default to using FPU push/pop instructions.

// Omitted platform-specific register initialization
// Including disabling PMP
asm!(
    "
    li    t0, 0x00001800
    csrw  mstatus, t0
    "
);

mcounteren::set_cy(); // Enable cycle counter
mcounteren::set_tm(); // Enable time counter

// FPU initialization
mstatus::set_fs(mstatus::FS::Initial);
asm!("csrwi fcsr, 0");
Enter fullscreen mode Exit fullscreen mode

In addition to interrupt enabling, mstatus is also responsible for the current CPU operating mode, such as M/S/U mode.

With the system's mcycle CSR, we can conveniently use the Delay trait in the embedded-hal ecosystem to achieve more precise delays, moving away from using nop.

const CPU0_CORE_CLK: u32 = 800_000_000;

let mut delay = riscv::delay::McycleDelay::new(CPU0_CORE_CLK);
delay.delay_ms(1000);
Enter fullscreen mode Exit fullscreen mode

Verifying Interrupt Handling

We can verify whether the interrupt handler is being executed by directly triggering a software interrupt. The CLINT interrupt controller of the K230 can trigger a software interrupt through the msip register.

pac::CLINT.msip(0).write(|w| w.set_msip(true)); // Trigger software interrupt
Enter fullscreen mode Exit fullscreen mode

Modify the interrupt handler _start_trap_rust to add a return:

if mcause.is_interrupt() && mcause.code() == riscv::interrupt::Interrupt::MachineSoft as _ {
    println!("Machine Software Interrupt");
    pac::CLINT.msip(0).write(|w| w.set_msip(false)); // Clear software interrupt
    return;
}
Enter fullscreen mode Exit fullscreen mode

Using the mtime and mtimecmp CSRs can also verify timer interrupts. However, I found a pitfall: reading the mtime of K230's CLINT via a 64-bit load instruction yields random content without any exception. This means that the 64-bit mtime must be read twice in 32-bit segments and then combined into 64 bits. Only the rdtime instruction can read the 64-bit mtime at once.

DDR Initialization

DDR initialization (SDRAM initialization) is a relatively complex process, generally requiring clock initialization, reset controller configuration, PHY training, chip initialization, timing configuration, self-check, etc. These contents are often provided directly by the manufacturer, and the register writing flow in the corresponding DDR initialization code is also like cryptic scripts.

Therefore, the DDR initialization code is directly translated from C using LLM (Large Language Models) without additional explanations. The DDR initialization code varies among different DDR chips.

After DDR initialization, we can use the DDR memory region. A pitfall here is that the starting address of DDR memory is 0x0000_0000. However, Rust has many restrictions on accessing the zero address, and most functions will directly panic. Programs should avoid using the 0x0000_0000 address.

Start Real Bare-Metal Programming

With the above initialization foundation, we can finally start actual bare-metal programming—for example, initializing other peripherals, reading and writing peripheral registers, and even implementing some simple functions.

Here, we demonstrate with two peripherals as examples. The relevant peripheral register definitions are already written in the k230-bare-metal repository.

Blinking an LED Using GPIO

For both MCUs and MPUs, the steps for blinking an LED using GPIO are similar:

  • Enable (or reset) the GPIO peripheral clock and power
  • Set the pin function multiplexing and pin mode
  • Perform GPIO write operations

In the K230, the peripheral clock and power signals are enabled by default (this can be confirmed by checking the relevant registers). Therefore, we only need to set the multiplexing function through IOMUX and set the pin mode through the GPIO peripheral.

The functionality can be referenced from official documentation, and the pin multiplexing documentation is located in K230_PINOUT_V*.xlsx.

The IOMUX peripheral is a PAD-like structure where each pin is set through a 32-bit register to configure multiplexing functions, pull-up/pull-down settings, input/output enable, etc. I obtained these definitions through .dtsi files and C header files, also using LLM to translate them into YAML definitions. Calling IOMUX.pad(n).set_sel(0) sets the pin's mode to the corresponding GPIO.

The GPIO peripheral comes from DW_apb_gpio. For those familiar with Verilog or other HDL languages, this is a configurable GPIO IP core with up to 4 ports. There are several configuration registers that can obtain the initial parameters of the peripheral:

GPIO0 config_reg1: num_ports=1
GPIO0 config_reg2: len(PA)=32
GPIO1 config_reg1: num_ports=2
GPIO1 config_reg2: len(PA)=32 len(PB)=8
Enter fullscreen mode Exit fullscreen mode

A total of 32 + 32 + 8 = 72 pins are divided into two GPIO controllers, where the GPIO1 controller has two ports. This can perfectly fit the cluster/array definition method in chiptool.

fn blinky() {
    // RGB LED of Luchan Pi K230
    // - R: GPIO62
    // - G: GPIO20
    // - B: GPIO63
    use pac::{GPIO0, GPIO1, IOMUX};

    IOMUX.pad(20).modify(|w| w.set_sel(0)); // function = GPIOx
    IOMUX.pad(62).modify(|w| w.set_sel(0));
    IOMUX.pad(63).modify(|w| w.set_sel(0));

    GPIO0.swport(0).ddr().modify(|w| *w |= 1 << 20); // output mode
    GPIO1.swport(0).ddr().modify(|w| *w |= 1 << 30);
    GPIO1.swport(0).ddr().modify(|w| *w |= 1 << 31);

    loop {
        GPIO0.swport(0).dr().modify(|w| *w ^= 1 << 20); // toggle data
        // GPIO1.swport(0).dr().modify(|w| *w ^= 1 << 30);
        GPIO1.swport(0).dr().modify(|w| *w ^= 1 << 31);

        riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(1000);
    }
}
Enter fullscreen mode Exit fullscreen mode

PWM Buzzer

The K230 has 6 PWM outputs, divided into two PWM controllers. Each controller internally has 3 PWM output channels. An additional channel 0 is responsible for configuring the reload.

The buzzer on the Luchan Pi K230 development board is controlled via PWM1 GPIO43. The input clock of the PWM peripheral is 100MHz, and the division factor is set via PWMCFG.SCALE as (2^n).

To make the buzzer reach a frequency audible to the human ear, the PWM frequency is generally set around 1KHz. The PWM frequency and duty cycle are set through PWMCFG.SCALE and PWMx.CMP. The relevant code is as follows; refer to the comments for register value calculations.

fn buzzer() {
    // GPIO43 - PWM1
    use pac::{IOMUX, PWM0};

    // PCLK, PWM uses the APB clock to program registers and generate waveforms. The default frequency is 100MHz.
    IOMUX.pad(43).modify(|w| {
        w.set_sel(2); // PWM function = 2
        w.set_oe(true);
        w.set_ds(7);
    });

    // Calculations:
    // scale = 2
    // period = 0x5000
    // freq = 100,000,000 / (1 << 2) / 0x5000 = 1,220.7 Hz
    // duty = period / 2 = 0x2800
    PWM0.pwmcfg().modify(|w| {
        w.set_zerocomp(true);
        w.set_scale(2);
    });

    PWM0.pwmcmp(0).write(|w| w.0 = 0x5000); // PWMCMP0: RELOAD
    let duty = 0x2800;

    PWM0.pwmcmp(2).modify(|w| w.0 = duty); // PWMCMP2: PWM1

    // Enable PWM
    PWM0.pwmcfg().modify(|w| w.set_enalways(true));
    riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(100);

    // Disable PWM
    PWM0.pwmcfg().modify(|w| w.set_enalways(false));
    riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(100);
}
Enter fullscreen mode Exit fullscreen mode

Some Extended Thoughts

Why Bare-Metal?

Bare-metal programming is the foundation of embedded development and is also the lowest level of development. Through bare-metal programming, we can better understand the working principles of hardware and the underlying aspects of operating systems.

Using all the libraries and SDKs out there is not as good as writing one ourselves; once you understand one, you understand many.

Shell?

In a bare-metal environment, since there is no operating system, no standard input/output, and no file system, a full-fledged Shell is impossible. However, we can implement simple command-line interaction via the serial port. All we need are two serial port functions: putchar and getchar, and a simple parser.

noline is a small no-std line-editing crate that can be used to implement simple command-line interactions. Moreover, it's based on the embedded-hal ecosystem, making it easy to port. It supports line history and common shortcuts. Of course, writing a readline from scratch is also a good exercise.

By implementing several shell commands, we can achieve simple interactions such as reading and writing peripheral registers, reading and writing memory, printing system information, etc.

The relevant implementation can be found in the k230-bare-metal repository. The final effect is as follows:

K230> help
Available commands:
  help - print this help
  echo <text> - print <text>
  reboot - reboot the system
  mem_read <address> <length> - read memory
  mem_write <address> <u32> - write memory
  tsensor - read temperature sensor
  cpuid - print CPUID
  serialboot - enter serial boot mode
  jump <address> - jump to address
  jumpbig <address> - jump to big core and run
Enter fullscreen mode Exit fullscreen mode

Download?

The K230, in essence, is more like an SBC (Single Board Computer). Flashing firmware often involves using a TF card, which is extremely inconvenient in bare-metal development. Continuous plugging and unplugging of the TF card can cause poor contact or even damage.

Referring to how LiteX provides a very convenient kernel/firmware loading method for the FPGA soft core environment—downloading firmware via serial port to a specific memory location (DDR), or even downloading firmware via network—I attempted to port the litex_term's UART download logic. It comes with a serial port download protocol and serial command line. After detecting a special string, it automatically switches to download mode, downloads the firmware to a specified memory location via the serial port, and jumps to execute it.

The final effect is:

> litex_term /dev/tty.usbmodem56C40035621 --kernel-adr 0x01000000 --kernel ../firmware.img
......
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[LITEX-TERM] Received firmware download request from the device.
[LITEX-TERM] Uploading ../firmware.img to 0x01000000 (17400 bytes)...
[LITEX-TERM] Upload calibration... failed, switching to --safe mode.
[LITEX-TERM] Upload complete (8.7KB/s).
[LITEX-TERM] Booting the device.
[LITEX-TERM] Done.
Jumping to 0x01000000...
Enter fullscreen mode Exit fullscreen mode

It's very convenient; I might introduce it separately in the future. Note that when writing firmware to the memory area, you need to handle the states of the I-Cache and D-Cache. When writing this article, I chose to completely disable the I-Cache and D-Cache.

Jumping to the Big Core

As mentioned earlier, the startup of CPU1 (big core) is controlled by CPU0 (small core). The specific startup logic is straightforward: set the reset vector and reset CPU1:

unsafe {
    ptr::write_volatile(0x91102104 as *mut u32, jump_addr as u32);
    ptr::write_volatile(0x9110100c as *mut u32, 0x10001000);
    ptr::write_volatile(0x9110100c as *mut u32, 0x10001);
    ptr::write_volatile(0x9110100c as *mut u32, 0x10000);
}
Enter fullscreen mode Exit fullscreen mode

To facilitate development and testing, I also made jumping to the big core a Shell command. By inputting jumpbig 0x01000000 via UART0, you can make the big core execute code in the memory region. Attempting to dump the big core's register information, we can see the startup information:

Rust 2nd stage on CPU1
mstatus: 0000000a00001900
mie: 0000000000000000
mip: 0000000000000000
misa: 8000000000b4112f
  RV64ABCDFIMSUVX
mvendorid: 5b7
marchid: 8000000009140d00
mhartid: 0
cpuid: 09140b0d 10050000 260c0001
Enter fullscreen mode Exit fullscreen mode

Here, the V in RV64ABCDFIMSUVX indicates support for the RVV vector instruction set. The K230 is a heterogeneous dual-core; the small core does not support RVV. This proves that our code has successfully jumped to the big core.

An interesting point is that mhartid is 0, indicating that the K230 does not comply with the RISC-V specification of assigning different IDs to different harts. This needs attention in actual development. You can only distinguish different harts through miscellaneous CSRs—this is a small pitfall of the K230.

Next, we can perform more complex operations on the big core, such as applying RVV vector instructions.

Conclusion

Through this experiment of bare-metal embedded development using Rust on the K230 chip, we deeply explored the differences in startup methods and usage modes between MPUs and MCUs, mastering the key steps of using Rust for bare-metal development on MPU chips, including startup process, firmware format parsing, interrupts, and peripheral initialization.

In practice, we successfully achieved UART debug output, GPIO LED blinking, PWM buzzer control, and other functions, deepening our understanding of the K230's underlying startup mechanism and hardware features. These achievements lay a solid foundation for future, more complex embedded development on the K230 and other RISC-V chips.

Looking ahead, we can further improve peripheral drivers, explore multi-core collaboration, apply RVV vector instructions, and leverage the Rust ecosystem to build efficient and secure embedded systems, contributing more to the RISC-V open-source community.

Tips

  • The Boot ROM provides exception error messages for illegal execution; you can use this behavior to inversely verify whether the code is being executed—for example, insert illegal instructions to check the pc of the error location.
  • It's best to avoid using the full target features in bare-metal code to prevent the compiler from generating instruction features that have not been enabled, such as the V extension.
  • In Rust bare-metal development, since there is no operating system, you cannot use the standard library or the panic! macro; therefore, you need to implement a panic handler yourself.
  • The states of D-Cache and I-Cache need to be managed; generally, disable them before jumping to new code to avoid cache inconsistencies.
  • The println! macro can conveniently output debug information, but note that printing is blocking and may affect time-sensitive operations.
  • Learn to use LLMs to assist your exploration process—for example, export YAML definitions from PDF manuals via OCR, translate DDR initialization code, and get explanations for specific registers.
  • The Boot ROM initializes some peripherals like UART0, but the specific states still need to be verified again, such as FIFO mode, baud rate, etc.
  • For possible hardware implementation bugs or peculiarities, you can try using equivalent alternative methods.

Top comments (0)