DEV Community

Ripan Deuri
Ripan Deuri

Posted on • Edited on

Memory Mapped IO (MMIO)

Introduction

Memory-Mapped I/O (MMIO) is an address-mapping technique where device registers are assigned fixed ranges in the system’s physical address map. When the CPU issues a load or store to these regions, the transaction is routed to a hardware block instead of DRAM. There are no special instructions involved; ordinary load/store operations form the entire I/O protocol.

VA to PA Conversion

The CPU always issues virtual addresses (VAs). Before transactions reach the interconnect, the Memory Management Unit (MMU) translates VAs into physical addresses (PAs). The MMU uses a Translation Lookaside Buffer (TLB) for cached translations and performs a page-table walk on TLB misses.

VA to PA mapping

MMIO Regions

MMIO regions occupy fixed physical address ranges defined by the SoC. Linux receives the physical MMIO layout from the Device Tree (DT), reserves the corresponding regions, and builds virtual mappings with device-type memory attributes.

+-----------------------------------------+
|   SoC Physical Address Map              |
+-----------------------------------------+
| 0x0000_0000 - 0x0FFF_FFFF : DDR         |
| 0x1000_0000 - 0x1000_0FFF : UART MMIO   |
| 0x1234_0000 - 0x1234_0FFF : Device MMIO |
| ...                                     |
+-----------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Load and Store Operations

A CPU MMIO access follows the same initial path as a normal load or store: instruction issue → VA → MMU translation → PA. After translation, the PA is emitted onto the SoC interconnect. The interconnect decodes the PA and routes the access to the MMIO peripheral instead of DDR.

MMIO read

Barriers and Ordering

Modern CPUs reorder memory operations aggressively. Loads may complete early, stores may be buffered, and multiple memory streams may execute out of order. This behaviour is essential for performance, but it breaks the assumptions needed when software interacts with hardware through MMIO registers.

Programming a Device Without a Write Barrier

Example: A device expects the CFG (Normal Memory) to be written before the DOORBELL (MMIO register).

LDR: Loads a word (32-bit).
STR: Stores a word (32-bit).

    ; R0 = CFG value
    LDR     R0, =0xAAAA5555

    ; R1 = CFG in Normal memory (cacheable bufferable RAM)
    LDR     R1, =0x80000000

    STR     R0, [R1]          ; store may stay in store buffer / cache

    ; --- no barrier here ---

    MOV     R2, #1

    ; R3 = DOORBELL (Device memory, nGnRnE)
    LDR     R3, =0x12340004

    STR     R2, [R3]          ; reaches device immediately
Enter fullscreen mode Exit fullscreen mode

Without a write barrier, ARM may reorder these two STRs.

Program order   :   CFG → DOORBELL
Device sees     :   DOORBELL → CFG
Enter fullscreen mode Exit fullscreen mode

Insert a store barrier between the two writes.

DMB ST : Data Memory Barrier (Store)

    ; R0 = CFG value
    LDR     R0, =0xAAAA5555

    ; R1 = CFG in Normal memory (cacheable bufferable RAM)
    LDR     R1, =0x80000000

    STR     R0, [R1]          ; store may stay in store buffer / cache

    DMB     ST                ; ensure CFG is visible to the system

    MOV     R2, #1

    ; R3 = DOORBELL (Device memory, nGnRnE)
    LDR     R3, =0x12340004

    STR     R2, [R3]          ; reaches device immediately

Enter fullscreen mode Exit fullscreen mode

Two CPU Cores, Shared DDR

Assume two cores share a structure in DDR:

struct shared {
    int data;
    int flag;
} S;
Enter fullscreen mode Exit fullscreen mode

Core0: write data, set flag

S.data = 123;   // store #1
S.flag = 1;     // store #2
Enter fullscreen mode Exit fullscreen mode

Core1: read flag, then read data

if (S.flag == 1)        // load #1
    val = S.data;       // load #2
Enter fullscreen mode Exit fullscreen mode

Behaviour without barrier:

  • Core0’s data write may sit in its store buffer
  • Core0’s flag write may drain early and be visible first
  • Core1 may observe flag == 1 before the new data is visible

With barriers:

spin_lock() → acquire barrier
spin_unlock() → release barrier

Core0:

spin_lock();
S.data = 123;     // may still be in Core0's store buffer
S.flag = 1;       // may also be buffered
spin_unlock();    // release barrier flushes BOTH writes
Enter fullscreen mode Exit fullscreen mode

Release barrier drains store buffer before releasing the lock.

Core1:

spin_lock();      // acquire barrier
tmp_flag = S.flag; // cannot be reordered before lock
tmp_data = S.data; // cannot be reordered before reading flag
spin_unlock();
Enter fullscreen mode Exit fullscreen mode

Acquire barrier forbids both loads from slipping above the lock.

Synchronising Device for DMA

For example:

  • CPU writes a buffer
  • CPU must issue a barrier
  • CPU writes a “start DMA” register

The barrier ensures data fields reach DDR before the device reads them.

Top comments (0)