ali ehab algmass

Posted on Jan 12

Multi-dimensional Arrays & Row-major Order: A Deep Dive

#webdev #programming #computerscience #learning

Let me explain multi-dimensional arrays and row-major ordering from the ground up, covering memory layout, addressing, and low-level implementation details.

1. Understanding Multi-dimensional Arrays

Conceptual vs Physical Reality

Conceptually, when you declare:

int matrix[3][4];

You think of a 2D grid:

[0][0]  [0][1]  [0][2]  [0][3]
[1][0]  [1][1]  [1][2]  [1][3]
[2][0]  [2][1]  [2][2]  [2][3]

Physically, memory is linear (one-dimensional). The CPU can only access memory sequentially through addresses. Multi-dimensional arrays are an abstraction that the compiler maps to linear memory.

2. Row-major Order Explained

Row-major order is the strategy for mapping multi-dimensional indices to a linear memory address. Elements are stored row by row.

Memory Layout Example

For int matrix[3][4] (3 rows, 4 columns):

Memory Address:  Value:        Logical Position:
0x1000          matrix[0][0]   
0x1004          matrix[0][1]   ← Row 0 stored contiguously
0x1008          matrix[0][2]
0x100C          matrix[0][3]
0x1010          matrix[1][0]   ← Then Row 1
0x1014          matrix[1][1]
0x1018          matrix[1][2]
0x101C          matrix[1][3]
0x1020          matrix[2][0]   ← Finally Row 2
0x1024          matrix[2][1]
0x1028          matrix[2][2]
0x102C          matrix[2][3]

Notice: Rightmost index changes fastest when traversing memory sequentially.

3. Address Calculation Formula

For 2D Arrays

Given: type array[ROWS][COLS]

To access array[i][j]:

address = base_address + (i * COLS + j) * sizeof(type)

Breakdown:

i * COLS: Skip i complete rows (each row has COLS elements)
+ j: Move j positions within the current row
* sizeof(type): Scale by element size (bytes)

Example Calculation

For int matrix[3][4] at base address 0x1000, accessing matrix[2][3]:

address = 0x1000 + (2 * 4 + 3) * 4
        = 0x1000 + (8 + 3) * 4
        = 0x1000 + 11 * 4
        = 0x1000 + 44
        = 0x102C

For 3D Arrays

Given: type array[D1][D2][D3]

To access array[i][j][k]:

address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)

Logic:

i * D2 * D3: Skip i complete "planes" (each plane is D2×D3 elements)
j * D3: Skip j complete rows within the current plane
k: Position within the current row

For N-dimensional Arrays

General formula for array[i₁][i₂][i₃]...[iₙ] with dimensions [D₁][D₂][D₃]...[Dₙ]:

offset = i₁ * (D₂ * D₃ * ... * Dₙ) +
         i₂ * (D₃ * D₄ * ... * Dₙ) +
         i₃ * (D₄ * D₅ * ... * Dₙ) +
         ...
         iₙ₋₁ * Dₙ +
         iₙ

address = base + offset * sizeof(type)

4. Compiler Implementation Details

How C/C++ Handles Multi-dimensional Arrays

When you write:

int arr[3][4][5];
int x = arr[1][2][3];

The compiler translates this to:

int x = *(arr + (1 * 4 * 5 + 2 * 5 + 3));

Pointer Arithmetic

Multi-dimensional arrays decay to pointers in interesting ways:

int arr[3][4];

arr has type int (*)[4] (pointer to array of 4 ints)
arr[i] has type int * (pointer to int)
arr[i][j] has type int (actual value)

What happens with arr + 1?

It advances by the size of one complete row (4 ints = 16 bytes if int is 4 bytes)
Points to arr[1][0]

What happens with *(arr + 1) + 2?

*(arr + 1) gives you arr[1] (a pointer to the first element of row 1)
Adding 2 advances by 2 ints
Points to arr[1][2]

Assembly Level (x86-64 Example)

For arr[i][j] where arr is int[ROWS][COLS]:

; Assume: base address in RAX, i in RBX, j in RCX

mov    rdx, rbx           ; rdx = i
imul   rdx, COLS          ; rdx = i * COLS
add    rdx, rcx           ; rdx = i * COLS + j
shl    rdx, 2             ; rdx = (i * COLS + j) * 4 (multiply by sizeof(int))
add    rdx, rax           ; rdx = base + offset
mov    eax, [rdx]         ; load the value

Modern compilers optimize this heavily using LEA (Load Effective Address):

lea    rdx, [rbx + rbx*4] ; rdx = i * 5 (if COLS=5)
lea    rdx, [rdx*4 + rax] ; combine scaling and base
mov    eax, [rdx + rcx*4] ; load arr[i][j]

5. Row-major vs Column-major Order

Row-major (C, C++, Java, Python NumPy default)

For arr[3][4]:
Storage order: [0,0] [0,1] [0,2] [0,3] [1,0] [1,1] ...

Column-major (Fortran, MATLAB, R, Julia)

For arr[3][4]:
Storage order: [0,0] [1,0] [2,0] [0,1] [1,1] [2,1] ...

Column-major formula for array[i][j] in dimensions [ROWS][COLS]:

address = base + (j * ROWS + i) * sizeof(type)

Performance Implications

Cache locality matters! Modern CPUs fetch memory in cache lines (typically 64 bytes).

Row-major traversal (optimal in C):

for (int i = 0; i < ROWS; i++) {
    for (int j = 0; j < COLS; j++) {
        process(arr[i][j]);  // Sequential memory access
    }
}

Column-major traversal (suboptimal in C):

for (int j = 0; j < COLS; j++) {
    for (int i = 0; i < ROWS; i++) {
        process(arr[i][j]);  // Strided access, cache misses
    }
}

6. Memory Layout Deep Dive

Alignment and Padding

Arrays are typically aligned to their element's alignment requirement:

struct Data {
    char c;      // 1 byte
    int arr[2][3]; // Starts at offset 4 (aligned to 4-byte boundary)
};

Array of Arrays vs Array of Pointers

True 2D array:

int arr[3][4];  // Contiguous 48 bytes (3*4*4)

Array of pointers (simulated 2D):

int *arr[3];
arr[0] = malloc(4 * sizeof(int));
arr[1] = malloc(4 * sizeof(int));
arr[2] = malloc(4 * sizeof(int));

These look similar but have completely different memory layouts:

True 2D: One contiguous block
Pointer array: Non-contiguous, each row can be anywhere in memory
True 2D: Faster, better cache locality
Pointer array: Flexible (rows can have different sizes), fragmented

7. Practical Example with Memory Dump

#include <stdio.h>

int main() {
    int arr[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    printf("Base address: %p\n", (void*)arr);

    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 3; j++) {
            printf("arr[%d][%d] = %d at address %p\n", 
                   i, j, arr[i][j], (void*)&arr[i][j]);
        }
    }

    // Treating as 1D array
    int *flat = (int*)arr;
    printf("\nAs flat array:\n");
    for (int k = 0; k < 6; k++) {
        printf("flat[%d] = %d at address %p\n", 
               k, flat[k], (void*)&flat[k]);
    }

    return 0;
}

Output (example):

Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54

Notice addresses increment by 4 (sizeof(int)) sequentially in row-major order.

8. Dynamic Multi-dimensional Arrays

Method 1: Single malloc with manual indexing

int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));

// Access arr[i][j]:
int value = arr[i * cols + j];

Method 2: Array of pointers

int **arr = malloc(rows * sizeof(int*));
for (int i = 0; i < rows; i++) {
    arr[i] = malloc(cols * sizeof(int));
}

// Access naturally: arr[i][j]

Method 3: Contiguous with pointer array (best of both)

int **arr = malloc(rows * sizeof(int*));
arr[0] = malloc(rows * cols * sizeof(int));
for (int i = 1; i < rows; i++) {
    arr[i] = arr[0] + i * cols;
}

// Cleanup:
free(arr[0]);
free(arr);

9. Hardware Considerations

CPU Cache Behavior

Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access patterns maximize cache hit rates.

Cache line example (64 bytes):

If int is 4 bytes, one cache line holds 16 ints.
Accessing arr[0][0] loads arr[0][0] through arr[0][15] into cache.

TLB (Translation Lookaside Buffer)

The TLB caches virtual-to-physical address translations. Strided access patterns can cause TLB thrashing.

Prefetching

Modern CPUs detect sequential access patterns and prefetch upcoming cache lines. Row-major traversal in row-major storage enables this optimization.

10. Summary

Key Takeaways:

Multi-dimensional arrays are linearized in memory
Row-major order: rightmost index varies fastest
Address formula: base + (i * COLS + j) * sizeof(type) for 2D
Traversal matters: iterate in storage order for performance
True arrays vs pointer arrays: different memory layouts, different trade-offs
Compiler magic: translates arr[i][j] to pointer arithmetic automatically
Cache locality: sequential access patterns are 10-100x faster than random access

Understanding these low-level details helps you write faster code, debug memory issues, and interface with hardware or other languages effectively.

DEV Community