DEV Community

Cover image for Multi-dimensional Arrays & Row-major Order: A Deep Dive
ali ehab algmass
ali ehab algmass

Posted on

Multi-dimensional Arrays & Row-major Order: A Deep Dive

Let me explain multi-dimensional arrays and row-major ordering from the ground up, covering memory layout, addressing, and low-level implementation details.

1. Understanding Multi-dimensional Arrays

Conceptual vs Physical Reality

Conceptually, when you declare:

int matrix[3][4];
Enter fullscreen mode Exit fullscreen mode

You think of a 2D grid:

[0][0]  [0][1]  [0][2]  [0][3]
[1][0]  [1][1]  [1][2]  [1][3]
[2][0]  [2][1]  [2][2]  [2][3]
Enter fullscreen mode Exit fullscreen mode

Physically, memory is linear (one-dimensional). The CPU can only access memory sequentially through addresses. Multi-dimensional arrays are an abstraction that the compiler maps to linear memory.

2. Row-major Order Explained

Row-major order is the strategy for mapping multi-dimensional indices to a linear memory address. Elements are stored row by row.

Memory Layout Example

For int matrix[3][4] (3 rows, 4 columns):

Memory Address:  Value:        Logical Position:
0x1000          matrix[0][0]   
0x1004          matrix[0][1]   ← Row 0 stored contiguously
0x1008          matrix[0][2]
0x100C          matrix[0][3]
0x1010          matrix[1][0]   ← Then Row 1
0x1014          matrix[1][1]
0x1018          matrix[1][2]
0x101C          matrix[1][3]
0x1020          matrix[2][0]   ← Finally Row 2
0x1024          matrix[2][1]
0x1028          matrix[2][2]
0x102C          matrix[2][3]
Enter fullscreen mode Exit fullscreen mode

Notice: Rightmost index changes fastest when traversing memory sequentially.

3. Address Calculation Formula

For 2D Arrays

Given: type array[ROWS][COLS]

To access array[i][j]:

address = base_address + (i * COLS + j) * sizeof(type)
Enter fullscreen mode Exit fullscreen mode

Breakdown:

  • i * COLS: Skip i complete rows (each row has COLS elements)
  • + j: Move j positions within the current row
  • * sizeof(type): Scale by element size (bytes)

Example Calculation

For int matrix[3][4] at base address 0x1000, accessing matrix[2][3]:

address = 0x1000 + (2 * 4 + 3) * 4
        = 0x1000 + (8 + 3) * 4
        = 0x1000 + 11 * 4
        = 0x1000 + 44
        = 0x102C
Enter fullscreen mode Exit fullscreen mode

For 3D Arrays

Given: type array[D1][D2][D3]

To access array[i][j][k]:

address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)
Enter fullscreen mode Exit fullscreen mode

Logic:

  • i * D2 * D3: Skip i complete "planes" (each plane is D2×D3 elements)
  • j * D3: Skip j complete rows within the current plane
  • k: Position within the current row

For N-dimensional Arrays

General formula for array[i₁][i₂][i₃]...[iₙ] with dimensions [D₁][D₂][D₃]...[Dₙ]:

offset = i₁ * (D₂ * D₃ * ... * Dₙ) +
         i₂ * (D₃ * D₄ * ... * Dₙ) +
         i₃ * (D₄ * D₅ * ... * Dₙ) +
         ...
         iₙ₋₁ * Dₙ +
         iₙ

address = base + offset * sizeof(type)
Enter fullscreen mode Exit fullscreen mode

4. Compiler Implementation Details

How C/C++ Handles Multi-dimensional Arrays

When you write:

int arr[3][4][5];
int x = arr[1][2][3];
Enter fullscreen mode Exit fullscreen mode

The compiler translates this to:

int x = *(arr + (1 * 4 * 5 + 2 * 5 + 3));
Enter fullscreen mode Exit fullscreen mode

Pointer Arithmetic

Multi-dimensional arrays decay to pointers in interesting ways:

int arr[3][4];
Enter fullscreen mode Exit fullscreen mode
  • arr has type int (*)[4] (pointer to array of 4 ints)
  • arr[i] has type int * (pointer to int)
  • arr[i][j] has type int (actual value)

What happens with arr + 1?

  • It advances by the size of one complete row (4 ints = 16 bytes if int is 4 bytes)
  • Points to arr[1][0]

What happens with *(arr + 1) + 2?

  • *(arr + 1) gives you arr[1] (a pointer to the first element of row 1)
  • Adding 2 advances by 2 ints
  • Points to arr[1][2]

Assembly Level (x86-64 Example)

For arr[i][j] where arr is int[ROWS][COLS]:

; Assume: base address in RAX, i in RBX, j in RCX

mov    rdx, rbx           ; rdx = i
imul   rdx, COLS          ; rdx = i * COLS
add    rdx, rcx           ; rdx = i * COLS + j
shl    rdx, 2             ; rdx = (i * COLS + j) * 4 (multiply by sizeof(int))
add    rdx, rax           ; rdx = base + offset
mov    eax, [rdx]         ; load the value
Enter fullscreen mode Exit fullscreen mode

Modern compilers optimize this heavily using LEA (Load Effective Address):

lea    rdx, [rbx + rbx*4] ; rdx = i * 5 (if COLS=5)
lea    rdx, [rdx*4 + rax] ; combine scaling and base
mov    eax, [rdx + rcx*4] ; load arr[i][j]
Enter fullscreen mode Exit fullscreen mode

5. Row-major vs Column-major Order

Row-major (C, C++, Java, Python NumPy default)

For arr[3][4]:
Storage order: [0,0] [0,1] [0,2] [0,3] [1,0] [1,1] ...
Enter fullscreen mode Exit fullscreen mode

Column-major (Fortran, MATLAB, R, Julia)

For arr[3][4]:
Storage order: [0,0] [1,0] [2,0] [0,1] [1,1] [2,1] ...
Enter fullscreen mode Exit fullscreen mode

Column-major formula for array[i][j] in dimensions [ROWS][COLS]:

address = base + (j * ROWS + i) * sizeof(type)
Enter fullscreen mode Exit fullscreen mode

Performance Implications

Cache locality matters! Modern CPUs fetch memory in cache lines (typically 64 bytes).

Row-major traversal (optimal in C):

for (int i = 0; i < ROWS; i++) {
    for (int j = 0; j < COLS; j++) {
        process(arr[i][j]);  // Sequential memory access
    }
}
Enter fullscreen mode Exit fullscreen mode

Column-major traversal (suboptimal in C):

for (int j = 0; j < COLS; j++) {
    for (int i = 0; i < ROWS; i++) {
        process(arr[i][j]);  // Strided access, cache misses
    }
}
Enter fullscreen mode Exit fullscreen mode

6. Memory Layout Deep Dive

Alignment and Padding

Arrays are typically aligned to their element's alignment requirement:

struct Data {
    char c;      // 1 byte
    int arr[2][3]; // Starts at offset 4 (aligned to 4-byte boundary)
};
Enter fullscreen mode Exit fullscreen mode

Array of Arrays vs Array of Pointers

True 2D array:

int arr[3][4];  // Contiguous 48 bytes (3*4*4)
Enter fullscreen mode Exit fullscreen mode

Array of pointers (simulated 2D):

int *arr[3];
arr[0] = malloc(4 * sizeof(int));
arr[1] = malloc(4 * sizeof(int));
arr[2] = malloc(4 * sizeof(int));
Enter fullscreen mode Exit fullscreen mode

These look similar but have completely different memory layouts:

  • True 2D: One contiguous block
  • Pointer array: Non-contiguous, each row can be anywhere in memory
  • True 2D: Faster, better cache locality
  • Pointer array: Flexible (rows can have different sizes), fragmented

7. Practical Example with Memory Dump

#include <stdio.h>

int main() {
    int arr[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    printf("Base address: %p\n", (void*)arr);

    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 3; j++) {
            printf("arr[%d][%d] = %d at address %p\n", 
                   i, j, arr[i][j], (void*)&arr[i][j]);
        }
    }

    // Treating as 1D array
    int *flat = (int*)arr;
    printf("\nAs flat array:\n");
    for (int k = 0; k < 6; k++) {
        printf("flat[%d] = %d at address %p\n", 
               k, flat[k], (void*)&flat[k]);
    }

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output (example):

Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54
Enter fullscreen mode Exit fullscreen mode

Notice addresses increment by 4 (sizeof(int)) sequentially in row-major order.

8. Dynamic Multi-dimensional Arrays

Method 1: Single malloc with manual indexing

int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));

// Access arr[i][j]:
int value = arr[i * cols + j];
Enter fullscreen mode Exit fullscreen mode

Method 2: Array of pointers

int **arr = malloc(rows * sizeof(int*));
for (int i = 0; i < rows; i++) {
    arr[i] = malloc(cols * sizeof(int));
}

// Access naturally: arr[i][j]
Enter fullscreen mode Exit fullscreen mode

Method 3: Contiguous with pointer array (best of both)

int **arr = malloc(rows * sizeof(int*));
arr[0] = malloc(rows * cols * sizeof(int));
for (int i = 1; i < rows; i++) {
    arr[i] = arr[0] + i * cols;
}

// Cleanup:
free(arr[0]);
free(arr);
Enter fullscreen mode Exit fullscreen mode

9. Hardware Considerations

CPU Cache Behavior

Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access patterns maximize cache hit rates.

Cache line example (64 bytes):

If int is 4 bytes, one cache line holds 16 ints.
Accessing arr[0][0] loads arr[0][0] through arr[0][15] into cache.
Enter fullscreen mode Exit fullscreen mode

TLB (Translation Lookaside Buffer)

The TLB caches virtual-to-physical address translations. Strided access patterns can cause TLB thrashing.

Prefetching

Modern CPUs detect sequential access patterns and prefetch upcoming cache lines. Row-major traversal in row-major storage enables this optimization.

10. Summary

Key Takeaways:

  1. Multi-dimensional arrays are linearized in memory
  2. Row-major order: rightmost index varies fastest
  3. Address formula: base + (i * COLS + j) * sizeof(type) for 2D
  4. Traversal matters: iterate in storage order for performance
  5. True arrays vs pointer arrays: different memory layouts, different trade-offs
  6. Compiler magic: translates arr[i][j] to pointer arithmetic automatically
  7. Cache locality: sequential access patterns are 10-100x faster than random access

Understanding these low-level details helps you write faster code, debug memory issues, and interface with hardware or other languages effectively.

Top comments (0)