Let me explain multi-dimensional arrays and row-major ordering from the ground up, covering memory layout, addressing, and low-level implementation details.
1. Understanding Multi-dimensional Arrays
Conceptual vs Physical Reality
Conceptually, when you declare:
int matrix[3][4];
You think of a 2D grid:
[0][0] [0][1] [0][2] [0][3]
[1][0] [1][1] [1][2] [1][3]
[2][0] [2][1] [2][2] [2][3]
Physically, memory is linear (one-dimensional). The CPU can only access memory sequentially through addresses. Multi-dimensional arrays are an abstraction that the compiler maps to linear memory.
2. Row-major Order Explained
Row-major order is the strategy for mapping multi-dimensional indices to a linear memory address. Elements are stored row by row.
Memory Layout Example
For int matrix[3][4] (3 rows, 4 columns):
Memory Address: Value: Logical Position:
0x1000 matrix[0][0]
0x1004 matrix[0][1] ← Row 0 stored contiguously
0x1008 matrix[0][2]
0x100C matrix[0][3]
0x1010 matrix[1][0] ← Then Row 1
0x1014 matrix[1][1]
0x1018 matrix[1][2]
0x101C matrix[1][3]
0x1020 matrix[2][0] ← Finally Row 2
0x1024 matrix[2][1]
0x1028 matrix[2][2]
0x102C matrix[2][3]
Notice: Rightmost index changes fastest when traversing memory sequentially.
3. Address Calculation Formula
For 2D Arrays
Given: type array[ROWS][COLS]
To access array[i][j]:
address = base_address + (i * COLS + j) * sizeof(type)
Breakdown:
-
i * COLS: Skipicomplete rows (each row has COLS elements) -
+ j: Movejpositions within the current row -
* sizeof(type): Scale by element size (bytes)
Example Calculation
For int matrix[3][4] at base address 0x1000, accessing matrix[2][3]:
address = 0x1000 + (2 * 4 + 3) * 4
= 0x1000 + (8 + 3) * 4
= 0x1000 + 11 * 4
= 0x1000 + 44
= 0x102C
For 3D Arrays
Given: type array[D1][D2][D3]
To access array[i][j][k]:
address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)
Logic:
-
i * D2 * D3: Skipicomplete "planes" (each plane is D2×D3 elements) -
j * D3: Skipjcomplete rows within the current plane -
k: Position within the current row
For N-dimensional Arrays
General formula for array[i₁][i₂][i₃]...[iₙ] with dimensions [D₁][D₂][D₃]...[Dₙ]:
offset = i₁ * (D₂ * D₃ * ... * Dₙ) +
i₂ * (D₃ * D₄ * ... * Dₙ) +
i₃ * (D₄ * D₅ * ... * Dₙ) +
...
iₙ₋₁ * Dₙ +
iₙ
address = base + offset * sizeof(type)
4. Compiler Implementation Details
How C/C++ Handles Multi-dimensional Arrays
When you write:
int arr[3][4][5];
int x = arr[1][2][3];
The compiler translates this to:
int x = *(arr + (1 * 4 * 5 + 2 * 5 + 3));
Pointer Arithmetic
Multi-dimensional arrays decay to pointers in interesting ways:
int arr[3][4];
-
arrhas typeint (*)[4](pointer to array of 4 ints) -
arr[i]has typeint *(pointer to int) -
arr[i][j]has typeint(actual value)
What happens with arr + 1?
- It advances by the size of one complete row (4 ints = 16 bytes if int is 4 bytes)
- Points to
arr[1][0]
What happens with *(arr + 1) + 2?
-
*(arr + 1)gives youarr[1](a pointer to the first element of row 1) - Adding 2 advances by 2 ints
- Points to
arr[1][2]
Assembly Level (x86-64 Example)
For arr[i][j] where arr is int[ROWS][COLS]:
; Assume: base address in RAX, i in RBX, j in RCX
mov rdx, rbx ; rdx = i
imul rdx, COLS ; rdx = i * COLS
add rdx, rcx ; rdx = i * COLS + j
shl rdx, 2 ; rdx = (i * COLS + j) * 4 (multiply by sizeof(int))
add rdx, rax ; rdx = base + offset
mov eax, [rdx] ; load the value
Modern compilers optimize this heavily using LEA (Load Effective Address):
lea rdx, [rbx + rbx*4] ; rdx = i * 5 (if COLS=5)
lea rdx, [rdx*4 + rax] ; combine scaling and base
mov eax, [rdx + rcx*4] ; load arr[i][j]
5. Row-major vs Column-major Order
Row-major (C, C++, Java, Python NumPy default)
For arr[3][4]:
Storage order: [0,0] [0,1] [0,2] [0,3] [1,0] [1,1] ...
Column-major (Fortran, MATLAB, R, Julia)
For arr[3][4]:
Storage order: [0,0] [1,0] [2,0] [0,1] [1,1] [2,1] ...
Column-major formula for array[i][j] in dimensions [ROWS][COLS]:
address = base + (j * ROWS + i) * sizeof(type)
Performance Implications
Cache locality matters! Modern CPUs fetch memory in cache lines (typically 64 bytes).
Row-major traversal (optimal in C):
for (int i = 0; i < ROWS; i++) {
for (int j = 0; j < COLS; j++) {
process(arr[i][j]); // Sequential memory access
}
}
Column-major traversal (suboptimal in C):
for (int j = 0; j < COLS; j++) {
for (int i = 0; i < ROWS; i++) {
process(arr[i][j]); // Strided access, cache misses
}
}
6. Memory Layout Deep Dive
Alignment and Padding
Arrays are typically aligned to their element's alignment requirement:
struct Data {
char c; // 1 byte
int arr[2][3]; // Starts at offset 4 (aligned to 4-byte boundary)
};
Array of Arrays vs Array of Pointers
True 2D array:
int arr[3][4]; // Contiguous 48 bytes (3*4*4)
Array of pointers (simulated 2D):
int *arr[3];
arr[0] = malloc(4 * sizeof(int));
arr[1] = malloc(4 * sizeof(int));
arr[2] = malloc(4 * sizeof(int));
These look similar but have completely different memory layouts:
- True 2D: One contiguous block
- Pointer array: Non-contiguous, each row can be anywhere in memory
- True 2D: Faster, better cache locality
- Pointer array: Flexible (rows can have different sizes), fragmented
7. Practical Example with Memory Dump
#include <stdio.h>
int main() {
int arr[2][3] = {
{1, 2, 3},
{4, 5, 6}
};
printf("Base address: %p\n", (void*)arr);
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 3; j++) {
printf("arr[%d][%d] = %d at address %p\n",
i, j, arr[i][j], (void*)&arr[i][j]);
}
}
// Treating as 1D array
int *flat = (int*)arr;
printf("\nAs flat array:\n");
for (int k = 0; k < 6; k++) {
printf("flat[%d] = %d at address %p\n",
k, flat[k], (void*)&flat[k]);
}
return 0;
}
Output (example):
Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54
Notice addresses increment by 4 (sizeof(int)) sequentially in row-major order.
8. Dynamic Multi-dimensional Arrays
Method 1: Single malloc with manual indexing
int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));
// Access arr[i][j]:
int value = arr[i * cols + j];
Method 2: Array of pointers
int **arr = malloc(rows * sizeof(int*));
for (int i = 0; i < rows; i++) {
arr[i] = malloc(cols * sizeof(int));
}
// Access naturally: arr[i][j]
Method 3: Contiguous with pointer array (best of both)
int **arr = malloc(rows * sizeof(int*));
arr[0] = malloc(rows * cols * sizeof(int));
for (int i = 1; i < rows; i++) {
arr[i] = arr[0] + i * cols;
}
// Cleanup:
free(arr[0]);
free(arr);
9. Hardware Considerations
CPU Cache Behavior
Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access patterns maximize cache hit rates.
Cache line example (64 bytes):
If int is 4 bytes, one cache line holds 16 ints.
Accessing arr[0][0] loads arr[0][0] through arr[0][15] into cache.
TLB (Translation Lookaside Buffer)
The TLB caches virtual-to-physical address translations. Strided access patterns can cause TLB thrashing.
Prefetching
Modern CPUs detect sequential access patterns and prefetch upcoming cache lines. Row-major traversal in row-major storage enables this optimization.
10. Summary
Key Takeaways:
- Multi-dimensional arrays are linearized in memory
- Row-major order: rightmost index varies fastest
-
Address formula:
base + (i * COLS + j) * sizeof(type)for 2D - Traversal matters: iterate in storage order for performance
- True arrays vs pointer arrays: different memory layouts, different trade-offs
-
Compiler magic: translates
arr[i][j]to pointer arithmetic automatically - Cache locality: sequential access patterns are 10-100x faster than random access
Understanding these low-level details helps you write faster code, debug memory issues, and interface with hardware or other languages effectively.
Top comments (0)