We often learn that computer memory, or RAM, is structured as a large grid of storage cells. Each of these "cells" has a unique address and holds a length of 8 bits, equivalent to 1 byte. While this is accurate, can we build a deeper intuition about how memory addresses function and the values stored within these "cells"?
Data Structures
If you have ever taken a Computer Science class or prepared for a Software Engineering job interview, you have likely come across the term "data structures" countless times. Data structures are fundamental to programming because they enable efficient data storage, retrieval, and manipulation, making it easier to solve complex problems and optimize performance in software systems. But what exactly is a data structure?
A "data structure" is an organized way to store, manage, and retrieve data efficiently within RAM to perform various operations. To understand this concept, let’s look at the simplest example possible: an array.
Arrays: The Simplest Data Structure
Here is a code snippet where we create an array with three integer numbers in it (you can run it locally if you wish):
#include <iostream>
int main() {
int p[3] = {1, 2, 3};
for (int i = 0; i < 3; i++) {
std::cout << "Address of p[" << i << "] = " << &p[i] << std::endl;
}
std::cout << "Address of (p[0] + 1) = " << (&p[0] + 1) << std::endl;
std::cout << "Value of (p[0] + 1) = " << *(&p[0] + 1) << std::endl;
return 0;
}
The execution of this code will look like this:
Address of p[0] = 0x16dcb3188
Address of p[1] = 0x16dcb318c
Address of p[2] = 0x16dcb3190
Since we see the memory addresses of our integer numbers, we can copy and paste these values into a Python interpreter to observe how the difference between addresses corresponds to the size of the integers in memory. This demonstrates that each integer occupies exactly four bytes, matching the default size for Clang on Raspberry Pi (Debian clang version 14.0.6):
In [1]: p0_addr = 0x16dcb3188
In [2]: p1_addr = 0x16dcb3188
In [3]: p2_addr = 0x16dcb3190
In [4]: p1_addr - p0_addr
Out[4]: 4
In [5]: p2_addr - p1_addr
Out[5]: 4
Pointer Arithmetic and OS Memory Allocation
When you increment or decrement a pointer, it moves by the size of the type it points to (the other two stdout statements). This behavior is important because it highlights how pointers are closely tied to the structure and size of the data they reference.
Address of (p[0] + 1) = 0x16f00717c
Value of (p[0] + 1) = 2
But how can we check at what stage the OS allocates memory? We can use a strace
. This tool monitors the system calls made by a program and the signals it receives. For instance, running strace ./memory_demo
would provide insights into the memory allocation process.
To keep this example simple, I'm not including the full output of the strace
, as it wouldn’t be very useful for such a small program. In this case, a small allocation for an array of three integers doesn’t trigger the mmap
call. Instead, the three integers array will be allocated on the stack.
The default program stack on my Raspbian (ulimit -s
→ 8MB) isn’t created in user space through a direct mmap call and thus does not show up in the strace
output. Instead, the kernel sets up the stack region before transferring control to the program. To see a new mmap
region being created, you would need to make a large, explicit allocation on the heap (e.g. 512MB of integers int* arr = new int[512 * 1024 * 1024 / sizeof(int)];
).
Conclusion
In conclusion, understanding how memory allocation works and exploring strace
output provides valuable insights into the underlying behavior of data structures and memory allocation for them in a program. These key ideas help us better understand how programs work with hardware and give us the tools to write faster, more efficient code.
Top comments (0)