DEV Community

Cover image for The Stack: Why It’s Fast and How It Works
Suvijak Vinyunavan
Suvijak Vinyunavan

Posted on

The Stack: Why It’s Fast and How It Works

While many programmers are familiar with the concept of the stack, a common question arises: Why is it faster than the heap, even though both reside in the same physical memory? This article aims to answer those questions to the very low level to satisfy the thirst for understanding of some curious individuals out there.

Let’s not waste any more time and dive into the depth of the stack memory architecture.

📚What is the stack?

The stack memory, allocated for every local variable and function calls, follows a strict, well-known, Last in first out (LIFO)data structure: the latest item has to be removed first in order to access the previous stack. Each segment of the stack, which corresponds to each allocations and scope, is called a “stack frame”.

Image portraying how stack data structure works
(Figure 1: stack data structure illustration)

🧪An Example in C and Assembly.

To better understand the stack memory, we need an example to follow by and explain.

Supposed you call a function - add():

int add(int j, int k) {
    int a = 10;
    int b = j + k;
    return b;
}
Enter fullscreen mode Exit fullscreen mode

Which compiles to the following assembly(I promise it’s not as scary, I will explain them through) looks like:
add:

        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
        str     r0, [r11, #-4]
        str     r1, [sp, #8]
        mov     r0, #10
        str     r0, [sp, #4]
        ldr     r0, [r11, #-4]
        ldr     r1, [sp, #8]
        add     r0, r0, r1
        str     r0, [sp]
        ldr     r0, [sp]
        mov     sp, r11
        pop     {r11, lr}
        bx      lr
Enter fullscreen mode Exit fullscreen mode

To understand the stack and how it’s allocated, we need to introduce some terminology and conventions first:

  1. 📚 The stack frame is a region of memory allocated on the call stack for each function call in a program, which in the lower level, is the area between the frame pointer fp and the top stack pointer sp.

  2. ☝️Frame pointer, together with the stack pointer, are the anchor points which the entire stack will operate relative to. They will never be changed throughout the entire process, and only change when the stack frame change ie. new stack allocation.

  3. ⏬The stack grows conventionally downwards, which means that when we allocate a stack on top of the existing one, the memory address of the new stack is lower than the previous one.

  4. 📝r11 refers to a register in this case used as the frame pointer, and sp refers to the stack pointer


🧱Breaking down the assembly

Every time a stack has to be allocated the computer does the following

        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
Enter fullscreen mode Exit fullscreen mode
  1. push {r11, lr} instructs the processor to put anything inside r11 which is currently the previous stack’s frame pointer, and lr (link register used for returning from a function) onto the stack and decrements sp by 4 per each item, 8 in total.

  2. mov r11, sp sets the frame pointer to the stack pointer

  3. Finally sub sp, sp, #16 decrements sp by 16 to make space for the 16 bytes we need for the function’s local variables and parameters.

Image visualizing how the stack allocation works
(Figure 2: stack allocation step by step illustration)

This concludes the stack allocation. The memory region between the stack pointer (sp) and the frame pointer (r11) constitutes the current function’s stack frame.

🔠Variables

On line 7, we can see the assembly associated with initializing the variable a to 10:

        str     r0, [sp, #4]
Enter fullscreen mode Exit fullscreen mode

This instruction stores the value from register r0, which currently holds the value 10 from the previous mov instruction, into the memory location at an offset of 4 bytes from the stack pointer.

This convoluted way of refering to a place in memory is required for the flexibility of the function to be called whenever and wherever in the code, since all the local variables would be just offsets in the frame itself. Different compilers have different approaches to how to use the stack, but essentially they all use location relative to the stack position for local variables.

Image portraying how local variable works

You might wonder where do these memories come from? The truth is, all processes on the system get preallocated a stack space it can use without requesting for more memory, by default about 1 - 8MB depends on the operating system. This means that programs already have the stack memory they need already allocated to them, but this also means that the stack is limited to only the space allocated for them at the beginning of the program. This limited preallocation is the reason why using too much of the stack memory (ie. from deep recursive function or an array too big) often leads to a stack overflow error.

The mechanics used to allocate the memory being essentially moving pointers up and down requiring only a few instructions is a major factor contributing to the exceptionally quick allocation of the stack memory, compared to heap memory which requires complicated book keepings of the entire heap area and operating system calls for more memory, making the stack memory far better suited for local small allocation such as variables, which does not need to persist through to other scope of the program.

✅To Summarize

  • Stack allocation is fast due to simple pointer arithmetic.
  • The CPU provides specific instructions optimized for stack operations.
  • Stack memory is preallocated, eliminating the need for dynamic memory requests during execution.
  • Stack size is determined at program start, and memory is automatically reclaimed when a function exits.

Top comments (1)

Collapse
 
chanidapha_boonlong_c2a7e profile image
Chanidapha Boonlong

Wow, what a useful article! Thank you for the information.