ali ehab algmass

Posted on Jan 11

Dynamic Arrays: Low-Level Implementation & Amortized Analysis

#algorithms #computerscience #performance

Dynamic arrays (like C++ vector, Java ArrayList, Python list) provide array-like access with automatic resizing. Let me break down the implementation details and cost analysis.

Core Data Structure

At the lowest level, a dynamic array maintains:

struct DynamicArray {
    void* data;      // Pointer to actual array in heap
    size_t size;     // Current number of elements
    size_t capacity; // Total allocated space
};

Key insight: size ≤ capacity always. When size == capacity, we're full and need to resize.

The Resizing Operation

When you append an element and the array is full, here's what happens at the memory level:

Step-by-Step Resize Process

Allocate new memory block

   // Typically: new_capacity = old_capacity * growth_factor
   // Common growth factors: 1.5x or 2x
   new_capacity = capacity * 2;
   void* new_data = malloc(new_capacity * sizeof(element_type));

Copy existing elements

   // memcpy or element-by-element copy
   for (i = 0; i < size; i++) {
       new_data[i] = data[i];
   }

Free old memory

   free(data);

Update pointers

   data = new_data;
   capacity = new_capacity;

Insert new element

   data[size++] = new_element;

Why This Is Expensive

Each resize operation costs O(n) where n is the current size because we copy all existing elements. This seems terrible for performance!

Amortized Cost Analysis

Here's where amortized analysis shows the true efficiency. Let's trace through insertions with 2x growth:

Detailed Cost Breakdown

Starting with capacity = 1:

Insert #	Size	Capacity	Resize?	Copy Cost	Total Copies So Far
1	1	1	No	0	0
2	2	2	Yes	1	1
3	3	4	Yes	2	3
4	4	4	No	0	3
5	5	8	Yes	4	7
6	6	8	No	0	7
7	7	8	No	0	7
8	8	8	No	0	7
9	9	16	Yes	8	15
...	...	...	...	...	...
n	n	...	...	...	≤ 2n

The Mathematical Proof

After inserting n elements with 2x growth, resizes occur at powers of 2:

Positions: 2, 4, 8, 16, 32, ..., 2^k where 2^k ≤ n

Total copies = 1 + 2 + 4 + 8 + ... + 2^k
= 2^(k+1) - 1 (geometric series)
≤ 2n - 1
= O(n)

Average cost per insertion = O(n) / n = O(1)

This is the amortized constant time!

Growth Factor Comparison

2x Growth (Doubling)

Pros: Minimal number of resizes (log₂ n resizes for n elements)
Cons:
- Can waste up to 50% memory (when array just resized)
- Memory fragmentation: old blocks may not be reusable

1.5x Growth

Pros:
- Better memory reuse (freed blocks can fit future allocations)
- Less wasted space (~33% vs 50%)
Cons: More frequent resizes (log₁.₅ n resizes)

Still O(1) amortized for both!

Proof for growth factor g > 1:
Total copies ≤ n × g/(g-1) = O(n)

Low-Level Memory Considerations

Cache Performance

Old array: [1][2][3][4]________
                        |
                        v (resize, allocate far away)
New array: ________________[1][2][3][4][5][6][7][8]

Resizing causes cache misses during copy
But subsequent accesses benefit from contiguous layout
Overall: excellent cache locality for sequential access

Memory Allocation Patterns

// Allocation timeline for 2x growth:
malloc(1 * elem_size)   // 1 element
malloc(2 * elem_size)   // 2 elements
free(1 * elem_size)
malloc(4 * elem_size)   // 4 elements
free(2 * elem_size)
malloc(8 * elem_size)   // 8 elements
free(4 * elem_size)
// ...

Peak memory usage: During resize, both old and new arrays exist temporarily:

Without realloc: up to 3× the data size (old + new during copy)
With realloc: allocator may resize in place if possible

`realloc()` Optimization

Smart implementations use realloc():

void* new_data = realloc(data, new_capacity * sizeof(T));

If there's free space immediately after the current block, realloc expands in-place:

No copying needed! O(1) operation
Only happens sometimes, doesn't change amortized analysis
Makes common cases much faster

Accounting Method Visualization

Think of each insertion as paying "$3":

$1 for the actual insertion
$2 stored as "credit" on that element

When we resize and copy n elements:

Cost to copy: $n
Credits available: $2n (each element has $2)
Credits remaining: $n (enough for future copies)

This proves each insertion pays constant "credits" = O(1) amortized.

Practical Implementation Details

Typical C++ std::vector Growth

// Pseudocode for push_back
void push_back(T elem) {
    if (size == capacity) {
        size_t new_cap = (capacity == 0) ? 1 : capacity * 2;
        T* new_arr = allocator.allocate(new_cap);

        // Move or copy existing elements
        for (size_t i = 0; i < size; i++) {
            new_arr[i] = std::move(data[i]); // C++11 move semantics
        }

        allocator.deallocate(data, capacity);
        data = new_arr;
        capacity = new_cap;
    }
    data[size++] = std::move(elem);
}

Edge Cases

Initial capacity: Often starts at 0 or 1, first resize creates small array
Shrinking: Some implementations shrink when size < capacity/4 (avoid thrashing)
Reserve: reserve(n) pre-allocates to avoid resizes when final size known

Summary

Amortized O(1) insertion works because:

Most insertions don't trigger resize (cheap)
Expensive resizes happen exponentially less frequently
Growth factor ensures total work is linear in insertions
Each operation "pays forward" for future expensive operations

The amortized analysis shows that while individual operations vary wildly in cost, the average cost per operation remains constant over a sequence of operations.
dynamic array code in c

DEV Community

Dynamic Arrays: Low-Level Implementation & Amortized Analysis

Core Data Structure

The Resizing Operation

Step-by-Step Resize Process

Why This Is Expensive

Amortized Cost Analysis

Detailed Cost Breakdown

The Mathematical Proof

Growth Factor Comparison

2x Growth (Doubling)

1.5x Growth

Low-Level Memory Considerations

Cache Performance

Memory Allocation Patterns

`realloc()` Optimization

Accounting Method Visualization

Practical Implementation Details

Typical C++ std::vector Growth

Edge Cases

Summary

Top comments (0)

Core Data Structure

The Resizing Operation

Step-by-Step Resize Process

Why This Is Expensive

Amortized Cost Analysis

Detailed Cost Breakdown

The Mathematical Proof

Growth Factor Comparison

2x Growth (Doubling)

1.5x Growth

Low-Level Memory Considerations

Cache Performance

Memory Allocation Patterns

realloc() Optimization

Accounting Method Visualization

Practical Implementation Details

Typical C++ std::vector Growth

Edge Cases

Summary

`realloc()` Optimization