DEV Community

ali ehab algmass
ali ehab algmass

Posted on

Dynamic Arrays: Low-Level Implementation & Amortized Analysis

Dynamic arrays (like C++ vector, Java ArrayList, Python list) provide array-like access with automatic resizing. Let me break down the implementation details and cost analysis.

Core Data Structure

At the lowest level, a dynamic array maintains:

struct DynamicArray {
    void* data;      // Pointer to actual array in heap
    size_t size;     // Current number of elements
    size_t capacity; // Total allocated space
};
Enter fullscreen mode Exit fullscreen mode

Key insight: size ≤ capacity always. When size == capacity, we're full and need to resize.

The Resizing Operation

When you append an element and the array is full, here's what happens at the memory level:

Step-by-Step Resize Process

  1. Allocate new memory block
   // Typically: new_capacity = old_capacity * growth_factor
   // Common growth factors: 1.5x or 2x
   new_capacity = capacity * 2;
   void* new_data = malloc(new_capacity * sizeof(element_type));
Enter fullscreen mode Exit fullscreen mode
  1. Copy existing elements
   // memcpy or element-by-element copy
   for (i = 0; i < size; i++) {
       new_data[i] = data[i];
   }
Enter fullscreen mode Exit fullscreen mode
  1. Free old memory
   free(data);
Enter fullscreen mode Exit fullscreen mode
  1. Update pointers
   data = new_data;
   capacity = new_capacity;
Enter fullscreen mode Exit fullscreen mode
  1. Insert new element
   data[size++] = new_element;
Enter fullscreen mode Exit fullscreen mode

Why This Is Expensive

Each resize operation costs O(n) where n is the current size because we copy all existing elements. This seems terrible for performance!

Amortized Cost Analysis

Here's where amortized analysis shows the true efficiency. Let's trace through insertions with 2x growth:

Detailed Cost Breakdown

Starting with capacity = 1:

Insert # Size Capacity Resize? Copy Cost Total Copies So Far
1 1 1 No 0 0
2 2 2 Yes 1 1
3 3 4 Yes 2 3
4 4 4 No 0 3
5 5 8 Yes 4 7
6 6 8 No 0 7
7 7 8 No 0 7
8 8 8 No 0 7
9 9 16 Yes 8 15
... ... ... ... ... ...
n n ... ... ... ≤ 2n

The Mathematical Proof

After inserting n elements with 2x growth, resizes occur at powers of 2:

  • Positions: 2, 4, 8, 16, 32, ..., 2^k where 2^k ≤ n

Total copies = 1 + 2 + 4 + 8 + ... + 2^k
= 2^(k+1) - 1 (geometric series)
≤ 2n - 1
= O(n)

Average cost per insertion = O(n) / n = O(1)

This is the amortized constant time!

Growth Factor Comparison

2x Growth (Doubling)

  • Pros: Minimal number of resizes (log₂ n resizes for n elements)
  • Cons:
    • Can waste up to 50% memory (when array just resized)
    • Memory fragmentation: old blocks may not be reusable

1.5x Growth

  • Pros:
    • Better memory reuse (freed blocks can fit future allocations)
    • Less wasted space (~33% vs 50%)
  • Cons: More frequent resizes (log₁.₅ n resizes)

Still O(1) amortized for both!

Proof for growth factor g > 1:
Total copies ≤ n × g/(g-1) = O(n)

Low-Level Memory Considerations

Cache Performance

Old array: [1][2][3][4]________
                        |
                        v (resize, allocate far away)
New array: ________________[1][2][3][4][5][6][7][8]
Enter fullscreen mode Exit fullscreen mode
  • Resizing causes cache misses during copy
  • But subsequent accesses benefit from contiguous layout
  • Overall: excellent cache locality for sequential access

Memory Allocation Patterns

// Allocation timeline for 2x growth:
malloc(1 * elem_size)   // 1 element
malloc(2 * elem_size)   // 2 elements
free(1 * elem_size)
malloc(4 * elem_size)   // 4 elements
free(2 * elem_size)
malloc(8 * elem_size)   // 8 elements
free(4 * elem_size)
// ...
Enter fullscreen mode Exit fullscreen mode

Peak memory usage: During resize, both old and new arrays exist temporarily:

  • Without realloc: up to 3× the data size (old + new during copy)
  • With realloc: allocator may resize in place if possible

realloc() Optimization

Smart implementations use realloc():

void* new_data = realloc(data, new_capacity * sizeof(T));
Enter fullscreen mode Exit fullscreen mode

If there's free space immediately after the current block, realloc expands in-place:

  • No copying needed! O(1) operation
  • Only happens sometimes, doesn't change amortized analysis
  • Makes common cases much faster

Accounting Method Visualization

Think of each insertion as paying "$3":

  • $1 for the actual insertion
  • $2 stored as "credit" on that element

When we resize and copy n elements:

  • Cost to copy: $n
  • Credits available: $2n (each element has $2)
  • Credits remaining: $n (enough for future copies)

This proves each insertion pays constant "credits" = O(1) amortized.

Practical Implementation Details

Typical C++ std::vector Growth

// Pseudocode for push_back
void push_back(T elem) {
    if (size == capacity) {
        size_t new_cap = (capacity == 0) ? 1 : capacity * 2;
        T* new_arr = allocator.allocate(new_cap);

        // Move or copy existing elements
        for (size_t i = 0; i < size; i++) {
            new_arr[i] = std::move(data[i]); // C++11 move semantics
        }

        allocator.deallocate(data, capacity);
        data = new_arr;
        capacity = new_cap;
    }
    data[size++] = std::move(elem);
}
Enter fullscreen mode Exit fullscreen mode

Edge Cases

  • Initial capacity: Often starts at 0 or 1, first resize creates small array
  • Shrinking: Some implementations shrink when size < capacity/4 (avoid thrashing)
  • Reserve: reserve(n) pre-allocates to avoid resizes when final size known

Summary

Amortized O(1) insertion works because:

  1. Most insertions don't trigger resize (cheap)
  2. Expensive resizes happen exponentially less frequently
  3. Growth factor ensures total work is linear in insertions
  4. Each operation "pays forward" for future expensive operations

The amortized analysis shows that while individual operations vary wildly in cost, the average cost per operation remains constant over a sequence of operations.
dynamic array code in c

Top comments (0)