Pablo GS

Posted on Jun 1 • Originally published at pablogs.dev

Growing Pains: realloc and Automatic Capacity Management

#c #computerscience #programming #tutorial

Post 2 of the Dynamic Arrays in C series · Full source code

Where We Left Off

In Post 1 we built an array that does three things: allocate a fixed buffer, push integers into it, and free everything when we're done. It works, until it doesn't. The moment the user pushes one element more than the initial capacity allows, array_push returns -1 and refuses to cooperate. The array is full and there's nothing we can do about it.

That's not a dynamic array. It's a fixed-size buffer with a nice API around it. A real dynamic array solves the fundamental problem: the user doesn't know how many elements they'll need. Maybe it's 5, maybe 5 million. The array should handle either case without the caller worrying about capacity.

The mechanism that makes this possible is realloc. It's one function call, one line of code, and the single most misunderstood function in the C standard library. Most C programmers know what it does in the abstract, "it resizes an allocation." Fewer understand the two distinct things it can do under the hood, why that distinction matters for correctness, and why writing arr->data = realloc(arr->data, new_size) is a bug waiting to happen.

This post replaces Post 1's static array_push with one that grows automatically. When size hits capacity, we double the buffer, copy the data, and keep going. The user never has to think about capacity again, they just push.

But automatic growth has consequences. The most important one is pointer invalidation: any pointer you held into the old buffer becomes a dangling pointer after realloc. This isn't a theoretical concern, it's one of the most common sources of use-after-free bugs in C codebases. We'll see it happen, understand why, and learn the pattern that prevents it.

We'll also see the "temporary pointer" pattern, the correct way to call realloc so that a failed allocation doesn't corrupt your array. It's three lines of code, and it's the difference between an array that degrades gracefully on out-of-memory and one that leaks your data and crashes.

By the end of this post you'll have an array that grows on demand, and you'll understand the two things about realloc that most tutorials get wrong: that it might move your data, and that you must never assign its result directly to the pointer you're reallocating.

What realloc Actually Does

The realloc function has a deceptively simple signature:

void *realloc(void *ptr, size_t new_size);

You pass it a pointer to an existing allocation (from malloc or a previous realloc) and a new size. It returns a pointer to a block of at least new_size bytes, with the old data preserved. But behind that simple interface, two fundamentally different things can happen.

Case 1: extend in-place. If there's enough free space right after your current allocation in the heap, the allocator just expands the block. The pointer doesn't change. This is fast, no copying, no new allocation. It's also the case you can never count on.

Case 2: allocate, copy, free. If there isn't room to grow in-place (because another allocation sits right after yours), the allocator mallocs a new block of the requested size, copies your old data into it with the equivalent of memcpy, and frees the old block. The returned pointer is different from the one you passed in. The old pointer is now invalid, the memory it pointed to has been returned to the allocator.

You cannot predict which case will happen. It depends on the heap's internal state, which allocator your system uses (glibc, jemalloc, musl), how fragmented memory is, and the phase of the moon. Your code must handle both cases correctly. This means one thing: never assume the pointer stays the same after realloc.

There's a third case too: failure. If the system can't satisfy the request, realloc returns NULL. And here's the critical detail: on failure, the original block is not freed. Your old pointer is still valid, and your old data is still there. This is actually good, it means you can recover gracefully. But only if you don't overwrite your pointer before checking for NULL.

The Code

The full file compiles with zero warnings under gcc -Wall -Wextra -Wpedantic -std=c11, produces ASCII visualization to stdout, and writes a Graphviz DOT file for diagram generation. Here are the essential changes from Post 1.

The complete source, including the main() with growth demonstrations, the pointer invalidation demo, and the DOT generator, is available on GitHub.

The Struct: One New Field

typedef struct {
    int    *data;           /* Heap buffer holding the elements              */
    size_t  size;           /* Elements currently stored                     */
    size_t  capacity;       /* Slots allocated                               */
    size_t  realloc_count;  /* How many times we've reallocated (diagnostic) */
} IntArray;

We add realloc_count, a diagnostic counter that tracks how many times the buffer has been reallocated. This has no functional purpose; it exists so we can observe and discuss growth behavior. In a production library you'd likely omit it. For learning, it's invaluable.

The Star: array_push with Automatic Growth

int array_push(IntArray *arr, int value)
{
    if (!arr) {
        fprintf(stderr, "array_push: NULL array\n");
        return -1;
    }

    /* ── Do we need to grow? ──────────────────────────────────── */
    if (arr->size >= arr->capacity) {
        size_t old_cap = arr->capacity;
        size_t new_cap = old_cap * 2;

        /*
         * realloc() does one of two things:
         *   1. Extends the block in-place (returns same pointer).
         *   2. Allocates new block, copies data, frees old block
         *      (returns NEW pointer — old pointer is INVALID).
         *
         * We MUST use a temporary variable.
         */
        int *tmp = realloc(arr->data, new_cap * sizeof(int));
        if (!tmp) {
            return -1;  /* arr->data still points to the original buffer */
        }

        arr->data     = tmp;
        arr->capacity = new_cap;
        arr->realloc_count++;
    }

    /* ── Normal push (guaranteed to have room now) ────────────── */
    arr->data[arr->size] = value;
    arr->size++;
    return 0;
}

This is the entire growth mechanism. When size >= capacity, we double the capacity, call realloc, and update the pointer. The rest of the function is identical to Post 1.

Compile and run the complete file to see it in action:

gcc -Wall -Wextra -Wpedantic -std=c11 -o post_02 post_02.c
./post_02

Starting from capacity=2, we push 12 elements and watch the buffer grow through three reallocations:

  cap=2 → push #3 triggers realloc → cap=4
  cap=4 → push #5 triggers realloc → cap=8
  cap=8 → push #9 triggers realloc → cap=16

Three reallocations for 12 elements. The doubling strategy means we reallocate less and less frequently as the array grows, that's the amortized O(1) property we'll analyze formally in Post 3.

Here's the ASCII visualization after the third realloc, with 9 elements in a 16-slot buffer:

╔══════════════════════════════════════════════════════════╗
║  After push(90) — REALLOC: 8 → 16                        ║
╠══════════════════════════════════════════════════════════╣
║  size = 9      capacity = 16     elem = 4 bytes          ║
║  data = 0x55a3c0        (heap)                           ║
║  reallocations so far: 3                                 ║
╠══════════════════════════════════════════════════════════╣
║  ┌──────┌──────┌──────┌──────┌──────┌──────┌──────...    ║
║  │   10 │   20 │   30 │   40 │   50 │   60 │   70 ...    ║
║  └──────└──────└──────└──────└──────└──────└──────...    ║
║     0      1      2      3      4      5      6   ...    ║
╠══════════════════════════════════════════════════════════╣
║  36B used /  64B alloc =  56.2% utilization              ║
║  28B wasted ( 43.8%)     next realloc at size=16         ║
╚══════════════════════════════════════════════════════════╝

Nine slots occupied, seven slots empty (shown as · in the full output). The stats show 43.8% waste — that's the price of pre-allocating for future growth. Whether that's acceptable depends on your use case. Post 3 explores this tradeoff in depth.

Diagram showing the old buffer (red, dashed, "Before, FREED") and the new buffer (green, solid, "After, CURRENT") with a dashed blue arrow labeled "memcpy + free old" connecting them.

The diagram above shows what realloc does when it moves the buffer. The old allocation (red, dashed border) is freed after the data is copied to the new, larger allocation (green). Every pointer that referred to the old address is now dangling.

Walking Through the Code

The Temporary Pointer: Why It Matters

The single most important detail in this post is three lines long:

int *tmp = realloc(arr->data, new_cap * sizeof(int));
if (!tmp) return -1;
arr->data = tmp;

Here's the pattern that seems equivalent but is actually a bug:

/* ⚠ WRONG — DO NOT DO THIS */
arr->data = realloc(arr->data, new_cap * sizeof(int));

If realloc succeeds, both patterns produce the same result. But if realloc fails, the consequences are completely different.

With the wrong pattern: realloc returns NULL, which is assigned directly to arr->data. Now arr->data is NULL. The old buffer, the one with all your data in it, is still allocated somewhere on the heap, but no pointer references it anymore. It is leaked. Your data is lost and your memory is leaked. This is a double failure: data loss plus resource leak.

With the correct pattern: realloc returns NULL, which is stored in tmp. The check if (!tmp) triggers and the function returns -1. Critically, arr->data was never touched, it still points to the original buffer with all the original data intact. The caller can handle the error (log it, free the array, try again with less memory) without losing anything.

This is not a hypothetical concern. In embedded systems or long-running servers, allocation failures happen. The temporary pointer pattern is the difference between an array that degrades gracefully and one that silently corrupts your program state.

Pointer Invalidation: The Realloc Trap

When realloc moves the buffer, every pointer into the old buffer becomes a dangling pointer. This is the most dangerous consequence of automatic growth, and it catches even experienced C programmers. Here's the scenario:

IntArray *arr = array_create(2);
array_push(arr, 100);
array_push(arr, 200);

/* Take a pointer into the buffer */
int *ptr = &arr->data[0];  /* ptr points to the 100 */

/* This push triggers realloc (capacity 2 → 4) */
array_push(arr, 300);

/* ptr now points to FREED MEMORY */
printf("%d\n", *ptr);  /* Undefined behavior */

After the third push, the buffer at arr->data may have moved to a new address. The variable ptr still holds the old address. That memory has been freed by realloc, it might be reused by the next malloc, overwritten with heap metadata, or still contain the old value by coincidence. Reading through ptr is undefined behavior. It might print 100. It might print garbage. It might crash. The outcome depends on what the allocator did with the freed block, and that's not something your program controls.

The rule is simple: after any operation that might trigger realloc (push, insert, resize), all pointers and references into the array's data buffer are potentially invalid. If you need a stable reference to an element, store its index, not a pointer. Indices survive reallocation; pointers don't.

The full source file includes a self-contained demo that creates this exact scenario and prints the pointer addresses before and after realloc so you can see the invalidation happen. Run it yourself, there's no substitute for watching the addresses change.

Growth by Doubling

We chose new_cap = old_cap * 2, double the capacity on every realloc. This is the simplest growth strategy and the one most implementations start with. Starting from capacity 2, the progression is: 2 → 4 → 8 → 16 → 32 → 64 → ... Each time we hit the wall, we double.

The key property: to reach size N starting from capacity 1, you need about log₂(N) reallocations. For a million elements, that's roughly 20 reallocations total. Each reallocation copies all existing elements, so the total work across all copies is bounded, the amortized cost per push is O(1). Post 3 will prove this formally and explore why some implementations prefer 1.5x growth over 2x.

For now, notice what happens to waste. Right after a realloc, the buffer is about half empty (we just doubled, and only one new element was added). As we fill it up, utilization climbs toward 100%, and then we double again. The sawtooth pattern, waste spikes after realloc, then decreases with each push, is characteristic of geometric growth strategies. You can see it clearly in the ASCII output by watching the utilization percentage after each push.

Key Concepts and Tradeoffs

realloc Moves vs Extends: You Can't Choose

Whether realloc extends in-place or moves to a new location is entirely up to the allocator. You might observe in-place extension during testing (especially with small arrays early in a program's life, when the heap is mostly empty) and then hit moves in production when the heap is fragmented.

This is why correctness requires handling both cases identically. The temporary pointer pattern does this naturally: whether tmp equals arr->data (in-place) or differs (moved), the assignment arr->data = tmp is correct either way. You don't need to check which case occurred, the pattern is correct for both.

One thing you should not do is rely on in-place extension for performance. Some codebases try to "help" the allocator by freeing and re-mallocing at a specific alignment. Unless you're writing the allocator itself, let realloc do its job. It knows more about the heap layout than you do.

The Cost of Growth

Reallocation is expensive. It's O(n) where n is the number of existing elements, every byte must be copied. With 2x growth, the cost per realloc increases as the array gets larger: copying 10 elements is cheap, copying 10 million elements is not.

The saving grace is that reallocations happen exponentially less often. You pay a big cost once, then enjoy cheap pushes until the next realloc. This amortization is what makes geometric growth viable. But it does mean that individual push operations have unpredictable latency: most are O(1), but occasionally one is O(n). For real-time systems where you need bounded worst-case latency, this can be a problem, a topic we'll revisit in Post 12 on benchmarking.

When Capacity Grows, Memory Waste Spikes

Right after doubling from capacity 8 to 16, you have 9 elements in 16 slots, 43.8% waste. That's 28 bytes of allocated-but-unused memory. For a small array, this is nothing. For an array of 10 million structs at 64 bytes each, doubling means allocating 640 MB when you only need 320 MB. The extra 320 MB might be the difference between fitting in RAM and hitting swap.

The growth factor directly controls this tradeoff: 2x wastes more but reallocates less, 1.5x wastes less but reallocates more. Additive growth (say, adding 1024 slots each time) keeps waste bounded but destroys the amortized O(1) property. Post 3 is dedicated entirely to this debate.

Try This and Watch It Fail

Experiment 1: The Dangerous Pattern. Modify array_push to use the direct assignment pattern: arr->data = realloc(arr->data, new_cap * sizeof(int));. Remove the tmp variable and the NULL check. Now simulate an allocation failure (you can do this on Linux with LD_PRELOAD and a library that makes malloc/realloc fail after N calls, or simply replace the realloc line with int *tmp = NULL; to simulate failure). Watch the array lose its data pointer.

Experiment 2: Pointer Invalidation in the Wild. Create an array with capacity 1. Push one element. Take a pointer: int *p = &arr->data[0];. Now push a second element (this triggers realloc from 1 → 2). Print *p. On many systems it will still print the old value, the freed memory hasn't been overwritten yet. Now push 1000 more elements. Print *p again. The memory at the old address has likely been reused, and *p will be garbage or crash. Compile with -fsanitize=address to see AddressSanitizer catch the use-after-free.

Experiment 3: Counting Reallocations. Modify array_create to start with capacity = 1. Push 1000 elements. How many reallocations happen? (Answer: about 10, because log₂(1000) ≈ 10.) Now change the growth strategy to new_cap = old_cap + 10 (additive growth). Push 1000 elements again. How many reallocations now? (Answer: about 100.) Feel the difference in efficiency.

Knowledge Test

If you hold a pointer to arr->data[3] and then push triggers realloc, is your pointer still valid? Why?

No. When realloc moves the buffer to a new location, it frees the old buffer. Your pointer still holds the old address, which now points to freed memory. Dereferencing it is undefined behavior — you might read stale data, garbage, or crash. Even if realloc extends in-place (same address), you cannot rely on that behavior, because you cannot predict which case will occur. The only safe approach is to treat all pointers into the buffer as potentially invalid after any operation that might trigger realloc. If you need a stable reference to an element, store the index and recompute the pointer: arr->data[3] will always be correct because arr->data is updated by the push function.

What's Next

Our array grows automatically, but we made an arbitrary choice: double the capacity on every realloc. Why 2x and not 1.5x? Why not add a fixed amount each time? The answer turns out to be subtle, it involves amortized analysis, memory allocator behavior, and a surprising fact about when freed memory can be reused.

In Post 3: "The Growth Factor Debate: 1.5x, 2x, or Something Else?", we'll put numbers on the tradeoffs. We'll calculate the amortized cost of push under different growth strategies, show why 2x growth can never reuse previously freed memory but 1.5x can, and build a benchmark harness to measure the real-world difference. You'll come out of it able to justify your growth factor choice to anyone who asks.

The array we built today is functionally complete for integers. It grows, it doesn't leak, and it handles allocation failures gracefully. But it only holds int. In Post 4, we'll break that limitation with void* and memcpy, the C way of saying "I don't care what type you store, I'll hold it for you."

Full source code

DEV Community