DEV Community

James Lee
James Lee

Posted on

Python list Internals: How Dynamic Arrays Work Under the Hood

A Python list is a capacity-adaptive linear container backed by a dynamic array. This design gives list excellent performance for tail operations, but poor performance for head operations. Let's dig into the source code to understand exactly why.


Capacity Adjustment

When we call append, pop, insert, etc., the list length changes. When the length exceeds the underlying array's capacity, the array needs to expand. When the length drops far below the capacity, the array needs to shrink.

The secret lives in list_resize inside Objects/listobject.c. All methods that change list length go through this function. Let's read it carefully:

static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
    if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
        PyErr_NoMemory();
        return -1;
    }

    if (newsize == 0)
        new_allocated = 0;
    num_allocated_bytes = new_allocated * sizeof(PyObject *);
    items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }
    self->ob_item = items;
    Py_SIZE(self) = newsize;
    self->allocated = new_allocated;
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Key local variables:

  • items — pointer to the new array
  • new_allocated — new array capacity
  • num_allocated_bytes — new array size in bytes
  • allocated — current (old) array capacity

Expand / Shrink Conditions

Line 12 checks the relationship between the new length and current capacity:

No realloc needed if:
    allocated >= newsize >= allocated / 2

Expand triggered if:
    newsize > allocated

Shrink triggered if:
    newsize < allocated / 2
Enter fullscreen mode Exit fullscreen mode

The Growth Formula

When expand or shrink is triggered, the new capacity is computed at line 27:

new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
//                         ↑ ~1/8 slack    ↑ fixed slack for small lists
Enter fullscreen mode Exit fullscreen mode

Why add 3 or 6 on top of the 1/8 slack? If newsize < 8, then newsize >> 3 == 0 — the 1/8 slack contributes nothing! Without the fixed slack, a list growing from 0 would need to reallocate on nearly every append.

The combined formula produces this growth sequence:

Length:   0  1  2  3  4  5  6  7  8  9 ...
Capacity: 0  4  4  4  4  8  8  8  8 16 ...

Full growth pattern: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
Enter fullscreen mode Exit fullscreen mode

For small lists, capacity doubles — keeping reallocation frequency low.

How PyMem_Realloc Works

PyMem_Realloc is Python's internal memory management function, analogous to C's realloc:

PyAPI_FUNC(void *) PyMem_Realloc(void *ptr, size_t new_size);
Enter fullscreen mode Exit fullscreen mode

Steps:

1. Allocate a new memory region of size new_size
2. Copy data from old region (ptr) to new region
3. Free the old region (ptr)
4. Return the new region
Enter fullscreen mode Exit fullscreen mode
Before realloc:
ptr ──▶ [ a | b | c | d |   |   ]   capacity=6

After realloc (newsize=5):
ptr ──▶ [ a | b | c | d | e |   |   |   ]   capacity=8
                            ↑ new element fits
Enter fullscreen mode Exit fullscreen mode

Tail Append

append is implemented by list_append in C, which calls app1:

static int
app1(PyListObject *self, PyObject *v)
{
    Py_ssize_t n = PyList_GET_SIZE(self);   // line 4: get current length

    assert (v != NULL);
    if (n == PY_SSIZE_T_MAX) {              // line 7-11: overflow check
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

    if (list_resize(self, n+1) < 0)         // line 13-15: expand if needed
        return -1;

    Py_INCREF(v);                           // line 16: increment ref count
    PyList_SET_ITEM(self, n, v);            // line 17: place element at end
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

With list_resize handling the heavy lifting, app1 is straightforward:

  1. Get current length n
  2. Resize to n+1 (expands underlying array if needed)
  3. Increment the element's reference count
  4. Store the element pointer at index n

Head Insert

insert is implemented by list_insert_impl, which calls ins1:

static int
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
    Py_ssize_t i, n = Py_SIZE(self);
    PyObject **items;
    if (v == NULL) {
        PyErr_BadInternalCall();
        return -1;
    }
    if (n == PY_SSIZE_T_MAX) {
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

    if (list_resize(self, n+1) < 0)         // expand if needed
        return -1;

    if (where < 0) {
        where += n;                          // convert negative index
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
    items = self->ob_item;
    for (i = n; --i >= where; )             // shift elements right (back to front!)
        items[i+1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

The key step is the shift loop — it must iterate from back to front to avoid overwriting elements before they're moved:

Insert 'X' at index 1:

Before: [ a | b | c | d |   ]
              ↑ insert here

Shift (back to front):
Step 1: [ a | b | c | d | d ]   i=3: items[4]=items[3]
Step 2: [ a | b | c | c | d ]   i=2: items[3]=items[2]
Step 3: [ a | b | b | c | d ]   i=1: items[2]=items[1]

Place:  [ a | X | b | c | d ]   items[1]='X' ✅
Enter fullscreen mode Exit fullscreen mode

Python's Negative Index Support

Python sequences support negative indices — counting from the end:

List:  [ a  |  b  |  c  |  d ]
Index:   0     1     2     3
         -4    -3    -2    -1
Enter fullscreen mode Exit fullscreen mode

Internally, Python converts a negative index by adding the list length n:

index = -1  →  index + n = n - 1  (last element)
index = -n  →  index + n = 0      (first element)
Enter fullscreen mode Exit fullscreen mode

Pop Element

pop removes and returns the element at a given index (default: -1, the last element):

>>> help(list.pop)
pop(self, index=-1, /)
    Remove and return item at index (default last).
    Raises IndexError if list is empty or index is out of range.
Enter fullscreen mode Exit fullscreen mode

C implementation list_pop_impl:

static PyObject *
list_pop_impl(PyListObject *self, Py_ssize_t index)
{
    PyObject *v;
    int status;

    if (Py_SIZE(self) == 0) {                        // empty list check
        PyErr_SetString(PyExc_IndexError, "pop from empty list");
        return NULL;
    }
    if (index < 0)
        index += Py_SIZE(self);                      // convert negative index
    if (index < 0 || index >= Py_SIZE(self)) {
        PyErr_SetString(PyExc_IndexError, "pop index out of range");
        return NULL;
    }
    v = self->ob_item[index];
    if (index == Py_SIZE(self) - 1) {
        status = list_resize(self, Py_SIZE(self) - 1); // tail pop: O(1) fast path
        if (status >= 0)
            return v;
        else
            return NULL;
    }
    Py_INCREF(v);
    status = list_ass_slice(self, index, index+1,    // non-tail: shift elements
                            (PyObject *)NULL);
    if (status < 0) {
        Py_DECREF(v);
        return NULL;
    }
    return v;
}
Enter fullscreen mode Exit fullscreen mode

There's a fast path for tail pop: if the element is the last one, just call list_resize(n-1) — no shifting needed. For any other position, list_ass_slice shifts all subsequent elements one position forward.

list_ass_slice has two semantics depending on the last argument:

  • v == NULLdelete: del a[ilow:ihigh]
  • v != NULLreplace: a[ilow:ihigh] = v

Here it's called with NULL, so it deletes a[index:index+1] — exactly one element.

Time Complexity of pop

pop(-1)  tail pop   → O(1)  ✅ fast path, no shifting
pop(0)   head pop   → O(n)  ❌ shifts every element
pop(i)   mid pop    → O(n)  average case
Enter fullscreen mode Exit fullscreen mode
pop(0) on [ a | b | c | d | e ]:

Remove 'a', shift left:
[ b | c | d | e |   ]
  ↑ every element moved one position ← O(n)
Enter fullscreen mode Exit fullscreen mode

Remove Element

remove deletes the first occurrence of a given value (not by index). Implemented by list_remove:

static PyObject *
list_remove(PyListObject *self, PyObject *value)
{
    Py_ssize_t i;

    for (i = 0; i < Py_SIZE(self); i++) {
        int cmp = PyObject_RichCompareBool(self->ob_item[i], value, Py_EQ);
        if (cmp > 0) {
            if (list_ass_slice(self, i, i+1,
                               (PyObject *)NULL) == 0)
                Py_RETURN_NONE;
            return NULL;
        }
        else if (cmp < 0)
            return NULL;
    }
    PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
    return NULL;
}
Enter fullscreen mode Exit fullscreen mode

list_remove first linearly scans the list to find the target element — O(n) — then calls list_ass_slice to delete it. If the element doesn't exist, it raises ValueError.

Top comments (0)