James Lee

Posted on May 17

Python list Internals: How Dynamic Arrays Work Under the Hood

#algorithms #computerscience #python #performance

A Python list is a capacity-adaptive linear container backed by a dynamic array. This design gives list excellent performance for tail operations, but poor performance for head operations. Let's dig into the source code to understand exactly why.

Capacity Adjustment

When we call append, pop, insert, etc., the list length changes. When the length exceeds the underlying array's capacity, the array needs to expand. When the length drops far below the capacity, the array needs to shrink.

The secret lives in list_resize inside Objects/listobject.c. All methods that change list length go through this function. Let's read it carefully:

static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
    if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
        PyErr_NoMemory();
        return -1;
    }

    if (newsize == 0)
        new_allocated = 0;
    num_allocated_bytes = new_allocated * sizeof(PyObject *);
    items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }
    self->ob_item = items;
    Py_SIZE(self) = newsize;
    self->allocated = new_allocated;
    return 0;
}

Key local variables:

items — pointer to the new array
new_allocated — new array capacity
num_allocated_bytes — new array size in bytes
allocated — current (old) array capacity

Expand / Shrink Conditions

Line 12 checks the relationship between the new length and current capacity:

No realloc needed if:
    allocated >= newsize >= allocated / 2

Expand triggered if:
    newsize > allocated

Shrink triggered if:
    newsize < allocated / 2

The Growth Formula

When expand or shrink is triggered, the new capacity is computed at line 27:

new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
//                         ↑ ~1/8 slack    ↑ fixed slack for small lists

Why add 3 or 6 on top of the 1/8 slack? If newsize < 8, then newsize >> 3 == 0 — the 1/8 slack contributes nothing! Without the fixed slack, a list growing from 0 would need to reallocate on nearly every append.

The combined formula produces this growth sequence:

Length:   0  1  2  3  4  5  6  7  8  9 ...
Capacity: 0  4  4  4  4  8  8  8  8 16 ...

Full growth pattern: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

For small lists, capacity doubles — keeping reallocation frequency low.

How `PyMem_Realloc` Works

PyMem_Realloc is Python's internal memory management function, analogous to C's realloc:

PyAPI_FUNC(void *) PyMem_Realloc(void *ptr, size_t new_size);

Steps:

1. Allocate a new memory region of size new_size
2. Copy data from old region (ptr) to new region
3. Free the old region (ptr)
4. Return the new region

Before realloc:
ptr ──▶ [ a | b | c | d |   |   ]   capacity=6

After realloc (newsize=5):
ptr ──▶ [ a | b | c | d | e |   |   |   ]   capacity=8
                            ↑ new element fits

Tail Append

append is implemented by list_append in C, which calls app1:

static int
app1(PyListObject *self, PyObject *v)
{
    Py_ssize_t n = PyList_GET_SIZE(self);   // line 4: get current length

    assert (v != NULL);
    if (n == PY_SSIZE_T_MAX) {              // line 7-11: overflow check
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

    if (list_resize(self, n+1) < 0)         // line 13-15: expand if needed
        return -1;

    Py_INCREF(v);                           // line 16: increment ref count
    PyList_SET_ITEM(self, n, v);            // line 17: place element at end
    return 0;
}

With list_resize handling the heavy lifting, app1 is straightforward:

Get current length n
Resize to n+1 (expands underlying array if needed)
Increment the element's reference count
Store the element pointer at index n

Head Insert

insert is implemented by list_insert_impl, which calls ins1:

static int
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
    Py_ssize_t i, n = Py_SIZE(self);
    PyObject **items;
    if (v == NULL) {
        PyErr_BadInternalCall();
        return -1;
    }
    if (n == PY_SSIZE_T_MAX) {
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

    if (list_resize(self, n+1) < 0)         // expand if needed
        return -1;

    if (where < 0) {
        where += n;                          // convert negative index
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
    items = self->ob_item;
    for (i = n; --i >= where; )             // shift elements right (back to front!)
        items[i+1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}

The key step is the shift loop — it must iterate from back to front to avoid overwriting elements before they're moved:

Insert 'X' at index 1:

Before: [ a | b | c | d |   ]
              ↑ insert here

Shift (back to front):
Step 1: [ a | b | c | d | d ]   i=3: items[4]=items[3]
Step 2: [ a | b | c | c | d ]   i=2: items[3]=items[2]
Step 3: [ a | b | b | c | d ]   i=1: items[2]=items[1]

Place:  [ a | X | b | c | d ]   items[1]='X' ✅

Python's Negative Index Support

Python sequences support negative indices — counting from the end:

List:  [ a  |  b  |  c  |  d ]
Index:   0     1     2     3
         -4    -3    -2    -1

Internally, Python converts a negative index by adding the list length n:

index = -1  →  index + n = n - 1  (last element)
index = -n  →  index + n = 0      (first element)

Pop Element

pop removes and returns the element at a given index (default: -1, the last element):

>>> help(list.pop)
pop(self, index=-1, /)
    Remove and return item at index (default last).
    Raises IndexError if list is empty or index is out of range.

C implementation list_pop_impl:

static PyObject *
list_pop_impl(PyListObject *self, Py_ssize_t index)
{
    PyObject *v;
    int status;

    if (Py_SIZE(self) == 0) {                        // empty list check
        PyErr_SetString(PyExc_IndexError, "pop from empty list");
        return NULL;
    }
    if (index < 0)
        index += Py_SIZE(self);                      // convert negative index
    if (index < 0 || index >= Py_SIZE(self)) {
        PyErr_SetString(PyExc_IndexError, "pop index out of range");
        return NULL;
    }
    v = self->ob_item[index];
    if (index == Py_SIZE(self) - 1) {
        status = list_resize(self, Py_SIZE(self) - 1); // tail pop: O(1) fast path
        if (status >= 0)
            return v;
        else
            return NULL;
    }
    Py_INCREF(v);
    status = list_ass_slice(self, index, index+1,    // non-tail: shift elements
                            (PyObject *)NULL);
    if (status < 0) {
        Py_DECREF(v);
        return NULL;
    }
    return v;
}

There's a fast path for tail pop: if the element is the last one, just call list_resize(n-1) — no shifting needed. For any other position, list_ass_slice shifts all subsequent elements one position forward.

list_ass_slice has two semantics depending on the last argument:

v == NULL → delete: del a[ilow:ihigh]
v != NULL → replace: a[ilow:ihigh] = v

Here it's called with NULL, so it deletes a[index:index+1] — exactly one element.

Time Complexity of `pop`

pop(-1)  tail pop   → O(1)  ✅ fast path, no shifting
pop(0)   head pop   → O(n)  ❌ shifts every element
pop(i)   mid pop    → O(n)  average case

pop(0) on [ a | b | c | d | e ]:

Remove 'a', shift left:
[ b | c | d | e |   ]
  ↑ every element moved one position ← O(n)

Remove Element

remove deletes the first occurrence of a given value (not by index). Implemented by list_remove:

static PyObject *
list_remove(PyListObject *self, PyObject *value)
{
    Py_ssize_t i;

    for (i = 0; i < Py_SIZE(self); i++) {
        int cmp = PyObject_RichCompareBool(self->ob_item[i], value, Py_EQ);
        if (cmp > 0) {
            if (list_ass_slice(self, i, i+1,
                               (PyObject *)NULL) == 0)
                Py_RETURN_NONE;
            return NULL;
        }
        else if (cmp < 0)
            return NULL;
    }
    PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
    return NULL;
}

list_remove first linearly scans the list to find the target element — O(n) — then calls list_ass_slice to delete it. If the element doesn't exist, it raises ValueError.

DEV Community

Python list Internals: How Dynamic Arrays Work Under the Hood

Capacity Adjustment

Expand / Shrink Conditions

The Growth Formula

How `PyMem_Realloc` Works

Tail Append

Head Insert

Python's Negative Index Support

Pop Element

Time Complexity of `pop`

Remove Element

Top comments (0)

Capacity Adjustment

Expand / Shrink Conditions

The Growth Formula

How PyMem_Realloc Works

Tail Append

Head Insert

Python's Negative Index Support

Pop Element

Time Complexity of pop

Remove Element

How `PyMem_Realloc` Works

Time Complexity of `pop`