A Python list is a capacity-adaptive linear container backed by a dynamic array. This design gives list excellent performance for tail operations, but poor performance for head operations. Let's dig into the source code to understand exactly why.
Capacity Adjustment
When we call append, pop, insert, etc., the list length changes. When the length exceeds the underlying array's capacity, the array needs to expand. When the length drops far below the capacity, the array needs to shrink.
The secret lives in list_resize inside Objects/listobject.c. All methods that change list length go through this function. Let's read it carefully:
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated, num_allocated_bytes;
Py_ssize_t allocated = self->allocated;
/* Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
PyErr_NoMemory();
return -1;
}
if (newsize == 0)
new_allocated = 0;
num_allocated_bytes = new_allocated * sizeof(PyObject *);
items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return 0;
}
Key local variables:
-
items— pointer to the new array -
new_allocated— new array capacity -
num_allocated_bytes— new array size in bytes -
allocated— current (old) array capacity
Expand / Shrink Conditions
Line 12 checks the relationship between the new length and current capacity:
No realloc needed if:
allocated >= newsize >= allocated / 2
Expand triggered if:
newsize > allocated
Shrink triggered if:
newsize < allocated / 2
The Growth Formula
When expand or shrink is triggered, the new capacity is computed at line 27:
new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
// ↑ ~1/8 slack ↑ fixed slack for small lists
Why add 3 or 6 on top of the 1/8 slack? If newsize < 8, then newsize >> 3 == 0 — the 1/8 slack contributes nothing! Without the fixed slack, a list growing from 0 would need to reallocate on nearly every append.
The combined formula produces this growth sequence:
Length: 0 1 2 3 4 5 6 7 8 9 ...
Capacity: 0 4 4 4 4 8 8 8 8 16 ...
Full growth pattern: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
For small lists, capacity doubles — keeping reallocation frequency low.
How PyMem_Realloc Works
PyMem_Realloc is Python's internal memory management function, analogous to C's realloc:
PyAPI_FUNC(void *) PyMem_Realloc(void *ptr, size_t new_size);
Steps:
1. Allocate a new memory region of size new_size
2. Copy data from old region (ptr) to new region
3. Free the old region (ptr)
4. Return the new region
Before realloc:
ptr ──▶ [ a | b | c | d | | ] capacity=6
After realloc (newsize=5):
ptr ──▶ [ a | b | c | d | e | | | ] capacity=8
↑ new element fits
Tail Append
append is implemented by list_append in C, which calls app1:
static int
app1(PyListObject *self, PyObject *v)
{
Py_ssize_t n = PyList_GET_SIZE(self); // line 4: get current length
assert (v != NULL);
if (n == PY_SSIZE_T_MAX) { // line 7-11: overflow check
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
if (list_resize(self, n+1) < 0) // line 13-15: expand if needed
return -1;
Py_INCREF(v); // line 16: increment ref count
PyList_SET_ITEM(self, n, v); // line 17: place element at end
return 0;
}
With list_resize handling the heavy lifting, app1 is straightforward:
- Get current length
n - Resize to
n+1(expands underlying array if needed) - Increment the element's reference count
- Store the element pointer at index
n
Head Insert
insert is implemented by list_insert_impl, which calls ins1:
static int
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
Py_ssize_t i, n = Py_SIZE(self);
PyObject **items;
if (v == NULL) {
PyErr_BadInternalCall();
return -1;
}
if (n == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
if (list_resize(self, n+1) < 0) // expand if needed
return -1;
if (where < 0) {
where += n; // convert negative index
if (where < 0)
where = 0;
}
if (where > n)
where = n;
items = self->ob_item;
for (i = n; --i >= where; ) // shift elements right (back to front!)
items[i+1] = items[i];
Py_INCREF(v);
items[where] = v;
return 0;
}
The key step is the shift loop — it must iterate from back to front to avoid overwriting elements before they're moved:
Insert 'X' at index 1:
Before: [ a | b | c | d | ]
↑ insert here
Shift (back to front):
Step 1: [ a | b | c | d | d ] i=3: items[4]=items[3]
Step 2: [ a | b | c | c | d ] i=2: items[3]=items[2]
Step 3: [ a | b | b | c | d ] i=1: items[2]=items[1]
Place: [ a | X | b | c | d ] items[1]='X' ✅
Python's Negative Index Support
Python sequences support negative indices — counting from the end:
List: [ a | b | c | d ]
Index: 0 1 2 3
-4 -3 -2 -1
Internally, Python converts a negative index by adding the list length n:
index = -1 → index + n = n - 1 (last element)
index = -n → index + n = 0 (first element)
Pop Element
pop removes and returns the element at a given index (default: -1, the last element):
>>> help(list.pop)
pop(self, index=-1, /)
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
C implementation list_pop_impl:
static PyObject *
list_pop_impl(PyListObject *self, Py_ssize_t index)
{
PyObject *v;
int status;
if (Py_SIZE(self) == 0) { // empty list check
PyErr_SetString(PyExc_IndexError, "pop from empty list");
return NULL;
}
if (index < 0)
index += Py_SIZE(self); // convert negative index
if (index < 0 || index >= Py_SIZE(self)) {
PyErr_SetString(PyExc_IndexError, "pop index out of range");
return NULL;
}
v = self->ob_item[index];
if (index == Py_SIZE(self) - 1) {
status = list_resize(self, Py_SIZE(self) - 1); // tail pop: O(1) fast path
if (status >= 0)
return v;
else
return NULL;
}
Py_INCREF(v);
status = list_ass_slice(self, index, index+1, // non-tail: shift elements
(PyObject *)NULL);
if (status < 0) {
Py_DECREF(v);
return NULL;
}
return v;
}
There's a fast path for tail pop: if the element is the last one, just call list_resize(n-1) — no shifting needed. For any other position, list_ass_slice shifts all subsequent elements one position forward.
list_ass_slice has two semantics depending on the last argument:
-
v == NULL→ delete:del a[ilow:ihigh] -
v != NULL→ replace:a[ilow:ihigh] = v
Here it's called with NULL, so it deletes a[index:index+1] — exactly one element.
Time Complexity of pop
pop(-1) tail pop → O(1) ✅ fast path, no shifting
pop(0) head pop → O(n) ❌ shifts every element
pop(i) mid pop → O(n) average case
pop(0) on [ a | b | c | d | e ]:
Remove 'a', shift left:
[ b | c | d | e | ]
↑ every element moved one position ← O(n)
Remove Element
remove deletes the first occurrence of a given value (not by index). Implemented by list_remove:
static PyObject *
list_remove(PyListObject *self, PyObject *value)
{
Py_ssize_t i;
for (i = 0; i < Py_SIZE(self); i++) {
int cmp = PyObject_RichCompareBool(self->ob_item[i], value, Py_EQ);
if (cmp > 0) {
if (list_ass_slice(self, i, i+1,
(PyObject *)NULL) == 0)
Py_RETURN_NONE;
return NULL;
}
else if (cmp < 0)
return NULL;
}
PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
return NULL;
}
list_remove first linearly scans the list to find the target element — O(n) — then calls list_ass_slice to delete it. If the element doesn't exist, it raises ValueError.
Top comments (0)