Ja'far Khakpour

Posted on Feb 7

Python and Memory Management, Part 1: Objects

#computerscience #performance #python #tutorial

You’ve heard it before: in Python, everything is an object. Integers, strings, even the functions you write, all are objects. It’s what gives the language its clean, consistent feel.

But here’s what nobody tells you upfront: this elegant design hides a memory management puzzle that can make your code behave in surprising ways.
To truly grasp how Python handles memory under the hood (and to write faster, more reliable code) you need to move past understanding language and take a look into CPython internal to understand what Python code does under the hood.

First-Class Objects in Python: Everything is an Object

To truly understand how Python manages memory, we need to start from distinction between three fundamental operations, ==, is, and id(). These three tools reveal different layers of Python's object model and memory management.

The equality operator

The == operator checks value equality: whether two objects contain the same data. When you write a == b, Python compares the actual values stored in the objects. This is a deep comparison that might involve calling special methods like eq():

# == compares values
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b) # True - same content

Under the hood, == can trigger more complicated operations. In lists, it compares equality of each element, and for custom objects, it calls custom __eq__ methods.

class SensorValue:
    def __init__(self, value, tolerance):
        self.value = int(value)
        self.tolerance = int(tolerance)

    def __eq__(self, other):
        # Special behavior: equality within tolerance
        return abs(self.value - other.value) <= min(
            self.tolerance,
            other.tolerance
        )

v1 = SensorValue(100, 3)
v2 = SensorValue(97, 5)
v3 = SensorValue(96, 10)

print(f"v1 == v2: {v1 == v2}") # True both have a value in each other's tolerance range
print(f"v1 == v3: {v1 == v3}") # False

The is Operator: Object Identity

The is operator checks object identity - whether two variables reference the exact same thing in memory:

# `is` compares object identity
a = [1, 2, 3]
b = a  # b references the same list object
c = [1, 2, 3]  # New list with same contents

print(a is b)  # True - same object
print(a is c)  # False - same values, but different objects

When you write a is b, Python compares the memory addresses of the objects.
The is operator is lightning fast because it simply compares two pointers (memory addresses). This is why it is better to check None, True, and False with is, not ==. They're singleton objects and is would always do what == can do for us.

The id() Function and Memory Address of Data

The id() function returns the virtual memory address of an object as an integer. This is the raw pointer value that is compares internally:

a = [1, 2, 3]
b = a
print(f"{a is b=}") # True
print(f"{id(a) == id(b)=}")  # True - equivalent to 'a is b'
print(f"{id(a)=}") # inetger value as memory address of a
print(f"id(a)=0x{id(a):x}") ## Hexadecimal value of variable address

We are going to use these addresses throughout this journey through this post to investigate CPython behavior.

Exploring memory Pointers

Let's start with a simple test to understand id() better:

x = 4
y = 2**2  # Both calculate to 4

print(f"{x is y=}")  # True - they're the same object!
print(f"{id(x)=}")
print(f"{id(y)=}") # same as X

Wait, why are x and y the same object? Because Python caches small integers (-5 to 256) as singleton objects to optimize performance. When you create the integer 4, Python doesn't allocate new memory. It returns a reference to the pre-existing "4" object.

Now let's try with larger numbers:

x, y = 2*123, 2**123
print(f"{x is y=}")  # False - different objects!
print(f"{hex(id(x))=}")
print(f"{hex(id(y))=}")

But what happens with assignment?

x = y = 2**123
print(f"{x is y=}")  # True!

print(f"{hex(id(x))=}") 
print(f"{hex(id(y))=}")  # Same address!

Why are they different now? Numbers outside the -5 to 256 range aren't cached. Each calculation creates a new PyLongObject in memory.
When you write x = y = 2**123, Python evaluates 2**123 once, creates one object, and makes both x and y point to it. This is different from calculating the same value twice.

Understanding Reference Counting

At the core of Python’s memory management is reference counting. A simple but powerful mechanism that tracks how many references point to each object in memory. Every time you assign a variable, pass an argument, or store an object in a data structure, Python increments that object’s internal reference counter. When a reference is removed or goes out of scope, the count decreases. Once it reaches zero, the memory is immediately reclaimed.
Let's dive deeper into reference counting mechanism:

import sys
x = y = 2**123
z = 2**123
print(f"{x is y=}")  # True - same object
print(f"{sys.getrefcount(x)=}")  # 3 references
print(f"{sys.getrefcount(y)=}")  # Also 3
print(f"{x is z=}")  # False - different object!
print(f"{sys.getrefcount(z)=}")  # 2 references

Why does x have refcount 3 while z has 2? Let's break it down:

x and y reference the same object (+2)
The getrefcount() call creates a temporary reference (+1) = Total 3
z references a different object (+1)
getrefcount() creates a temporary reference (+1) = Total 2

The Mystery of Small Integer Reference Counts

Try this for small integers, and you'll discover another interesting point:

x = 4
print(f"{sys.getrefcount(x)=}")  # Not 2 or 3, but 3221225472?!

That's not a real reference count! Let's investigate further:

for x in range(5):
    print(f"for {x=}:")
    print(f"    {sys.getrefcount(x)=}")
    print(f"    {id(x)=}")

You'll notice that integers 0-4 all have the same "refcount": 3221225472 which if you convert them to hex and binary format:

x = 4
print(f"{sys.getrefcount(x)=:x}")  # 0xc0000000
print(f"{sys.getrefcount(x)=:b}")  # 11000000000000000000000000000000

Oh! This looks like a binary flag, not a reference count!! Indeed, 0xC0000000 is a special marker value that Python uses for immortal objects.

Immortal Objects: The Python Optimization You Never Knew About

Python has an optimization tweak: certain objects are designated as "immortal" - they live for the entire program's lifetime and never get garbage collected. These include:

Small integers (-5 to 256)
None
True, False
Empty tuples By the way, you can check any value being immortal using sys._is_immortal(Python 3.14+). For these objects, Python doesn't bother with reference counting. Instead, it sets their refcount field to a special flag value (0xC0000000 in many versions) that means "this object never dies."

CPython has also a kind of special behavior when you define a string. It cachces some strings and make them interned object which has a similar behavior to Immortal objects. You can read about them at sys.intern documentation. Just note that Python impliciltly makes some string interned, and this may show some unexpected behavior on is operator for strings.

How Python Objects are Stored in Memory Bytes?

We already know that id() return virtual memory address. In out prevous example of sequence of small numbers, there was another interesting sequence of small integers memeory addresses:

for i in range(5):
    print(f"id({i+1}) - id({i}) = {id(i+1) - id(i)}")

Are these 32 bytes, the allocates memory for a integer in Python?
Lets check size of int objects in memory:

import sys
print(f"{sys.getsizeof(1)=}") #sys.getsizeof(1)=28!!

What are these 28-bytes memory size and 32-byte gaps? id() returns the memory address of the object, and small integers are allocated in a contiguous array, each taking exactly 32 bytes. Looking into CPython documentation, you will find out content of this 32 bytes memory are as:

8 bytes for reference count (Which is the magical 0xC0000000 number in case of small -immortal- numbers)
8 bytes for type pointer (Pointer to type object)
8 bytes for size field (Remember that Python ineger can have any length! it is not just int32 or int64 number!)
4 bytes for the actual digit (This number can grow and allocate more than 32-bits if the number is large)
4 bytes padding for alignment (memory blocks must have a length as multiplication of 8)

The PyObject: Python's Fundamental Building Block

Lets break the above memory block to see what CPython does under the hood. At the C code level, every Python object starts with a PyObject header:

typedef struct _object {
    Py_ssize_t ob_refcnt;    // Reference count (or immortal flag)
    PyTypeObject *ob_type;   // Pointer to type object
} PyObject;

Now, if you take a look into Int object definition, you will see this struct (cpython/Include/cpython/longintrepr.h in CPython source code):

typedef struct _PyLongValue {
    uintptr_t lv_tag; /* Number of digits, sign and flags */
    digit ob_digit[1]; /* Numerical data of integer. Array of unit32 (4-bytes) numbers */
} _PyLongValue;

struct _longobject { /* yes, Integer is called Long Ineteger in C-level API */
    PyObject_HEAD /* PyObject reference */
    _PyLongValue long_value;
};

PyObject part is an important part which memory manager use for retrieval and garbage collection of this object in memory. So, now we know how is an object stored in memory. Let's talk about how is it allocated and accessed.

Python's Garbage Collection: From Refcounting to Generational Garbage Collection

The Foundation: Reference Counting

As mentioned earlier, at the heart of Python's memory management is reference counting. Every Python object has a counter tracking how many references point to this memory block. As discussed in first section:

typedef struct _object {
    Py_ssize_t ob_refcnt;    // Reference count field
    PyTypeObject *ob_type;   // Type pointer
    // ... type-specific data
} PyObject;

You can try it in Python via sys.getrefcount() which shows number of references to value +1 (one for the argument pased to getrefcount(...)):

import sys

x = [1, 2, 3]
print(f"Initial refcount: {sys.getrefcount(x)=}") # 2 (one for x and one for the argument passed to getrefcount)
y = x 
print(f"Assignintng y <- x: {sys.getrefcount(x)=}") # 3
z = x
print(f"Assignintng z <- x: {sys.getrefcount(x)=}") # 4

del y
print(f"De-Assigning y: {sys.getrefcount(x)=}") # 3
z = None
print(f"De-Assigning z: {sys.getrefcount(x)=}") # 2
del x
print("Deleted x")

Refcount allows Python to easily manage the memory. When refrence count of an object reaches 0, Python releases the memory.
But there is an issue:

import sys
import weakref

class Node:
    def __init__(self, child):
        self.child = child

a = Node(None) # reference count = 1
b = Node(None)
a_ref = weakref.ref(a) # weak reference keeps the reference, but does not increase refcount
a.child = b
b.child = a # reference count = 2
a_ref = weakref.ref(a) # reference count = 2

print("All refs a, b, and a_ref asigned", f"{sys.getrefcount(a_ref())=}") # 2 + 1(argument to getrefcount) = 3
del a
print("After varialble A deleted", f"{sys.getrefcount(a_ref())=}") # 2, which means real reference count = 1
del b
print("After varialble B deleted", f"{sys.getrefcount(a_ref())=}") # reference count = 0, object freed. But still accessible!

Circular reference is not possible to handle with a simple refcount. This kind of referencing needs to scan objects in memory and collect deleted circular references.
This kind of garbage collection is expensive and it is not possible to do after any value assignment or dissociation.
Python runs garbage collection process in cycles and different values in memory have different cycles. This means object in memeory will not be removed immediately (this is why we are able to access the memory using a_ref() even though all references are deleted).

The Generational Garbage Collector

Python's solution to circular references is the generational garbage collector. It operates on the principle of generational hypothesis:

Most objects die young (80-90% of objects)
Old objects rarely reference young objects

Python uses three generations:

import gc

print(f"GC generations: {gc.get_threshold()}") # (700, 10, 10) by default

These numbers are for:

Generation 0: New objects created in memory. These objects are collected after every 700 allocations.
Generation 1: Objects that survived one Gen 0 collection. Collected every 10 Gen 0 collections.
Generation 2: Objects that survived multiple collections. Collected every 10 Gen 1 collections.

This is why after deleting all references to variable a, we were able to access through weak reference a_ref: Generation 0 garbage collection was not started yet.

DEV Community