DEV Community

Cover image for Python Internals: Reference Cycles and Garbage Collection
aykhlf yassir
aykhlf yassir

Posted on

Python Internals: Reference Cycles and Garbage Collection

Understanding reference counting, garbage collection, and why your objects won't die


The Box vs. The Label

Quick: what does this code do?

a = [1, 2, 3]
b = a
b.append(4)
print(a)  # What prints?
Enter fullscreen mode Exit fullscreen mode

If you're coming from C++ or Java, you might think "I assigned a to b, so they're separate copies. a should still be [1, 2, 3]."

But run this code and you'll see [1, 2, 3, 4].

Why? Because your mental model is wrong.

Variables Are Not Boxes

In C++, a variable is like a box with a name written on it. When you write int x = 5;, you create a box labeled "x" and put the value 5 inside it. When you assign int y = x;, you create a new box labeled "y" and copy the value into it.

Python doesn't work this way.

In Python, objects live in the heap. Variables are not boxes that contain objects; they're labels attached to objects.

a = [1, 2, 3]
Enter fullscreen mode Exit fullscreen mode

What actually happens:

  1. Python creates a list object [1, 2, 3] somewhere in memory
  2. Python creates a label (reference) named a
  3. Python sticks that label onto the object

When you write:

b = a
Enter fullscreen mode Exit fullscreen mode

You're not copying the object. You're creating a second label b and sticking it onto the same object that a is pointing to.

    [1, 2, 3, 4] ← Object in memory
      ↑     ↑
      a     b   ← Two labels on one object
Enter fullscreen mode Exit fullscreen mode

This is why modifying through b affects what you see through a, they're both pointing at the same object.

Proving It With id()

Every object in Python has a unique identifier its memory address:

a = [1, 2, 3]
b = a

print(id(a))  # 140234567890
print(id(b))  # 140234567890 - Same address!
print(a is b) # True - Same object

c = [1, 2, 3]
print(id(c))  # 140234567999 - Different address!
print(a is c) # False - Different objects
print(a == c) # True - Same contents
Enter fullscreen mode Exit fullscreen mode

The is operator checks if two labels point to the same object, not whether the objects have the same value.

The Immutability Exception

This model explains why immutable types seem to behave differently:

x = 5
y = x
y = 10
print(x)  # Still 5!
Enter fullscreen mode Exit fullscreen mode

Did the label model break? No! When you write y = 10, you're not modifying the object, you're unsticking the label y from the object 5 and re-sticking it to a different object 10. The original object 5 is unchanged, and x still points to it.

Before y = 10:
    5 ← Object
    ↑
   x,y

After y = 10:
    5 ← Object    10 ← New object
    ↑              ↑
    x              y
Enter fullscreen mode Exit fullscreen mode

Immutable objects can't be modified, so every operation that looks like modification is actually creating a new object and moving the label.


Mutable Default Arguments

Now that you understand the label model, let's look at Python's most infamous gotcha:

def add_passenger(passenger, manifest=[]):
    manifest.append(passenger)
    return manifest

flight1 = add_passenger("Alice")
print(flight1)  # ['Alice']

flight2 = add_passenger("Bob")
print(flight2)  # ['Alice', 'Bob'] - WHAT?!
Enter fullscreen mode Exit fullscreen mode

Why is Alice on Bob's flight?

The Hidden Function Object

When Python executes a def statement, it doesn't just "define a function and forget it." It creates a function object that lives in memory. Default argument values are created once when the function is defined and stored inside that function object.

Let's prove it:

def add_passenger(passenger, manifest=[]):
    manifest.append(passenger)
    return manifest

# The function object has a __defaults__ attribute
print(add_passenger.__defaults__)  # ([],)

# Call it once
add_passenger("Alice")
print(add_passenger.__defaults__)  # (['Alice'],)

# Call it again
add_passenger("Bob")
print(add_passenger.__defaults__)  # (['Alice', 'Bob'],)
Enter fullscreen mode Exit fullscreen mode

The default list [] is created once, when def executes. Every call to the function that doesn't provide a manifest argument reuses that same list object.

The Execution Timeline

# DEFINITION TIME (happens once)
def add_passenger(passenger, manifest=[]):  # Create empty list, store in __defaults__
    manifest.append(passenger)
    return manifest

# CALL TIME 1
add_passenger("Alice")  # No manifest provided, use default (the list in __defaults__)
# The list is now ['Alice']

# CALL TIME 2  
add_passenger("Bob")    # No manifest provided, use default (SAME list!)
# The list is now ['Alice', 'Bob']
Enter fullscreen mode Exit fullscreen mode

The Idiomatic Fix

Never use mutable objects as default arguments. Use None as a sentinel:

def add_passenger(passenger, manifest=None):
    if manifest is None:
        manifest = []  # Create a NEW list for each call
    manifest.append(passenger)
    return manifest

flight1 = add_passenger("Alice")
print(flight1)  # ['Alice']

flight2 = add_passenger("Bob")
print(flight2)  # ['Bob'] - Separate list!
Enter fullscreen mode Exit fullscreen mode

Now each call that doesn't provide manifest creates a fresh, independent list.

When to Use Mutable Defaults (Intentionally)

There are rare cases where you want to preserve state across calls:

def cache_result(key, cache={}):
    if key not in cache:
        print(f"Computing {key}...")
        cache[key] = expensive_computation(key)
    return cache[key]

cache_result(5)  # Computing 5...
cache_result(5)  # (no output - cached!)
Enter fullscreen mode Exit fullscreen mode

But this is almost always better expressed with a class or explicit cache variable.


The Life and Death of an Object: Reference Counting

So we know objects are created in memory and variables are labels pointing to them. But when does an object die?

The Reference Counter

Every Python object has a hidden field called ob_refcnt, a counter tracking how many labels are currently stuck to it.

typedef struct {
    Py_ssize_t ob_refcnt;   // Reference count
    PyTypeObject *ob_type;  // Type
    // ... actual data
} PyObject;
Enter fullscreen mode Exit fullscreen mode

The rules are simple:

  • Create a label → Count +1
  • Remove a label → Count -1
  • Count reaches 0 → Immediate death

Let's trace the life cycle:

import sys

x = object()  # Create object, refcount = 1
print(sys.getrefcount(x))  # 2 (why? read on...)

y = x  # New label, refcount = 2 (+ 1 from getrefcount call = 3)
print(sys.getrefcount(x))  # 3

del y  # Remove label, refcount = 2
print(sys.getrefcount(x))  # 2

del x  # Refcount = 1 (from getrefcount), then 0 when function returns
# Object is immediately destroyed
Enter fullscreen mode Exit fullscreen mode

The sys.getrefcount() Trap

Notice the reference count is always higher than expected? This is because calling sys.getrefcount(x) creates a temporary reference when passing x as an argument!

x = []
# Actual refs: x
print(sys.getrefcount(x))  # 2
# Why 2? Because during the call, refs are: x, argument to getrefcount()
Enter fullscreen mode Exit fullscreen mode

The real count is always getrefcount(x) - 1.

What Creates References?

Understanding what increases the reference count is critical:

import sys

obj = object()
print(sys.getrefcount(obj))  # 2 (variable + function arg)

# Assignment creates a reference
another = obj
print(sys.getrefcount(obj))  # 3

# Lists/tuples/dicts create references
lst = [obj]
print(sys.getrefcount(obj))  # 4

# Function arguments create temporary references
def foo(x):
    print(sys.getrefcount(x))  # +1 from function call

foo(obj)  # Will print 5

# Deleting removes references
del another
del lst
print(sys.getrefcount(obj))  # Back to 2
Enter fullscreen mode Exit fullscreen mode

Immediate Destruction

The beauty of reference counting is deterministic cleanup. The moment the last reference disappears, the object dies:

class Mortal:
    def __init__(self, name):
        self.name = name
        print(f"{name} is born")

    def __del__(self):
        # Called when refcount hits 0
        print(f"{name} dies")

print("Creating object...")
x = Mortal("Socrates")  # Socrates is born

print("Deleting reference...")
del x  # Socrates dies - immediately!

print("After deletion")
Enter fullscreen mode Exit fullscreen mode

This is why Python doesn't need explicit free() or delete calls like C/C++. Objects clean up automatically when they're no longer needed.


The Reference Cycles Problem

Reference counting sounds perfect and intuitive. But it has a fatal weakness:

class Node:
    def __init__(self, name):
        self.name = name
        self.partner = None

    def __del__(self):
        print(f"Deleting {self.name}")

# Create two nodes
alice = Node("Alice")
bob = Node("Bob")

# Create a circular reference
alice.partner = bob
bob.partner = alice

# Delete our references
del alice
del bob

# ... crickets ...
# __del__ is never called!
Enter fullscreen mode Exit fullscreen mode

What happened? Let's trace the references:

Before del:
    Node("Alice") ← alice variable
          ↕ partner
    Node("Bob") ← bob variable

After del alice, del bob:
    Node("Alice") 
          ↕ partner
    Node("Bob")
Enter fullscreen mode Exit fullscreen mode

Even after we delete the variables alice and bob, the Node objects still hold references to each other. Their reference counts are still 1 (from the partner attribute), so they never get destroyed.

These objects are now unreachable from our code (we have no variables pointing to them), but they're still in memory, causing a memory leak.

The Real-World Impact

This isn't just theoretical. Reference cycles are common:

# Parent-child relationships
class Parent:
    def __init__(self):
        self.children = []

class Child:
    def __init__(self, parent):
        self.parent = parent
        parent.children.append(self)

# Event listeners
class Button:
    def __init__(self):
        self.click_handler = None

    def on_click(self, handler):
        self.click_handler = handler

# Handler holds button, button holds handler
button = Button()
button.on_click(lambda: print(f"Clicked {button}"))

# Data structures
class TreeNode:
    def __init__(self, value):
        self.value = value
        self.parent = None
        self.left = None
        self.right = None

root = TreeNode(1)
root.left = TreeNode(2)
root.left.parent = root  # Cycle!
Enter fullscreen mode Exit fullscreen mode

The Garbage Collector

Python's solution is the Generational Garbage Collector, which exists specifically to find and destroy reference cycles.

How It Works

The GC periodically:

  1. Scans all objects looking for those that reference each other
  2. Builds a graph of references
  3. Finds cycles (strongly connected components)
  4. Determines if the cycle is reachable from any external reference
  5. If not reachable, destroys the entire cycle

You can trigger it manually:

import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.partner = None

    def __del__(self):
        print(f"Deleting {self.name}")

alice = Node("Alice")
bob = Node("Bob")
alice.partner = bob
bob.partner = alice

del alice, bob
print("After del...")

# Force garbage collection
gc.collect()  # Deleting Alice
              # Deleting Bob
print("After gc.collect()")
Enter fullscreen mode Exit fullscreen mode

Generational Hypothesis

The GC is "generational" because it's based on an empirical observation: most objects die young.

Python divides objects into three generations:

  • Generation 0: New objects (checked frequently)
  • Generation 1: Objects that survived one GC pass (checked less often)
  • Generation 2: Long-lived objects (checked rarely)
import gc

# See the current thresholds
print(gc.get_threshold())  # (700, 10, 10)
# Meaning: Run gen0 after 700 allocations,
#          Run gen1 after 10 gen0 collections,
#          Run gen2 after 10 gen1 collections

# See collection stats
print(gc.get_count())  # (453, 5, 2)
# Current counts in each generation
Enter fullscreen mode Exit fullscreen mode

The Cost of Cycles

The garbage collector isn't free. It has to:

  • Pause your program to scan objects
  • Build reference graphs
  • Traverse cycles

This is why avoiding cycles improves performance:

# Slower: Creates cycles
class Child:
    def __init__(self, parent):
        self.parent = parent
        parent.children.append(self)

# Faster: No cycles
class Child:
    def __init__(self, parent_name):
        self.parent_name = parent_name  # Store name, not reference
Enter fullscreen mode Exit fullscreen mode

Weak References

Imagine building a cache:

cache = {}

def get_user(user_id):
    if user_id not in cache:
        cache[user_id] = load_from_database(user_id)
    return cache[user_id]
Enter fullscreen mode Exit fullscreen mode

Problem: objects in cache never die. Even if nobody else needs them, the cache keeps them alive. This is a memory leak.

Strong vs. Weak References

Think of references like this:

Strong Reference (normal):

  • Like holding a dog's leash
  • The dog cannot leave while you hold the leash
  • The dog exists because you're holding it

Weak Reference:

  • Like pointing at a dog
  • You don't prevent the dog from leaving
  • If the owner (strong reference) leaves, the dog disappears
  • You're now pointing at nothing (None)

The weakref Module

import weakref

class HeavyObject:
    def __init__(self, data):
        self.data = data

    def __del__(self):
        print(f"Deleting {self.data}")

# Strong reference
obj = HeavyObject("important")
print(obj.data)  # "important"

# Weak reference
weak = weakref.ref(obj)
print(weak())  # <__main__.HeavyObject object> - obj still alive

# Delete strong reference
del obj  # Deleting important

# Weak reference now points to nothing
print(weak())  # None
Enter fullscreen mode Exit fullscreen mode

The weak reference doesn't count toward ob_refcnt. When the last strong reference dies, the object is destroyed, even if weak references still exist.

The Optimized Cache: WeakValueDictionary

import weakref

cache = weakref.WeakValueDictionary()

class User:
    def __init__(self, user_id, name):
        self.user_id = user_id
        self.name = name

    def __del__(self):
        print(f"User {self.name} deleted from memory")

# Add to cache
user = User(1, "Alice")
cache[1] = user

print(cache[1].name)  # "Alice" - cache works

# Delete the strong reference
del user  # User Alice deleted from memory

# Cache automatically removed the entry!
print(1 in cache)  # False
Enter fullscreen mode Exit fullscreen mode

When user was deleted, its reference count hit 0 (the cache only held a weak reference). The object was destroyed, and WeakValueDictionary automatically removed the entry.

This is how you build caches that don't leak memory.

Use Cases for Weak References

  1. Caches: Store objects without preventing their cleanup
  2. Observer patterns: Listeners shouldn't keep subjects alive
  3. Circular references: Parent doesn't prevent child GC
# Observer pattern with weak refs
class Subject:
    def __init__(self):
        self._observers = []

    def attach(self, observer):
        # Store weak reference, not strong
        self._observers.append(weakref.ref(observer))

    def notify(self):
        # Filter out dead references
        self._observers = [obs for obs in self._observers if obs() is not None]
        for obs_ref in self._observers:
            obs = obs_ref()
            if obs is not None:
                obs.update()
Enter fullscreen mode Exit fullscreen mode

When NOT to Use Weak References

Weak references have limitations:

# Can't create weak refs to basic types
try:
    weak = weakref.ref(42)
except TypeError:
    print("Can't weakref int")

try:
    weak = weakref.ref("hello")
except TypeError:
    print("Can't weakref str")

# Weak refs add overhead
# Don't use them everywhere - only where you need them
Enter fullscreen mode Exit fullscreen mode

Recommended Resource

I highly recommend Nina Zakharenko's "Memory Management in Python" talk from PyCon 2016.

Watch on YouTube


Summary:

We've learned how Python manages object lifecycles:

Mental Models

  • Variables are labels, not boxesb = a creates two labels on one object
  • Mutable defaults are shared → Use None sentinel pattern
  • Reference counting is immediate → Objects die when count hits 0

The Reference Counting System

  • Every object has ob_refcnt tracking references
  • Assignment increments count, deletion decrements
  • Count = 0 triggers immediate destruction via __del__

The Garbage Collector

  • Purpose: Find and destroy reference cycles
  • Strategy: Generational collection (young objects die fast)
  • Trigger: Automatically or via gc.collect()
  • Cost: Pause program to scan objects

Weak References

  • Strong reference: Keeps object alive (normal)
  • Weak reference: Points to object without preventing cleanup
  • WeakValueDictionary: Auto-cleaning cache
  • Use for: Caches, observers, avoiding cycles

The Professional Checklist

When designing classes:

# Avoid cycles in data structures
class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        # DON'T store self.parent - creates cycle
        # Store parent_id or use weakref instead

# Use weak refs for caches
import weakref
class ResourceManager:
    def __init__(self):
        self._cache = weakref.WeakValueDictionary()

# Clean up resources explicitly
class FileHandler:
    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.cleanup()  # Don't rely on __del__
Enter fullscreen mode Exit fullscreen mode

When writing functions:

# NEVER use mutable defaults
def process_items(items, buffer=[]):  # BAD
    pass

def process_items(items, buffer=None):  # GOOD
    if buffer is None:
        buffer = []
Enter fullscreen mode Exit fullscreen mode

Top comments (0)