Understanding reference counting, garbage collection, and why your objects won't die
The Box vs. The Label
Quick: what does this code do?
a = [1, 2, 3]
b = a
b.append(4)
print(a) # What prints?
If you're coming from C++ or Java, you might think "I assigned a to b, so they're separate copies. a should still be [1, 2, 3]."
But run this code and you'll see [1, 2, 3, 4].
Why? Because your mental model is wrong.
Variables Are Not Boxes
In C++, a variable is like a box with a name written on it. When you write int x = 5;, you create a box labeled "x" and put the value 5 inside it. When you assign int y = x;, you create a new box labeled "y" and copy the value into it.
Python doesn't work this way.
In Python, objects live in the heap. Variables are not boxes that contain objects; they're labels attached to objects.
a = [1, 2, 3]
What actually happens:
- Python creates a list object
[1, 2, 3]somewhere in memory - Python creates a label (reference) named
a - Python sticks that label onto the object
When you write:
b = a
You're not copying the object. You're creating a second label b and sticking it onto the same object that a is pointing to.
[1, 2, 3, 4] ← Object in memory
↑ ↑
a b ← Two labels on one object
This is why modifying through b affects what you see through a, they're both pointing at the same object.
Proving It With id()
Every object in Python has a unique identifier its memory address:
a = [1, 2, 3]
b = a
print(id(a)) # 140234567890
print(id(b)) # 140234567890 - Same address!
print(a is b) # True - Same object
c = [1, 2, 3]
print(id(c)) # 140234567999 - Different address!
print(a is c) # False - Different objects
print(a == c) # True - Same contents
The is operator checks if two labels point to the same object, not whether the objects have the same value.
The Immutability Exception
This model explains why immutable types seem to behave differently:
x = 5
y = x
y = 10
print(x) # Still 5!
Did the label model break? No! When you write y = 10, you're not modifying the object, you're unsticking the label y from the object 5 and re-sticking it to a different object 10. The original object 5 is unchanged, and x still points to it.
Before y = 10:
5 ← Object
↑
x,y
After y = 10:
5 ← Object 10 ← New object
↑ ↑
x y
Immutable objects can't be modified, so every operation that looks like modification is actually creating a new object and moving the label.
Mutable Default Arguments
Now that you understand the label model, let's look at Python's most infamous gotcha:
def add_passenger(passenger, manifest=[]):
manifest.append(passenger)
return manifest
flight1 = add_passenger("Alice")
print(flight1) # ['Alice']
flight2 = add_passenger("Bob")
print(flight2) # ['Alice', 'Bob'] - WHAT?!
Why is Alice on Bob's flight?
The Hidden Function Object
When Python executes a def statement, it doesn't just "define a function and forget it." It creates a function object that lives in memory. Default argument values are created once when the function is defined and stored inside that function object.
Let's prove it:
def add_passenger(passenger, manifest=[]):
manifest.append(passenger)
return manifest
# The function object has a __defaults__ attribute
print(add_passenger.__defaults__) # ([],)
# Call it once
add_passenger("Alice")
print(add_passenger.__defaults__) # (['Alice'],)
# Call it again
add_passenger("Bob")
print(add_passenger.__defaults__) # (['Alice', 'Bob'],)
The default list [] is created once, when def executes. Every call to the function that doesn't provide a manifest argument reuses that same list object.
The Execution Timeline
# DEFINITION TIME (happens once)
def add_passenger(passenger, manifest=[]): # Create empty list, store in __defaults__
manifest.append(passenger)
return manifest
# CALL TIME 1
add_passenger("Alice") # No manifest provided, use default (the list in __defaults__)
# The list is now ['Alice']
# CALL TIME 2
add_passenger("Bob") # No manifest provided, use default (SAME list!)
# The list is now ['Alice', 'Bob']
The Idiomatic Fix
Never use mutable objects as default arguments. Use None as a sentinel:
def add_passenger(passenger, manifest=None):
if manifest is None:
manifest = [] # Create a NEW list for each call
manifest.append(passenger)
return manifest
flight1 = add_passenger("Alice")
print(flight1) # ['Alice']
flight2 = add_passenger("Bob")
print(flight2) # ['Bob'] - Separate list!
Now each call that doesn't provide manifest creates a fresh, independent list.
When to Use Mutable Defaults (Intentionally)
There are rare cases where you want to preserve state across calls:
def cache_result(key, cache={}):
if key not in cache:
print(f"Computing {key}...")
cache[key] = expensive_computation(key)
return cache[key]
cache_result(5) # Computing 5...
cache_result(5) # (no output - cached!)
But this is almost always better expressed with a class or explicit cache variable.
The Life and Death of an Object: Reference Counting
So we know objects are created in memory and variables are labels pointing to them. But when does an object die?
The Reference Counter
Every Python object has a hidden field called ob_refcnt, a counter tracking how many labels are currently stuck to it.
typedef struct {
Py_ssize_t ob_refcnt; // Reference count
PyTypeObject *ob_type; // Type
// ... actual data
} PyObject;
The rules are simple:
- Create a label → Count +1
- Remove a label → Count -1
- Count reaches 0 → Immediate death
Let's trace the life cycle:
import sys
x = object() # Create object, refcount = 1
print(sys.getrefcount(x)) # 2 (why? read on...)
y = x # New label, refcount = 2 (+ 1 from getrefcount call = 3)
print(sys.getrefcount(x)) # 3
del y # Remove label, refcount = 2
print(sys.getrefcount(x)) # 2
del x # Refcount = 1 (from getrefcount), then 0 when function returns
# Object is immediately destroyed
The sys.getrefcount() Trap
Notice the reference count is always higher than expected? This is because calling sys.getrefcount(x) creates a temporary reference when passing x as an argument!
x = []
# Actual refs: x
print(sys.getrefcount(x)) # 2
# Why 2? Because during the call, refs are: x, argument to getrefcount()
The real count is always getrefcount(x) - 1.
What Creates References?
Understanding what increases the reference count is critical:
import sys
obj = object()
print(sys.getrefcount(obj)) # 2 (variable + function arg)
# Assignment creates a reference
another = obj
print(sys.getrefcount(obj)) # 3
# Lists/tuples/dicts create references
lst = [obj]
print(sys.getrefcount(obj)) # 4
# Function arguments create temporary references
def foo(x):
print(sys.getrefcount(x)) # +1 from function call
foo(obj) # Will print 5
# Deleting removes references
del another
del lst
print(sys.getrefcount(obj)) # Back to 2
Immediate Destruction
The beauty of reference counting is deterministic cleanup. The moment the last reference disappears, the object dies:
class Mortal:
def __init__(self, name):
self.name = name
print(f"{name} is born")
def __del__(self):
# Called when refcount hits 0
print(f"{name} dies")
print("Creating object...")
x = Mortal("Socrates") # Socrates is born
print("Deleting reference...")
del x # Socrates dies - immediately!
print("After deletion")
This is why Python doesn't need explicit free() or delete calls like C/C++. Objects clean up automatically when they're no longer needed.
The Reference Cycles Problem
Reference counting sounds perfect and intuitive. But it has a fatal weakness:
class Node:
def __init__(self, name):
self.name = name
self.partner = None
def __del__(self):
print(f"Deleting {self.name}")
# Create two nodes
alice = Node("Alice")
bob = Node("Bob")
# Create a circular reference
alice.partner = bob
bob.partner = alice
# Delete our references
del alice
del bob
# ... crickets ...
# __del__ is never called!
What happened? Let's trace the references:
Before del:
Node("Alice") ← alice variable
↕ partner
Node("Bob") ← bob variable
After del alice, del bob:
Node("Alice")
↕ partner
Node("Bob")
Even after we delete the variables alice and bob, the Node objects still hold references to each other. Their reference counts are still 1 (from the partner attribute), so they never get destroyed.
These objects are now unreachable from our code (we have no variables pointing to them), but they're still in memory, causing a memory leak.
The Real-World Impact
This isn't just theoretical. Reference cycles are common:
# Parent-child relationships
class Parent:
def __init__(self):
self.children = []
class Child:
def __init__(self, parent):
self.parent = parent
parent.children.append(self)
# Event listeners
class Button:
def __init__(self):
self.click_handler = None
def on_click(self, handler):
self.click_handler = handler
# Handler holds button, button holds handler
button = Button()
button.on_click(lambda: print(f"Clicked {button}"))
# Data structures
class TreeNode:
def __init__(self, value):
self.value = value
self.parent = None
self.left = None
self.right = None
root = TreeNode(1)
root.left = TreeNode(2)
root.left.parent = root # Cycle!
The Garbage Collector
Python's solution is the Generational Garbage Collector, which exists specifically to find and destroy reference cycles.
How It Works
The GC periodically:
- Scans all objects looking for those that reference each other
- Builds a graph of references
- Finds cycles (strongly connected components)
- Determines if the cycle is reachable from any external reference
- If not reachable, destroys the entire cycle
You can trigger it manually:
import gc
class Node:
def __init__(self, name):
self.name = name
self.partner = None
def __del__(self):
print(f"Deleting {self.name}")
alice = Node("Alice")
bob = Node("Bob")
alice.partner = bob
bob.partner = alice
del alice, bob
print("After del...")
# Force garbage collection
gc.collect() # Deleting Alice
# Deleting Bob
print("After gc.collect()")
Generational Hypothesis
The GC is "generational" because it's based on an empirical observation: most objects die young.
Python divides objects into three generations:
- Generation 0: New objects (checked frequently)
- Generation 1: Objects that survived one GC pass (checked less often)
- Generation 2: Long-lived objects (checked rarely)
import gc
# See the current thresholds
print(gc.get_threshold()) # (700, 10, 10)
# Meaning: Run gen0 after 700 allocations,
# Run gen1 after 10 gen0 collections,
# Run gen2 after 10 gen1 collections
# See collection stats
print(gc.get_count()) # (453, 5, 2)
# Current counts in each generation
The Cost of Cycles
The garbage collector isn't free. It has to:
- Pause your program to scan objects
- Build reference graphs
- Traverse cycles
This is why avoiding cycles improves performance:
# Slower: Creates cycles
class Child:
def __init__(self, parent):
self.parent = parent
parent.children.append(self)
# Faster: No cycles
class Child:
def __init__(self, parent_name):
self.parent_name = parent_name # Store name, not reference
Weak References
Imagine building a cache:
cache = {}
def get_user(user_id):
if user_id not in cache:
cache[user_id] = load_from_database(user_id)
return cache[user_id]
Problem: objects in cache never die. Even if nobody else needs them, the cache keeps them alive. This is a memory leak.
Strong vs. Weak References
Think of references like this:
Strong Reference (normal):
- Like holding a dog's leash
- The dog cannot leave while you hold the leash
- The dog exists because you're holding it
Weak Reference:
- Like pointing at a dog
- You don't prevent the dog from leaving
- If the owner (strong reference) leaves, the dog disappears
- You're now pointing at nothing (
None)
The weakref Module
import weakref
class HeavyObject:
def __init__(self, data):
self.data = data
def __del__(self):
print(f"Deleting {self.data}")
# Strong reference
obj = HeavyObject("important")
print(obj.data) # "important"
# Weak reference
weak = weakref.ref(obj)
print(weak()) # <__main__.HeavyObject object> - obj still alive
# Delete strong reference
del obj # Deleting important
# Weak reference now points to nothing
print(weak()) # None
The weak reference doesn't count toward ob_refcnt. When the last strong reference dies, the object is destroyed, even if weak references still exist.
The Optimized Cache: WeakValueDictionary
import weakref
cache = weakref.WeakValueDictionary()
class User:
def __init__(self, user_id, name):
self.user_id = user_id
self.name = name
def __del__(self):
print(f"User {self.name} deleted from memory")
# Add to cache
user = User(1, "Alice")
cache[1] = user
print(cache[1].name) # "Alice" - cache works
# Delete the strong reference
del user # User Alice deleted from memory
# Cache automatically removed the entry!
print(1 in cache) # False
When user was deleted, its reference count hit 0 (the cache only held a weak reference). The object was destroyed, and WeakValueDictionary automatically removed the entry.
This is how you build caches that don't leak memory.
Use Cases for Weak References
- Caches: Store objects without preventing their cleanup
- Observer patterns: Listeners shouldn't keep subjects alive
- Circular references: Parent doesn't prevent child GC
# Observer pattern with weak refs
class Subject:
def __init__(self):
self._observers = []
def attach(self, observer):
# Store weak reference, not strong
self._observers.append(weakref.ref(observer))
def notify(self):
# Filter out dead references
self._observers = [obs for obs in self._observers if obs() is not None]
for obs_ref in self._observers:
obs = obs_ref()
if obs is not None:
obs.update()
When NOT to Use Weak References
Weak references have limitations:
# Can't create weak refs to basic types
try:
weak = weakref.ref(42)
except TypeError:
print("Can't weakref int")
try:
weak = weakref.ref("hello")
except TypeError:
print("Can't weakref str")
# Weak refs add overhead
# Don't use them everywhere - only where you need them
Recommended Resource
I highly recommend Nina Zakharenko's "Memory Management in Python" talk from PyCon 2016.
Summary:
We've learned how Python manages object lifecycles:
Mental Models
-
Variables are labels, not boxes →
b = acreates two labels on one object -
Mutable defaults are shared → Use
Nonesentinel pattern - Reference counting is immediate → Objects die when count hits 0
The Reference Counting System
- Every object has
ob_refcnttracking references - Assignment increments count, deletion decrements
- Count = 0 triggers immediate destruction via
__del__
The Garbage Collector
- Purpose: Find and destroy reference cycles
- Strategy: Generational collection (young objects die fast)
-
Trigger: Automatically or via
gc.collect() - Cost: Pause program to scan objects
Weak References
- Strong reference: Keeps object alive (normal)
- Weak reference: Points to object without preventing cleanup
-
WeakValueDictionary: Auto-cleaning cache - Use for: Caches, observers, avoiding cycles
The Professional Checklist
When designing classes:
# Avoid cycles in data structures
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
# DON'T store self.parent - creates cycle
# Store parent_id or use weakref instead
# Use weak refs for caches
import weakref
class ResourceManager:
def __init__(self):
self._cache = weakref.WeakValueDictionary()
# Clean up resources explicitly
class FileHandler:
def __enter__(self):
return self
def __exit__(self, *args):
self.cleanup() # Don't rely on __del__
When writing functions:
# NEVER use mutable defaults
def process_items(items, buffer=[]): # BAD
pass
def process_items(items, buffer=None): # GOOD
if buffer is None:
buffer = []
Top comments (0)