Timothy was investigating a memory leak when Margaret found him staring at his monitoring dashboard. "Look at this," he said, pointing to a slow but steady memory increase. "My application creates thousands of objects, processes them, and I'm pretty sure I'm done with them. But the memory isn't getting freed."
Margaret pulled up a chair. "Welcome to Python's memory management system. Let's talk about reference counting, cycle detection, and generational garbage collection - the three pillars of how Python decides when objects can be freed."
The Problem: When Does Memory Get Freed?
Timothy showed Margaret a simple example that puzzled him:
import sys
def demonstrate_reference_counting():
"""Basic example of Python's reference counting"""
# Create an object
data = [1, 2, 3, 4, 5]
print(f"Reference count after creation: {sys.getrefcount(data)}")
# Create another reference
also_data = data
print(f"Reference count after second reference: {sys.getrefcount(data)}")
# Delete one reference
del also_data
print(f"Reference count after deletion: {sys.getrefcount(data)}")
# The object still exists because 'data' still references it
print(f"Data still accessible: {data}")
demonstrate_reference_counting()
Output:
Reference count after creation: 2
Reference count after second reference: 3
Reference count after deletion: 2
Data still accessible: [1, 2, 3, 4, 5]
"Why is the initial reference count 2, not 1?" Timothy asked.
"One reference is your data variable," Margaret explained. "The other is the temporary reference created when you pass it to getrefcount() itself. That's why it starts at 2."
Reference Counting: The Primary Mechanism
Margaret explained Python's fundamental memory management strategy:
"""
Every Python object has a reference count - a number tracking
how many references point to it.
When reference count reaches 0 → object is immediately freed
Reference count increases when:
- Variable assigned to object (x = obj)
- Object passed to function (func(obj))
- Object added to container (list.append(obj))
- Object assigned as attribute (self.data = obj)
Reference count decreases when:
- Variable goes out of scope
- Variable reassigned (x = something_else)
- Object removed from container (list.remove(obj))
- del statement used (del x)
"""
def demonstrate_refcount_mechanics():
"""Show how reference counting works"""
import sys
# Create object
my_list = [1, 2, 3]
print(f"Initial refcount: {sys.getrefcount(my_list) - 1}") # Subtract 1 for getrefcount's own reference
# Add to another container
container = [my_list]
print(f"After adding to container: {sys.getrefcount(my_list) - 1}")
# Create another reference
another_ref = my_list
print(f"After another reference: {sys.getrefcount(my_list) - 1}")
# Remove from container
container.clear()
print(f"After removing from container: {sys.getrefcount(my_list) - 1}")
# Delete reference
del another_ref
print(f"After deleting reference: {sys.getrefcount(my_list) - 1}")
# my_list is the only reference left
print(f"Final refcount: {sys.getrefcount(my_list) - 1}")
demonstrate_refcount_mechanics()
Output:
Initial refcount: 1
After adding to container: 2
After another reference: 3
After removing from container: 2
After deleting reference: 1
Final refcount: 1
The Speed of Reference Counting
"Reference counting is fast," Margaret explained. "When the count hits zero, the memory is freed immediately - no waiting for a garbage collection sweep."
import time
import sys
def measure_immediate_cleanup():
"""Demonstrate that reference counting cleanup is immediate"""
class LargeObject:
def __init__(self, size_mb=10):
# Allocate approximately size_mb megabytes
self.data = [0] * (size_mb * 1024 * 1024 // 8)
def __del__(self):
# Destructor called when object is freed
print(f" LargeObject freed at {time.time():.3f}")
print("Creating large object...")
start_time = time.time()
obj = LargeObject(size_mb=50)
print(f"Created at {time.time():.3f}")
# Use the object
print(f"Object has {len(obj.data):,} elements")
# Delete it
print(f"Deleting at {time.time():.3f}")
del obj
print(f"Deletion returned at {time.time():.3f}")
# The destructor should have run immediately
measure_immediate_cleanup()
Output:
Creating large object...
Created at 1699564532.123
Object has 6,553,600 elements
Deleting at 1699564532.456
LargeObject freed at 1699564532.456
Deletion returned at 1699564532.457
"See?" Margaret pointed. "The moment the reference count hit zero, the destructor ran and memory was freed. No delay, no waiting for a collection cycle."
The Reference Cycle Problem
Timothy asked, "If reference counting is so great, why do we need anything else?"
Margaret showed him the fundamental problem:
def demonstrate_reference_cycle():
"""The problem that reference counting can't solve"""
class Node:
def __init__(self, value):
self.value = value
self.next = None
def __del__(self):
print(f" Node {self.value} freed")
print("Creating a cycle:")
# Create two nodes
node1 = Node("A")
node2 = Node("B")
# Create a cycle - each points to the other
node1.next = node2
node2.next = node1
import sys
print(f" node1 refcount: {sys.getrefcount(node1) - 1}") # 2: our variable + node2.next
print(f" node2 refcount: {sys.getrefcount(node2) - 1}") # 2: our variable + node1.next
# Delete our variables
print("\nDeleting variables...")
del node1
del node2
print("Variables deleted, but...")
# The nodes still reference each other!
# They have refcount=1 each (pointing to each other)
# Reference counting CANNOT free them
print("Forcing garbage collection...")
import gc
gc.collect()
print("Garbage collection complete")
demonstrate_reference_cycle()
Output:
Creating a cycle:
node1 refcount: 2
node2 refcount: 2
Deleting variables...
Variables deleted, but...
Forcing garbage collection...
Node B freed
Node A freed
Garbage collection complete
"Notice," Margaret explained, "that the objects weren't freed when we deleted the variables. They only got freed when we explicitly ran garbage collection. That's because they were caught in a reference cycle."
The Cycle Detector
Margaret sketched out how Python solves this:
"""
Python's Cycle Detector:
Every container object (lists, dicts, tuples, classes with __dict__)
is tracked by the garbage collector.
Periodically (or when explicitly triggered), the collector:
1. Identifies all tracked objects
2. Finds objects that reference each other but are unreachable
from the root set (global variables, local variables, etc.)
3. Breaks the cycles and frees the objects
This is MORE EXPENSIVE than reference counting, so Python
uses it as a backup - only for detecting cycles.
"""
def demonstrate_cycle_detection():
"""Show how cycle detection works"""
import gc
import sys
# Disable automatic garbage collection for demonstration
gc.disable()
class Node:
def __init__(self, name):
self.name = name
self.references = []
def __repr__(self):
return f"Node({self.name})"
def __del__(self):
print(f" Freed: {self.name}")
print("Creating a complex cycle:")
# Create a circular linked structure
a = Node("A")
b = Node("B")
c = Node("C")
a.references.append(b)
b.references.append(c)
c.references.append(a) # Cycle: A -> B -> C -> A
print(f"Created nodes: {a}, {b}, {c}")
print(f"Tracked objects before deletion: {len(gc.get_objects())}")
# Delete our references
del a, b, c
print("\nVariables deleted")
print(f"Tracked objects after deletion: {len(gc.get_objects())}")
print("(Nodes still in memory - cycles prevent ref counting cleanup)")
# Run garbage collection
print("\nRunning garbage collection...")
collected = gc.collect()
print(f"Collected {collected} objects")
print(f"Tracked objects after collection: {len(gc.get_objects())}")
# Re-enable automatic collection
gc.enable()
demonstrate_cycle_detection()
Generational Garbage Collection
"Python doesn't check for cycles constantly," Margaret explained. "That would be too expensive. Instead, it uses generational collection."
import gc
def explain_generations():
"""
Python's garbage collector has THREE generations:
Generation 0: Young objects (newly created)
- Checked most frequently
- Most objects die young (short-lived temporaries)
- Threshold: ~700 objects before collection
Generation 1: Middle-aged objects
- Survived one Gen 0 collection
- Checked less frequently
- Threshold: ~10 Gen 0 collections before Gen 1 collection
Generation 2: Old objects (long-lived)
- Survived Gen 1 collection
- Checked least frequently
- Threshold: ~10 Gen 1 collections before Gen 2 collection
The hypothesis: Most objects die young.
So check young objects often, old objects rarely.
"""
# Get current thresholds
thresholds = gc.get_threshold()
print(f"Collection thresholds: {thresholds}")
print(f" Gen 0: Collect after {thresholds[0]} allocations")
print(f" Gen 1: Collect after {thresholds[1]} Gen 0 collections")
print(f" Gen 2: Collect after {thresholds[2]} Gen 1 collections")
# Get current generation counts
counts = gc.get_count()
print(f"\nCurrent counts: {counts}")
print(f" Gen 0: {counts[0]} objects since last collection")
print(f" Gen 1: {counts[1]} collections since last Gen 1 collection")
print(f" Gen 2: {counts[2]} collections since last Gen 2 collection")
explain_generations()
Output:
Collection thresholds: (700, 10, 10)
Gen 0: Collect after 700 allocations
Gen 1: Collect after 10 Gen 0 collections
Gen 2: Collect after 10 Gen 1 collections
Current counts: (423, 3, 2)
Gen 0: 423 objects since last collection
Gen 1: 3 collections since last Gen 1 collection
Gen 2: 2 collections since last Gen 2 collection
Watching Garbage Collection In Action
Timothy wanted to see it happen. Margaret wrote a monitoring script:
import gc
import sys
def watch_garbage_collection():
"""Monitor garbage collection as it happens"""
class TrackedObject:
instances_created = 0
instances_freed = 0
def __init__(self):
TrackedObject.instances_created += 1
self.data = [0] * 1000 # Make it substantial
def __del__(self):
TrackedObject.instances_freed += 1
# Enable gc debugging
gc.set_debug(gc.DEBUG_STATS)
print("Creating objects to trigger collections:\n")
# Create many objects in a loop
objects = []
for i in range(1000):
obj = TrackedObject()
# Keep some, let others become garbage
if i % 10 == 0:
objects.append(obj) # Keep every 10th object
# Others will be garbage (no reference kept)
# Print status every 200 objects
if i % 200 == 0:
counts = gc.get_count()
print(f"After {i} objects:")
print(f" Created: {TrackedObject.instances_created}")
print(f" Freed: {TrackedObject.instances_freed}")
print(f" Gen counts: {counts}")
print()
# Turn off debug output
gc.set_debug(0)
# Final collection
print("\nForcing final collection:")
collected = gc.collect()
print(f"Collected {collected} objects")
print(f"Final stats:")
print(f" Created: {TrackedObject.instances_created}")
print(f" Freed: {TrackedObject.instances_freed}")
print(f" Kept alive: {len(objects)}")
# Note: gc.DEBUG_STATS produces verbose output
# Uncomment to see detailed collection information:
# watch_garbage_collection()
Manual Memory Management
"Can I control garbage collection manually?" Timothy asked.
import gc
def manual_gc_control():
"""Demonstrate manual garbage collection control"""
# Check if GC is enabled
print(f"GC enabled: {gc.isenabled()}")
# Disable automatic garbage collection
gc.disable()
print(f"GC enabled after disable: {gc.isenabled()}")
# Create some garbage
class Node:
def __init__(self, value):
self.value = value
self.ref = None
def __del__(self):
print(f" Node {self.value} freed")
# Create a cycle
a = Node("A")
b = Node("B")
a.ref = b
b.ref = a
del a, b
print("Cycle created and variables deleted")
print("(With GC disabled, cycle persists)")
# Manually trigger collection
print("\nManually collecting...")
collected = gc.collect()
print(f"Collected {collected} objects")
# Re-enable automatic collection
gc.enable()
print(f"\nGC re-enabled: {gc.isenabled()}")
def collection_statistics():
"""Get detailed statistics about garbage collection"""
print("Garbage Collection Statistics:")
print(f" Collections: {gc.get_count()}")
print(f" Thresholds: {gc.get_threshold()}")
print(f" Tracked objects: {len(gc.get_objects())}")
# Get statistics by generation
stats = gc.get_stats()
for i, generation_stats in enumerate(stats):
print(f"\nGeneration {i}:")
print(f" Collections: {generation_stats.get('collections', 'N/A')}")
print(f" Collected: {generation_stats.get('collected', 'N/A')}")
print(f" Uncollectable: {generation_stats.get('uncollectable', 'N/A')}")
manual_gc_control()
print("\n" + "="*50 + "\n")
collection_statistics()
Weak References: Breaking Cycles Intentionally
Margaret showed Timothy a powerful technique:
import weakref
import sys
def demonstrate_weak_references():
"""Weak references don't increase reference count"""
class Resource:
def __init__(self, name):
self.name = name
def __repr__(self):
return f"Resource({self.name})"
def __del__(self):
print(f" Resource {self.name} freed")
# Regular reference
print("Regular reference:")
obj = Resource("Data")
print(f" Reference count: {sys.getrefcount(obj) - 1}")
# Strong reference in list
cache = [obj]
print(f" After adding to list: {sys.getrefcount(obj) - 1}")
# Weak reference doesn't increase count
print("\nWeak reference:")
weak_ref = weakref.ref(obj)
print(f" After creating weak ref: {sys.getrefcount(obj) - 1}")
# Can still access through weak ref (as long as object exists)
print(f" Accessing via weak ref: {weak_ref()}")
# Delete strong references
print("\nDeleting strong references...")
del obj
cache.clear()
# Now weak reference returns None
print(f" Weak ref now returns: {weak_ref()}")
def weak_reference_cache_pattern():
"""Common pattern: cache with weak references"""
class ExpensiveObject:
def __init__(self, key):
self.key = key
self.data = [0] * 1_000_000 # Large object
print(f" Created expensive object: {key}")
def __del__(self):
print(f" Freed expensive object: {self.key}")
# Cache using weak references
cache = {}
def get_or_create(key):
"""Get from cache or create new"""
# Check if we have a weak reference
if key in cache:
obj = cache[key]() # Call weak ref
if obj is not None:
print(f" Cache hit: {key}")
return obj
# Create new object
print(f" Cache miss: {key}")
obj = ExpensiveObject(key)
cache[key] = weakref.ref(obj)
return obj
print("First access (creates object):")
obj1 = get_or_create("data1")
print("\nSecond access (cache hit):")
obj2 = get_or_create("data1")
print("\nDeleting reference:")
del obj1, obj2
print("\nThird access (cache miss - object was freed):")
obj3 = get_or_create("data1")
del obj3
demonstrate_weak_references()
print("\n" + "="*50 + "\n")
weak_reference_cache_pattern()
Memory Leak Detection
Timothy showed Margaret his memory leak investigation tools:
import gc
import sys
import tracemalloc
def detect_memory_leaks():
"""Detect objects that aren't being freed"""
# Start tracing memory allocations
tracemalloc.start()
class LeakyObject:
instances = []
def __init__(self, data):
self.data = data
LeakyObject.instances.append(self) # BUG: Never removed!
def __repr__(self):
return f"LeakyObject({len(self.data)} bytes)"
# Take snapshot before
snapshot1 = tracemalloc.take_snapshot()
# Create objects that should be freed
for i in range(100):
obj = LeakyObject([0] * 10000)
# Object should be freed here, but instances list keeps reference!
# Take snapshot after
snapshot2 = tracemalloc.take_snapshot()
# Compare snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("Top 5 memory increases:")
for stat in top_stats[:5]:
print(f" {stat}")
# Check for uncollected objects
gc.collect()
print(f"\nObjects in LeakyObject.instances: {len(LeakyObject.instances)}")
print("(These objects are preventing memory from being freed!)")
tracemalloc.stop()
def find_reference_cycles():
"""Find objects involved in reference cycles"""
import gc
class NodeA:
def __init__(self):
self.ref = None
class NodeB:
def __init__(self):
self.ref = None
# Create cycles
a = NodeA()
b = NodeB()
a.ref = b
b.ref = a
# Make them garbage
del a, b
# Find garbage (unreachable cycles)
gc.collect()
garbage = gc.garbage
if garbage:
print("Found garbage (uncollectable cycles):")
for item in garbage:
print(f" {type(item)}: {item}")
else:
print("No uncollectable garbage found")
print("(Cycles were detected and collected successfully)")
detect_memory_leaks()
print("\n" + "="*50 + "\n")
find_reference_cycles()
The del Trap
Margaret warned Timothy about a common pitfall:
def demonstrate_del_trap():
"""__del__ can prevent garbage collection in cycles"""
import gc
class ProblematicNode:
def __init__(self, name):
self.name = name
self.ref = None
def __del__(self):
# Having __del__ makes objects "uncollectable" in some cycles
print(f" Destructor called for {self.name}")
print("Creating cycle with __del__ methods:")
# Disable automatic collection
gc.disable()
a = ProblematicNode("A")
b = ProblematicNode("B")
a.ref = b
b.ref = a
del a, b
print("Variables deleted, trying to collect...")
# Try to collect
uncollectable = gc.collect()
print(f"Garbage collector found {uncollectable} uncollectable objects")
# Check garbage
if gc.garbage:
print(f"Garbage list contains {len(gc.garbage)} objects")
print("(These are in cycles and have __del__ methods)")
# Clean up
gc.garbage.clear()
gc.enable()
def better_cleanup_pattern():
"""Better pattern: context managers instead of __del__"""
class ResourceManager:
def __init__(self, name):
self.name = name
self.resource = f"Resource: {name}"
print(f" Acquired {self.resource}")
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
# Explicit cleanup - better than __del__
print(f" Released {self.resource}")
self.resource = None
# NO __del__ method!
print("Using context manager pattern:")
with ResourceManager("Database") as db:
print(f" Using {db.resource}")
print(" (Cleanup happened automatically)")
demonstrate_del_trap()
print("\n" + "="*50 + "\n")
better_cleanup_pattern()
Optimizing for the Garbage Collector
Margaret shared optimization strategies:
def gc_optimization_strategies():
"""
Strategies for working with Python's garbage collector:
1. Avoid Cycles When Possible
- Use weak references for back-pointers
- Break cycles explicitly before losing references
- Consider restructuring to avoid cycles
2. Batch Operations
- Disable GC during intensive object creation
- Re-enable and collect manually after batch
- Reduces GC overhead during critical sections
3. Tune Generation Thresholds
- For long-running servers, increase thresholds
- Reduces GC frequency at cost of more memory
- Profile to find optimal values
4. Use __slots__ for Memory-Heavy Classes
- Reduces per-instance overhead
- Prevents cycles through __dict__
- Significant savings with many instances
"""
import gc
import time
# Example: Batch creation with GC disabled
def create_many_objects_with_gc():
"""Create objects with GC enabled"""
start = time.perf_counter()
objects = []
for i in range(100000):
objects.append({'id': i, 'data': [0] * 10})
elapsed = time.perf_counter() - start
return elapsed, len(objects)
def create_many_objects_without_gc():
"""Create objects with GC disabled"""
gc.disable()
start = time.perf_counter()
objects = []
for i in range(100000):
objects.append({'id': i, 'data': [0] * 10})
elapsed = time.perf_counter() - start
gc.enable()
gc.collect()
return elapsed, len(objects)
print("Creating 100,000 objects:")
time_with_gc, count = create_many_objects_with_gc()
print(f" With GC enabled: {time_with_gc:.3f} seconds")
time_without_gc, count = create_many_objects_without_gc()
print(f" With GC disabled: {time_without_gc:.3f} seconds")
print(f" Speedup: {time_with_gc / time_without_gc:.2f}x")
gc_optimization_strategies()
Real-World Pattern: Object Pool
Margaret showed a production pattern:
import gc
class ObjectPool:
"""Reusable object pool to reduce GC pressure"""
def __init__(self, factory, max_size=100):
self.factory = factory
self.max_size = max_size
self.available = []
self.in_use = set()
def acquire(self):
"""Get an object from the pool"""
if self.available:
obj = self.available.pop()
else:
obj = self.factory()
self.in_use.add(id(obj))
return obj
def release(self, obj):
"""Return an object to the pool"""
obj_id = id(obj)
if obj_id in self.in_use:
self.in_use.remove(obj_id)
# Reset object state
if hasattr(obj, 'reset'):
obj.reset()
# Add back to pool if not full
if len(self.available) < self.max_size:
self.available.append(obj)
# Otherwise let it be garbage collected
def demonstrate_object_pool():
"""Show object pool reducing GC pressure"""
class ExpensiveObject:
def __init__(self):
self.data = [0] * 10000
self.counter = 0
def reset(self):
self.counter = 0
def do_work(self):
self.counter += 1
# Create pool
pool = ObjectPool(factory=ExpensiveObject, max_size=10)
print("Using object pool:")
gc_collections_before = gc.get_count()[0]
# Simulate many operations
for i in range(1000):
obj = pool.acquire()
obj.do_work()
pool.release(obj)
gc_collections_after = gc.get_count()[0]
print(f" Objects created: ~10 (reused)")
print(f" GC events: {gc_collections_after - gc_collections_before}")
print("\nWithout object pool:")
gc_collections_before = gc.get_count()[0]
# Same operations without pooling
for i in range(1000):
obj = ExpensiveObject()
obj.do_work()
# Object becomes garbage immediately
gc_collections_after = gc.get_count()[0]
print(f" Objects created: 1000 (not reused)")
print(f" GC events: {gc_collections_after - gc_collections_before}")
demonstrate_object_pool()
Monitoring Garbage Collection in Production
Timothy asked about production monitoring:
import gc
import time
import logging
class GCMonitor:
"""Monitor garbage collection in production"""
def __init__(self, log_threshold_ms=100):
self.log_threshold_ms = log_threshold_ms
self.logger = logging.getLogger('gc_monitor')
def __enter__(self):
# Record state before
self.start_time = time.perf_counter()
self.start_counts = gc.get_count()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
# Record state after
elapsed_ms = (time.perf_counter() - self.start_time) * 1000
end_counts = gc.get_count()
# Calculate collections that occurred
collections = tuple(
end_counts[i] - self.start_counts[i]
for i in range(3)
)
# Log if threshold exceeded
if elapsed_ms > self.log_threshold_ms:
self.logger.warning(
f"Slow operation: {elapsed_ms:.2f}ms, "
f"GC collections: {collections}"
)
def production_gc_monitoring():
"""Pattern for monitoring GC impact"""
logging.basicConfig(level=logging.INFO)
# Example: Monitor a critical section
with GCMonitor(log_threshold_ms=50):
# Critical operation
data = []
for i in range(10000):
data.append({'key': i, 'value': [0] * 100})
# Get detailed stats for reporting
stats = {
'collections': gc.get_count(),
'thresholds': gc.get_threshold(),
'tracked_objects': len(gc.get_objects())
}
print("\nGC Statistics:")
print(f" Current counts: {stats['collections']}")
print(f" Thresholds: {stats['thresholds']}")
print(f" Tracked objects: {stats['tracked_objects']:,}")
production_gc_monitoring()
The Library Metaphor
Margaret brought it back to the library:
"Think of Python's memory management like the library's book circulation system," she said.
"Reference counting is like the checkout cards in each book. Every time someone checks out a book, we add their name to the card. When everyone returns it, the card is empty and we know the book can be reshelved or sent to storage.
"But sometimes, books get caught in circular holds - Book A is held for Book B, which is held for Book A. Neither can be reshelved because each is 'waiting' for the other. That's where the cycle detector comes in - like a librarian doing periodic audits to find these circular hold patterns and resolve them.
"The generational system is like organizing books by how long they've been in circulation. New books (Generation 0) get checked frequently - most are returned quickly. Books that have been out for a while (Generation 1) get checked less often. And books that have been out for a long time (Generation 2) get checked rarely - they're likely to stay out for a while longer.
"The system is automatic and efficient. As a developer, you rarely need to think about it - just like library patrons don't think about the reshelving system. But understanding it helps you avoid creating 'circular holds' that prevent proper cleanup."
Common Pitfalls
Timothy compiled the common mistakes:
"""
GARBAGE COLLECTION PITFALLS:
❌ MISTAKE 1: Creating cycles with __del__
class Node:
def __init__(self):
self.ref = None
def __del__(self): # ❌ Makes cycles harder to collect
print("Cleanup")
✓ SOLUTION: Use context managers or explicit cleanup
class Node:
def cleanup(self): # ✓ Explicit cleanup
self.ref = None
❌ MISTAKE 2: Relying on __del__ for resource cleanup
class FileHandler:
def __init__(self, filename):
self.file = open(filename)
def __del__(self): # ❌ May not run promptly!
self.file.close()
✓ SOLUTION: Use context managers
class FileHandler:
def __init__(self, filename):
self.file = open(filename)
def __enter__(self):
return self
def __exit__(self, *args): # ✓ Guaranteed cleanup
self.file.close()
❌ MISTAKE 3: Disabling GC permanently
gc.disable() # ❌ Memory leaks with cycles!
# ... run entire application ...
✓ SOLUTION: Disable only for critical sections
gc.disable()
# Fast object creation
gc.enable()
gc.collect() # ✓ Collect after batch
❌ MISTAKE 4: Not breaking cycles explicitly
class Parent:
def __init__(self):
self.child = Child(self) # Creates cycle
class Child:
def __init__(self, parent):
self.parent = parent # ❌ Cycle with no cleanup
✓ SOLUTION: Use weak references or explicit cleanup
class Parent:
def __init__(self):
self.child = Child(self)
def cleanup(self): # ✓ Explicit cycle breaking
self.child.parent = None
❌ MISTAKE 5: Creating massive temporary structures
def process():
temp = [obj for obj in huge_sequence] # ❌ All in memory at once
return [transform(obj) for obj in temp]
✓ SOLUTION: Use generators for memory efficiency
def process():
return (transform(obj) for obj in huge_sequence) # ✓ Lazy evaluation
"""
Testing Memory Management
Margaret showed testing patterns:
import gc
import weakref
import pytest
def test_objects_are_freed():
"""Test that objects are properly freed"""
class TrackedObject:
pass
# Create weak reference to track lifetime
obj = TrackedObject()
weak_ref = weakref.ref(obj)
# Object should exist
assert weak_ref() is not None
# Delete reference
del obj
# Object should be freed (weak ref returns None)
assert weak_ref() is None
def test_cycle_is_collected():
"""Test that reference cycles are detected and collected"""
class Node:
def __init__(self, name):
self.name = name
self.ref = None
# Create cycle
a = Node("A")
b = Node("B")
a.ref = b
b.ref = a
# Create weak references to track them
weak_a = weakref.ref(a)
weak_b = weakref.ref(b)
# Delete variables (creates garbage cycle)
del a, b
# Force collection
gc.collect()
# Cycle should be collected
assert weak_a() is None
assert weak_b() is None
def test_gc_can_be_disabled():
"""Test manual GC control"""
# Record initial state
initial_enabled = gc.isenabled()
# Disable
gc.disable()
assert not gc.isenabled()
# Re-enable
gc.enable()
assert gc.isenabled()
# Restore initial state
if initial_enabled:
gc.enable()
else:
gc.disable()
# Run with: pytest test_gc.py -v
Performance Tuning
Margaret shared advanced tuning strategies:
def tune_gc_for_workload():
"""Tune garbage collector for specific workloads"""
import gc
print("Default thresholds:")
print(f" {gc.get_threshold()}")
# For short-running scripts: default is fine
# For long-running servers: tune for your workload
# Example 1: Reduce Gen 0 collections (more memory, less CPU)
gc.set_threshold(700, 10, 10) # Default
gc.set_threshold(1000, 15, 15) # Less frequent collections
# Example 2: More aggressive collection (less memory, more CPU)
gc.set_threshold(400, 5, 5) # More frequent collections
# Example 3: Disable Gen 2 collections entirely
# (Only for specific use cases!)
gc.set_threshold(700, 10, 0) # Never collect Gen 2
# Restore defaults
gc.set_threshold(700, 10, 10)
print("\nTuning recommendations:")
print(" Long-running server: Increase thresholds")
print(" Memory-constrained: Decrease thresholds")
print(" CPU-constrained: Increase thresholds")
print(" Short scripts: Use defaults")
tune_gc_for_workload()
Key Takeaways
Margaret summarized the lesson:
"""
GARBAGE COLLECTION KEY TAKEAWAYS:
1. Three-Part System:
- Reference counting: Fast, immediate cleanup (primary mechanism)
- Cycle detection: Finds unreachable cycles (backup mechanism)
- Generational collection: Optimizes frequency (efficiency boost)
2. Reference Counting:
- Every object has a reference count
- Count reaches 0 → immediate cleanup
- Fast and deterministic
- Can't handle cycles
3. Cycle Detection:
- Periodically scans for unreachable cycles
- More expensive than reference counting
- Necessary for container objects that reference each other
- Runs automatically based on generation thresholds
4. Generational Collection:
- Three generations: young, middle-aged, old
- "Most objects die young" hypothesis
- Gen 0 collected frequently, Gen 2 rarely
- Reduces overhead of cycle detection
5. Best Practices:
- Avoid cycles when possible (use weak references)
- Don't rely on __del__ for resource cleanup (use context managers)
- Break cycles explicitly in cleanup methods
- Disable GC during batch operations, collect after
- Use object pools for frequently created objects
6. When to Intervene:
- Memory leaks from cycles
- Performance issues during object creation
- Long-running servers with specific memory patterns
- Critical sections needing predictable performance
7. What to Avoid:
- __del__ methods in objects that might form cycles
- Disabling GC permanently
- Assuming __del__ runs immediately
- Creating massive temporary structures
8. Monitoring:
- Use gc.get_stats() for collection statistics
- Track gc.get_count() during performance issues
- Enable gc.DEBUG_STATS for detailed logging
- Profile with tracemalloc for memory leak detection
"""
Timothy nodded thoughtfully. "So Python's garbage collector is mostly automatic - reference counting handles 99% of cases instantly. The cycle detector catches the edge cases. And the generational system makes it all efficient by focusing effort on young objects."
"Exactly," Margaret confirmed. "As a developer, you usually don't need to think about it. But understanding reference counting helps you avoid cycles, and knowing about the cycle detector explains why certain patterns - like circular references with __del__ methods - cause problems."
With that understanding, Timothy could now write Python code that worked with the garbage collector rather than against it - letting Python's memory management system do its job efficiently and automatically.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Top comments (1)
This whole code is a demonstration of Python’s garbage collection system — how memory leaks and reference cycles can occur, and how to handle them properly.
Key lessons:
Detecting memory leaks
Use gc.collect() and check gc.garbage to find uncollectable objects (cycles that GC can’t clean up).
The del Trap
Objects with del methods involved in cycles can’t be garbage-collected automatically.
Instead of relying on destructors, use context managers (with + enter/exit) for cleanup.
GC Optimization Strategies
Avoid creating reference cycles (especially with back-pointers).
Temporarily disable GC during bulk object creation for performance.
Tune GC thresholds for long-running processes.
Use slots in large classes to save memory.
Object Pool Pattern
Reuse objects instead of constantly creating/destroying them to reduce GC load.
The ObjectPool class demonstrates how to manage reusable objects efficiently.