Aaron Rose

Posted on Nov 5

The Secret Life of Python: The Memory Manager - How Python's Garbage Collector Works

#python #coding #programming #softwaredevelopment

Timothy was investigating a memory leak when Margaret found him staring at his monitoring dashboard. "Look at this," he said, pointing to a slow but steady memory increase. "My application creates thousands of objects, processes them, and I'm pretty sure I'm done with them. But the memory isn't getting freed."

Margaret pulled up a chair. "Welcome to Python's memory management system. Let's talk about reference counting, cycle detection, and generational garbage collection - the three pillars of how Python decides when objects can be freed."

The Problem: When Does Memory Get Freed?

Timothy showed Margaret a simple example that puzzled him:

import sys

def demonstrate_reference_counting():
    """Basic example of Python's reference counting"""

    # Create an object
    data = [1, 2, 3, 4, 5]
    print(f"Reference count after creation: {sys.getrefcount(data)}")

    # Create another reference
    also_data = data
    print(f"Reference count after second reference: {sys.getrefcount(data)}")

    # Delete one reference
    del also_data
    print(f"Reference count after deletion: {sys.getrefcount(data)}")

    # The object still exists because 'data' still references it
    print(f"Data still accessible: {data}")

demonstrate_reference_counting()

Output:

Reference count after creation: 2
Reference count after second reference: 3
Reference count after deletion: 2
Data still accessible: [1, 2, 3, 4, 5]

"Why is the initial reference count 2, not 1?" Timothy asked.

"One reference is your data variable," Margaret explained. "The other is the temporary reference created when you pass it to getrefcount() itself. That's why it starts at 2."

Reference Counting: The Primary Mechanism

Margaret explained Python's fundamental memory management strategy:

"""
Every Python object has a reference count - a number tracking
how many references point to it.

When reference count reaches 0 → object is immediately freed

Reference count increases when:
- Variable assigned to object (x = obj)
- Object passed to function (func(obj))
- Object added to container (list.append(obj))
- Object assigned as attribute (self.data = obj)

Reference count decreases when:
- Variable goes out of scope
- Variable reassigned (x = something_else)
- Object removed from container (list.remove(obj))
- del statement used (del x)
"""

def demonstrate_refcount_mechanics():
    """Show how reference counting works"""
    import sys

    # Create object
    my_list = [1, 2, 3]
    print(f"Initial refcount: {sys.getrefcount(my_list) - 1}")  # Subtract 1 for getrefcount's own reference

    # Add to another container
    container = [my_list]
    print(f"After adding to container: {sys.getrefcount(my_list) - 1}")

    # Create another reference
    another_ref = my_list
    print(f"After another reference: {sys.getrefcount(my_list) - 1}")

    # Remove from container
    container.clear()
    print(f"After removing from container: {sys.getrefcount(my_list) - 1}")

    # Delete reference
    del another_ref
    print(f"After deleting reference: {sys.getrefcount(my_list) - 1}")

    # my_list is the only reference left
    print(f"Final refcount: {sys.getrefcount(my_list) - 1}")

demonstrate_refcount_mechanics()

Output:

Initial refcount: 1
After adding to container: 2
After another reference: 3
After removing from container: 2
After deleting reference: 1
Final refcount: 1

The Speed of Reference Counting

"Reference counting is fast," Margaret explained. "When the count hits zero, the memory is freed immediately - no waiting for a garbage collection sweep."

import time
import sys

def measure_immediate_cleanup():
    """Demonstrate that reference counting cleanup is immediate"""

    class LargeObject:
        def __init__(self, size_mb=10):
            # Allocate approximately size_mb megabytes
            self.data = [0] * (size_mb * 1024 * 1024 // 8)

        def __del__(self):
            # Destructor called when object is freed
            print(f"  LargeObject freed at {time.time():.3f}")

    print("Creating large object...")
    start_time = time.time()

    obj = LargeObject(size_mb=50)
    print(f"Created at {time.time():.3f}")

    # Use the object
    print(f"Object has {len(obj.data):,} elements")

    # Delete it
    print(f"Deleting at {time.time():.3f}")
    del obj
    print(f"Deletion returned at {time.time():.3f}")

    # The destructor should have run immediately

measure_immediate_cleanup()

Output:

Creating large object...
Created at 1699564532.123
Object has 6,553,600 elements
Deleting at 1699564532.456
  LargeObject freed at 1699564532.456
Deletion returned at 1699564532.457

"See?" Margaret pointed. "The moment the reference count hit zero, the destructor ran and memory was freed. No delay, no waiting for a collection cycle."

The Reference Cycle Problem

Timothy asked, "If reference counting is so great, why do we need anything else?"

Margaret showed him the fundamental problem:

def demonstrate_reference_cycle():
    """The problem that reference counting can't solve"""

    class Node:
        def __init__(self, value):
            self.value = value
            self.next = None

        def __del__(self):
            print(f"  Node {self.value} freed")

    print("Creating a cycle:")
    # Create two nodes
    node1 = Node("A")
    node2 = Node("B")

    # Create a cycle - each points to the other
    node1.next = node2
    node2.next = node1

    import sys
    print(f"  node1 refcount: {sys.getrefcount(node1) - 1}")  # 2: our variable + node2.next
    print(f"  node2 refcount: {sys.getrefcount(node2) - 1}")  # 2: our variable + node1.next

    # Delete our variables
    print("\nDeleting variables...")
    del node1
    del node2

    print("Variables deleted, but...")
    # The nodes still reference each other!
    # They have refcount=1 each (pointing to each other)
    # Reference counting CANNOT free them

    print("Forcing garbage collection...")
    import gc
    gc.collect()
    print("Garbage collection complete")

demonstrate_reference_cycle()

Output:

Creating a cycle:
  node1 refcount: 2
  node2 refcount: 2

Deleting variables...
Variables deleted, but...
Forcing garbage collection...
  Node B freed
  Node A freed
Garbage collection complete

"Notice," Margaret explained, "that the objects weren't freed when we deleted the variables. They only got freed when we explicitly ran garbage collection. That's because they were caught in a reference cycle."

The Cycle Detector

Margaret sketched out how Python solves this:

"""
Python's Cycle Detector:

Every container object (lists, dicts, tuples, classes with __dict__)
is tracked by the garbage collector.

Periodically (or when explicitly triggered), the collector:

1. Identifies all tracked objects
2. Finds objects that reference each other but are unreachable
   from the root set (global variables, local variables, etc.)
3. Breaks the cycles and frees the objects

This is MORE EXPENSIVE than reference counting, so Python
uses it as a backup - only for detecting cycles.
"""

def demonstrate_cycle_detection():
    """Show how cycle detection works"""
    import gc
    import sys

    # Disable automatic garbage collection for demonstration
    gc.disable()

    class Node:
        def __init__(self, name):
            self.name = name
            self.references = []

        def __repr__(self):
            return f"Node({self.name})"

        def __del__(self):
            print(f"  Freed: {self.name}")

    print("Creating a complex cycle:")
    # Create a circular linked structure
    a = Node("A")
    b = Node("B")
    c = Node("C")

    a.references.append(b)
    b.references.append(c)
    c.references.append(a)  # Cycle: A -> B -> C -> A

    print(f"Created nodes: {a}, {b}, {c}")
    print(f"Tracked objects before deletion: {len(gc.get_objects())}")

    # Delete our references
    del a, b, c

    print("\nVariables deleted")
    print(f"Tracked objects after deletion: {len(gc.get_objects())}")
    print("(Nodes still in memory - cycles prevent ref counting cleanup)")

    # Run garbage collection
    print("\nRunning garbage collection...")
    collected = gc.collect()
    print(f"Collected {collected} objects")
    print(f"Tracked objects after collection: {len(gc.get_objects())}")

    # Re-enable automatic collection
    gc.enable()

demonstrate_cycle_detection()

Generational Garbage Collection

"Python doesn't check for cycles constantly," Margaret explained. "That would be too expensive. Instead, it uses generational collection."

import gc

def explain_generations():
    """
    Python's garbage collector has THREE generations:

    Generation 0: Young objects (newly created)
        - Checked most frequently
        - Most objects die young (short-lived temporaries)
        - Threshold: ~700 objects before collection

    Generation 1: Middle-aged objects
        - Survived one Gen 0 collection
        - Checked less frequently  
        - Threshold: ~10 Gen 0 collections before Gen 1 collection

    Generation 2: Old objects (long-lived)
        - Survived Gen 1 collection
        - Checked least frequently
        - Threshold: ~10 Gen 1 collections before Gen 2 collection

    The hypothesis: Most objects die young.
    So check young objects often, old objects rarely.
    """

    # Get current thresholds
    thresholds = gc.get_threshold()
    print(f"Collection thresholds: {thresholds}")
    print(f"  Gen 0: Collect after {thresholds[0]} allocations")
    print(f"  Gen 1: Collect after {thresholds[1]} Gen 0 collections")
    print(f"  Gen 2: Collect after {thresholds[2]} Gen 1 collections")

    # Get current generation counts
    counts = gc.get_count()
    print(f"\nCurrent counts: {counts}")
    print(f"  Gen 0: {counts[0]} objects since last collection")
    print(f"  Gen 1: {counts[1]} collections since last Gen 1 collection")
    print(f"  Gen 2: {counts[2]} collections since last Gen 2 collection")

explain_generations()

Output:

Collection thresholds: (700, 10, 10)
  Gen 0: Collect after 700 allocations
  Gen 1: Collect after 10 Gen 0 collections
  Gen 2: Collect after 10 Gen 1 collections

Current counts: (423, 3, 2)
  Gen 0: 423 objects since last collection
  Gen 1: 3 collections since last Gen 1 collection
  Gen 2: 2 collections since last Gen 2 collection

Watching Garbage Collection In Action

Timothy wanted to see it happen. Margaret wrote a monitoring script:

import gc
import sys

def watch_garbage_collection():
    """Monitor garbage collection as it happens"""

    class TrackedObject:
        instances_created = 0
        instances_freed = 0

        def __init__(self):
            TrackedObject.instances_created += 1
            self.data = [0] * 1000  # Make it substantial

        def __del__(self):
            TrackedObject.instances_freed += 1

    # Enable gc debugging
    gc.set_debug(gc.DEBUG_STATS)

    print("Creating objects to trigger collections:\n")

    # Create many objects in a loop
    objects = []
    for i in range(1000):
        obj = TrackedObject()

        # Keep some, let others become garbage
        if i % 10 == 0:
            objects.append(obj)  # Keep every 10th object
        # Others will be garbage (no reference kept)

        # Print status every 200 objects
        if i % 200 == 0:
            counts = gc.get_count()
            print(f"After {i} objects:")
            print(f"  Created: {TrackedObject.instances_created}")
            print(f"  Freed: {TrackedObject.instances_freed}")
            print(f"  Gen counts: {counts}")
            print()

    # Turn off debug output
    gc.set_debug(0)

    # Final collection
    print("\nForcing final collection:")
    collected = gc.collect()
    print(f"Collected {collected} objects")
    print(f"Final stats:")
    print(f"  Created: {TrackedObject.instances_created}")
    print(f"  Freed: {TrackedObject.instances_freed}")
    print(f"  Kept alive: {len(objects)}")

# Note: gc.DEBUG_STATS produces verbose output
# Uncomment to see detailed collection information:
# watch_garbage_collection()

Manual Memory Management

"Can I control garbage collection manually?" Timothy asked.

import gc

def manual_gc_control():
    """Demonstrate manual garbage collection control"""

    # Check if GC is enabled
    print(f"GC enabled: {gc.isenabled()}")

    # Disable automatic garbage collection
    gc.disable()
    print(f"GC enabled after disable: {gc.isenabled()}")

    # Create some garbage
    class Node:
        def __init__(self, value):
            self.value = value
            self.ref = None

        def __del__(self):
            print(f"  Node {self.value} freed")

    # Create a cycle
    a = Node("A")
    b = Node("B")
    a.ref = b
    b.ref = a

    del a, b
    print("Cycle created and variables deleted")
    print("(With GC disabled, cycle persists)")

    # Manually trigger collection
    print("\nManually collecting...")
    collected = gc.collect()
    print(f"Collected {collected} objects")

    # Re-enable automatic collection
    gc.enable()
    print(f"\nGC re-enabled: {gc.isenabled()}")

def collection_statistics():
    """Get detailed statistics about garbage collection"""

    print("Garbage Collection Statistics:")
    print(f"  Collections: {gc.get_count()}")
    print(f"  Thresholds: {gc.get_threshold()}")
    print(f"  Tracked objects: {len(gc.get_objects())}")

    # Get statistics by generation
    stats = gc.get_stats()
    for i, generation_stats in enumerate(stats):
        print(f"\nGeneration {i}:")
        print(f"  Collections: {generation_stats.get('collections', 'N/A')}")
        print(f"  Collected: {generation_stats.get('collected', 'N/A')}")
        print(f"  Uncollectable: {generation_stats.get('uncollectable', 'N/A')}")

manual_gc_control()
print("\n" + "="*50 + "\n")
collection_statistics()

Weak References: Breaking Cycles Intentionally

Margaret showed Timothy a powerful technique:

import weakref
import sys

def demonstrate_weak_references():
    """Weak references don't increase reference count"""

    class Resource:
        def __init__(self, name):
            self.name = name

        def __repr__(self):
            return f"Resource({self.name})"

        def __del__(self):
            print(f"  Resource {self.name} freed")

    # Regular reference
    print("Regular reference:")
    obj = Resource("Data")
    print(f"  Reference count: {sys.getrefcount(obj) - 1}")

    # Strong reference in list
    cache = [obj]
    print(f"  After adding to list: {sys.getrefcount(obj) - 1}")

    # Weak reference doesn't increase count
    print("\nWeak reference:")
    weak_ref = weakref.ref(obj)
    print(f"  After creating weak ref: {sys.getrefcount(obj) - 1}")

    # Can still access through weak ref (as long as object exists)
    print(f"  Accessing via weak ref: {weak_ref()}")

    # Delete strong references
    print("\nDeleting strong references...")
    del obj
    cache.clear()

    # Now weak reference returns None
    print(f"  Weak ref now returns: {weak_ref()}")

def weak_reference_cache_pattern():
    """Common pattern: cache with weak references"""

    class ExpensiveObject:
        def __init__(self, key):
            self.key = key
            self.data = [0] * 1_000_000  # Large object
            print(f"  Created expensive object: {key}")

        def __del__(self):
            print(f"  Freed expensive object: {self.key}")

    # Cache using weak references
    cache = {}

    def get_or_create(key):
        """Get from cache or create new"""
        # Check if we have a weak reference
        if key in cache:
            obj = cache[key]()  # Call weak ref
            if obj is not None:
                print(f"  Cache hit: {key}")
                return obj

        # Create new object
        print(f"  Cache miss: {key}")
        obj = ExpensiveObject(key)
        cache[key] = weakref.ref(obj)
        return obj

    print("First access (creates object):")
    obj1 = get_or_create("data1")

    print("\nSecond access (cache hit):")
    obj2 = get_or_create("data1")

    print("\nDeleting reference:")
    del obj1, obj2

    print("\nThird access (cache miss - object was freed):")
    obj3 = get_or_create("data1")

    del obj3

demonstrate_weak_references()
print("\n" + "="*50 + "\n")
weak_reference_cache_pattern()

Memory Leak Detection

Timothy showed Margaret his memory leak investigation tools:

import gc
import sys
import tracemalloc

def detect_memory_leaks():
    """Detect objects that aren't being freed"""

    # Start tracing memory allocations
    tracemalloc.start()

    class LeakyObject:
        instances = []

        def __init__(self, data):
            self.data = data
            LeakyObject.instances.append(self)  # BUG: Never removed!

        def __repr__(self):
            return f"LeakyObject({len(self.data)} bytes)"

    # Take snapshot before
    snapshot1 = tracemalloc.take_snapshot()

    # Create objects that should be freed
    for i in range(100):
        obj = LeakyObject([0] * 10000)
        # Object should be freed here, but instances list keeps reference!

    # Take snapshot after
    snapshot2 = tracemalloc.take_snapshot()

    # Compare snapshots
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("Top 5 memory increases:")
    for stat in top_stats[:5]:
        print(f"  {stat}")

    # Check for uncollected objects
    gc.collect()
    print(f"\nObjects in LeakyObject.instances: {len(LeakyObject.instances)}")
    print("(These objects are preventing memory from being freed!)")

    tracemalloc.stop()

def find_reference_cycles():
    """Find objects involved in reference cycles"""

    import gc

    class NodeA:
        def __init__(self):
            self.ref = None

    class NodeB:
        def __init__(self):
            self.ref = None

    # Create cycles
    a = NodeA()
    b = NodeB()
    a.ref = b
    b.ref = a

    # Make them garbage
    del a, b

    # Find garbage (unreachable cycles)
    gc.collect()
    garbage = gc.garbage

    if garbage:
        print("Found garbage (uncollectable cycles):")
        for item in garbage:
            print(f"  {type(item)}: {item}")
    else:
        print("No uncollectable garbage found")
        print("(Cycles were detected and collected successfully)")

detect_memory_leaks()
print("\n" + "="*50 + "\n")
find_reference_cycles()

The del Trap

Margaret warned Timothy about a common pitfall:

def demonstrate_del_trap():
    """__del__ can prevent garbage collection in cycles"""

    import gc

    class ProblematicNode:
        def __init__(self, name):
            self.name = name
            self.ref = None

        def __del__(self):
            # Having __del__ makes objects "uncollectable" in some cycles
            print(f"  Destructor called for {self.name}")

    print("Creating cycle with __del__ methods:")

    # Disable automatic collection
    gc.disable()

    a = ProblematicNode("A")
    b = ProblematicNode("B")
    a.ref = b
    b.ref = a

    del a, b

    print("Variables deleted, trying to collect...")

    # Try to collect
    uncollectable = gc.collect()

    print(f"Garbage collector found {uncollectable} uncollectable objects")

    # Check garbage
    if gc.garbage:
        print(f"Garbage list contains {len(gc.garbage)} objects")
        print("(These are in cycles and have __del__ methods)")

        # Clean up
        gc.garbage.clear()

    gc.enable()

def better_cleanup_pattern():
    """Better pattern: context managers instead of __del__"""

    class ResourceManager:
        def __init__(self, name):
            self.name = name
            self.resource = f"Resource: {name}"
            print(f"  Acquired {self.resource}")

        def __enter__(self):
            return self

        def __exit__(self, exc_type, exc_val, exc_tb):
            # Explicit cleanup - better than __del__
            print(f"  Released {self.resource}")
            self.resource = None

        # NO __del__ method!

    print("Using context manager pattern:")
    with ResourceManager("Database") as db:
        print(f"  Using {db.resource}")
    print("  (Cleanup happened automatically)")

demonstrate_del_trap()
print("\n" + "="*50 + "\n")
better_cleanup_pattern()

Optimizing for the Garbage Collector

Margaret shared optimization strategies:

def gc_optimization_strategies():
    """
    Strategies for working with Python's garbage collector:

    1. Avoid Cycles When Possible
       - Use weak references for back-pointers
       - Break cycles explicitly before losing references
       - Consider restructuring to avoid cycles

    2. Batch Operations
       - Disable GC during intensive object creation
       - Re-enable and collect manually after batch
       - Reduces GC overhead during critical sections

    3. Tune Generation Thresholds
       - For long-running servers, increase thresholds
       - Reduces GC frequency at cost of more memory
       - Profile to find optimal values

    4. Use __slots__ for Memory-Heavy Classes
       - Reduces per-instance overhead
       - Prevents cycles through __dict__
       - Significant savings with many instances
    """

    import gc
    import time

    # Example: Batch creation with GC disabled
    def create_many_objects_with_gc():
        """Create objects with GC enabled"""
        start = time.perf_counter()
        objects = []
        for i in range(100000):
            objects.append({'id': i, 'data': [0] * 10})
        elapsed = time.perf_counter() - start
        return elapsed, len(objects)

    def create_many_objects_without_gc():
        """Create objects with GC disabled"""
        gc.disable()
        start = time.perf_counter()
        objects = []
        for i in range(100000):
            objects.append({'id': i, 'data': [0] * 10})
        elapsed = time.perf_counter() - start
        gc.enable()
        gc.collect()
        return elapsed, len(objects)

    print("Creating 100,000 objects:")

    time_with_gc, count = create_many_objects_with_gc()
    print(f"  With GC enabled: {time_with_gc:.3f} seconds")

    time_without_gc, count = create_many_objects_without_gc()
    print(f"  With GC disabled: {time_without_gc:.3f} seconds")
    print(f"  Speedup: {time_with_gc / time_without_gc:.2f}x")

gc_optimization_strategies()

Real-World Pattern: Object Pool

Margaret showed a production pattern:

import gc

class ObjectPool:
    """Reusable object pool to reduce GC pressure"""

    def __init__(self, factory, max_size=100):
        self.factory = factory
        self.max_size = max_size
        self.available = []
        self.in_use = set()

    def acquire(self):
        """Get an object from the pool"""
        if self.available:
            obj = self.available.pop()
        else:
            obj = self.factory()

        self.in_use.add(id(obj))
        return obj

    def release(self, obj):
        """Return an object to the pool"""
        obj_id = id(obj)
        if obj_id in self.in_use:
            self.in_use.remove(obj_id)

            # Reset object state
            if hasattr(obj, 'reset'):
                obj.reset()

            # Add back to pool if not full
            if len(self.available) < self.max_size:
                self.available.append(obj)
            # Otherwise let it be garbage collected

def demonstrate_object_pool():
    """Show object pool reducing GC pressure"""

    class ExpensiveObject:
        def __init__(self):
            self.data = [0] * 10000
            self.counter = 0

        def reset(self):
            self.counter = 0

        def do_work(self):
            self.counter += 1

    # Create pool
    pool = ObjectPool(factory=ExpensiveObject, max_size=10)

    print("Using object pool:")
    gc_collections_before = gc.get_count()[0]

    # Simulate many operations
    for i in range(1000):
        obj = pool.acquire()
        obj.do_work()
        pool.release(obj)

    gc_collections_after = gc.get_count()[0]

    print(f"  Objects created: ~10 (reused)")
    print(f"  GC events: {gc_collections_after - gc_collections_before}")

    print("\nWithout object pool:")
    gc_collections_before = gc.get_count()[0]

    # Same operations without pooling
    for i in range(1000):
        obj = ExpensiveObject()
        obj.do_work()
        # Object becomes garbage immediately

    gc_collections_after = gc.get_count()[0]

    print(f"  Objects created: 1000 (not reused)")
    print(f"  GC events: {gc_collections_after - gc_collections_before}")

demonstrate_object_pool()

Monitoring Garbage Collection in Production

Timothy asked about production monitoring:

import gc
import time
import logging

class GCMonitor:
    """Monitor garbage collection in production"""

    def __init__(self, log_threshold_ms=100):
        self.log_threshold_ms = log_threshold_ms
        self.logger = logging.getLogger('gc_monitor')

    def __enter__(self):
        # Record state before
        self.start_time = time.perf_counter()
        self.start_counts = gc.get_count()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        # Record state after
        elapsed_ms = (time.perf_counter() - self.start_time) * 1000
        end_counts = gc.get_count()

        # Calculate collections that occurred
        collections = tuple(
            end_counts[i] - self.start_counts[i]
            for i in range(3)
        )

        # Log if threshold exceeded
        if elapsed_ms > self.log_threshold_ms:
            self.logger.warning(
                f"Slow operation: {elapsed_ms:.2f}ms, "
                f"GC collections: {collections}"
            )

def production_gc_monitoring():
    """Pattern for monitoring GC impact"""

    logging.basicConfig(level=logging.INFO)

    # Example: Monitor a critical section
    with GCMonitor(log_threshold_ms=50):
        # Critical operation
        data = []
        for i in range(10000):
            data.append({'key': i, 'value': [0] * 100})

    # Get detailed stats for reporting
    stats = {
        'collections': gc.get_count(),
        'thresholds': gc.get_threshold(),
        'tracked_objects': len(gc.get_objects())
    }

    print("\nGC Statistics:")
    print(f"  Current counts: {stats['collections']}")
    print(f"  Thresholds: {stats['thresholds']}")
    print(f"  Tracked objects: {stats['tracked_objects']:,}")

production_gc_monitoring()

The Library Metaphor

Margaret brought it back to the library:

"Think of Python's memory management like the library's book circulation system," she said.

"Reference counting is like the checkout cards in each book. Every time someone checks out a book, we add their name to the card. When everyone returns it, the card is empty and we know the book can be reshelved or sent to storage.

"But sometimes, books get caught in circular holds - Book A is held for Book B, which is held for Book A. Neither can be reshelved because each is 'waiting' for the other. That's where the cycle detector comes in - like a librarian doing periodic audits to find these circular hold patterns and resolve them.

"The generational system is like organizing books by how long they've been in circulation. New books (Generation 0) get checked frequently - most are returned quickly. Books that have been out for a while (Generation 1) get checked less often. And books that have been out for a long time (Generation 2) get checked rarely - they're likely to stay out for a while longer.

"The system is automatic and efficient. As a developer, you rarely need to think about it - just like library patrons don't think about the reshelving system. But understanding it helps you avoid creating 'circular holds' that prevent proper cleanup."

Common Pitfalls

Timothy compiled the common mistakes:

"""
GARBAGE COLLECTION PITFALLS:

❌ MISTAKE 1: Creating cycles with __del__
class Node:
    def __init__(self):
        self.ref = None

    def __del__(self):  # ❌ Makes cycles harder to collect
        print("Cleanup")

✓ SOLUTION: Use context managers or explicit cleanup
class Node:
    def cleanup(self):  # ✓ Explicit cleanup
        self.ref = None

❌ MISTAKE 2: Relying on __del__ for resource cleanup
class FileHandler:
    def __init__(self, filename):
        self.file = open(filename)

    def __del__(self):  # ❌ May not run promptly!
        self.file.close()

✓ SOLUTION: Use context managers
class FileHandler:
    def __init__(self, filename):
        self.file = open(filename)

    def __enter__(self):
        return self

    def __exit__(self, *args):  # ✓ Guaranteed cleanup
        self.file.close()

❌ MISTAKE 3: Disabling GC permanently
gc.disable()  # ❌ Memory leaks with cycles!
# ... run entire application ...

✓ SOLUTION: Disable only for critical sections
gc.disable()
# Fast object creation
gc.enable()
gc.collect()  # ✓ Collect after batch

❌ MISTAKE 4: Not breaking cycles explicitly
class Parent:
    def __init__(self):
        self.child = Child(self)  # Creates cycle

class Child:
    def __init__(self, parent):
        self.parent = parent  # ❌ Cycle with no cleanup

✓ SOLUTION: Use weak references or explicit cleanup
class Parent:
    def __init__(self):
        self.child = Child(self)

    def cleanup(self):  # ✓ Explicit cycle breaking
        self.child.parent = None

❌ MISTAKE 5: Creating massive temporary structures
def process():
    temp = [obj for obj in huge_sequence]  # ❌ All in memory at once
    return [transform(obj) for obj in temp]

✓ SOLUTION: Use generators for memory efficiency
def process():
    return (transform(obj) for obj in huge_sequence)  # ✓ Lazy evaluation
"""

Testing Memory Management

Margaret showed testing patterns:

import gc
import weakref
import pytest

def test_objects_are_freed():
    """Test that objects are properly freed"""

    class TrackedObject:
        pass

    # Create weak reference to track lifetime
    obj = TrackedObject()
    weak_ref = weakref.ref(obj)

    # Object should exist
    assert weak_ref() is not None

    # Delete reference
    del obj

    # Object should be freed (weak ref returns None)
    assert weak_ref() is None

def test_cycle_is_collected():
    """Test that reference cycles are detected and collected"""

    class Node:
        def __init__(self, name):
            self.name = name
            self.ref = None

    # Create cycle
    a = Node("A")
    b = Node("B")
    a.ref = b
    b.ref = a

    # Create weak references to track them
    weak_a = weakref.ref(a)
    weak_b = weakref.ref(b)

    # Delete variables (creates garbage cycle)
    del a, b

    # Force collection
    gc.collect()

    # Cycle should be collected
    assert weak_a() is None
    assert weak_b() is None

def test_gc_can_be_disabled():
    """Test manual GC control"""

    # Record initial state
    initial_enabled = gc.isenabled()

    # Disable
    gc.disable()
    assert not gc.isenabled()

    # Re-enable
    gc.enable()
    assert gc.isenabled()

    # Restore initial state
    if initial_enabled:
        gc.enable()
    else:
        gc.disable()

# Run with: pytest test_gc.py -v

Performance Tuning

Margaret shared advanced tuning strategies:

def tune_gc_for_workload():
    """Tune garbage collector for specific workloads"""

    import gc

    print("Default thresholds:")
    print(f"  {gc.get_threshold()}")

    # For short-running scripts: default is fine
    # For long-running servers: tune for your workload

    # Example 1: Reduce Gen 0 collections (more memory, less CPU)
    gc.set_threshold(700, 10, 10)  # Default
    gc.set_threshold(1000, 15, 15)  # Less frequent collections

    # Example 2: More aggressive collection (less memory, more CPU)
    gc.set_threshold(400, 5, 5)  # More frequent collections

    # Example 3: Disable Gen 2 collections entirely
    # (Only for specific use cases!)
    gc.set_threshold(700, 10, 0)  # Never collect Gen 2

    # Restore defaults
    gc.set_threshold(700, 10, 10)

    print("\nTuning recommendations:")
    print("  Long-running server: Increase thresholds")
    print("  Memory-constrained: Decrease thresholds")
    print("  CPU-constrained: Increase thresholds")
    print("  Short scripts: Use defaults")

tune_gc_for_workload()

Key Takeaways

Margaret summarized the lesson:

"""
GARBAGE COLLECTION KEY TAKEAWAYS:

1. Three-Part System:
   - Reference counting: Fast, immediate cleanup (primary mechanism)
   - Cycle detection: Finds unreachable cycles (backup mechanism)
   - Generational collection: Optimizes frequency (efficiency boost)

2. Reference Counting:
   - Every object has a reference count
   - Count reaches 0 → immediate cleanup
   - Fast and deterministic
   - Can't handle cycles

3. Cycle Detection:
   - Periodically scans for unreachable cycles
   - More expensive than reference counting
   - Necessary for container objects that reference each other
   - Runs automatically based on generation thresholds

4. Generational Collection:
   - Three generations: young, middle-aged, old
   - "Most objects die young" hypothesis
   - Gen 0 collected frequently, Gen 2 rarely
   - Reduces overhead of cycle detection

5. Best Practices:
   - Avoid cycles when possible (use weak references)
   - Don't rely on __del__ for resource cleanup (use context managers)
   - Break cycles explicitly in cleanup methods
   - Disable GC during batch operations, collect after
   - Use object pools for frequently created objects

6. When to Intervene:
   - Memory leaks from cycles
   - Performance issues during object creation
   - Long-running servers with specific memory patterns
   - Critical sections needing predictable performance

7. What to Avoid:
   - __del__ methods in objects that might form cycles
   - Disabling GC permanently
   - Assuming __del__ runs immediately
   - Creating massive temporary structures

8. Monitoring:
   - Use gc.get_stats() for collection statistics
   - Track gc.get_count() during performance issues
   - Enable gc.DEBUG_STATS for detailed logging
   - Profile with tracemalloc for memory leak detection
"""

Timothy nodded thoughtfully. "So Python's garbage collector is mostly automatic - reference counting handles 99% of cases instantly. The cycle detector catches the edge cases. And the generational system makes it all efficient by focusing effort on young objects."

"Exactly," Margaret confirmed. "As a developer, you usually don't need to think about it. But understanding reference counting helps you avoid cycles, and knowing about the cycle detector explains why certain patterns - like circular references with __del__ methods - cause problems."

With that understanding, Timothy could now write Python code that worked with the garbage collector rather than against it - letting Python's memory management system do its job efficiently and automatically.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.

Top comments (1)

James Hond • Nov 6

This whole code is a demonstration of Python’s garbage collection system — how memory leaks and reference cycles can occur, and how to handle them properly.

Key lessons:

Detecting memory leaks

Use gc.collect() and check gc.garbage to find uncollectable objects (cycles that GC can’t clean up).

The del Trap

Objects with del methods involved in cycles can’t be garbage-collected automatically.

Instead of relying on destructors, use context managers (with + enter/exit) for cleanup.

GC Optimization Strategies

Avoid creating reference cycles (especially with back-pointers).

Temporarily disable GC during bulk object creation for performance.

Tune GC thresholds for long-running processes.

Use slots in large classes to save memory.

Object Pool Pattern

Reuse objects instead of constantly creating/destroying them to reduce GC load.

The ObjectPool class demonstrates how to manage reusable objects efficiently.