DEV Community

Kaushikcoderpy
Kaushikcoderpy

Posted on • Originally published at logicandlegacy.blogspot.com

Python Memory Management Masterclass: Garbage Collection, Slots, and WeakRefs

Day 12: The Karma of RAM — Memory Mastery & CPython Internals

40 min read
Series:
Logic & Legacy
Day 12 / 30
Level:
Senior Architecture

Prerequisite: We have bound our behavior and state
together in
The Architecture of State (OOP). Now, we must ask the final architectural question:
Where exactly does that state physically live, and how does it die?

⚠️ The 3 Fatal Memory Illusions

Beginners treat Python like magic. They believe the language handles
memory perfectly, allowing them to spin up millions of variables without
consequence. This leads to catastrophic server crashes (OOM - Out of
Memory). Here is what they get wrong:

  • "The del keyword deletes objects." It absolutely does not. It only deletes a pointer. If you don't understand Reference Counting, your deleted objects are still silently hogging RAM.
  • "Python doesn't have memory leaks." It does. If Object A points to Object B, and Object B points back to Object A, they form an infinite loop of memory that traditional tracking cannot kill.
  • "A Class is just a blueprint." At runtime, a standard class instance creates a massive underlying Dictionary to store its variables. Creating 1 million objects means creating 1 million heavy dictionaries, wasting gigabytes of RAM.

LET'S UNDERSTAND MEMORY IN PYTHON

From C-level structures to the Garbage Collection Matrix.

▶ Table of Contents 🕉️ (Click to Expand)

  1. The Illusion of Deletion: The del Keyword
  2. CPython Under the Hood: Primitives & Arrays
  3. The Architecture of Objects: PyObject & Heaps
  4. The Reincarnation Matrix: Garbage Collection (gc)
  5. Compressing the Soul: __slots__
  6. The Ghost in the RAM: weakref
  7. The Forge: The Multi-Million Object Challenge
  8. The Vyuhas – Key Takeaways
  9. FAQ > "For that which is born, death is certain, and for the dead, birth is > certain. Therefore, you should not lament over the inevitable." > — Bhagavad Gita 2.27

Diagram showing two variables pointing to the same list object in memory with reference count increasing to two, then one variable being deleted and the reference count decreasing to one while the object remains in memory.

In the CPython architecture, physical RAM is the Akasha. Objects are born,
they perform their duties, and when all references to them are lost, they
face the inevitability of the Garbage Collector. We must master this
lifecycle.

1. The Illusion of Deletion: The del Keyword

In languages like C or C++, you must explicitly allocate and free memory
(using malloc and free). Python abstracts this
away using Reference Counting. Every time an object is
bound to a variable, its internal "reference count" increases by 1. When it
loses a variable, the count decreases by 1. If the count hits zero, the
memory is instantly freed.

Therefore, the del keyword
does not delete objects. It only deletes the
name tag (the variable pointing to the object). If another variable
is still pointing to that object, it stays alive in RAM.

import sys

arjuna = ["Gandiva", "Chariot"]  # Ref count = 1
karna = arjuna                 # Ref count = 2 (karna points to the exact same list)

print(f"References to the list: {sys.getrefcount(arjuna) - 1}")   
# -1 because getrefcount itself creates a temporary ref

del arjuna  # The name 'arjuna' is destroyed. Ref count drops to 1.

# The list is NOT deleted! 'karna' still holds it.
print(f"Surviving data: {karna}")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
References to the list: 2
Surviving data: ['Gandiva', 'Chariot']
Enter fullscreen mode Exit fullscreen mode

2. CPython Under the Hood: Primitives & Arrays

Comparison diagram of Python list and tuple memory structures where the list shows extra allocated empty slots for dynamic resizing and the tuple shows fixed size allocation with no unused space.

To optimize memory, you must understand how Python stores data at the
C-level. Python is written in C, and every Python object is secretly a
C-struct.

Integer & String Interning

Python aggressively optimizes memory for small numbers and short strings.
When Python starts, it pre-allocates integers from -5 to
256. If you write a = 100 and
b = 100, Python does not create two objects. It simply points
both a and b to the exact same pre-existing memory
address. This is called Interning.

The Collection Matrix (Lists, Tuples, Sets)

Collections do not store objects directly. They store
arrays of pointers (memory addresses) that point to the objects.
This is why a List can hold an Integer, a String, and another List
simultaneously.

Collection Type C-Level Implementation Memory Overhead
Tuple () A static array of PyObject* pointers. Minimal. Because it is immutable, Python allocates exactly the memory needed and no more.
List [] A dynamic array of PyObject* pointers. High. To make .append() fast, lists over-allocate memory. A list of 4 items might reserve space for 8 items secretly.
Dict {} & Set Hash Tables (Sparse arrays mapping hashes to values). Massive. Hash tables require empty space to avoid collisions. A dictionary is heavily bloated compared to a Tuple.

🏛️ Deep Mechanics: sys.getsizeof()

When a Senior Architect runs sys.getsizeof(my_list), it does
not return the total size of the list and all the data
inside it! It only returns the size of the C-array holding the
pointers. The actual strings or integers inside the list are
stored elsewhere in RAM and must be calculated separately.

3. The Architecture of Objects: The 56-Byte Empty List

Low-level diagram of a Python object showing internal fields including garbage collection header, reference count, type pointer, size field, data pointer, and allocated capacity with byte-level segmentation.

At the absolute core of Python's C source code, every single variable is
derived from a C-struct called PyObject. In Python,
nothing is free. Even a completely empty object carries a
massive metadata payload.

Why does an empty list [] consume exactly 56 bytes on a 64-bit
system? Because you are paying the C-struct overhead tax:

  • 16 Bytes (PyGC_Head): Hidden header required by the Garbage Collector to track cyclic references.
  • 8 Bytes (ob_refcnt): The reference counter.
  • 8 Bytes (ob_type): Memory pointer to the object's Type/Class.
  • 8 Bytes (ob_size): The current number of items.
  • 8 Bytes (ob_item): Memory pointer to the actual array holding the data pointers.
  • 8 Bytes (allocated): The total capacity currently allocated in RAM (to allow fast appending).
import sys

# Proving the physical payload of "empty" data
empty_int = 0
empty_str = ""
empty_list = []
empty_dict = {}

print(f" Empty Integer: {sys.getsizeof(empty_int)} bytes")
print(f" Empty String:  {sys.getsizeof(empty_str)} bytes")
print(f" Empty List:    {sys.getsizeof(empty_list)} bytes")
print(f" Empty Dict:    {sys.getsizeof(empty_dict)} bytes")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
Empty Integer: 24 bytes
Empty String:  49 bytes
Empty List:    56 bytes
Empty Dict:    232 bytes
Enter fullscreen mode Exit fullscreen mode

Notice the Dictionary. 232 bytes for absolutely nothing. This is why
using standard classes (which rely on __dict__) for millions
of objects will obliterate your server's RAM.

4. The Reincarnation Matrix: Garbage Collection (gc)

Diagram showing two objects referencing each other in a loop, preventing their reference counts from reaching zero and requiring garbage collection to free memory.

We established that Reference Counting frees memory instantly when the count
hits zero. But what happens in a Cyclic Reference?

Imagine Object A has an attribute pointing to Object B. Object B has an
attribute pointing back to Object A. If you delete the global variables
pointing to A and B, they are isolated from the main program... but they are
still pointing at each other. Their reference counts are stuck at
1. Reference counting fails here, causing a
Memory Leak.

To solve this, Python runs a secondary system: The
Generational Garbage Collector (gc module).
Periodically, Python pauses execution and scans the heap for cyclic islands
of memory that have no connection to the global scope. When found, it
forcefully destroys them.

import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.connection = None

# Creating the objects
node_a = Node("A")
node_b = Node("B")

# Creating a Cyclic Reference (Infinite Loop of Memory)
node_a.connection = node_b
node_b.connection = node_a

# Deleting the main pointers. 
# Ref count is NOT zero because they point to each other.
del node_a
del node_b

# Force the Garbage Collector to run manually to destroy the cycle
collected = gc.collect()
print(f"Garbage Collector destroyed {collected} orphaned objects.")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
Garbage Collector destroyed 2 orphaned objects.
Enter fullscreen mode Exit fullscreen mode

5. Compressing the Soul: __slots__ & The Tradeoffs

Comparison of Python object storage where a standard object uses a dictionary for attributes with higher memory overhead and a slotted object uses fixed memory slots with reduced memory usage.

If you are building an AI simulation, a game, or processing massive database
rows, you might need to instantiate 1,000,000 User objects.
Because every standard Python class creates a 232+ byte
__dict__ to hold its variables, instantiating a million objects
means allocating a million Hash Tables.

To fix this, Senior Architects use __slots__. By defining
__slots__ = ['name', 'age'], you instruct Python:
"Do not create a \_\_dict\_\_ for this object. Use a tiny, fixed-size
C-array instead."

import sys

# ❌ The Heavy Class (Uses __dict__)
class HeavyUser:
    def __init__(self, name, age):
        self.name = name
        self.age = age

# ✅ The Slotted Class (No __dict__ created)
class LightUser:
    __slots__ = ['name', 'age']

    def __init__(self, name, age):
        self.name = name
        self.age = age

h_user = HeavyUser("Arjuna", 30)
l_user = LightUser("Arjuna", 30)

# Measuring the size of the object AND its dictionary
heavy_size = sys.getsizeof(h_user) + sys.getsizeof(h_user.__dict__)
light_size = sys.getsizeof(l_user) # No __dict__ exists!

print(f"HeavyUser RAM: {heavy_size} bytes")
print(f"LightUser RAM: {light_size} bytes")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
HeavyUser RAM: 152 bytes
LightUser RAM: 48 bytes
Enter fullscreen mode Exit fullscreen mode

☢️ The Cost of Compression (The Tradeoffs)

A savings of 104 bytes per object. Scaled to 1 million objects, that is
~104 MB of pure RAM saved. But
nothing in architecture is free. By stripping the
__dict__, you sacrifice Python's dynamic nature:

  • No Dynamic Assignment: You can no longer add new variables to an object on the fly. Doing l_user.weapon = "Bow" will instantly crash with an AttributeError because there is no dictionary to hold the new key.
  • Inheritance Nightmares: If you try to inherit from multiple parent classes that both define __slots__, Python will crash with a TypeError: multiple bases have instance lay-out conflict.
  • Weakref Breakage: Removing __dict__ also removes the hidden __weakref__ pointer. To use slotted classes with caches, you MUST manually add '__weakref__' to your slots list.

6. The Ghost in the RAM: weakref

Diagram illustrating strong reference keeping an object alive versus weak reference that does not increase reference count and becomes null after the object is deleted.

Sometimes you want to track an object (like putting it in a Cache
dictionary) to speed up database reads. However, if you put an object in a
global dictionary, its Reference Count goes up. Because the dictionary is
global, the object will never be garbage collected, even if the
rest of your app is done with it. You have created a Memory Leak via
caching.

The weakref module creates a "Ghost Pointer". It allows you to
look at an object without increasing its Reference Count. If the object is
deleted elsewhere, the Weak Reference quietly evaporates and returns
None.

import weakref

class HeavyDatabaseRecord:
    def __init__(self, data):
        self.data = data

record = HeavyDatabaseRecord("1GB of payload")

# Create a Weak Reference (Does not increase ref count)
cache_ref = weakref.ref(record)

print(f"Accessing via Ghost Pointer: {cache_ref().data}")

# Delete the main strong reference
del record

# The Garbage Collector destroys the object. The Ghost Pointer returns None.
print(f"Cache after deletion: {cache_ref()}")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
Accessing via Ghost Pointer: 1GB of payload
Cache after deletion: None
Enter fullscreen mode Exit fullscreen mode

7. The Forge: The Multi-Million Object Challenge

Flowchart showing stages of Python object lifecycle including creation, reference count increase, reference decrease, object deletion, and garbage collection handling cyclic references.

The Challenge: You are tasked with caching 100,000 player
connections in a high-speed multiplayer game. Build a
PlayerConnection class optimized for minimal memory (using
slots) and store them in a WeakValueDictionary so disconnected
players do not leak memory.

import weakref

# TODO: Create a PlayerConnection class. 
# It must have 'ip_address' and 'port' as instance variables.
# It MUST be optimized for memory using slots.

# TODO: Initialize a weakref.WeakValueDictionary() named 'server_cache'

# TODO: Create a player object, assign it to the cache with key 'player_1'
# TODO: Delete the player object.
# TODO: Print the list of values in the cache to prove it evaporated.
Enter fullscreen mode Exit fullscreen mode

▶ Show Architectural Solution & Output

import weakref

# 1. Slotted Class for massive memory savings
class PlayerConnection:
    # NOTE: To use weakref with slots, you MUST explicitly add '__weakref__' to the slots list!
    __slots__ = ['ip_address', 'port', '__weakref__']

    def __init__(self, ip_address, port):
        self.ip_address = ip_address
        self.port = port

# 2. A Cache that automatically drops entries when original objects die
server_cache = weakref.WeakValueDictionary()

p1 = PlayerConnection("192.168.1.1", 8080)
server_cache['player_1'] = p1

print(f"Cache before disconnect: {list(server_cache.items())}")

# 3. Simulate player disconnecting and main system deleting the object
del p1

# Cache magically empties itself, preventing memory leaks!
print(f"Cache after disconnect:  {list(server_cache.items())}")
Enter fullscreen mode Exit fullscreen mode
[RESULT]
Cache before disconnect: [('player_1', <__main__.PlayerConnection object at 0x...>)]
Cache after disconnect:  []
Enter fullscreen mode Exit fullscreen mode

8. The Vyuhas – Key Takeaways

  • The Maya of del: del does not clear RAM. It removes a pointer and reduces the Reference Count by 1. RAM is only cleared when the count reaches 0.
  • List vs Tuple Ram: Lists over-allocate memory for dynamic appending. Tuples are perfectly sized. If data is static, Tuples are far superior for memory architecture.
  • Cyclic Leaks: Two objects pointing at each other will never hit a reference count of 0. The gc module exists entirely to hunt and destroy these cyclic loops.
  • Compressing State: Use __slots__ to banish the heavy __dict__ from massive class populations, saving over 60% memory overhead.
  • Ghost Tracking: Use weakref when caching or cataloging objects. It allows you to monitor them without preventing the Garbage Collector from freeing their memory.

FAQ: Memory & CPython Internals

Architectural memory questions answered — optimised for quick lookup.

Why does sys.getsizeof() return a small number for a massive list?

Because collections in Python (like Lists and Dictionaries) only store
pointers (memory addresses) to the actual data.
getsizeof() on a list only returns the size of the C-array
holding those pointers. It does not recursively calculate the size of
the strings or objects stored inside.

Can I manually force Garbage Collection?

Yes. By running import gc; gc.collect(), you force Python
to immediately pause execution and scan the generational heaps for
cyclic references to destroy. However, doing this too often severely
impacts application performance. It should only be used after massive
data purging operations.

Are there any downsides to using \_\_slots\_\_?

Yes. Slotted classes are rigid. You cannot add new, arbitrary variables
to an object dynamically at runtime (e.g.,
user.new_var = True will throw an error). Additionally,
they break multiple inheritance if multiple parent classes define
conflicting slots.

What happens if I try to use weakref on a standard slotted class?

It will fail with a TypeError. Because __slots__ removes
the __dict__, it also removes the hidden
__weakref__ attribute that Python uses to track ghost
pointers. If you want to use weak references on a memory-optimized
class, you must explicitly include '__weakref__' as a
string inside your slots list.

The Infinite Game: Join the Vyuha

If you are building an architectural legacy, hit the
Follow button in the sidebar to receive the remaining
days of this 30-Day Series directly to your feed.

💬 Have you ever crashed a server with an Out-of-Memory (OOM) error due to
a memory leak? Drop your war story below.

The Architect's Protocol: To master the architecture of logic, read
The Architect's Intent.

[← Previous

Day 11: The Architecture of State — Classes & OOP](https://logicandlegacy.blogspot.com/2026/03/day-11-classes-oop.html)
[Next →

Day 13: Type Hinting & Mypy — The Static Shield](#)


Originally published at https://logicandlegacy.blogspot.com

Top comments (0)