Shayan Holakouee

Posted on Apr 29

Python's Data Model Is an API. Here Is How to Use It Properly

#python #django #webdev #computerscience

Most Python developers know that __len__ makes len() work and __add__ makes + work. That is the surface. The actual story is that Python's data model is a coherent, documented protocol through which user-defined objects can participate in the language itself: not just operators, but truthiness, hashing, iteration, context management, attribute access, memory layout, and more. Using it well means understanding what CPython actually calls, in what order, and why.

The Interpreter Calls These, Not You

The first thing to internalize: dunder methods are called by the interpreter, not by user code. When you write len(obj), Python calls type(obj).__len__(obj). Not obj.__len__(). The lookup goes through the type, not the instance.

This matters for two reasons.

First, defining a dunder on an instance rather than the class does not work:

class MyClass:
    pass

obj = MyClass()
obj.__len__ = lambda: 42

len(obj)  # TypeError: object of type 'MyClass' has no len()

The interpreter checked type(obj).__len__, found nothing, and raised. The instance attribute was ignored entirely.

Second, it means metaclasses can define dunders that apply to the class itself as an object. __len__ on a metaclass makes len(MyClass) work. __iter__ on a metaclass makes for x in MyClass work. The class is an instance of the metaclass, so the metaclass's dunders are the class's dunders.

Truthiness, Equality, and Hashing Are a Triangle

These three are deeply connected and breaking the contract between them is one of the more common ways to introduce subtle bugs.

Truthiness is determined by __bool__. If not defined, Python falls back to __len__: an object with length zero is falsy. If neither is defined, the object is always truthy.

class Container:
    def __init__(self, items):
        self.items = items

    def __len__(self):
        return len(self.items)

c = Container([])
if not c:
    print("empty")  # prints, because __bool__ fell back to __len__

Equality is __eq__. The default is identity comparison (same as is). Override it to define value equality.

Hashing is __hash__. Here is the contract: objects that compare equal must have the same hash. If you define __eq__, Python automatically sets __hash__ to None, making your object unhashable. This is intentional. Mutable objects with value equality should not be hashable because their hash could change when they are mutated, silently breaking set and dict behavior.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        return isinstance(other, Point) and self.x == other.x and self.y == other.y

    # Python set __hash__ = None automatically
    # Point is now unhashable

p = Point(1, 2)
{p}  # TypeError: unhashable type: 'Point'

If you want a hashable value type, define both:

    def __hash__(self):
        return hash((self.x, self.y))

The tuple hash is stable, consistent with equality, and composes well. Never define __hash__ without thinking about what makes two instances equal, and never define __eq__ without thinking about whether you also need __hash__.

Comparison Operators and the Reflected Protocol

__lt__, __le__, __gt__, __ge__, __eq__, __ne__ cover the six comparison operators. Python's mechanism for resolving them is more nuanced than most people realize.

When you write a < b, Python first tries type(a).__lt__(a, b). If that returns NotImplemented, Python tries the reflected operation: type(b).__gt__(b, a). This gives the right operand a chance to handle the comparison when the left operand does not know how.

class Celsius:
    def __init__(self, temp):
        self.temp = temp

    def __lt__(self, other):
        if isinstance(other, Fahrenheit):
            return self.temp < (other.temp - 32) * 5 / 9
        if isinstance(other, Celsius):
            return self.temp < other.temp
        return NotImplemented  # not NotImplemented(), the singleton


class Fahrenheit:
    def __init__(self, temp):
        self.temp = temp

    def __gt__(self, other):
        if isinstance(other, Celsius):
            return self.temp > other.temp * 9 / 5 + 32
        return NotImplemented

NotImplemented is a singleton, not an exception. Returning it tells Python to try the reflected operation. Raising TypeError or returning False are both wrong here: they prevent the fallback.

functools.total_ordering is worth knowing. Define __eq__ and one of the four ordering methods, and it derives the rest. It is slower than defining all four manually but saves considerable boilerplate for types where ordering is well-defined.

Arithmetic Operators and In-Place Variants

The arithmetic protocol has three layers: forward, reflected, and in-place.

a + b tries type(a).__add__(a, b) first. If that returns NotImplemented, it tries type(b).__radd__(b, a). The reflected methods (__radd__, __rsub__, __rmul__, etc.) exist so that user-defined types can work with built-in types on both sides of an operator.

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        if isinstance(other, Vector):
            return Vector(self.x + other.x, self.y + other.y)
        return NotImplemented

    def __radd__(self, other):
        # called when other + self and other.__add__ returned NotImplemented
        return self.__add__(other)

    def __mul__(self, scalar):
        if isinstance(scalar, (int, float)):
            return Vector(self.x * scalar, self.y * scalar)
        return NotImplemented

    def __rmul__(self, scalar):
        return self.__mul__(scalar)

v = Vector(1, 2)
print(3 * v)  # works because int.__mul__(3, v) returns NotImplemented,
              # then Vector.__rmul__(v, 3) is tried

In-place operators (+=, -=, *=) call __iadd__, __isub__, __imul__ and so on. If the in-place method is not defined, Python falls back to the regular method and rebinds the name. For mutable types, define the in-place variants to mutate in place and return self. For immutable types, do not define them and let Python handle the fallback.

def __iadd__(self, other):
    if isinstance(other, Vector):
        self.x += other.x
        self.y += other.y
        return self  # must return self for in-place operations
    return NotImplemented

Attribute Access Is More Controllable Than You Think

The attribute access protocol has four hooks: __getattribute__, __getattr__, __setattr__, and __delattr__.

__getattribute__ is called on every attribute access, including dunders. Overriding it without calling super().__getattribute__ will break your class entirely. It is rarely the right hook.

__getattr__ is called only when normal attribute lookup has failed. It is the right hook for lazy loading, proxying, and dynamic attributes:

class LazyLoader:
    def __init__(self, module_name):
        self._module_name = module_name
        self._module = None

    def __getattr__(self, name):
        if self._module is None:
            import importlib
            self._module = importlib.import_module(self._module_name)
        return getattr(self._module, name)

__setattr__ is called on every attribute assignment, including in __init__. If you override it, you must use object.__setattr__(self, name, value) or super().__setattr__(name, value) to actually store the value, otherwise you get infinite recursion:

class Validated:
    def __setattr__(self, name, value):
        if name == "age" and not isinstance(value, int):
            raise TypeError("age must be int")
        super().__setattr__(name, value)  # not self.name = value

The Container Protocol in Full

To make a proper sequence, you need __len__ and __getitem__. Python derives a surprising amount from these two:

class Fibonacci:
    def __len__(self):
        return 100

    def __getitem__(self, index):
        if isinstance(index, slice):
            return [self[i] for i in range(*index.indices(len(self)))]
        if index < 0:
            index += len(self)
        if not 0 <= index < len(self):
            raise IndexError(index)
        a, b = 0, 1
        for _ in range(index):
            a, b = b, a + b
        return a

fib = Fibonacci()
print(fib[10])       # 55
print(fib[2:5])      # [1, 2, 3]
print(10 in fib)     # works: Python iterates using __getitem__
for n in fib:        # works: iteration falls back to __getitem__
    pass

in and for both fall back to __getitem__-based iteration if __contains__ and __iter__ are not defined. Python calls __getitem__ with increasing integer indices starting at 0 until it gets an IndexError. This fallback exists for backward compatibility but you should define __iter__ explicitly for anything you intend to be iterable.

For mappings, define __getitem__, __setitem__, __delitem__, __len__, and __iter__. Inheriting from collections.abc.MutableMapping and implementing those five methods gives you keys(), values(), items(), get(), update(), pop(), and the rest for free through mixin implementations.

slots: Memory Layout as a Data Model Feature

__slots__ is part of the data model. Defining it on a class prevents the creation of __dict__ on instances and instead allocates a fixed set of descriptors for the named attributes.

class Point:
    __slots__ = ("x", "y")

    def __init__(self, x, y):
        self.x = x
        self.y = y

p = Point(1, 2)
p.z = 3  # AttributeError: 'Point' object has no attribute 'z'

The memory savings are significant for classes with many instances. A regular instance carries a __dict__ (a hash table) and a __weakref__ slot. A slotted instance carries only the declared attributes. For a class like Point that might be instantiated millions of times, this matters.

__slots__ interacts with inheritance in ways that require care. If a subclass does not define __slots__, instances get a __dict__ again, defeating the purpose. Every class in the hierarchy needs to define __slots__ for the optimization to hold all the way down.

The One Rule That Ties Everything Together

Dunder methods are a protocol, not a feature list. Every method participates in a documented contract. __eq__ implies __hash__ constraints. __iter__ implies __next__. __enter__ implies __exit__. __get__ participating in the descriptor protocol implies understanding when obj is None.

The mistakes come from implementing part of a protocol and ignoring the rest. Define __eq__ without thinking about __hash__ and your objects silently break in sets. Define __iter__ without StopIteration handling and your loops never end. Define __enter__ without a matching __exit__ and resources leak.

Read the protocol documentation for any dunder you implement. The contracts are specified, they are not long, and violating them produces bugs that are difficult to reproduce and harder to trace. The data model is the most reliable part of Python. It rewards the developers who actually read it.

DEV Community