Mastering __new__, __repr__, and __hash__
The Constructor Myth
Pop quiz: What method creates a Python object?
If you answered __init__, you're in good company, and you're wrong!
class Point:
def __init__(self, x, y):
print(f"__init__ called with {self}")
self.x = x
self.y = y
p = Point(3, 4)
# Output: __init__ called with <__main__.Point object at 0x7f8b4c>
Notice that inside __init__, we already have self. The object already exists. So what actually created it?
The answer is __new__, a method so fundamental that Python calls it automatically, and most developers never even know it exists.
The Truth About Object Creation
Here's what actually happens when you call Point(3, 4):
class Point:
def __new__(cls, x, y):
print(f"__new__ called with class {cls}")
instance = super().__new__(cls)
print(f"__new__ created {instance}")
return instance
def __init__(self, x, y):
print(f"__init__ called with {self}")
self.x = x
self.y = y
p = Point(3, 4)
# Output:
# __new__ called with class <class '__main__.Point'>
# __new__ created <__main__.Point object at 0x7f8b4c>
# __init__ called with <__main__.Point object at 0x7f8b4c>
The execution flow is:
-
__new__(cls, ...)- The Architect- Allocates memory for a new instance
- Returns the newly created object
- Receives the class as first parameter, not an instance
-
__init__(self, ...)- The Interior Decorator- Receives the instance created by
__new__ - Populates it with data
- Returns
None(always!)
- Receives the instance created by
The Mental Model:
Point(3, 4)
↓
__new__(Point, 3, 4) → creates empty object → instance
↓
__init__(instance, 3, 4) → populates instance.x, instance.y
↓
return instance
99% of the time, you don't need to touch __new__. Python's default implementation (inherited from object) handles memory allocation perfectly. But there's one critical use case where __new__ is not just useful, it's essential.
Deep Dive: The Singleton Pattern
Imagine you're building a database connection pool, a configuration manager, or a logger. You want exactly one instance of the class to exist, no matter how many times someone calls the constructor.
# What we want:
db1 = Database()
db2 = Database()
print(db1 is db2) # Should be True!
Can we do this with __init__? Let's try:
class Database:
_instance = None
def __init__(self):
if Database._instance is not None:
# Too late! Memory is already allocated
# We can't "un-create" this object
pass
Database._instance = self
db1 = Database()
db2 = Database()
print(db1 is db2) # False - we created two objects!
The problem: by the time __init__ runs, __new__ has already allocated memory for a new object. We can't prevent the creation—only configure what's already been created.
The Solution: Intercept at __new__
class Database:
_instance = None
def __new__(cls):
if cls._instance is None:
print("Creating the one true Database instance...")
cls._instance = super().__new__(cls)
else:
print("Returning existing instance...")
return cls._instance
def __init__(self):
print(f"__init__ called on {id(self)}")
db1 = Database()
# Output:
# Creating the one true Database instance...
# __init__ called on 140234567890
db2 = Database()
# Output:
# Returning existing instance...
# __init__ called on 140234567890
print(db1 is db2) # True!
print(id(db1), id(db2)) # Same memory address
What's happening:
- First call:
_instanceisNone, so we callsuper().__new__(cls)to actually allocate memory - We cache this instance in
_instance - Second call:
_instanceexists, so we return the cached object -
__init__still runs every time (be careful with this!)
The Critical Detail: super().__new__(cls)
This line is calling object.__new__(cls), the base implementation that actually talks to Python's memory allocator. You're delegating the "real" work of memory allocation to Python's core object class.
Do NOT do this:
def __new__(cls):
if cls._instance is None:
cls._instance = cls() # RECURSION ERROR!
return cls._instance
Calling cls() inside __new__ calls __new__ again, which calls __new__ again... infinite recursion.
Singleton Best Practice
If __init__ shouldn't run multiple times, use a flag:
class Database:
_instance = None
_initialized = False
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self, connection_string="localhost"):
if not Database._initialized:
self.connection_string = connection_string
Database._initialized = True
print(f"Connected to {connection_string}")
db1 = Database("prod-server") # Connected to prod-server
db2 = Database("dev-server") # (no output - already initialized)
print(db1.connection_string) # prod-server
The Representation Layer: __str__ vs __repr__
You've built a beautiful class. Now it looks like this in the debugger:
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
m = Money(10, "USD")
print(m) # <__main__.Money object at 0x7f8b4c>
Useless. Let's fix it.
The Two Faces of Representation
Python has two methods for converting objects to strings:
__str__ - The User-Friendly Version
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
def __str__(self):
return f"${self.amount} {self.currency}"
m = Money(10, "USD")
print(m) # $10 USD
print(str(m)) # $10 USD
__repr__ - The Developer Version
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
def __repr__(self):
return f"Money({self.amount}, {self.currency})"
m = Money(10, "USD")
print(repr(m)) # Money(10, USD)
print([m]) # [Money(10, USD)] - repr is used in containers!
The Golden Rule of __repr__
The output should be valid Python code that recreates the object.
This is often stated as: eval(repr(obj)) == obj
m = Money(10, "USD")
code = repr(m) # "Money(10, USD)"
m2 = eval(code) # Recreate the object!
print(m2.amount) # 10
Wait... did that actually work? Let's test it:
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
def __repr__(self):
return f"Money({self.amount}, {self.currency})"
m = Money(10, "USD")
print(repr(m)) # Money(10, USD)
eval(repr(m)) # NameError: name 'USD' is not defined
The problem: USD without quotes isn't a string—it's treated as a variable name!
The !r Trick
Python's f-strings have a special formatter that automatically calls repr() on values:
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
def __repr__(self):
return f"Money({self.amount!r}, {self.currency!r})"
m = Money(10, "USD")
print(repr(m)) # Money(10, 'USD') - notice the quotes!
m2 = eval(repr(m)) # Works perfectly!
The !r format specifier calls repr() on each value, which for strings adds the quotes. This ensures the output is valid Python syntax.
Pro comparison:
amount = 10
currency = "USD"
# Without !r
print(f"Money({amount}, {currency})") # Money(10, USD)
# With !r
print(f"Money({amount!r}, {currency!r})") # Money(10, 'USD')
When to Use Which
-
__repr__: Always implement this. It's used by debuggers, logs, and the interactive interpreter. Make it unambiguous. -
__str__: Optional. Only implement if you need a user-friendly format. If not defined, Python falls back to__repr__.
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
def __repr__(self):
return f"Money({self.amount!r}, {self.currency!r})"
def __str__(self):
symbols = {"USD": "$", "EUR": "€", "GBP": "£"}
symbol = symbols.get(self.currency, self.currency)
return f"{symbol}{self.amount}"
m = Money(10, "USD")
print(str(m)) # $10 (user-friendly)
print(repr(m)) # Money(10, 'USD') (code-like)
print(m) # $10 (print uses str)
print([m]) # [Money(10, 'USD')] (containers use repr)
The Hashability Contract: Making Objects Dictionary Keys
You've probably used strings and tuples as dictionary keys:
cache = {}
cache["user:123"] = {"name": "Alice"} # String key - works
cache[(1, 2)] = "point" # Tuple key - works
But try this:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
cache = {}
cache[p] = "point" # TypeError: unhashable type: 'Point'
Why can't we use our custom object as a key? Because it's not hashable.
What Does Hashable Mean?
To be used as a dictionary key or stored in a set, an object must:
- Have a
__hash__method that returns an integer - Have an
__eq__method to check equality - Follow the hashability contract
The Hashability Contract
Rule 1: Equal objects must have equal hashes
If a == b, then hash(a) MUST equal hash(b)
Rule 2: The hash must never change
Once created, an object's hash must remain constant for its entire lifetime. This is why lists aren't hashable—you can modify them!
# This is why lists fail:
lst = [1, 2, 3]
hash(lst) # TypeError: unhashable type: 'list'
# But tuples work:
tpl = (1, 2, 3)
hash(tpl) # 529344067295497451
Implementing Hashability
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
p1 = Point(1, 2)
p2 = Point(1, 2)
p3 = Point(3, 4)
print(p1 == p2) # True
print(hash(p1) == hash(p2)) # True - contract satisfied!
cache = {p1: "origin"}
print(cache[p2]) # "origin" - found it using p2!
Why Delegate to a Tuple?
The line return hash((self.x, self.y)) is the idiomatic way to hash objects. Here's why:
- Tuples are immutable - Their hash is guaranteed stable
- Python's tuple hash is well-designed - It combines element hashes efficiently
- It's simple - You don't have to write your own hash combining logic
Under the hood, Python's tuple hash does something like:
# Simplified version of what Python does
def hash_tuple(items):
result = 0x345678
for item in items:
result = (1000003 * result) ^ hash(item)
return result
But you don't need to know that—just pack your state into a tuple and let Python handle it.
The NotImplemented Pattern
Notice this line in __eq__:
if not isinstance(other, Point):
return NotImplemented
Don't return False here! Returning NotImplemented tells Python "I don't know how to compare with this type—ask the other object."
class Point:
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self.x == other.x and self.y == other.y
p = Point(1, 2)
print(p == 5) # False (Python tries both p.__eq__(5) and (5).__eq__(p))
If you returned False instead, you'd be claiming "a Point is definitely not equal to an integer," which might not be true if someone subclasses Point and adds custom comparison logic.
The Immutability Trap
Remember: hashable objects should be immutable. If you allow modification, weird things happen:
class MutablePoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
if not isinstance(other, MutablePoint):
return NotImplemented
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
p = MutablePoint(1, 2)
cache = {p: "original"}
print(cache[p]) # "original" - works
# Mutate the object
p.x = 99
# Now the hash changed!
print(cache[p]) # KeyError: MutablePoint object not found
The object is now "lost" in the dictionary because its hash changed. The dictionary is looking in the wrong bucket!
Best practice: If you implement __hash__, make your object immutable using __slots__ and properties:
class ImmutablePoint:
__slots__ = ['_x', '_y']
def __init__(self, x, y):
object.__setattr__(self, '_x', x)
object.__setattr__(self, '_y', y)
@property
def x(self):
return self._x
@property
def y(self):
return self._y
def __setattr__(self, name, value):
raise AttributeError("ImmutablePoint is immutable")
def __eq__(self, other):
if not isinstance(other, ImmutablePoint):
return NotImplemented
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
def __repr__(self):
return f"ImmutablePoint({self.x!r}, {self.y!r})"
Summary: The Professional Object Checklist
Today we've learned the lifecycle methods that make Python objects behave like first-class types:
Creation & Representation
-
__new__(cls, ...)creates the object;__init__(self, ...)configures it - Use
__new__for Singletons and other creation-control patterns -
__repr__is for developers (make it code-like with!r) -
__str__is for users (optional, human-friendly)
The Hashability Contract
-
__eq__defines equality (returnNotImplementedfor unknown types) -
__hash__enables dictionary/set usage (delegate to tuple) -
Rule: If
a == b, thenhash(a) == hash(b) - Immutability: The hash must never change
The Professional Class Template
class Money:
__slots__ = ['_amount', '_currency']
def __init__(self, amount, currency):
object.__setattr__(self, '_amount', amount)
object.__setattr__(self, '_currency', currency)
@property
def amount(self):
return self._amount
@property
def currency(self):
return self._currency
def __setattr__(self, name, value):
raise AttributeError("Money is immutable")
def __repr__(self):
return f"Money({self.amount!r}, {self.currency!r})"
def __str__(self):
return f"${self.amount} {self.currency}"
def __eq__(self, other):
if not isinstance(other, Money):
return NotImplemented
return self.amount == other.amount and self.currency == other.currency
def __hash__(self):
return hash((self.amount, self.currency))
This class is memory-efficient (__slots__), immutable (read-only properties), debuggable (__repr__), user-friendly (__str__), and can be used in sets and dicts (__eq__ + __hash__).
Top comments (0)