When we speak of hashing in Python, we are entering a realm where identity, equality, and performance converge. In Koan 1, we learnt the difference between Identity and Equality. Hashing in Python uses equality rather than identity. It promises that two objects that are equal will share the same mark.
But the promise is not perfect.
The koan shows us two NaNs ("Not a Number"). They cannot be equal to each other, by mathematical rule. Yet their hashes are equal. How can two things be marked the same, yet not be the same?
Let us examine the seals carefully.
Part 1: What is a Hash?
A hash is a value produced by a function designed to uniquely represent an object. In Python, the built-in hash() function returns an integer, and it is used by dictionaries and sets to test membership. When you do:
Python does not search through all the keys each time. Instead, it computes a hash of "key"
, uses it to jump to the right location in memory, and then checks equality to confirm.
So the rule is:
If
a == b
, thenhash(a) == hash(b)
.But the reverse is not guaranteed: if
hash(a) == hash(b)
,a
andb
may not be equal.
The first rule is what allows dictionaries and sets to function correctly.
Part 2: The Strange Case of NaN
Mathematically, NaN is defined so that:
float('nan') == float('nan') # False
float accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity. (from Python Docs)
This is deliberate: NaN is not equal to anything, not even itself.
Yet Python ensures:
hash(float('nan')) == hash(float('nan'))
Why? Because hash-based collections require a consistent contract: objects that are equal must share the same hash, but Python does not forbid unequal objects from sharing a hash.
Part 3: Hashing and Equality in User-Defined Classes
When you define your own class, you may decide what equality and hashing mean.
Now two scrolls with the same text are considered equal, and they share the same hash:
If you override __eq__
but forget to also override __hash__
, Python will often make your object unhashable :
If you define them inconsistently like returning different hashes for equal objects, your code may behave strangely or inefficiently.
These are the guidelines for implementation from the python docs:
If a class does not define an
__eq__()
method it should not define a__hash__()
operation eitherif it defines
__eq__()
but not__hash__()
, its instances will not be usable as items in hashable collections.If a class defines mutable objects and implements an
__eq__()
method, it should not implement__hash__()
, since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Part 4: Other Gotchas
4.1 Equality Across Types
Python strives for consistency across related types:
print(1 == 1.0) # True
print(hash(1) == hash(1.0)) # True
Despite being different types, the value of 1
and 1.0
are equal, so their hashes are also equal.
d = {1: "integer"}
d[1.0] # "integer"
At first glance this is elegant, but it means different numeric types may collapse into the same bucket, even if their meanings diverge in your domain. In scientific or financial code, this subtle equality can introduce surprising behaviour if you intended to distinguish int
from float
.
4.2 Immutable vs Mutable
Hashes are meant to be stable. If the contents of an object could change after being placed in a dictionary or set, the mapping would fall apart.
This is why:
Immutable containers (like tuples) are hashable — but only if all their contents are hashable.
Mutable containers (like lists and dicts) are not hashable.
hash((1, 2, (3, 4))) # Works
hash((1, [2, 3])) # TypeError
The error is not arbitrary. It protects you from placing an unstable key into a dictionary. Imagine changing the list inside a tuple after it has been used as a key — the hash would no longer reflect the object’s state.
The Final Seal
The novice saw two NaNs and said: "The seals are the same, so the things must be the same." The master corrected him: "The ink is the same, but the seal is broken."
Hashing is a symbol, not a proof. Equality and hashing may touch, but they are not bound. To see clearly, we must hold both the symbol, and the substance it points to.
Thanks for reading Python Koans! If you enjoyed this post, consider sharing with your friends or subscribing below:
Top comments (0)