DEV Community

Vivis Dev
Vivis Dev

Posted on • Originally published at pythonkoans.substack.com

Understanding Python’s rules for hashing

When we speak of hashing in Python, we are entering a realm where identity, equality, and performance converge. In Koan 1, we learnt the difference between Identity and Equality. Hashing in Python uses equality rather than identity. It promises that two objects that are equal will share the same mark.

But the promise is not perfect.

The koan shows us two NaNs ("Not a Number"). They cannot be equal to each other, by mathematical rule. Yet their hashes are equal. How can two things be marked the same, yet not be the same?

Let us examine the seals carefully.

Part 1: What is a Hash?

A hash is a value produced by a function designed to uniquely represent an object. In Python, the built-in hash() function returns an integer, and it is used by dictionaries and sets to test membership. When you do:

Python does not search through all the keys each time. Instead, it computes a hash of "key", uses it to jump to the right location in memory, and then checks equality to confirm.

So the rule is:

  1. Ifa == b, thenhash(a) == hash(b).

  2. But the reverse is not guaranteed: ifhash(a) == hash(b),aandbmay not be equal.

The first rule is what allows dictionaries and sets to function correctly.

Part 2: The Strange Case of NaN

Mathematically, NaN is defined so that:

float('nan') == float('nan')  # False
Enter fullscreen mode Exit fullscreen mode

float accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity. (from Python Docs)

This is deliberate: NaN is not equal to anything, not even itself.

Yet Python ensures:

hash(float('nan')) == hash(float('nan'))
Enter fullscreen mode Exit fullscreen mode

Why? Because hash-based collections require a consistent contract: objects that are equal must share the same hash, but Python does not forbid unequal objects from sharing a hash.

Part 3: Hashing and Equality in User-Defined Classes

When you define your own class, you may decide what equality and hashing mean.

Now two scrolls with the same text are considered equal, and they share the same hash:

If you override __eq__ but forget to also override __hash__, Python will often make your object unhashable :

If you define them inconsistently like returning different hashes for equal objects, your code may behave strangely or inefficiently.

These are the guidelines for implementation from the python docs:

  • If a class does not define an __eq__() method it should not define a __hash__() operation either

  • if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections.

  • If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Part 4: Other Gotchas

4.1 Equality Across Types

Python strives for consistency across related types:

print(1 == 1.0)      # True
print(hash(1) == hash(1.0))  # True
Enter fullscreen mode Exit fullscreen mode

Despite being different types, the value of 1 and 1.0 are equal, so their hashes are also equal.

d = {1: "integer"}
d[1.0]   # "integer"
Enter fullscreen mode Exit fullscreen mode

At first glance this is elegant, but it means different numeric types may collapse into the same bucket, even if their meanings diverge in your domain. In scientific or financial code, this subtle equality can introduce surprising behaviour if you intended to distinguish int from float.

4.2 Immutable vs Mutable

Hashes are meant to be stable. If the contents of an object could change after being placed in a dictionary or set, the mapping would fall apart.

This is why:

  • Immutable containers (like tuples) are hashable — but only if all their contents are hashable.

  • Mutable containers (like lists and dicts) are not hashable.

hash((1, 2, (3, 4)))  # Works
hash((1, [2, 3]))     # TypeError
Enter fullscreen mode Exit fullscreen mode

The error is not arbitrary. It protects you from placing an unstable key into a dictionary. Imagine changing the list inside a tuple after it has been used as a key — the hash would no longer reflect the object’s state.

The Final Seal

The novice saw two NaNs and said: "The seals are the same, so the things must be the same." The master corrected him: "The ink is the same, but the seal is broken."

Hashing is a symbol, not a proof. Equality and hashing may touch, but they are not bound. To see clearly, we must hold both the symbol, and the substance it points to.

Thanks for reading Python Koans! If you enjoyed this post, consider sharing with your friends or subscribing below:

Python Koans | Vivis Dev | Substack

Python lessons wrapped in koans. Small puzzles, deep truths. Not your usual tutorial thread. Click to read Python Koans, by Vivis Dev, a Substack publication with hundreds of subscribers.

favicon pythonkoans.substack.com

Top comments (0)