Timothy was debugging a puzzling issue when he called Margaret over. "Look at this," he said, pointing at his terminal. "These two comparisons should behave the same way, but they don't."
# Small numbers
a = 256
b = 256
print(a is b) # True
# Larger numbers
x = 257
y = 257
print(x is y) # False - Wait, what?!
Margaret smiled. "You've discovered Python's integer cache. Welcome to one of Python's most surprising optimizations - and a perfect lesson in the difference between identity and equality."
The Problem: Identity vs Equality
"First," Margaret said, "let's be crystal clear about what is actually checks."
def demonstrate_identity_vs_equality():
"""
== checks VALUE equality (are contents the same?)
is checks IDENTITY (are they the same object in memory?)
"""
# These are always equal in value
a = 257
b = 257
print(f"a == b: {a == b}") # True - same value
print(f"a is b: {a is b}") # False - different objects!
# Check their memory addresses
print(f"id(a): {id(a)}")
print(f"id(b): {id(b)}") # Different addresses!
# But small numbers behave differently
x = 256
y = 256
print(f"\nx == y: {x == y}") # True - same value
print(f"x is y: {x is y}") # True - SAME object!
print(f"id(x): {id(x)}")
print(f"id(y): {id(y)}") # Same address!
demonstrate_identity_vs_equality()
Output:
a == b: True
a is b: False
id(a): 140234567890123
id(b): 140234567890456
x == y: True
x is y: True
id(x): 140234567889876
id(y): 140234567889876
"See the difference?" Margaret pointed. "With 257, Python creates two separate integer objects. With 256, both variables point to the exact same object in memory."
The Integer Cache: Python's Singleton Pool
Margaret sketched out the concept on paper:
"""
Python pre-creates and caches integers from -5 to 256.
Think of it as a shelf of pre-made number cards that Python
keeps permanently in memory. Whenever you use 42, you don't
get a NEW integer object - you get a reference to the single
shared 42 that already exists.
The Cache Range:
-5, -4, -3, -2, -1, 0, 1, 2, 3, ... 254, 255, 256
262 total pre-allocated integer objects
"""
def explore_cache_boundaries():
"""Find the exact boundaries of the integer cache"""
# Test negative boundary
print("Negative boundary:")
a = -5
b = -5
print(f"-5 is -5: {a is b}") # True - in cache
a = -6
b = -6
print(f"-6 is -6: {a is b}") # False - outside cache
# Test positive boundary
print("\nPositive boundary:")
x = 256
y = 256
print(f"256 is 256: {x is y}") # True - in cache
x = 257
y = 257
print(f"257 is 257: {x is y}") # False - outside cache
explore_cache_boundaries()
Output:
Negative boundary:
-5 is -5: True
-6 is -6: False
Positive boundary:
256 is 256: True
257 is 257: False
"The range -5 to 256 was chosen," Margaret explained, "because these are the most commonly used integers in real programs. Counters start at 0, list indices are usually small, HTTP status codes fit in this range, ASCII characters are 0-127. It's a pragmatic optimization."
Why Does This Happen? The Implementation
Timothy asked, "But why? Why does Python bother with this?"
Margaret pulled up the CPython source code concept:
"""
Conceptual implementation of integer caching in CPython:
# During Python interpreter startup, pre-allocate integers
_SMALL_INT_CACHE = {}
for i in range(-5, 257):
_SMALL_INT_CACHE[i] = create_integer_object(i)
# When you write: x = 42
# Python checks the cache first:
def int_from_literal(value):
if -5 <= value <= 256:
return _SMALL_INT_CACHE[value] # Return cached object
else:
return create_new_integer_object(value) # Create new object
"""
def demonstrate_memory_savings():
"""Show why caching matters"""
import sys
# Small integer - from cache
cached_int = 100
print(f"Size of cached int (100): {sys.getsizeof(cached_int)} bytes")
# Large integer - newly created
new_int = 1000
print(f"Size of new int (1000): {sys.getsizeof(new_int)} bytes")
# Both are the same size in memory
# But cached integers are SHARED across your entire program
# Demonstrate sharing
list_of_hundreds = [100 for _ in range(1000)]
print(f"\nAll 1000 references to 100 point to same object:")
print(f"All identical: {all(x is list_of_hundreds[0] for x in list_of_hundreds)}")
list_of_thousands = [1000 for _ in range(1000)]
print(f"\nAll 1000 references to 1000 point to same object:")
print(f"All identical: {all(x is list_of_thousands[0] for x in list_of_thousands)}")
demonstrate_memory_savings()
Output:
Size of cached int (100): 28 bytes
Size of new int (1000): 28 bytes
All 1000 references to 100 point to same object:
All identical: True
All 1000 references to 1000 point to same object:
All identical: False
"The memory savings are real," Margaret explained. "If you have a list of a million zeros, with caching you have a million references to ONE object. Without caching, you'd have a million separate objects. Multiply by every integer from -5 to 256 that appears anywhere in your program, and the savings add up."
Context Matters: Assignment vs Literals
Timothy noticed something odd in the REPL:
# In Python REPL or script
>>> a = 257
>>> b = 257
>>> a is b
False
# But in the same expression
>>> 257 is 257
True # What?!
Margaret explained the subtlety:
def demonstrate_context_dependent_behavior():
"""
Python's compiler applies optimizations at compile time.
In the same expression, literals can be reused.
"""
# Separate assignments - no optimization
a = 257
b = 257
print(f"Separate assignments: {a is b}") # False
# Same expression - compiler optimizes
print(f"Same expression: {257 is 257}") # True!
# This is called "constant folding" and "peephole optimization"
# The compiler sees two identical literals and reuses the same object
# Works with tuple assignments too
x, y = 257, 257
print(f"Tuple assignment: {x is y}") # True!
# But not across function boundaries
def get_257():
return 257
c = get_257()
d = get_257()
print(f"Function returns: {c is d}") # False
demonstrate_context_dependent_behavior()
"The lesson here," Margaret said, "is that integer identity is an implementation detail, not a language guarantee. The only guarantee is that cached integers from -5 to 256 will always be singletons. Beyond that, you're seeing compiler optimizations that may or may not happen."
Lists and the Identity Trap
Timothy showed Margaret a bug he'd been chasing:
def buggy_code():
"""Common mistake: using 'is' to compare values"""
def get_status_code():
# Imagine this comes from network response
return 200
status = get_status_code()
# ❌ WRONG - Don't use 'is' for value comparison
if status is 200:
print("Success!")
# This works because 200 is in the cache range
# But it's semantically wrong and could break
def correct_code():
"""Correct: use == for value comparison"""
def get_status_code():
return 200
status = get_status_code()
# ✓ CORRECT - Use == for value comparison
if status == 200:
print("Success!")
# This is why it matters
def why_is_is_wrong_for_values():
"""Demonstrate the trap"""
# Works with small numbers (by accident!)
small_num = 200
if small_num is 200:
print("This works but is wrong!")
# Breaks with large numbers
large_num = 500
if large_num is 500:
print("This will never print!")
else:
print("'is' check failed even though value is 500")
# Correct way
if large_num == 500:
print("This works correctly!")
why_is_is_wrong_for_values()
Output:
This works but is wrong!
'is' check failed even though value is 500
This works correctly!
Margaret emphasized, "Use is only when you specifically need identity checking - like if x is None. For value comparison, always use ==. The fact that is happens to work for small integers is a trap that leads to bugs."
The None Singleton Pattern
"Python uses the singleton pattern for more than just integers," Margaret explained.
def singleton_patterns_in_python():
"""
Python guarantees singletons for certain objects:
- None (always a singleton)
- True and False (always singletons)
- Small integers (-5 to 256)
- Small strings (in some contexts - string interning)
"""
# None is ALWAYS a singleton
a = None
b = None
print(f"None is None: {a is b}") # True - always!
# This is why we check: if x is None
# rather than: if x == None
# True and False are singletons
x = True
y = True
print(f"True is True: {x is y}") # True
# Small integers are singletons
num1 = 42
num2 = 42
print(f"42 is 42: {num1 is num2}") # True
# But large integers are not
big1 = 1000
big2 = 1000
print(f"1000 is 1000: {big1 is big2}") # False
singleton_patterns_in_python()
Implications for Mutable Containers
Timothy asked, "Does this affect lists and dictionaries?"
def integers_in_containers():
"""Integer caching works even inside containers"""
# Lists containing cached integers
list1 = [1, 2, 3, 100, 200]
list2 = [1, 2, 3, 100, 200]
print("Lists are different objects:")
print(f"list1 is list2: {list1 is list2}") # False
print("\nBut their elements share integer objects:")
for i in range(len(list1)):
print(f"list1[{i}] is list2[{i}]: {list1[i] is list2[i]}") # All True!
# This also works with dictionary keys
dict1 = {1: 'a', 2: 'b', 100: 'c'}
dict2 = {1: 'a', 2: 'b', 100: 'c'}
print("\nDictionaries are different:")
print(f"dict1 is dict2: {dict1 is dict2}") # False
print("\nBut their integer keys are shared:")
for key in dict1:
# Get the actual key objects (not just values)
key1 = [k for k in dict1.keys() if k == key][0]
key2 = [k for k in dict2.keys() if k == key][0]
print(f"Key {key}: {key1 is key2}") # True!
integers_in_containers()
Output:
Lists are different objects:
list1 is list2: False
But their elements share integer objects:
list1[0] is list2[0]: True
list1[1] is list2[1]: True
list1[2] is list2[2]: True
list1[3] is list2[3]: True
list1[4] is list2[4]: True
Dictionaries are different:
dict1 is dict2: False
But their integer keys are shared:
Key 1: True
Key 2: True
Key 100: True
"The container itself is unique," Margaret explained, "but the cached integers inside are shared across all containers in your program."
Performance Implications
Margaret demonstrated the performance benefit:
import time
def measure_creation_speed():
"""Compare cached vs non-cached integer creation"""
# Test cached integers
iterations = 1_000_000
start = time.perf_counter()
for _ in range(iterations):
x = 42 # From cache
cached_time = time.perf_counter() - start
# Test non-cached integers
start = time.perf_counter()
for _ in range(iterations):
x = 500 # New object each time
non_cached_time = time.perf_counter() - start
print(f"Cached integers (42): {cached_time:.4f} seconds")
print(f"Non-cached integers (500): {non_cached_time:.4f} seconds")
print(f"Speedup: {non_cached_time / cached_time:.2f}x")
def measure_comparison_speed():
"""Compare identity vs equality checks"""
iterations = 1_000_000
x = 100
y = 100
# Identity check (very fast)
start = time.perf_counter()
for _ in range(iterations):
result = x is y
identity_time = time.perf_counter() - start
# Equality check (slightly slower)
start = time.perf_counter()
for _ in range(iterations):
result = x == y
equality_time = time.perf_counter() - start
print(f"\nIdentity check (is): {identity_time:.4f} seconds")
print(f"Equality check (==): {equality_time:.4f} seconds")
print(f"'is' speedup: {equality_time / identity_time:.2f}x")
measure_creation_speed()
measure_comparison_speed()
Output (approximate):
Cached integers (42): 0.0234 seconds
Non-cached integers (500): 0.0456 seconds
Speedup: 1.95x
Identity check (is): 0.0123 seconds
Equality check (==): 0.0178 seconds
'is' speedup: 1.45x
"The performance difference is real but modest," Margaret said. "The bigger benefit is memory savings when you have millions of references to common integers."
Interning in Other Languages
Timothy was curious. "Do other languages do this?"
"""
Integer caching/interning across languages:
Python: -5 to 256 cached
a = 100
b = 100
a is b # True
Java: -128 to 127 cached (Integer cache)
Integer a = 100;
Integer b = 100;
a == b // True (autoboxing reuses cache)
Ruby: All Fixnums (small integers) are immediate values
a = 100
b = 100
a.object_id == b.object_id # true
JavaScript: Primitive numbers are always compared by value
(No object identity for numbers - they're primitives)
C#: No automatic integer caching
(Boxing always creates new objects)
The pattern is common in managed languages where
integer objects have overhead.
"""
Practical Guidelines
Margaret created a reference guide:
"""
INTEGER CACHE GUIDELINES:
✓ DO:
- Use == for comparing integer values
- Use 'is None' for None checks (None is always a singleton)
- Understand that -5 to 256 are cached (for debugging/understanding)
- Rely on the cache for memory efficiency (it's automatic)
✗ DON'T:
- Use 'is' to compare integer values (except None)
- Write code that depends on 'is' behavior for integers
- Assume all equal integers are the same object
- Optimize around the cache range in your code
REMEMBER:
- Cache range: -5 to 256 (guaranteed)
- Outside this range: implementation-dependent
- Compiler optimizations may cache other values
- 'is' checks identity, '==' checks equality
- Integer immutability makes caching safe
WHEN IT MATTERS:
- Debugging identity vs equality bugs
- Understanding memory usage in large programs
- Explaining surprising 'is' behavior
- Teaching Python's object model
WHEN IT DOESN'T MATTER:
- Normal application code (use == always)
- Algorithm design
- Most performance optimization
"""
The Debug Session Pattern
Margaret showed Timothy a debugging technique:
def debug_integer_identity():
"""Useful debugging pattern for identity issues"""
def show_identity_info(var, name):
"""Display identity information about a variable"""
print(f"\n{name}:")
print(f" Value: {var}")
print(f" Type: {type(var).__name__}")
print(f" ID: {id(var)}")
print(f" In cache: {-5 <= var <= 256 if isinstance(var, int) else 'N/A'}")
# Example usage
a = 256
b = 256
c = 257
d = 257
show_identity_info(a, "a (256)")
show_identity_info(b, "b (256)")
print(f" a is b: {a is b}")
show_identity_info(c, "c (257)")
show_identity_info(d, "d (257)")
print(f" c is d: {c is d}")
def test_identity_assumptions(x, y, expected_same_object=None):
"""Test function for verifying identity behavior"""
print(f"\nTesting {x} and {y}:")
print(f" Values equal (==): {x == y}")
print(f" Same object (is): {x is y}")
print(f" id(x): {id(x)}")
print(f" id(y): {id(y)}")
if expected_same_object is not None:
actual = x is y
if actual == expected_same_object:
print(f" ✓ Behavior as expected")
else:
print(f" ✗ Unexpected! Expected is={expected_same_object}, got {actual}")
# Usage
debug_integer_identity()
# Testing edge cases
test_identity_assumptions(256, 256, expected_same_object=True)
test_identity_assumptions(257, 257, expected_same_object=False)
Testing Integer Identity
Margaret wrote test patterns:
import pytest
def test_small_integer_cache():
"""Verify cached integers are singletons"""
# Test cache boundaries
assert -5 is -5
assert 256 is 256
# Variables with cached values share identity
a = 100
b = 100
assert a is b
# Even in containers
list1 = [5, 10, 15]
list2 = [5, 10, 15]
assert list1[0] is list2[0]
def test_large_integers_not_cached():
"""Verify large integers are not singletons"""
# These should NOT be the same object
a = 1000
b = 1000
assert a == b # Values equal
assert a is not b # But different objects
# Ids are different
assert id(a) != id(b)
def test_never_use_is_for_value_comparison():
"""Demonstrate why 'is' is wrong for value comparison"""
def get_number(value):
"""Function that returns a number"""
return value
# This might work for small numbers but is semantically wrong
small = get_number(100)
# Don't do this: if small is 100
# Do this instead:
assert small == 100
# This breaks for large numbers
large = get_number(1000)
# This would fail: assert large is 1000
# This works: assert large == 1000
assert large == 1000
def test_none_is_always_singleton():
"""None is the correct use case for 'is'"""
a = None
b = None
# None is ALWAYS a singleton
assert a is b
assert a is None
assert b is None
# This is THE correct usage of 'is'
def process(value):
if value is None: # ✓ Correct
return "No value"
return f"Value: {value}"
assert process(None) == "No value"
assert process(42) == "Value: 42"
# Run with: pytest test_integer_cache.py -v
The Library Metaphor
Margaret brought it back to the library:
"Think of the integer cache like the library's reference collection," she said. "We have permanent copies of the most frequently used reference books - dictionaries, encyclopedias, common classics - that never leave the building. Everyone who needs them shares the same physical copies.
"When Timothy needs 'Dictionary Volume 42', he doesn't get a personal copy created for him. He gets a reference card that points to the permanent copy in the reference section. Every other patron who needs Volume 42 gets a card pointing to that same copy.
"But when someone requests an obscure book - say, 'Maritime Trade Patterns in 1273' - we create a temporary copy just for them. If someone else wants the same book, they get their own separate copy because it's not worth keeping permanent duplicates of rarely-used materials.
"The integers from -5 to 256 are like those reference books - so commonly used that Python keeps permanent shared copies. Larger integers are like the obscure books - created on demand and not shared unless the compiler happens to optimize it."
Common Misconceptions
Timothy compiled a list of myths:
"""
MYTH vs REALITY:
MYTH: "All integers with the same value are the same object"
REALITY: Only integers from -5 to 256 are guaranteed singletons
MYTH: "I should use 'is' for faster integer comparison"
REALITY: Use '==' for values. Use 'is' only for None and identity checks
MYTH: "The cache is a performance optimization I can rely on"
REALITY: It's a memory optimization. Don't write code depending on it
MYTH: "257 is 257 is always False"
REALITY: It depends on context - compiler may optimize literals
MYTH: "Integer caching is a Python quirk/bug"
REALITY: It's a deliberate, documented optimization in many languages
MYTH: "I need to worry about the cache in my application code"
REALITY: Just use '==' for values and forget about it
"""
When Identity Actually Matters
Margaret showed the rare cases where integer identity is significant:
def when_identity_actually_matters():
"""Rare cases where integer object identity is relevant"""
# 1. Understanding bugs in code that misuses 'is'
def buggy_function(status_code):
if status_code is 200: # ❌ WRONG
return "OK"
return "Error"
print("Bug example:")
print(f" Small value: {buggy_function(200)}") # Works by accident
print(f" Large value: {buggy_function(500)}") # Breaks!
# 2. Memory profiling and optimization
import sys
numbers = [42] * 1_000_000
# Because 42 is cached, this list uses far less memory
# than it would if each 42 was a separate object
print(f"\nMemory efficiency:")
print(f" List size: {len(numbers):,} elements")
print(f" All reference same object: {all(x is numbers[0] for x in numbers)}")
# 3. Teaching Python's object model
# Understanding the cache helps explain:
# - Object identity vs equality
# - Memory management
# - Immutability benefits
# - Singleton pattern
print("\nTeaching moment:")
a = 100
b = 100
print(f" Two variables, one object: {a is b}")
print(f" This is safe because integers are immutable")
print(f" If integers were mutable, sharing would be dangerous!")
when_identity_actually_matters()
The Mutability Connection
"This ties into something important," Margaret said. "Integer caching is only safe because integers are immutable."
def why_immutability_matters_for_caching():
"""Caching only works safely with immutable objects"""
# Integers are immutable and cached - safe!
a = 100
b = 100 # Points to same object as 'a'
print("Integers (immutable, cached):")
print(f" a is b: {a is b}")
print(f" a = {a}, b = {b}")
# If integers were mutable, this would be a problem:
# a += 1 # If this modified the cached object...
# print(b) # ...b would change too! Disaster!
# But integers are immutable, so += creates a NEW object
a = a + 1
print(f" After a += 1:")
print(f" a is b: {a is b}") # False - different objects now
print(f" a = {a}, b = {b}") # b unchanged
# Compare to lists (mutable, NEVER cached/shared automatically)
list1 = [1, 2, 3]
list2 = [1, 2, 3]
print("\nLists (mutable, not shared):")
print(f" list1 is list2: {list1 is list2}") # False
# If lists were automatically shared, mutations would be dangerous:
# list1.append(4)
# # If they shared, list2 would also change!
why_immutability_matters_for_caching()
Timothy nodded. "So caching works because integers can't change. If they could, sharing them would cause spooky action at a distance - modifying one variable would affect another."
"Exactly," Margaret confirmed. "Python can safely share immutable objects because there's no way to modify them. Operations that appear to modify an integer actually create a new integer object."
Real-World Bug Example
Margaret pulled up a real bug report:
def real_world_bug_example():
"""Based on actual bug reports from production code"""
class StatusCodeChecker:
"""Buggy implementation that misuses 'is'"""
def __init__(self):
self.success_codes = [200, 201, 204]
def is_success(self, code):
# ❌ WRONG - don't use 'is' for value comparison
for success_code in self.success_codes:
if code is success_code: # BUG HERE!
return True
return False
class FixedStatusCodeChecker:
"""Correct implementation"""
def __init__(self):
self.success_codes = [200, 201, 204]
def is_success(self, code):
# ✓ CORRECT - use 'in' or '=='
return code in self.success_codes
# Test both implementations
buggy = StatusCodeChecker()
fixed = FixedStatusCodeChecker()
# This works by accident (200 is cached)
response_code = 200
print(f"Code 200:")
print(f" Buggy version: {buggy.is_success(response_code)}")
print(f" Fixed version: {fixed.is_success(response_code)}")
# This works (201 is cached)
response_code = 201
print(f"\nCode 201:")
print(f" Buggy version: {buggy.is_success(response_code)}")
print(f" Fixed version: {fixed.is_success(response_code)}")
# But imagine a redirect code (307) - not cached
# In reality, the code might come from an HTTP library
# that creates new integer objects
def get_redirect_code():
return int("307") # Creates new object
response_code = get_redirect_code()
print(f"\nCode 307 (from function):")
print(f" Buggy version: {buggy.is_success(response_code)}")
print(f" Fixed version: {fixed.is_success(response_code)}")
print("\n💡 The bug is intermittent based on whether integers happen")
print(" to be the same object - the worst kind of bug to debug!")
real_world_bug_example()
"These bugs are insidious," Margaret warned, "because they work most of the time. Small integers pass the is check by accident, so the bug only surfaces with larger values or when integers come from certain sources like parsing or computation."
Key Takeaways
Margaret summarized the lesson:
"""
INTEGER CACHE KEY TAKEAWAYS:
1. Python caches integers from -5 to 256
- These are singleton objects
- All variables with these values point to the same object
- Memory optimization for commonly-used integers
2. Identity (is) vs Equality (==)
- 'is' checks: Are they the same object in memory?
- '==' checks: Are their values equal?
- ALWAYS use '==' for value comparison
3. The cache is an implementation detail
- Don't write code that depends on it
- Compiler may cache other values in certain contexts
- Behavior outside -5 to 256 is not guaranteed
4. Correct usage patterns
- ✓ Use '==' for comparing integer values
- ✓ Use 'is None' for None checks
- ✗ Don't use 'is' for integer value comparison
- ✗ Don't rely on identity of large integers
5. Why caching is safe
- Integers are immutable
- Sharing immutable objects can't cause mutations
- Operations create new objects rather than modifying
6. Real-world impact
- Modest performance benefit
- Significant memory savings for common integers
- Source of bugs when 'is' is misused
- Important for understanding Python's object model
"""
Timothy leaned back, understanding dawning. "So the integer cache is Python being clever about memory - keeping one copy of commonly-used numbers and handing out references. It works because numbers can't change, so sharing is safe. And I should forget about it in my code and just use == for comparisons."
"Perfect summary," Margaret said. "The cache is a brilliant optimization that's completely transparent when you follow the rules: use == for values, is for identity checks like if x is None. Do that, and the cache does its job silently in the background."
With that, Timothy understood not just Python's integer cache, but the deeper principles of identity versus equality and why immutability enables safe optimization.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Top comments (0)