Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems
Every Python developer knows defaultdict and Counter. They're the first things you reach for when you need to group items or count occurrences. But the collections module has more to offer — and these less-known tools can dramatically simplify your code.
Here's the thing: I used to write a lot of boilerplate for things like tracking configuration scopes, building lightweight data containers, or implementing sliding window algorithms. Then I actually read the collections docs beyond the first two entries. Here's what I found.
1. ChainMap — When You Need Multiple Dicts as One
The problem: You have layered configuration — default settings, environment overrides, user preferences, command-line args. You want to check each layer in order without merging them manually.
from collections import ChainMap
import os
defaults = {"host": "localhost", "port": 5432, "debug": False}
env_overrides = {"port": os.getenv("DB_PORT", 5432)}
user_config = {"host": "prod-db.internal", "debug": True}
# ChainMap checks each dict in order — first match wins
config = ChainMap(user_config, env_overrides, defaults)
print(config["host"]) # "prod-db.internal" (from user_config)
print(config["port"]) # 5432 (from env_overrides if set, else defaults)
print(config["debug"]) # True (from user_config)
Why this matters:
- No
{**a, **b, **c}merging that creates new dicts every time - Updates to any underlying dict are reflected immediately —
ChainMapis a view, not a copy -
.mapsgives you the list of dicts for inspection -
.new_child()pushes a new layer onto the chain — perfect for context managers
Real pattern — scoped config with context manager:
class ScopedConfig:
def __init__(self, *configs):
self._chain = ChainMap(*configs)
def __getitem__(self, key):
return self._chain[key]
def __setitem__(self, key, value):
self._chain.maps[0][key] = value
def scope(self, **overrides):
return ScopedConfig(overrides, *self._chain.maps)
# Usage
base = ScopedConfig({"theme": "light", "lang": "en"})
admin_view = base.scope(theme="dark")
print(admin_view["theme"]) # "dark"
print(admin_view["lang"]) # "en" — falls through to base
2. namedtuple — Readable Data Without a Class Definition
The problem: You need a simple data holder — coordinates, database rows, API responses. Tuples work but row[0] is meaningless. Classes work but __init__ + __repr__ boilerplate for 3 fields is tiresome.
from collections import namedtuple
Point = namedtuple("Point", ["x", "y", "z"])
p = Point(1, 2, 3)
print(p.x) # 1 — named access
print(p[0]) # 1 — still indexable
x, y, z = p # unpacking works
print(p) # Point(x=1, y=2, z=3) — free repr
The hidden superpower — _replace for immutable updates:
# Want to change one field? Create a new instance with _replace
p2 = p._replace(z=10)
print(p2) # Point(x=1, y=2, z=10)
print(p) # Point(x=1, y=2, z=3) — original unchanged
Real pattern — database row wrapper:
Row = namedtuple("Row", ["id", "title", "status", "created_at"])
def query(sql):
# ... your database code ...
return [
Row(id=1, title="Fix login bug", status="open", created_at="2026-06-10"),
Row(id=2, title="Add dark mode", status="in_progress", created_at="2026-06-11"),
]
rows = query("SELECT * FROM tasks")
open_tasks = [r for r in rows if r.status == "open"] # r.status reads better than r[2]
Pro tip: For Python 3.7+, consider dataclasses when you need mutable data with type hints. Use namedtuple when you want immutability by default and tuple-like behavior.
3. deque — The Data Structure You're Probably Implementing Wrong
The problem: You need a queue, a sliding window, or undo history. You use a list with .pop(0) — O(n) every time. Or you write a circular buffer from scratch.
from collections import deque
# Fixed-size rolling buffer — perfect for recent history
history = deque(maxlen=5)
for i in range(10):
history.append(f"action_{i}")
print(list(history))
# ['action_5', 'action_6', 'action_7', 'action_8', 'action_9']
# Old entries are automatically discarded
Real pattern — sliding window average (O(1) per element):
def moving_average(iterable, window_size=3):
window = deque(maxlen=window_size)
total = 0
for value in iterable:
if len(window) == window_size:
total -= window[0] # subtract the value leaving the window
window.append(value)
total += value
yield total / len(window)
prices = [100, 102, 101, 105, 110, 108, 107]
avgs = list(moving_average(prices))
print(avgs) # [100.0, 101.0, 101.0, 102.67, 105.33, 107.67, 108.33]
Real pattern — bidirectional undo/redo:
class UndoBuffer:
def __init__(self, max_history=50):
self._undo = deque(maxlen=max_history)
self._redo = deque()
def record(self, state):
self._undo.append(state)
self._redo.clear() # new action invalidates redo
return True
def undo(self):
if len(self._undo) < 2:
return self._undo[-1] if self._undo else None
current = self._undo.pop()
self._redo.appendleft(current)
return self._undo[-1]
def redo(self):
if not self._redo:
return None
state = self._redo.popleft()
self._undo.append(state)
return state
Why not a list?
-
list.pop(0)is O(n) — shifts every element.deque.popleft()is O(1). -
deque(maxlen=N)automatically evicts old items — no manual slicing. - Thread-safe
.append()and.popleft()— safe for simple producer-consumer.
4. UserDict — When You Need to Add Behavior to a Dict
The problem: You subclass dict directly and hit obscure edge cases. __init__ doesn't call __setitem__ in subclasses. update() doesn't use your custom __setitem__. The CPython internals fight you at every turn.
from collections import UserDict
class CaseInsensitiveDict(UserDict):
def __setitem__(self, key, value):
super().__setitem__(key.lower(), value)
def __getitem__(self, key):
return super().__getitem__(key.lower())
def __contains__(self, key):
return super().__contains__(key.lower())
config = CaseInsensitiveDict({"Host": "localhost", "PORT": 8080})
print(config["host"]) # "localhost"
print(config["Port"]) # 8080
print("HOST" in config) # True — all variations work
Why UserDict over subclassing dict:
-
UserDictwraps a real dict inself.data— all methods go through your overrides - Subclassing
dictdirectly has quirks:__init__bypasses__setitem__,copy()returns a plaindict,update()ignores custom__setitem__ -
UserDictis a regular class — you can inspect it, debug it, mock it without worrying about C-level internals
Real pattern — validated config store:
class ValidatedConfig(UserDict):
SCHEMA = {
"port": (int, lambda v: 1 <= v <= 65535),
"host": (str, lambda v: len(v) > 0),
"timeout": (float, lambda v: v > 0),
}
def __setitem__(self, key, value):
key = key.lower()
if key not in self.SCHEMA:
raise KeyError(f"Unknown config key: {key}")
expected_type, validator = self.SCHEMA[key]
if not isinstance(value, expected_type):
raise TypeError(f"{key}: expected {expected_type.__name__}, got {type(value).__name__}")
if not validator(value):
raise ValueError(f"{key}: validation failed for {value!r}")
super().__setitem__(key, value)
cfg = ValidatedConfig({"port": 8080, "host": "localhost", "timeout": 30.0})
cfg["port"] = 70000 # ValueError: port: validation failed for 70000
Bonus: Combining Them in Real Code
Here's a pattern I use regularly — a layered config with validation, lookup history, and immutable snapshots:
from collections import ChainMap, UserDict, deque
from datetime import datetime
class ConfigLayer(UserDict):
def __init__(self, name, **defaults):
super().__init__(**defaults)
self.name = name
self._history = deque(maxlen=100)
def __setitem__(self, key, value):
old = self.data.get(key)
super().__setitem__(key, value)
if old is not None and old != value:
self._history.append((key, old, value, datetime.now()))
class LayeredConfig:
def __init__(self):
self._layers = []
self._lookup_log = deque(maxlen=1000)
def add_layer(self, name, **values):
self._layers.append(ConfigLayer(name, **values))
def __getitem__(self, key):
chain = ChainMap(*[l.data for l in reversed(self._layers)])
value = chain[key]
self._lookup_log.append((key, value, datetime.now()))
return value
def __setitem__(self, key, value):
if self._layers:
self._layers[-1][key] = value
# Usage
config = LayeredConfig()
config.add_layer("defaults", host="localhost", port=3000, debug=False)
config.add_layer("environment", port=8080)
print(config["port"]) # 8080
print(config["host"]) # "localhost"
When NOT to Use These
-
Skip
ChainMapif you only have 2 small dicts that never change —{**a, **b}reads cleaner. -
Skip
namedtupleif you need mutable fields with type validation — usedataclassesinstead (Python 3.7+). -
Skip
dequeif your queue never exceeds ~100 items — Python list overhead doesn't matter at that scale. -
Skip
UserDictif you're just adding one method to a dict — a standalone function is simpler.
The collections module is one of those standard library gems where every tool fills a specific, well-designed niche. Learning them didn't just make my code shorter — it made it more explicit about what pattern I was using. A deque with maxlen tells the reader "this is a rolling buffer" more clearly than any comment or list slice ever could.
What's your go-to collections tool that most people don't know about?
Follow me for more Python deep dives — next up: how functools.lru_cache works under the hood and when it actually hurts performance.
Follow me for more Python deep dives. If you want copy-paste ready Python scripts for automation, check out my Python Automation Scripts Pack — 10 ready-to-run tools that cover file organization, data processing, email automation, and more.
Top comments (0)