Kai Thorne

Posted on Jun 14

Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems

#python #programming #tutorial #learning

Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems

Every Python developer knows defaultdict and Counter. They're the first things you reach for when you need to group items or count occurrences. But the collections module has more to offer — and these less-known tools can dramatically simplify your code.

Here's the thing: I used to write a lot of boilerplate for things like tracking configuration scopes, building lightweight data containers, or implementing sliding window algorithms. Then I actually read the collections docs beyond the first two entries. Here's what I found.

1. `ChainMap` — When You Need Multiple Dicts as One

The problem: You have layered configuration — default settings, environment overrides, user preferences, command-line args. You want to check each layer in order without merging them manually.

from collections import ChainMap
import os

defaults = {"host": "localhost", "port": 5432, "debug": False}
env_overrides = {"port": os.getenv("DB_PORT", 5432)}
user_config = {"host": "prod-db.internal", "debug": True}

# ChainMap checks each dict in order — first match wins
config = ChainMap(user_config, env_overrides, defaults)

print(config["host"])    # "prod-db.internal" (from user_config)
print(config["port"])    # 5432 (from env_overrides if set, else defaults)
print(config["debug"])   # True (from user_config)

Why this matters:

No {**a, **b, **c} merging that creates new dicts every time
Updates to any underlying dict are reflected immediately — ChainMap is a view, not a copy
.maps gives you the list of dicts for inspection
.new_child() pushes a new layer onto the chain — perfect for context managers

Real pattern — scoped config with context manager:

class ScopedConfig:
    def __init__(self, *configs):
        self._chain = ChainMap(*configs)

    def __getitem__(self, key):
        return self._chain[key]

    def __setitem__(self, key, value):
        self._chain.maps[0][key] = value

    def scope(self, **overrides):
        return ScopedConfig(overrides, *self._chain.maps)

# Usage
base = ScopedConfig({"theme": "light", "lang": "en"})
admin_view = base.scope(theme="dark")
print(admin_view["theme"])  # "dark"
print(admin_view["lang"])   # "en" — falls through to base

2. `namedtuple` — Readable Data Without a Class Definition

The problem: You need a simple data holder — coordinates, database rows, API responses. Tuples work but row[0] is meaningless. Classes work but __init__ + __repr__ boilerplate for 3 fields is tiresome.

from collections import namedtuple

Point = namedtuple("Point", ["x", "y", "z"])
p = Point(1, 2, 3)

print(p.x)        # 1 — named access
print(p[0])       # 1 — still indexable
x, y, z = p       # unpacking works

print(p)          # Point(x=1, y=2, z=3) — free repr

The hidden superpower — _replace for immutable updates:

# Want to change one field? Create a new instance with _replace
p2 = p._replace(z=10)
print(p2)  # Point(x=1, y=2, z=10)
print(p)   # Point(x=1, y=2, z=3) — original unchanged

Real pattern — database row wrapper:

Row = namedtuple("Row", ["id", "title", "status", "created_at"])

def query(sql):
    # ... your database code ...
    return [
        Row(id=1, title="Fix login bug", status="open", created_at="2026-06-10"),
        Row(id=2, title="Add dark mode", status="in_progress", created_at="2026-06-11"),
    ]

rows = query("SELECT * FROM tasks")
open_tasks = [r for r in rows if r.status == "open"]  # r.status reads better than r[2]

Pro tip: For Python 3.7+, consider dataclasses when you need mutable data with type hints. Use namedtuple when you want immutability by default and tuple-like behavior.

3. `deque` — The Data Structure You're Probably Implementing Wrong

The problem: You need a queue, a sliding window, or undo history. You use a list with .pop(0) — O(n) every time. Or you write a circular buffer from scratch.

from collections import deque

# Fixed-size rolling buffer — perfect for recent history
history = deque(maxlen=5)
for i in range(10):
    history.append(f"action_{i}")

print(list(history))
# ['action_5', 'action_6', 'action_7', 'action_8', 'action_9']
# Old entries are automatically discarded

Real pattern — sliding window average (O(1) per element):

def moving_average(iterable, window_size=3):
    window = deque(maxlen=window_size)
    total = 0
    for value in iterable:
        if len(window) == window_size:
            total -= window[0]  # subtract the value leaving the window
        window.append(value)
        total += value
        yield total / len(window)

prices = [100, 102, 101, 105, 110, 108, 107]
avgs = list(moving_average(prices))
print(avgs)  # [100.0, 101.0, 101.0, 102.67, 105.33, 107.67, 108.33]

Real pattern — bidirectional undo/redo:

class UndoBuffer:
    def __init__(self, max_history=50):
        self._undo = deque(maxlen=max_history)
        self._redo = deque()

    def record(self, state):
        self._undo.append(state)
        self._redo.clear()  # new action invalidates redo
        return True

    def undo(self):
        if len(self._undo) < 2:
            return self._undo[-1] if self._undo else None
        current = self._undo.pop()
        self._redo.appendleft(current)
        return self._undo[-1]

    def redo(self):
        if not self._redo:
            return None
        state = self._redo.popleft()
        self._undo.append(state)
        return state

Why not a list?

list.pop(0) is O(n) — shifts every element. deque.popleft() is O(1).
deque(maxlen=N) automatically evicts old items — no manual slicing.
Thread-safe .append() and .popleft() — safe for simple producer-consumer.

4. `UserDict` — When You Need to Add Behavior to a Dict

The problem: You subclass dict directly and hit obscure edge cases. __init__ doesn't call __setitem__ in subclasses. update() doesn't use your custom __setitem__. The CPython internals fight you at every turn.

from collections import UserDict

class CaseInsensitiveDict(UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key.lower(), value)

    def __getitem__(self, key):
        return super().__getitem__(key.lower())

    def __contains__(self, key):
        return super().__contains__(key.lower())

config = CaseInsensitiveDict({"Host": "localhost", "PORT": 8080})
print(config["host"])    # "localhost"
print(config["Port"])    # 8080
print("HOST" in config)  # True — all variations work

Why UserDict over subclassing dict:

UserDict wraps a real dict in self.data — all methods go through your overrides
Subclassing dict directly has quirks: __init__ bypasses __setitem__, copy() returns a plain dict, update() ignores custom __setitem__
UserDict is a regular class — you can inspect it, debug it, mock it without worrying about C-level internals

Real pattern — validated config store:

class ValidatedConfig(UserDict):
    SCHEMA = {
        "port": (int, lambda v: 1 <= v <= 65535),
        "host": (str, lambda v: len(v) > 0),
        "timeout": (float, lambda v: v > 0),
    }

    def __setitem__(self, key, value):
        key = key.lower()
        if key not in self.SCHEMA:
            raise KeyError(f"Unknown config key: {key}")
        expected_type, validator = self.SCHEMA[key]
        if not isinstance(value, expected_type):
            raise TypeError(f"{key}: expected {expected_type.__name__}, got {type(value).__name__}")
        if not validator(value):
            raise ValueError(f"{key}: validation failed for {value!r}")
        super().__setitem__(key, value)

cfg = ValidatedConfig({"port": 8080, "host": "localhost", "timeout": 30.0})
cfg["port"] = 70000  # ValueError: port: validation failed for 70000

Bonus: Combining Them in Real Code

Here's a pattern I use regularly — a layered config with validation, lookup history, and immutable snapshots:

from collections import ChainMap, UserDict, deque
from datetime import datetime

class ConfigLayer(UserDict):
    def __init__(self, name, **defaults):
        super().__init__(**defaults)
        self.name = name
        self._history = deque(maxlen=100)

    def __setitem__(self, key, value):
        old = self.data.get(key)
        super().__setitem__(key, value)
        if old is not None and old != value:
            self._history.append((key, old, value, datetime.now()))

class LayeredConfig:
    def __init__(self):
        self._layers = []
        self._lookup_log = deque(maxlen=1000)

    def add_layer(self, name, **values):
        self._layers.append(ConfigLayer(name, **values))

    def __getitem__(self, key):
        chain = ChainMap(*[l.data for l in reversed(self._layers)])
        value = chain[key]
        self._lookup_log.append((key, value, datetime.now()))
        return value

    def __setitem__(self, key, value):
        if self._layers:
            self._layers[-1][key] = value

# Usage
config = LayeredConfig()
config.add_layer("defaults", host="localhost", port=3000, debug=False)
config.add_layer("environment", port=8080)

print(config["port"])    # 8080
print(config["host"])    # "localhost"

When NOT to Use These

Skip ChainMap if you only have 2 small dicts that never change — {**a, **b} reads cleaner.
Skip namedtuple if you need mutable fields with type validation — use dataclasses instead (Python 3.7+).
Skip deque if your queue never exceeds ~100 items — Python list overhead doesn't matter at that scale.
Skip UserDict if you're just adding one method to a dict — a standalone function is simpler.

The collections module is one of those standard library gems where every tool fills a specific, well-designed niche. Learning them didn't just make my code shorter — it made it more explicit about what pattern I was using. A deque with maxlen tells the reader "this is a rolling buffer" more clearly than any comment or list slice ever could.

What's your go-to collections tool that most people don't know about?

Follow me for more Python deep dives — next up: how functools.lru_cache works under the hood and when it actually hurts performance.

Follow me for more Python deep dives. If you want copy-paste ready Python scripts for automation, check out my Python Automation Scripts Pack — 10 ready-to-run tools that cover file organization, data processing, email automation, and more.

DEV Community

Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems

Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems

1. `ChainMap` — When You Need Multiple Dicts as One

2. `namedtuple` — Readable Data Without a Class Definition

3. `deque` — The Data Structure You're Probably Implementing Wrong

4. `UserDict` — When You Need to Add Behavior to a Dict

Bonus: Combining Them in Real Code

When NOT to Use These

Top comments (0)

Python's collections Module: Beyond defaultdict and Counter — 4 Hidden Gems That Solve Real Problems

1. ChainMap — When You Need Multiple Dicts as One

2. namedtuple — Readable Data Without a Class Definition

3. deque — The Data Structure You're Probably Implementing Wrong

4. UserDict — When You Need to Add Behavior to a Dict

Bonus: Combining Them in Real Code

When NOT to Use These

1. `ChainMap` — When You Need Multiple Dicts as One

2. `namedtuple` — Readable Data Without a Class Definition

3. `deque` — The Data Structure You're Probably Implementing Wrong

4. `UserDict` — When You Need to Add Behavior to a Dict