German Yamil

Posted on May 15

Python collections: defaultdict, Counter, deque, and namedtuple

#python #codenewbie #beginners #tutorial

Python ships with a built-in collections module that fixes the most common pain points with plain dicts and lists. Most beginners reach for a regular dict or list out of habit — and then write ten extra lines of boilerplate to work around their limitations. This article walks you through the six most useful types in collections, with clear before/after examples so you know exactly when to swap.

🎁 Free: AI Publishing Checklist — 7 steps in Python · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)

Why the `collections` module exists

Plain dicts raise KeyError when you access a missing key. Plain lists are slow when you need to insert or remove from the front. Tuples have no field names, so point[0] tells you nothing. The collections module exists to solve these exact problems without requiring you to write a class or import a heavy library.

from collections import defaultdict, Counter, deque, namedtuple, OrderedDict, ChainMap

That single import unlocks six specialized containers, each a drop-in replacement for a plain dict or list in the right situation.

`defaultdict` — no more "if key not in dict"

Every time you try to group or count with a plain dict, you write the same defensive check:

# Before: plain dict, grouping articles by tag
articles_by_tag = {}
for article, tag in data:
    if tag not in articles_by_tag:
        articles_by_tag[tag] = []
    articles_by_tag[tag].append(article)

defaultdict replaces those three lines with one. You pass a factory function — list, int, set — and any missing key is automatically initialized with that factory's return value:

# After: defaultdict
from collections import defaultdict

articles_by_tag = defaultdict(list)
for article, tag in data:
    articles_by_tag[tag].append(article)  # no KeyError, ever

Use defaultdict(int) for counting, defaultdict(set) for deduplication, defaultdict(list) for grouping. The factory can be any callable, including a lambda:

stats = defaultdict(lambda: {"count": 0, "views": 0})
stats["python"]["count"] += 1

`Counter` — count anything in one line

Counting the frequency of items in a list normally takes a loop and a dict. Counter collapses that entirely:

# Before
word_counts = {}
for word in words:
    word_counts[word] = word_counts.get(word, 0) + 1

# After
from collections import Counter

word_counts = Counter(words)

Counter accepts any iterable — strings, lists, generator expressions. It comes with most_common(n), which returns the top-n items sorted by count:

top5 = word_counts.most_common(5)
# [('python', 42), ('tutorial', 31), ...]

Counters also support arithmetic: add two Counters to combine tallies, subtract to find the difference, use & for intersection (minimum counts) and | for union (maximum counts):

c1 = Counter(python=10, javascript=5)
c2 = Counter(python=3, rust=8)
print(c1 + c2)  # Counter({'python': 13, 'rust': 8, 'javascript': 5})
print(c1 - c2)  # Counter({'python': 7, 'javascript': 5})

`deque` — efficient append/pop from both ends

Python lists are fast at appending to the right, but list.insert(0, x) and list.pop(0) are O(n) — they shift every element. deque (double-ended queue) does both in O(1):

# Before: using list as a queue (slow left-side operations)
queue = []
queue.append("task_a")
queue.insert(0, "urgent_task")   # O(n) — shifts everything
next_task = queue.pop(0)         # O(n) — shifts everything

# After: deque
from collections import deque

queue = deque()
queue.append("task_a")           # append to right
queue.appendleft("urgent_task")  # O(1) — no shifting
next_task = queue.popleft()      # O(1) — no shifting

The maxlen parameter turns a deque into a sliding window that automatically discards the oldest item when full — perfect for keeping the last N log lines or prices:

recent_views = deque(maxlen=10)
for view in stream:
    recent_views.append(view)    # oldest auto-dropped when len > 10

deque.rotate(n) shifts all elements by n positions, which is handy for circular buffers or round-robin scheduling:

tasks = deque(["a", "b", "c", "d"])
tasks.rotate(1)   # deque(['d', 'a', 'b', 'c'])

`namedtuple` — lightweight records with field names

Plain tuples force you to remember that point[0] is x and point[1] is y. namedtuple gives you attribute access with zero memory overhead over a regular tuple:

# Before: plain tuple, no context
article = ("Python collections", "tutorial", 1200)
print(article[2])  # What is index 2 again?

# After: namedtuple
from collections import namedtuple

Article = namedtuple("Article", ["title", "tag", "word_count"])
article = Article("Python collections", "tutorial", 1200)
print(article.word_count)  # clear and readable

Two utility methods make namedtuple practical for data pipelines:

_asdict() converts the record to an OrderedDict (or regular dict in Python 3.8+), useful for serialization.
_replace() returns a copy with specific fields swapped — since namedtuples are immutable, this is how you "update" a record.

published = article._replace(tag="python")
data = article._asdict()  # {'title': 'Python collections', 'tag': 'tutorial', ...}

For more complex records with defaults, validators, or type annotations, see the dataclasses module (link in Further Reading).

`OrderedDict` — insertion order with extras

Since Python 3.7, regular dicts preserve insertion order, so OrderedDict is rarely needed. But it still has one trick plain dicts lack: move_to_end():

from collections import OrderedDict

cache = OrderedDict()
cache["a"] = 1
cache["b"] = 2
cache["c"] = 3

cache.move_to_end("a")           # move "a" to the back
cache.move_to_end("c", last=False)  # move "c" to the front

This makes OrderedDict the natural building block for an LRU (Least Recently Used) cache.

`ChainMap` — layered configuration

ChainMap combines multiple dicts into a single view, resolving lookups from left to right. This is ideal for layered config: CLI args override environment variables, which override defaults.

from collections import ChainMap
import os

defaults = {"debug": False, "output": "stdout", "retries": 3}
env_config = {"debug": True} if os.getenv("DEBUG") else {}
cli_args = {"output": "file.log"}

config = ChainMap(cli_args, env_config, defaults)
print(config["debug"])    # True  (from env_config)
print(config["retries"])  # 3     (from defaults)
print(config["output"])   # file.log (from cli_args)

Writes go to the first map, so overrides stay isolated — the defaults dict is never mutated.

Real pipeline example: Counter + defaultdict together

Here is a compact publishing pipeline that tracks article stats and groups titles by tag using both types together:

from collections import Counter, defaultdict

articles = [
    {"title": "Python collections", "tag": "python", "views": 420},
    {"title": "Python decorators",  "tag": "python", "views": 310},
    {"title": "Git rebasing",       "tag": "git",    "views": 180},
    {"title": "Git aliases",        "tag": "git",    "views": 95},
    {"title": "Docker basics",      "tag": "devops", "views": 260},
]

# Count articles per tag
tag_counts = Counter(a["tag"] for a in articles)
print(tag_counts.most_common())
# [('python', 2), ('git', 2), ('devops', 1)]

# Group titles by tag
by_tag = defaultdict(list)
for a in articles:
    by_tag[a["tag"]].append(a["title"])

print(dict(by_tag))
# {'python': ['Python collections', 'Python decorators'],
#  'git':    ['Git rebasing', 'Git aliases'],
#  'devops': ['Docker basics']}

# Total views per tag using Counter arithmetic
views_by_tag = Counter()
for a in articles:
    views_by_tag[a["tag"]] += a["views"]

print(views_by_tag.most_common(1))
# [('python', 730)]

No extra boilerplate, no KeyErrors, no O(n) list operations — just the right tool for each job.

If you want to see these patterns applied to an actual AI publishing pipeline — including scripts that auto-post to Dev.to, track stats, and generate content — check out the full pipeline guide at germy5.gumroad.com/l/xhxkzz.

DEV Community

Python collections: defaultdict, Counter, deque, and namedtuple

Why the `collections` module exists

`defaultdict` — no more "if key not in dict"

`Counter` — count anything in one line

`deque` — efficient append/pop from both ends

`namedtuple` — lightweight records with field names

`OrderedDict` — insertion order with extras

`ChainMap` — layered configuration

Real pipeline example: Counter + defaultdict together

Further Reading

Top comments (0)

Why the collections module exists

defaultdict — no more "if key not in dict"

Counter — count anything in one line

deque — efficient append/pop from both ends

namedtuple — lightweight records with field names

OrderedDict — insertion order with extras

ChainMap — layered configuration

Real pipeline example: Counter + defaultdict together

Further Reading

Why the `collections` module exists

`defaultdict` — no more "if key not in dict"

`Counter` — count anything in one line

`deque` — efficient append/pop from both ends

`namedtuple` — lightweight records with field names

`OrderedDict` — insertion order with extras

`ChainMap` — layered configuration