DEV Community

Cover image for Python collections: defaultdict, Counter, deque, and namedtuple
German Yamil
German Yamil

Posted on

Python collections: defaultdict, Counter, deque, and namedtuple

Python ships with a built-in collections module that fixes the most common pain points with plain dicts and lists. Most beginners reach for a regular dict or list out of habit โ€” and then write ten extra lines of boilerplate to work around their limitations. This article walks you through the six most useful types in collections, with clear before/after examples so you know exactly when to swap.


๐ŸŽ Free: AI Publishing Checklist โ€” 7 steps in Python ยท Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)


Why the collections module exists

Plain dicts raise KeyError when you access a missing key. Plain lists are slow when you need to insert or remove from the front. Tuples have no field names, so point[0] tells you nothing. The collections module exists to solve these exact problems without requiring you to write a class or import a heavy library.

from collections import defaultdict, Counter, deque, namedtuple, OrderedDict, ChainMap
Enter fullscreen mode Exit fullscreen mode

That single import unlocks six specialized containers, each a drop-in replacement for a plain dict or list in the right situation.

defaultdict โ€” no more "if key not in dict"

Every time you try to group or count with a plain dict, you write the same defensive check:

# Before: plain dict, grouping articles by tag
articles_by_tag = {}
for article, tag in data:
    if tag not in articles_by_tag:
        articles_by_tag[tag] = []
    articles_by_tag[tag].append(article)
Enter fullscreen mode Exit fullscreen mode

defaultdict replaces those three lines with one. You pass a factory function โ€” list, int, set โ€” and any missing key is automatically initialized with that factory's return value:

# After: defaultdict
from collections import defaultdict

articles_by_tag = defaultdict(list)
for article, tag in data:
    articles_by_tag[tag].append(article)  # no KeyError, ever
Enter fullscreen mode Exit fullscreen mode

Use defaultdict(int) for counting, defaultdict(set) for deduplication, defaultdict(list) for grouping. The factory can be any callable, including a lambda:

stats = defaultdict(lambda: {"count": 0, "views": 0})
stats["python"]["count"] += 1
Enter fullscreen mode Exit fullscreen mode

Counter โ€” count anything in one line

Counting the frequency of items in a list normally takes a loop and a dict. Counter collapses that entirely:

# Before
word_counts = {}
for word in words:
    word_counts[word] = word_counts.get(word, 0) + 1
Enter fullscreen mode Exit fullscreen mode
# After
from collections import Counter

word_counts = Counter(words)
Enter fullscreen mode Exit fullscreen mode

Counter accepts any iterable โ€” strings, lists, generator expressions. It comes with most_common(n), which returns the top-n items sorted by count:

top5 = word_counts.most_common(5)
# [('python', 42), ('tutorial', 31), ...]
Enter fullscreen mode Exit fullscreen mode

Counters also support arithmetic: add two Counters to combine tallies, subtract to find the difference, use & for intersection (minimum counts) and | for union (maximum counts):

c1 = Counter(python=10, javascript=5)
c2 = Counter(python=3, rust=8)
print(c1 + c2)  # Counter({'python': 13, 'rust': 8, 'javascript': 5})
print(c1 - c2)  # Counter({'python': 7, 'javascript': 5})
Enter fullscreen mode Exit fullscreen mode

deque โ€” efficient append/pop from both ends

Python lists are fast at appending to the right, but list.insert(0, x) and list.pop(0) are O(n) โ€” they shift every element. deque (double-ended queue) does both in O(1):

# Before: using list as a queue (slow left-side operations)
queue = []
queue.append("task_a")
queue.insert(0, "urgent_task")   # O(n) โ€” shifts everything
next_task = queue.pop(0)         # O(n) โ€” shifts everything
Enter fullscreen mode Exit fullscreen mode
# After: deque
from collections import deque

queue = deque()
queue.append("task_a")           # append to right
queue.appendleft("urgent_task")  # O(1) โ€” no shifting
next_task = queue.popleft()      # O(1) โ€” no shifting
Enter fullscreen mode Exit fullscreen mode

The maxlen parameter turns a deque into a sliding window that automatically discards the oldest item when full โ€” perfect for keeping the last N log lines or prices:

recent_views = deque(maxlen=10)
for view in stream:
    recent_views.append(view)    # oldest auto-dropped when len > 10
Enter fullscreen mode Exit fullscreen mode

deque.rotate(n) shifts all elements by n positions, which is handy for circular buffers or round-robin scheduling:

tasks = deque(["a", "b", "c", "d"])
tasks.rotate(1)   # deque(['d', 'a', 'b', 'c'])
Enter fullscreen mode Exit fullscreen mode

namedtuple โ€” lightweight records with field names

Plain tuples force you to remember that point[0] is x and point[1] is y. namedtuple gives you attribute access with zero memory overhead over a regular tuple:

# Before: plain tuple, no context
article = ("Python collections", "tutorial", 1200)
print(article[2])  # What is index 2 again?
Enter fullscreen mode Exit fullscreen mode
# After: namedtuple
from collections import namedtuple

Article = namedtuple("Article", ["title", "tag", "word_count"])
article = Article("Python collections", "tutorial", 1200)
print(article.word_count)  # clear and readable
Enter fullscreen mode Exit fullscreen mode

Two utility methods make namedtuple practical for data pipelines:

  • _asdict() converts the record to an OrderedDict (or regular dict in Python 3.8+), useful for serialization.
  • _replace() returns a copy with specific fields swapped โ€” since namedtuples are immutable, this is how you "update" a record.
published = article._replace(tag="python")
data = article._asdict()  # {'title': 'Python collections', 'tag': 'tutorial', ...}
Enter fullscreen mode Exit fullscreen mode

For more complex records with defaults, validators, or type annotations, see the dataclasses module (link in Further Reading).

OrderedDict โ€” insertion order with extras

Since Python 3.7, regular dicts preserve insertion order, so OrderedDict is rarely needed. But it still has one trick plain dicts lack: move_to_end():

from collections import OrderedDict

cache = OrderedDict()
cache["a"] = 1
cache["b"] = 2
cache["c"] = 3

cache.move_to_end("a")           # move "a" to the back
cache.move_to_end("c", last=False)  # move "c" to the front
Enter fullscreen mode Exit fullscreen mode

This makes OrderedDict the natural building block for an LRU (Least Recently Used) cache.

ChainMap โ€” layered configuration

ChainMap combines multiple dicts into a single view, resolving lookups from left to right. This is ideal for layered config: CLI args override environment variables, which override defaults.

from collections import ChainMap
import os

defaults = {"debug": False, "output": "stdout", "retries": 3}
env_config = {"debug": True} if os.getenv("DEBUG") else {}
cli_args = {"output": "file.log"}

config = ChainMap(cli_args, env_config, defaults)
print(config["debug"])    # True  (from env_config)
print(config["retries"])  # 3     (from defaults)
print(config["output"])   # file.log (from cli_args)
Enter fullscreen mode Exit fullscreen mode

Writes go to the first map, so overrides stay isolated โ€” the defaults dict is never mutated.

Real pipeline example: Counter + defaultdict together

Here is a compact publishing pipeline that tracks article stats and groups titles by tag using both types together:

from collections import Counter, defaultdict

articles = [
    {"title": "Python collections", "tag": "python", "views": 420},
    {"title": "Python decorators",  "tag": "python", "views": 310},
    {"title": "Git rebasing",       "tag": "git",    "views": 180},
    {"title": "Git aliases",        "tag": "git",    "views": 95},
    {"title": "Docker basics",      "tag": "devops", "views": 260},
]

# Count articles per tag
tag_counts = Counter(a["tag"] for a in articles)
print(tag_counts.most_common())
# [('python', 2), ('git', 2), ('devops', 1)]

# Group titles by tag
by_tag = defaultdict(list)
for a in articles:
    by_tag[a["tag"]].append(a["title"])

print(dict(by_tag))
# {'python': ['Python collections', 'Python decorators'],
#  'git':    ['Git rebasing', 'Git aliases'],
#  'devops': ['Docker basics']}

# Total views per tag using Counter arithmetic
views_by_tag = Counter()
for a in articles:
    views_by_tag[a["tag"]] += a["views"]

print(views_by_tag.most_common(1))
# [('python', 730)]
Enter fullscreen mode Exit fullscreen mode

No extra boilerplate, no KeyErrors, no O(n) list operations โ€” just the right tool for each job.


If you want to see these patterns applied to an actual AI publishing pipeline โ€” including scripts that auto-post to Dev.to, track stats, and generate content โ€” check out the full pipeline guide at germy5.gumroad.com/l/xhxkzz.

Further Reading


If this was useful, the โค๏ธ button helps other developers find it.

Top comments (0)