Python ships with a built-in collections module that fixes the most common pain points with plain dicts and lists. Most beginners reach for a regular dict or list out of habit โ and then write ten extra lines of boilerplate to work around their limitations. This article walks you through the six most useful types in collections, with clear before/after examples so you know exactly when to swap.
๐ Free: AI Publishing Checklist โ 7 steps in Python ยท Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)
Why the collections module exists
Plain dicts raise KeyError when you access a missing key. Plain lists are slow when you need to insert or remove from the front. Tuples have no field names, so point[0] tells you nothing. The collections module exists to solve these exact problems without requiring you to write a class or import a heavy library.
from collections import defaultdict, Counter, deque, namedtuple, OrderedDict, ChainMap
That single import unlocks six specialized containers, each a drop-in replacement for a plain dict or list in the right situation.
defaultdict โ no more "if key not in dict"
Every time you try to group or count with a plain dict, you write the same defensive check:
# Before: plain dict, grouping articles by tag
articles_by_tag = {}
for article, tag in data:
if tag not in articles_by_tag:
articles_by_tag[tag] = []
articles_by_tag[tag].append(article)
defaultdict replaces those three lines with one. You pass a factory function โ list, int, set โ and any missing key is automatically initialized with that factory's return value:
# After: defaultdict
from collections import defaultdict
articles_by_tag = defaultdict(list)
for article, tag in data:
articles_by_tag[tag].append(article) # no KeyError, ever
Use defaultdict(int) for counting, defaultdict(set) for deduplication, defaultdict(list) for grouping. The factory can be any callable, including a lambda:
stats = defaultdict(lambda: {"count": 0, "views": 0})
stats["python"]["count"] += 1
Counter โ count anything in one line
Counting the frequency of items in a list normally takes a loop and a dict. Counter collapses that entirely:
# Before
word_counts = {}
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
# After
from collections import Counter
word_counts = Counter(words)
Counter accepts any iterable โ strings, lists, generator expressions. It comes with most_common(n), which returns the top-n items sorted by count:
top5 = word_counts.most_common(5)
# [('python', 42), ('tutorial', 31), ...]
Counters also support arithmetic: add two Counters to combine tallies, subtract to find the difference, use & for intersection (minimum counts) and | for union (maximum counts):
c1 = Counter(python=10, javascript=5)
c2 = Counter(python=3, rust=8)
print(c1 + c2) # Counter({'python': 13, 'rust': 8, 'javascript': 5})
print(c1 - c2) # Counter({'python': 7, 'javascript': 5})
deque โ efficient append/pop from both ends
Python lists are fast at appending to the right, but list.insert(0, x) and list.pop(0) are O(n) โ they shift every element. deque (double-ended queue) does both in O(1):
# Before: using list as a queue (slow left-side operations)
queue = []
queue.append("task_a")
queue.insert(0, "urgent_task") # O(n) โ shifts everything
next_task = queue.pop(0) # O(n) โ shifts everything
# After: deque
from collections import deque
queue = deque()
queue.append("task_a") # append to right
queue.appendleft("urgent_task") # O(1) โ no shifting
next_task = queue.popleft() # O(1) โ no shifting
The maxlen parameter turns a deque into a sliding window that automatically discards the oldest item when full โ perfect for keeping the last N log lines or prices:
recent_views = deque(maxlen=10)
for view in stream:
recent_views.append(view) # oldest auto-dropped when len > 10
deque.rotate(n) shifts all elements by n positions, which is handy for circular buffers or round-robin scheduling:
tasks = deque(["a", "b", "c", "d"])
tasks.rotate(1) # deque(['d', 'a', 'b', 'c'])
namedtuple โ lightweight records with field names
Plain tuples force you to remember that point[0] is x and point[1] is y. namedtuple gives you attribute access with zero memory overhead over a regular tuple:
# Before: plain tuple, no context
article = ("Python collections", "tutorial", 1200)
print(article[2]) # What is index 2 again?
# After: namedtuple
from collections import namedtuple
Article = namedtuple("Article", ["title", "tag", "word_count"])
article = Article("Python collections", "tutorial", 1200)
print(article.word_count) # clear and readable
Two utility methods make namedtuple practical for data pipelines:
-
_asdict()converts the record to anOrderedDict(or regular dict in Python 3.8+), useful for serialization. -
_replace()returns a copy with specific fields swapped โ since namedtuples are immutable, this is how you "update" a record.
published = article._replace(tag="python")
data = article._asdict() # {'title': 'Python collections', 'tag': 'tutorial', ...}
For more complex records with defaults, validators, or type annotations, see the dataclasses module (link in Further Reading).
OrderedDict โ insertion order with extras
Since Python 3.7, regular dicts preserve insertion order, so OrderedDict is rarely needed. But it still has one trick plain dicts lack: move_to_end():
from collections import OrderedDict
cache = OrderedDict()
cache["a"] = 1
cache["b"] = 2
cache["c"] = 3
cache.move_to_end("a") # move "a" to the back
cache.move_to_end("c", last=False) # move "c" to the front
This makes OrderedDict the natural building block for an LRU (Least Recently Used) cache.
ChainMap โ layered configuration
ChainMap combines multiple dicts into a single view, resolving lookups from left to right. This is ideal for layered config: CLI args override environment variables, which override defaults.
from collections import ChainMap
import os
defaults = {"debug": False, "output": "stdout", "retries": 3}
env_config = {"debug": True} if os.getenv("DEBUG") else {}
cli_args = {"output": "file.log"}
config = ChainMap(cli_args, env_config, defaults)
print(config["debug"]) # True (from env_config)
print(config["retries"]) # 3 (from defaults)
print(config["output"]) # file.log (from cli_args)
Writes go to the first map, so overrides stay isolated โ the defaults dict is never mutated.
Real pipeline example: Counter + defaultdict together
Here is a compact publishing pipeline that tracks article stats and groups titles by tag using both types together:
from collections import Counter, defaultdict
articles = [
{"title": "Python collections", "tag": "python", "views": 420},
{"title": "Python decorators", "tag": "python", "views": 310},
{"title": "Git rebasing", "tag": "git", "views": 180},
{"title": "Git aliases", "tag": "git", "views": 95},
{"title": "Docker basics", "tag": "devops", "views": 260},
]
# Count articles per tag
tag_counts = Counter(a["tag"] for a in articles)
print(tag_counts.most_common())
# [('python', 2), ('git', 2), ('devops', 1)]
# Group titles by tag
by_tag = defaultdict(list)
for a in articles:
by_tag[a["tag"]].append(a["title"])
print(dict(by_tag))
# {'python': ['Python collections', 'Python decorators'],
# 'git': ['Git rebasing', 'Git aliases'],
# 'devops': ['Docker basics']}
# Total views per tag using Counter arithmetic
views_by_tag = Counter()
for a in articles:
views_by_tag[a["tag"]] += a["views"]
print(views_by_tag.most_common(1))
# [('python', 730)]
No extra boilerplate, no KeyErrors, no O(n) list operations โ just the right tool for each job.
If you want to see these patterns applied to an actual AI publishing pipeline โ including scripts that auto-post to Dev.to, track stats, and generate content โ check out the full pipeline guide at germy5.gumroad.com/l/xhxkzz.
Further Reading
- Python dataclasses: Cleaner Code Than Dicts or NamedTuples
- Python List Comprehensions: From Loops to One-Liners
- Python Type Hints: A Practical Beginner's Guide
If this was useful, the โค๏ธ button helps other developers find it.
Top comments (0)