DEV Community

Carlos Saldaña
Carlos Saldaña

Posted on

The Protocol Behind Python's Containers

Lists, tuples, dicts, and sets look like four different things. Underneath, they share the same idea: they all implement a small set of protocols, and once you see that, the rest stops feeling arbitrary.

What is a protocol?

A protocol in Python is an informal contract defined by behavior, not by declaration. Anything that behaves like a sequence is a sequence, as far as Python cares. This style of typing has a name, it’s called structural typing, or as most of us know it, duck typing.("if it walks like a duck...").

Think of this like the USB port. A USB port doesn't ask "are you a keyboard?" or "are you a hard drive?" It exposes a physical shape and a data protocol. Anything that fits the shape and speaks the protocol works.

Iterators and Iterables

If you only learn one thing about Python containers, learn the difference between an iterable and an iterator.

  • An iterable is an object that we can iterate over, it's an object that can produce an iterator. To make this happen, it implements __iter__(), which returns a fresh iterator each time it's called**.
  • An iterator is an object that produces values one at a time. It implements __next__() (and, by convention, __iter__() returning self). It raises StopIteration when there are no more values.

There are a few built-in iterables:

  • List
  • Tuples
  • Strings
  • Dictionaries

Let’s try and example to understand why a list is iterable but not an iterator:

nums = [1,2,3]
next(nums)
TypeError: 'list' object is not an iterator
it = iter(nums)
next(it)
1
next(it)
2
Enter fullscreen mode Exit fullscreen mode

The for loop is doing this exact dance. for i in nums: it’s roughly doing:

_it = iter(nums)
while True:
    try:
            x = next(_it)
    except StopIteration:
            break
Enter fullscreen mode Exit fullscreen mode

That’s the whole protocol. Once you understand this, things like for loops, generators, zip, map, comprehensions, unpacking, and *args all start to work the same way in your mind.

I'm more visual, so here's a sequence diagram showing the interaction over time between two objects.

A for loop

Built-in Containers

This is the good part. Most of us learn Python's containers at the surface: lists are mutable and ordered, tuples are immutable and ordered, dicts are key-value pairs, sets are unique and unordered. That's correct but also useless most of the time. It tells you nothing about performance. Nothing about why your set of dicts blew up with a TypeError. Nothing about why your queue is suddenly O(n²).

Here's what to focus on instead. Every container in Python is three things at once:

  • A data structure: how its stored in memory and how fast operations are.
  • A set of protocols: the special methods it suports.
  • A semantic choice: the meaning you communicate to other developers by using it.

List: the dynamic array

A python list is a dynamic array (called a vector or ArrayList in other languages).

Under the hood, it’s a block of pointers to python objects, with extra space preallocated at the end for appending new items.

┌────┬────┬────┬────┬────┬────┬────┬────┐
│ p0 │ p1 │ p2 │ p3 │ ▒▒ │ ▒▒ │ ▒▒ │ ▒▒ │   length=4, capacity=8
└────┴────┴────┴────┴────┴────┴────┴────┘
  ↑    ↑    ↑    ↑
  obj  obj  obj  obj

Enter fullscreen mode Exit fullscreen mode

This one detail is what determines the entire complexity table.

Operation Cost
lst[i] O(1)
lst.append(x) O(1) amortized
lst.pop() O(1)
lst.pop(0) O(n)
lst.insert(0, x) O(n)
x in lst O(n)
len(lst) O(1)

Protocol: MutableSequence (which gives you Sequence + Iterable + Container + Sized + Reversible).

Semantic intent: An ordered, mutable collection of items that I expect to iterate over, append to, and access by index.

A common mistake is to use a list as a FIFO queue:

pending = []
pending.append(task)
task = pending.pop(0)
Enter fullscreen mode Exit fullscreen mode

This is pretty expensive, the pop(0) shifts every remaining element one position to the left, and remember this operation is O(n). If this runs inside a loop it can accidentally become O(n²) .

Tuple: the record

We think of tuples as immutable lists, but that idea can mislead.

A tuple is better understood as a record: a fixed-size structure where each position can hold different types of data.

In practice:

  • Tuples represent structured data.
  • Lists represent collections of items.
# List
orders = [order1, order2, order3, ...]

# Tuple
db_row = ("alice", "active", 42)
Enter fullscreen mode Exit fullscreen mode

Immutability is a result of this distinction, not the point.

And because tuples are immutable, they gain a few important properties that lists don’t have:

  • Hashable
  • Memory efficiency
  • Read-only contract

From a protocol perspective, tuples behave like read-only sequences and can also be hashable.

Semantically, a tuple means: “This is a single thing identified by its contents.”

This matters a lot in a production application, tuples are commonly used for:

  • Compound dictionary keys
  • Deduplication
  • Cache and memoization keys

Dict: the hash table

The king. If lists are the most common container in Python, dictionaries are probably the most important. A big part of Python's internals is built on top of dictionaries.

A dict is a hash table with open addressing. The keys you insert get hashed, and the hash points to a slot.

Operation Average Worst
d[k], d[k] = v, del d[k] O(1) O(n) (hash collisions)
k in d O(1) O(n)
Iteration O(n) O(n)

From a protocol perspective, a dict implements MutableMapping, not Sequence.

There are two things you must remember about dictionaries:

  • Keys must be hashable
  • Insertion order is preserved

Set: the dict without values

A set is almost the same thing as a dictionary, except it only cares about keys.

It uses the same hash-table implementation underneath, which means it has:

  • O(1) average-time lookups, inserts, and removals
  • elements must be hashable
  • same "no duplicates" guarantee, duplicates are naturally eliminated
seen = set()
seen.add(user_id)
if user_id in seen:
        ...

# the antipattern
seen = []
if user_id in seen:
        ...
seen.append(user_id)
Enter fullscreen mode Exit fullscreen mode

So in practice it's just a collection of unique keys. From a protocol perspective, a set implements MutableSet, which gives you built-in set operations like: |, &, -, ^

Semantically, a set means: “I only care about uniqueness and fast membership checks.”

Conclusion

These are some of the most commonly used containers in Python in our applications, and it's important to understand the trade-offs when deciding which one to use. When you reach for a container, try to ask these:

  • What's the access pattern?
  • Will it grow?
  • Does identity-by-contents matter?
  • What does the choice say to the reader?

These are the common bad choices we covered and what you should reach for instead:

You wrote You probably wanted
list.pop(0) in a loop collections.deque
if x in big_list repeatedly set (convert once)
f"{a}:{b}" as a dict key (a, b) tuple key
List of (key, value) pairs dict
dict[str, None] for membership set
Tuple with named indexes everywhere namedtuple or dataclass

If you remember one thing: the container you pick is a contract, not a convenience.

Top comments (0)