DEV Community

Cover image for Python Internals: The Iterator Protocol
aykhlf yassir
aykhlf yassir

Posted on

Python Internals: The Iterator Protocol

Stop Thinking in Lists. Start Thinking in Streams.

Here's a question that separates junior Python developers from senior ones:

What happens when you run for item in my_list?

If your answer was "Python loops over the list," you're not wrong—but you're describing the what, not the how. And the how is where the real engineering lives.

The for loop is syntactic sugar. Under the hood, it's a precise protocol—a contract between your code and Python's runtime. Once you understand that contract, you'll never load a 10GB file into memory again, your pipelines will run in constant memory, and you'll finally understand why generators feel like magic.

By the end of this post, you'll know:

  • The difference between an iterable and an iterator
  • What Python actually executes when you write a for loop
  • Why the same iterator can't be looped twice
  • How to build your own memory-efficient data pipelines from scratch

Let's pull back the curtain.


1. The Definitions:

The words "iterable" and "iterator" are used interchangeably in most tutorials. This can lead to confusion. Let's define them precisely.

The Analogy

Think of a book and a bookmark.

The book (Iterable) contains all the data. It can be read from the beginning any number of times. You can hand it to anyone, and they can start reading from page one. A list, a tuple, a str—these are all books.

The bookmark (Iterator) tracks where you are in a specific reading session. It has state. It knows you're on page 47. It can tell you the next word, then advance one position. Crucially: there is only one reading session. When you reach the last page, the bookmark is spent. Generators, file objects, map() results—these are all bookmarks.

The Interface

Python makes this concrete through two dunder methods:

Concept Must Implement Behavior
Iterable __iter__ Returns a new Iterator
Iterator __iter__ AND __next__ __next__ returns the next value, raises StopIteration when exhausted
# A list is an ITERABLE: it implements __iter__
my_list = [1, 2, 3]
print(hasattr(my_list, '__iter__'))  # True
print(hasattr(my_list, '__next__'))  # False - not an iterator!

# Calling __iter__ on a list creates an ITERATOR
my_iterator = iter(my_list)
print(hasattr(my_iterator, '__iter__'))  # True
print(hasattr(my_iterator, '__next__'))  # True - now we have both
Enter fullscreen mode Exit fullscreen mode

The iter() built-in calls __iter__. The next() built-in calls __next__. Everything else in Python is built on top of these two primitives.


2. The Mechanics: Deconstructing the for Loop

Here is the lie Python tells you every day:

# What you write (The Sugar)
for item in [1, 2, 3]:
    print(item)
Enter fullscreen mode Exit fullscreen mode

Here is what Python actually executes:

# What Python runs (The Reality)
_iter = iter([1, 2, 3])   # Step 1: Call __iter__, get an iterator

while True:                # Step 2: Loop forever
    try:
        item = next(_iter) # Step 3: Call __next__, get the next item
        print(item)        # Step 4: Execute the loop body
    except StopIteration:  # Step 5: Iterator is exhausted
        break              # Step 6: Exit the loop
Enter fullscreen mode Exit fullscreen mode

That's it. That's the entire for loop. No magic. No special knowledge of lists or tuples or strings. Just a mechanical protocol: call __iter__ once, call __next__ repeatedly until StopIteration is raised.

The Critical Insight: The Loop is Blind

The for loop does not care about your data structure. It cannot see whether you gave it a list, a database cursor, a network socket, or a custom class you wrote this morning.

It only asks one question: "Does this object speak the iterator protocol?"

If __iter__ and __next__ exist, the loop works. This is Python's duck typing philosophy applied at the language level. No inheritance required. No registration required. Just implement two methods.

# COMPLETELY custom iteration - no list, no built-in involved
class Countdown:
    def __init__(self, start: int) -> None:
        self.current = start

    def __iter__(self):
        return self  # More on this shortly

    def __next__(self) -> int:
        if self.current <= 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value

# The for loop has NO IDEA this is a custom class
for n in Countdown(5):
    print(n)  # 5, 4, 3, 2, 1
Enter fullscreen mode Exit fullscreen mode

3. Why Does __iter__ Return self?

If you look carefully at the Countdown class above, you'll notice something odd: __iter__ returns self. An iterator returning itself as an iterator.

If an iterable is a factory that creates iterators, and an iterator is a consumer that gets exhausted... why does an iterator pretend to be an iterable?

The Answer: Polymorphism

Consider this function:

def print_first_three(iterable):
    it = iter(iterable)  # Always call iter() first
    print(next(it))
    print(next(it))
    print(next(it))
Enter fullscreen mode Exit fullscreen mode

If we follow the "always call iter() first" convention, this function works with both iterables and iterators:

# Works with an iterable (list)
print_first_three([10, 20, 30, 40, 50])

# Works with an iterator (generator, file, network stream)
my_stream = iter([10, 20, 30, 40, 50])
print_first_three(my_stream)
Enter fullscreen mode Exit fullscreen mode

The second call works because when you call iter() on an iterator, it returns self. The protocol stays consistent. The caller doesn't need to know—or care—whether it received a fresh list or a half-consumed stream.

This is the rule, codified:

Iterable:  __iter__() → returns a NEW Iterator (fresh reading session)
Iterator:  __iter__() → returns SELF     (I am already the session)
Enter fullscreen mode Exit fullscreen mode
Object __iter__ __next__
list, tuple, str Returns a new list_iterator ✗ Not present
list_iterator Returns self ✓ Advances position
Generator Returns self ✓ Runs until yield
File object Returns self ✓ Reads next line

In short: Every iterator is also iterable. Not every iterable is an iterator.


4. The "One-Shot" Trap:

Here is a bug that has shipped to production more times than anyone will admit:

squares = map(lambda x: x**2, [1, 2, 3])

print(list(squares))  # [1, 4, 9] ✓
print(list(squares))  # []        ✗ Silent failure
Enter fullscreen mode Exit fullscreen mode

The second call returns an empty list. No error. No warning. Just silently wrong data.

Why? Because map() returns an iterator—a bookmark, not a book. The first list() call consumes every item, advancing the bookmark to the end. The second call finds the iterator exhausted. StopIteration is raised immediately. list() catches it and returns [].

This trap catches developers with:

# All of these are ONE-SHOT iterators:
squares = map(lambda x: x**2, range(5))
evens = filter(lambda x: x % 2 == 0, range(10))
pairs = zip([1, 2, 3], ['a', 'b', 'c'])

lines = open("data.txt")  # File objects too!
Enter fullscreen mode Exit fullscreen mode

The Fix: Know What You Have

The decision tree is simple:

Do I need to iterate this data multiple times?
├── YES → Store it: data = list(my_iterator)
└── NO  → Stream it: consume it once, discard it
Enter fullscreen mode Exit fullscreen mode
# If you need multiple passes: materialize to list
squares = list(map(lambda x: x**2, [1, 2, 3]))
print(squares)  # [1, 4, 9]
print(squares)  # [1, 4, 9] - Works! It's a list now.

# If one pass is enough: stay lazy, save memory
for square in map(lambda x: x**2, range(10_000_000)):
    process(square)  # Never loads 10M items into RAM
Enter fullscreen mode Exit fullscreen mode

The Generator Version of the One-Shot Trap

Generator functions are iterators, so they have the same behavior:

def squares(n: int):
    for i in range(n):
        yield i ** 2

gen = squares(5)

print(sum(gen))   # 30 ✓
print(sum(gen))   # 0  ✗ - exhausted!
Enter fullscreen mode Exit fullscreen mode

Every time you call squares(5), you get a new generator object—a fresh bookmark. But if you assign it to a variable and reuse that variable, you're sharing one bookmark between two consumers.


5. The Toolkit: itertools — Don't Reinvent the Wheel

Once you think in streams, you'll want to transform, combine, and slice them. Python's itertools module is the standard library for doing exactly this—all lazily, all in O(1) memory.
itertools are

The itertools module is high performance C code exposed to Python. Using it is almost always faster than writing your own loop logic.

Here are the three tools you'll reach for every week:

itertools.count — Infinite Streams

import itertools

# Infinite counter: 10, 11, 12, 13, ...
counter = itertools.count(start=10, step=1)

# Always pair with islice or a break condition
for n in counter:
    if n > 15:
        break
    print(n)  # 10, 11, 12, 13, 14, 15
Enter fullscreen mode Exit fullscreen mode

Use this anywhere you'd use while True with a manual counter.

itertools.chain — Linking Streams

import itertools

# Process multiple files as one continuous stream
# NO intermediate list created
log_lines = itertools.chain(
    open("january.log"),
    open("february.log"),
    open("march.log"),
)

for line in log_lines:
    process(line)  # Streams through all three files sequentially
Enter fullscreen mode Exit fullscreen mode

The alternative—open("jan") + open("feb") — doesn't work on file objects. chain is the correct tool for combining any iterables without materializing them, good stuff.

itertools.islice — Safe Consumption

import itertools

# Take the first 5 from any iterator, safely
def squares():
    n = 0
    while True:
        yield n ** 2
        n += 1

# Without islice, this would run forever
first_five = list(itertools.islice(squares(), 5))
print(first_five)  # [0, 1, 4, 9, 16]
Enter fullscreen mode Exit fullscreen mode

islice is the lazy equivalent of my_list[:5]. It doesn't generate all items first—it stops consuming after n items.

The Combinators at a Glance

Tool What it does Memory
count(n) Infinite incrementing stream O(1)
cycle(it) Repeat iterable forever O(n)
repeat(x, n) Yield x exactly n times O(1)
chain(*its) Concatenate iterables O(1)
islice(it, n) Take first n items lazily O(1)
takewhile(pred, it) Take while condition holds O(1)
dropwhile(pred, it) Skip while condition holds O(1)
batched(it, n) Group into n-sized chunks O(n)

Conclusion:

Let's close with the mental model shift.

Junior mindset: A for loop iterates over a collection.

Senior mindset: A for loop drives a protocol. Any object that speaks the protocol—regardless of what it is or where the data lives—can be iterated. The data could be in a list in RAM, a file on disk, a socket on the network, a database cursor, or computed on the fly from a mathematical formula. The loop doesn't care.

This is the power:

# These all work identically with the for loop:
for item in [1, 2, 3]:                          # In memory
for line in open("data.csv"):                   # On disk
for record in database.execute("SELECT ..."):   # In database
for packet in socket.recv_packets():            # On network
for value in itertools.count():                 # Computed infinitely
Enter fullscreen mode Exit fullscreen mode

Same protocol. Same syntax. Radically different resource profiles.

And the payoff in memory:

# Junior code: loads entire file
def count_errors(path: str) -> int:
    lines = open(path).readlines()  # 10GB in RAM 💀
    return sum(1 for line in lines if "ERROR" in line)

# Senior code: streams the file
def count_errors(path: str) -> int:
    with open(path) as f:           # ~constant memory ✓
        return sum(1 for line in f if "ERROR" in line)
Enter fullscreen mode Exit fullscreen mode

Both produce the same number. One will kill your server at 3AM.


Your Action Item

Go find that one function in your codebase that does this:

data = list(some_query_or_file_or_api_call())
for item in data:
    process(item)
Enter fullscreen mode Exit fullscreen mode

If you never use data again after the loop, that list() call is pure waste. Delete it. Stream directly. Your memory graph will thank you.

The for loop doesn't need the list. It never did.


This is part of an ongoing series on Python internals. If this post changed how you think about loops, follow along.

Top comments (0)