Stop Thinking in Lists. Start Thinking in Streams.
Here's a question that separates junior Python developers from senior ones:
What happens when you run for item in my_list?
If your answer was "Python loops over the list," you're not wrong—but you're describing the what, not the how. And the how is where the real engineering lives.
The for loop is syntactic sugar. Under the hood, it's a precise protocol—a contract between your code and Python's runtime. Once you understand that contract, you'll never load a 10GB file into memory again, your pipelines will run in constant memory, and you'll finally understand why generators feel like magic.
By the end of this post, you'll know:
- The difference between an iterable and an iterator
- What Python actually executes when you write a
forloop - Why the same iterator can't be looped twice
- How to build your own memory-efficient data pipelines from scratch
Let's pull back the curtain.
1. The Definitions:
The words "iterable" and "iterator" are used interchangeably in most tutorials. This can lead to confusion. Let's define them precisely.
The Analogy
Think of a book and a bookmark.
The book (Iterable) contains all the data. It can be read from the beginning any number of times. You can hand it to anyone, and they can start reading from page one. A list, a tuple, a str—these are all books.
The bookmark (Iterator) tracks where you are in a specific reading session. It has state. It knows you're on page 47. It can tell you the next word, then advance one position. Crucially: there is only one reading session. When you reach the last page, the bookmark is spent. Generators, file objects, map() results—these are all bookmarks.
The Interface
Python makes this concrete through two dunder methods:
| Concept | Must Implement | Behavior |
|---|---|---|
| Iterable | __iter__ |
Returns a new Iterator |
| Iterator |
__iter__ AND __next__
|
__next__ returns the next value, raises StopIteration when exhausted |
# A list is an ITERABLE: it implements __iter__
my_list = [1, 2, 3]
print(hasattr(my_list, '__iter__')) # True
print(hasattr(my_list, '__next__')) # False - not an iterator!
# Calling __iter__ on a list creates an ITERATOR
my_iterator = iter(my_list)
print(hasattr(my_iterator, '__iter__')) # True
print(hasattr(my_iterator, '__next__')) # True - now we have both
The iter() built-in calls __iter__. The next() built-in calls __next__. Everything else in Python is built on top of these two primitives.
2. The Mechanics: Deconstructing the for Loop
Here is the lie Python tells you every day:
# What you write (The Sugar)
for item in [1, 2, 3]:
print(item)
Here is what Python actually executes:
# What Python runs (The Reality)
_iter = iter([1, 2, 3]) # Step 1: Call __iter__, get an iterator
while True: # Step 2: Loop forever
try:
item = next(_iter) # Step 3: Call __next__, get the next item
print(item) # Step 4: Execute the loop body
except StopIteration: # Step 5: Iterator is exhausted
break # Step 6: Exit the loop
That's it. That's the entire for loop. No magic. No special knowledge of lists or tuples or strings. Just a mechanical protocol: call __iter__ once, call __next__ repeatedly until StopIteration is raised.
The Critical Insight: The Loop is Blind
The for loop does not care about your data structure. It cannot see whether you gave it a list, a database cursor, a network socket, or a custom class you wrote this morning.
It only asks one question: "Does this object speak the iterator protocol?"
If __iter__ and __next__ exist, the loop works. This is Python's duck typing philosophy applied at the language level. No inheritance required. No registration required. Just implement two methods.
# COMPLETELY custom iteration - no list, no built-in involved
class Countdown:
def __init__(self, start: int) -> None:
self.current = start
def __iter__(self):
return self # More on this shortly
def __next__(self) -> int:
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
# The for loop has NO IDEA this is a custom class
for n in Countdown(5):
print(n) # 5, 4, 3, 2, 1
3. Why Does __iter__ Return self?
If you look carefully at the Countdown class above, you'll notice something odd: __iter__ returns self. An iterator returning itself as an iterator.
If an iterable is a factory that creates iterators, and an iterator is a consumer that gets exhausted... why does an iterator pretend to be an iterable?
The Answer: Polymorphism
Consider this function:
def print_first_three(iterable):
it = iter(iterable) # Always call iter() first
print(next(it))
print(next(it))
print(next(it))
If we follow the "always call iter() first" convention, this function works with both iterables and iterators:
# Works with an iterable (list)
print_first_three([10, 20, 30, 40, 50])
# Works with an iterator (generator, file, network stream)
my_stream = iter([10, 20, 30, 40, 50])
print_first_three(my_stream)
The second call works because when you call iter() on an iterator, it returns self. The protocol stays consistent. The caller doesn't need to know—or care—whether it received a fresh list or a half-consumed stream.
This is the rule, codified:
Iterable: __iter__() → returns a NEW Iterator (fresh reading session)
Iterator: __iter__() → returns SELF (I am already the session)
| Object | __iter__ |
__next__ |
|---|---|---|
list, tuple, str
|
Returns a new list_iterator | ✗ Not present |
list_iterator |
Returns self
|
✓ Advances position |
| Generator | Returns self
|
✓ Runs until yield
|
| File object | Returns self
|
✓ Reads next line |
In short: Every iterator is also iterable. Not every iterable is an iterator.
4. The "One-Shot" Trap:
Here is a bug that has shipped to production more times than anyone will admit:
squares = map(lambda x: x**2, [1, 2, 3])
print(list(squares)) # [1, 4, 9] ✓
print(list(squares)) # [] ✗ Silent failure
The second call returns an empty list. No error. No warning. Just silently wrong data.
Why? Because map() returns an iterator—a bookmark, not a book. The first list() call consumes every item, advancing the bookmark to the end. The second call finds the iterator exhausted. StopIteration is raised immediately. list() catches it and returns [].
This trap catches developers with:
# All of these are ONE-SHOT iterators:
squares = map(lambda x: x**2, range(5))
evens = filter(lambda x: x % 2 == 0, range(10))
pairs = zip([1, 2, 3], ['a', 'b', 'c'])
lines = open("data.txt") # File objects too!
The Fix: Know What You Have
The decision tree is simple:
Do I need to iterate this data multiple times?
├── YES → Store it: data = list(my_iterator)
└── NO → Stream it: consume it once, discard it
# If you need multiple passes: materialize to list
squares = list(map(lambda x: x**2, [1, 2, 3]))
print(squares) # [1, 4, 9]
print(squares) # [1, 4, 9] - Works! It's a list now.
# If one pass is enough: stay lazy, save memory
for square in map(lambda x: x**2, range(10_000_000)):
process(square) # Never loads 10M items into RAM
The Generator Version of the One-Shot Trap
Generator functions are iterators, so they have the same behavior:
def squares(n: int):
for i in range(n):
yield i ** 2
gen = squares(5)
print(sum(gen)) # 30 ✓
print(sum(gen)) # 0 ✗ - exhausted!
Every time you call squares(5), you get a new generator object—a fresh bookmark. But if you assign it to a variable and reuse that variable, you're sharing one bookmark between two consumers.
5. The Toolkit: itertools — Don't Reinvent the Wheel
Once you think in streams, you'll want to transform, combine, and slice them. Python's itertools module is the standard library for doing exactly this—all lazily, all in O(1) memory.
itertools are
The itertools module is high performance C code exposed to Python. Using it is almost always faster than writing your own loop logic.
Here are the three tools you'll reach for every week:
itertools.count — Infinite Streams
import itertools
# Infinite counter: 10, 11, 12, 13, ...
counter = itertools.count(start=10, step=1)
# Always pair with islice or a break condition
for n in counter:
if n > 15:
break
print(n) # 10, 11, 12, 13, 14, 15
Use this anywhere you'd use while True with a manual counter.
itertools.chain — Linking Streams
import itertools
# Process multiple files as one continuous stream
# NO intermediate list created
log_lines = itertools.chain(
open("january.log"),
open("february.log"),
open("march.log"),
)
for line in log_lines:
process(line) # Streams through all three files sequentially
The alternative—open("jan") + open("feb") — doesn't work on file objects. chain is the correct tool for combining any iterables without materializing them, good stuff.
itertools.islice — Safe Consumption
import itertools
# Take the first 5 from any iterator, safely
def squares():
n = 0
while True:
yield n ** 2
n += 1
# Without islice, this would run forever
first_five = list(itertools.islice(squares(), 5))
print(first_five) # [0, 1, 4, 9, 16]
islice is the lazy equivalent of my_list[:5]. It doesn't generate all items first—it stops consuming after n items.
The Combinators at a Glance
| Tool | What it does | Memory |
|---|---|---|
count(n) |
Infinite incrementing stream | O(1) |
cycle(it) |
Repeat iterable forever | O(n) |
repeat(x, n) |
Yield x exactly n times | O(1) |
chain(*its) |
Concatenate iterables | O(1) |
islice(it, n) |
Take first n items lazily | O(1) |
takewhile(pred, it) |
Take while condition holds | O(1) |
dropwhile(pred, it) |
Skip while condition holds | O(1) |
batched(it, n) |
Group into n-sized chunks | O(n) |
Conclusion:
Let's close with the mental model shift.
Junior mindset: A for loop iterates over a collection.
Senior mindset: A for loop drives a protocol. Any object that speaks the protocol—regardless of what it is or where the data lives—can be iterated. The data could be in a list in RAM, a file on disk, a socket on the network, a database cursor, or computed on the fly from a mathematical formula. The loop doesn't care.
This is the power:
# These all work identically with the for loop:
for item in [1, 2, 3]: # In memory
for line in open("data.csv"): # On disk
for record in database.execute("SELECT ..."): # In database
for packet in socket.recv_packets(): # On network
for value in itertools.count(): # Computed infinitely
Same protocol. Same syntax. Radically different resource profiles.
And the payoff in memory:
# Junior code: loads entire file
def count_errors(path: str) -> int:
lines = open(path).readlines() # 10GB in RAM 💀
return sum(1 for line in lines if "ERROR" in line)
# Senior code: streams the file
def count_errors(path: str) -> int:
with open(path) as f: # ~constant memory ✓
return sum(1 for line in f if "ERROR" in line)
Both produce the same number. One will kill your server at 3AM.
Your Action Item
Go find that one function in your codebase that does this:
data = list(some_query_or_file_or_api_call())
for item in data:
process(item)
If you never use data again after the loop, that list() call is pure waste. Delete it. Stream directly. Your memory graph will thank you.
The for loop doesn't need the list. It never did.
This is part of an ongoing series on Python internals. If this post changed how you think about loops, follow along.
Top comments (0)