Python Iterators: Clearing the Lazy/Eager Confusion

#python #performance #oop #architecture

Python Iterators: Clearing the Lazy/Eager Confusion

As a backend developer, I sometimes help companies evaluate candidates by reviewing their recorded technical interviews. However, over time, I’ve noticed a deeply ingrained misconception. When discussing memory management or data streaming, many developers explicitly state:

"Iterators in Python are inherently eager. If you want true lazy loading or lazy evaluation, you have to use generators and the yield keyword."

This misconception is common. Many popular bootcamps and online courses introduce lazy evaluation exclusively through generators. Custom class-based iterators are usually skipped or dismissed as boilerplate-heavy OOP theory rarely used in production Python.

This confusion is further reinforced by two common educational simplifications:

The List vs. Generator Expression Analogy: Beginners are taught that square brackets [...] (list comprehensions) are eager and take up memory, while parentheses (...) (generator expressions) are lazy. This often creates a false binary mental model: "generators = lazy, everything else = eager."
Standard "Textbook" Examples: When courses demonstrate a custom iterator, they usually write a basic class that accepts an already fully loaded list in its __init__ and simply increments an index in __next__. While this is valid for in-memory data, it leads developers to assume that custom iterators inherently require loading all data upfront.

In reality, generators are a specialized language feature designed to implement the iterator protocol automatically. They comply with the exact same interface (__iter__ and __next__). A generator is lazy not because of some magical property of the yield keyword, but simply because it adheres to this underlying contract.

To show that custom iterators can be lazy without using any generators or yield keywords, I’ve put together a lightweight and reproducible benchmark.

🧪 The Experiment: Proving Lazy Loading with Custom Iterators

Suppose we need to read a database export file (test_users_db.txt) containing 1_000_000 rows of user data. We want to iterate over these users but stop immediately once we find a specific record (at HALF_ROWS=500_000). If custom iterators were truly "eager," our lazy implementation would significantly increase RAM consumption.

💡 Please note: I’m using a local text file here as a simplified data source. However, in a production system, this exact same pattern is used to stream rows from a heavy SQL database cursor, fetch paginated data from an external HTTP API, or consume messages from a queue.

Let's look at the code:

from __future__ import annotations
import tracemalloc
from dataclasses import dataclass
from pathlib import Path

FILE_PATH = Path("test_users_db.txt")
TOTAL_ROWS = 1_000_000
HALF_ROWS = TOTAL_ROWS // 2


def generate_test_file(path: Path, rows: int = TOTAL_ROWS) -> None:
    if not path.exists():
        with open(path, "w", encoding="utf-8") as f:
            for i in range(1, rows + 1):
                f.write(f"{i},User_Name_{i}\n")


@dataclass(frozen=True)
class UserDTO:
    id: int
    name: str


# --- LAZY IMPLEMENTATION ---
class LazyUserGateway:
    def __init__(self, file_object):
        self.file = file_object

    def __iter__(self) -> LazyUserIterator:
        return LazyUserIterator(self.file)


class LazyUserIterator:
    def __init__(self, file_object):
        self.file = file_object

    def __iter__(self) -> LazyUserIterator:
        return self

    def __next__(self) -> UserDTO:
        # Step-by-step stream: reading only one line into memory at a time
        line = self.file.readline()
        if not line:
            raise StopIteration

        user_id, name = line.strip().split(",")
        return UserDTO(id=int(user_id), name=name)


# --- EAGER IMPLEMENTATION ---
class EagerUserGateway:
    def __init__(self, file_object):
        self.file = file_object

    def __iter__(self) -> EagerUserIterator:
        return EagerUserIterator(self.file)


class EagerUserIterator:
    def __init__(self, file_object):
        # The readlines() call forces Python to build a massive list of strings in RAM upfront
        self.lines = file_object.readlines()
        self.index = 0

    def __iter__(self) -> EagerUserIterator:
        return self

    def __next__(self) -> UserDTO:
        if self.index >= len(self.lines):
            raise StopIteration

        line = self.lines[self.index]
        self.index += 1

        user_id, name = line.strip().split(",")
        return UserDTO(id=int(user_id), name=name)


# --- BENCHMARKS ---
def test_eager_gateway() -> float:
    tracemalloc.start()

    with open(FILE_PATH, "r", encoding="utf-8") as f:
        gateway = EagerUserGateway(f)
        for user in gateway:
            if user.id == HALF_ROWS:
                print(f"[Eager] Final reached: {user}")
                break

    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return peak / (1024 * 1024)


def test_lazy_gateway() -> float:
    tracemalloc.start()

    with open(FILE_PATH, "r", encoding="utf-8") as f:
        gateway = LazyUserGateway(f)
        for user in gateway:
            if user.id == HALF_ROWS:
                print(f"[Lazy]  Final reached: {user}")
                break

    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return peak / (1024 * 1024)


if __name__ == "__main__":
    generate_test_file(FILE_PATH)
    print("\n--- Memory Benchmark Running ---")

    peak_eager = test_eager_gateway()
    print(f"-> Peak Memory (EagerUserGateway): {peak_eager:.2f} MB\n")

    peak_lazy = test_lazy_gateway()
    print(f"-> Peak Memory (LazyUserGateway):  {peak_lazy:.2f} MB\n")

    if FILE_PATH.exists():
        FILE_PATH.unlink()

📊 Benchmark Results

Running the script yields the following terminal output:

--- Memory Benchmark Running ---
[Eager] Final reached: UserDTO(id=500000, name='User_Name_500000')
-> Peak Memory (EagerUserGateway): 69.97 MB

[Lazy]  Final reached: UserDTO(id=500000, name='User_Name_500000')
-> Peak Memory (LazyUserGateway):  0.15 MB

This represents a ~500x reduction in RAM consumption (dropping from ~69.97MB to 0.15MB). While the eager implementation scales linearly with the dataset size, the lazy iterator maintains a flat, near-zero memory footprint regardless of the row count.

🧠 Iterators vs. Generators: Contract vs. Convenience

In practice, generators are the industry standard for a reason. In most daily tasks, writing a quick generator function with yield is much faster, safer, and cleaner than writing custom iterator classes. The generator object itself handles the boilerplate, implicitly saves execution state, and natively facilitates resource cleanup.

But as engineers, we shouldn't confuse implementation-specific capabilities with a structural interface contract, especially when designing architectural boundaries:

An Iterator is just a contract (interface). It guarantees that data can be traversed step-by-step via __next__. How that data is managed — whether the iterator eagerly fetches it upfront, lazily streams it from an API, or simply acts as a pointer to data passed to it from the outside — is entirely up to the developer who implements it. An iterator doesn't even need to own or fetch data; it just drives the traversal logic.
A Generator is a specialized language feature that auto-implements this protocol while supporting additional lifecycle and communication methods (via .send(), .throw(), and .close()). For basic data streaming, you don't need this additional complexity. A custom iterator class gives you a clean, one-way contract strictly limited to data traversal, keeping your architectural boundaries well-defined.

Custom class-based iterators deliver the exact same polymorphic interface. They aren't meant to replace generators everywhere. Instead, they prove that lazy evaluation is an architectural design choice, not a magic trick powered strictly by yield.

🏁 Conclusion

The "iterators are eager" take is just a byproduct of simplified educational content. Diving deeper into core protocols helps us write much better, more predictable architectural layers.

💬 What about you? Have you ever heard that custom Python iterators are inherently eager?