aykhlf yassir

Posted on Feb 16

Python Internals: Generators & Coroutines

#python #performance #cleancode #beginners

There's a function that exists in almost every Python codebase. It looks harmless:

def get_trades(symbol: str) -> list[dict]:
    results = []
    for record in enormous_database_cursor:
        if record["symbol"] == symbol:
            results.append(record)
    return results

trades = get_trades("AAPL")  # 💀 Waits. Waits. Waits. Then crashes.

The problems stack up fast:

High latency: You get zero items until the entire database is scanned
Massive RAM: Every matching record is held in memory simultaneously
Fragility: One spike in result size kills the process

This is the "eager" pattern—do all the work, collect all the results, then hand them over. For small datasets, you'll never notice. For anything real-world, it's a time bomb.

The fix is a single keyword. But to use it correctly, you need to understand what it actually does to your function.

1. The Basics: What `yield` Does to a Function

Every Python function you've written follows the same lifecycle:

Call → Execute → return value → Stack frame is destroyed → Done

Local variables evaporate. State is gone. The function has no memory that it ever ran.

yield breaks this contract entirely.

The Normal Function: A Sprint

def countdown_list(n: int) -> list[int]:
    result = []
    while n > 0:
        result.append(n)
        n -= 1
    return result  # Hands you everything at once, then dies

One call. One massive result. The function's stack frame is created, used, and destroyed.

The Generator: A Pause Button

def countdown(n: int):
    while n > 0:
        yield n    # Pause here, hand back n, wait to be resumed
        n -= 1

The moment Python sees yield in a function body, the rules change. Calling countdown(5) no longer executes a single line of code. Instead, Python hands you back a generator object—a suspended, ready-to-run machine.

gen = countdown(5)
print(gen)          # <generator object countdown at 0x7f...>
print(next(gen))    # 5  → Runs until yield, pauses, returns 5
print(next(gen))    # 4  → Resumes, runs until yield, pauses, returns 4
print(next(gen))    # 3  → Same

What makes this possible? When a generator pauses at yield, its entire stack frame—local variables, current line number, the value of n—is moved from the stack to the heap. It doesn't disappear. It waits, frozen in time, until next() is called again.

Normal function:    [Stack Frame] → return → [Destroyed]

Generator:          [Stack Frame] → yield → [Moved to Heap, frozen]
                                          ↓
                    next() called  → [Thawed, execution resumes]
                                          ↓
                                   yield → [Frozen again]

Old Way vs. New Way: Side by Side

Before generators, you had to implement the iterator protocol manually—a verbose class with __iter__ and __next__:

# THE OLD WAY: Class-based iterator (20 lines of boilerplate)
class Countdown:
    def __init__(self, start: int) -> None:
        self.current = start

    def __iter__(self):
        return self

    def __next__(self) -> int:
        if self.current <= 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value

# THE NEW WAY: Generator function (4 lines, zero boilerplate)
def countdown(n: int):
    while n > 0:
        yield n
        n -= 1

Same behavior. Same memory efficiency. Same protocol compatibility—countdown(5) works anywhere Countdown(5) does.

The yield keyword is a class-based iterator, fully implemented, in one line.

2. Infinite Data Pipelines: The "Pull" Model

Here's where generators move from "interesting" to "indispensable."

Consider the difference:

import sys

# Eager: allocate the entire sequence in RAM
big_list = [x ** 2 for x in range(1_000_000)]
print(f"List size:      {sys.getsizeof(big_list):>12,} bytes")  
# List size:       8,448,728 bytes (~8 MB)

# Lazy: a tiny object that knows *how* to produce values
big_gen = (x ** 2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(big_gen):>12,} bytes")
# Generator size:         104 bytes (~104 bytes)

Eight megabytes vs. 104 bytes. The generator doesn't store the squares—it stores the recipe for producing the next one. Scale this to 10GB of log files or a live market feed, and this difference is what separates a working system from a crashed one.

Generator Expressions: Lazy List Comprehensions

The (x for x in ...) syntax is a generator expression—the lazy sibling of list comprehensions.

# List comprehension: eager, executes immediately
squares_list = [x ** 2 for x in range(10)]    # All 10 computed NOW

# Generator expression: lazy, executes on demand
squares_gen  = (x ** 2 for x in range(10))    # NONE computed yet

Square brackets → list (eager). Parentheses → generator (lazy).

The Pipeline Architecture

Generators compose naturally into pipelines—a chain of lazy transformations where data flows through only when pulled from the end:

  Source          Filter               Processor
    │                │                     │
market_ticker() ──► (t for t              ──► trading logic
[infinite stream]    if t == 'AAPL')          (processes one
                    [lazy filter]              tick at a time)

def market_ticker():
    """Simulates an infinite stream of market data."""
    import random, itertools
    symbols = ['AAPL', 'GOOG', 'MSFT', 'AMZN']
    for i in itertools.count():
        yield {
            'symbol': random.choice(symbols),
            'price': round(random.uniform(100, 300), 2),
            'volume': random.randint(100, 10000),
        }

# Build the pipeline — nothing executes yet
ticker    = market_ticker()                             # Source: infinite generator
aapl_only = (t for t in ticker if t['symbol'] == 'AAPL') # Filter: lazy expression

# Data flows ONLY when we pull from the end
tick = next(aapl_only)  # NOW it runs: pulls from ticker until it finds AAPL
print(tick)  # {'symbol': 'AAPL', 'price': 172.34, 'volume': 4821}

Nothing ran when we built the pipeline. No data was fetched, no filtering occurred. The entire chain is dormant until we call next(). This is lazy evaluation—the pipeline pulls data through only as fast as you consume it.

This is how you process a terabyte log file with 104 bytes of working memory.

3. The Deep Dive: Generators as Coroutines

Everything above treats generators as producers: you pull data out of them via next().

But generators can also be consumers: you push data into them via .send(). This transforms a generator from a simple stream into a stateful processing unit—what computer scientists call a coroutine.

`yield` as an Expression

Normally, yield value is a statement—it sends a value out. But it can also be an expression that receives a value:

def accumulator():
    total = 0
    while True:
        value = (yield total)  # Pause: send out total, wait to receive a value
        if value is not None:
            total += value

acc = accumulator()
next(acc)        # Prime the coroutine (advance to first yield)
acc.send(10)     # Push 10 in → total becomes 10
acc.send(20)     # Push 20 in → total becomes 30
result = acc.send(5)
print(result)    # 35

The priming step (next(acc)) is required. A fresh generator is frozen at the start of the function, before any yield has been reached. You must advance it to the first yield before you can send anything to it.

The Three Generator Controls

Operation	Syntax	What it does
Pull	`next(gen)`	Resume, run until next `yield`, return yielded value
Push	`gen.send(val)`	Resume with `val` as the result of `yield`, run until next `yield`
Throw	`gen.throw(ExcType)`	Resume by raising an exception at the yield point
Close	`gen.close()`	Throw `GeneratorExit`, shut down cleanly

The .throw() method is particularly powerful. Instead of crashing your pipeline when bad data appears, you can inject the error directly at the coroutine's pause point and let it handle recovery internally.

Building a Finite State Machine with `yield`

A coroutine's "current line number" is its state. No state variables. No if state == "WATCHING" branching at the top. The control flow itself encodes the state.

from enum import Enum, auto

class BotState(Enum):
    WATCHING = auto()
    ACTIVE   = auto()

def trading_bot(entry_threshold: float = 150.0,
                exit_threshold:  float = 200.0) -> None:
    """
    A coroutine FSM with two states:
      WATCHING → waiting for a low price to enter a position
      ACTIVE   → holding a position, waiting to exit at profit
    """
    print("[BOT] Initialized. State: WATCHING")
    entry_price: float = 0.0

    while True:
        try:
            # ── STATE: WATCHING ──────────────────────────────
            while True:
                price: float = (yield)               # Wait for next tick
                print(f"[WATCHING] AAPL @ ${price:.2f}")
                if price <= entry_threshold:
                    entry_price = price
                    print(f"[SIGNAL] Entry at ${entry_price:.2f} → switching to ACTIVE")
                    break                            # Transition to ACTIVE

            # ── STATE: ACTIVE ────────────────────────────────
            while True:
                price = (yield)                      # Wait for next tick
                pnl   = price - entry_price
                print(f"[ACTIVE]   AAPL @ ${price:.2f} | PnL: ${pnl:+.2f}")
                if price >= exit_threshold:
                    print(f"[SIGNAL] Exit at ${price:.2f} | Profit: ${pnl:.2f} → switching to WATCHING")
                    break                            # Transition back to WATCHING

        except ValueError as e:
            # Bad tick injected via .throw() — reset to WATCHING without crashing
            print(f"[ERROR] Bad data received: {e}. Resetting to WATCHING.")
            entry_price = 0.0
            # Loop continues: back to WATCHING state

4. The Grand Finale: The High-Frequency Trading Bot

Let's wire everything together. Four components. One elegant pipeline.

The Architecture

┌─────────────────────────────────────────────────────────┐
│                    PIPELINE OVERVIEW                     │
│                                                          │
│  [Source]          [Filter]           [Sink]             │
│  market_ticker()──►aapl_stream ──────►trading_bot()      │
│  (Generator)       (Gen. Expression)  (Coroutine FSM)    │
│       │                  │                  │            │
│   Produces all       Passes only        Consumes AAPL   │
│   symbols lazily     AAPL ticks         ticks, manages  │
│                                         state internally │
│                                                          │
│                   [Bridge]                               │
│                   for loop with                          │
│                   .send() / .throw()                     │
└─────────────────────────────────────────────────────────┘

The Complete System

import random
import itertools
import sys

# ── COMPONENT 1: THE SOURCE ──────────────────────────────────────
def market_ticker():
    """Infinite stream of market ticks. Never terminates."""
    symbols = ['AAPL', 'GOOG', 'MSFT', 'AMZN']
    for _ in itertools.count():
        yield {
            'symbol': random.choice(symbols),
            'price':  round(random.uniform(100, 300), 2),
        }

# ── COMPONENT 2: THE FILTER ──────────────────────────────────────
def build_pipeline():
    ticker    = market_ticker()
    # Occasionally inject a bad tick to test error handling
    aapl_stream = (
        t for t in ticker
        if t['symbol'] == 'AAPL'
    )
    return aapl_stream

# ── COMPONENT 3: THE CONSUMER (FSM Coroutine) ────────────────────
# (trading_bot as defined in Section 3 above)

# ── COMPONENT 4: THE BRIDGE ──────────────────────────────────────
def run(tick_limit: int = 30) -> None:
    stream = build_pipeline()
    bot    = trading_bot(entry_threshold=150.0, exit_threshold=200.0)

    # Prime the coroutine — advance it to the first yield
    next(bot)

    ticks_processed = 0
    for tick in stream:
        if ticks_processed >= tick_limit:
            break

        price = tick['price']

        # Simulate occasional bad data (1-in-10 chance)
        if random.random() < 0.1:
            bad_price = -abs(price)  # Corrupt tick: negative price
            try:
                bot.throw(ValueError(f"Negative price: {bad_price}"))
                next(bot)            # Re-prime after error recovery
            except StopIteration:
                print("[BRIDGE] Bot shut down during error recovery.")
                break
        else:
            try:
                bot.send(price)      # Normal operation: push price to bot
            except StopIteration:
                print("[BRIDGE] Bot has shut down.")
                break

        ticks_processed += 1

    bot.close()  # Send GeneratorExit — clean shutdown
    print(f"\n[BRIDGE] Pipeline complete. Processed {ticks_processed} AAPL ticks.")

if __name__ == "__main__":
    run(tick_limit=30)

Why `.throw()` is the Pro Pattern

Most tutorials show .send() and call it a day. But .throw() is what makes a coroutine-based FSM production-grade.

The alternative to .throw() is sentinel values:

# Naive approach: use magic values to signal errors
bot.send(-1)  # Hope the bot understands -1 means "bad data"

This is fragile. It poisons your data channel with control signals. What if -1 is a legitimate (if unusual) price? What if you need to distinguish between different error types?

.throw() keeps the error channel and the data channel separate:

# The Bridge: clear separation of concerns
for tick in stream:
    if tick['price'] < 0:
        bot.throw(ValueError(f"Corrupt tick: {tick}"))  # Error channel
    else:
        bot.send(tick['price'])                          # Data channel

The coroutine catches it in a try/except at its yield point—exactly like a normal function. The state machine resets cleanly. The pipeline keeps running. Zero sentinel values. Zero ambiguity.

Conclusion: Why This Matters

Let's close with three concrete reasons this mental model changes how you write code.

1. Memory Efficiency: O(1) Space for Infinite Streams

# This processes a 10GB log file in constant memory
def find_errors(path: str):
    with open(path) as f:
        yield from (line for line in f if "ERROR" in line)

for error_line in find_errors("application.log"):
    alert(error_line)

No list. No .readlines(). A single line lives in memory at a time.

2. State Management: Your Line Number Is Your State

The trading_bot() coroutine has zero explicit state variables for its FSM transitions. The while True loop it's currently executing in is the state. Python's own call stack manages it.

Compare that to the class-based alternative:

# The non-generator version: manual state management
class TradingBot:
    def __init__(self):
        self.state = "WATCHING"    # Explicit state
        self.entry_price = 0.0

    def process(self, price: float) -> None:
        if self.state == "WATCHING":   # State checks everywhere
            ...
        elif self.state == "ACTIVE":
            ...

More code. More surface area for bugs. More branching to read and maintain.

3. Composability: UNIX Pipes for Your Data

Each component in our pipeline does exactly one thing: the source generates ticks, the filter screens symbols, the bot manages trades. They're connected by convention—the iterator protocol—not by inheritance or tight coupling.

You can swap any component without touching the others:

# Swap source: real broker API instead of random data
ticker = broker_api.stream()           # Same interface, different source

# Swap filter: multiple symbols
stream = (t for t in ticker if t['symbol'] in {'AAPL', 'MSFT'})

# Swap sink: logging bot instead of trading bot
bot = audit_logger(output="trades.log")  # Same .send() interface

Small tools. Single responsibilities. Glued by protocol.

DEV Community

Python Internals: Generators & Coroutines

1. The Basics: What `yield` Does to a Function

The Normal Function: A Sprint

The Generator: A Pause Button

Old Way vs. New Way: Side by Side

2. Infinite Data Pipelines: The "Pull" Model

Generator Expressions: Lazy List Comprehensions

The Pipeline Architecture

3. The Deep Dive: Generators as Coroutines

`yield` as an Expression

The Three Generator Controls

Building a Finite State Machine with `yield`

4. The Grand Finale: The High-Frequency Trading Bot

The Architecture

The Complete System

Why `.throw()` is the Pro Pattern

Conclusion: Why This Matters

1. Memory Efficiency: O(1) Space for Infinite Streams

2. State Management: Your Line Number Is Your State

3. Composability: UNIX Pipes for Your Data

Top comments (0)

1. The Basics: What yield Does to a Function

The Normal Function: A Sprint

The Generator: A Pause Button

Old Way vs. New Way: Side by Side

2. Infinite Data Pipelines: The "Pull" Model

Generator Expressions: Lazy List Comprehensions

The Pipeline Architecture

3. The Deep Dive: Generators as Coroutines

yield as an Expression

The Three Generator Controls

Building a Finite State Machine with yield

4. The Grand Finale: The High-Frequency Trading Bot

The Architecture

The Complete System

Why .throw() is the Pro Pattern

Conclusion: Why This Matters

1. Memory Efficiency: O(1) Space for Infinite Streams

2. State Management: Your Line Number Is Your State

3. Composability: UNIX Pipes for Your Data

1. The Basics: What `yield` Does to a Function

`yield` as an Expression

Building a Finite State Machine with `yield`

Why `.throw()` is the Pro Pattern