Why Generators?
As previously announced, our journey starts from the discussion about Python generators. But why? To recapitulate, let’s write a very simple async
function and see its execution result.
async def journey() -> None:
return 1
ret = journey()
print(type(ret))
<class 'coroutine'>
sys:1: RuntimeWarning: coroutine 'journey' was never awaited
Please ignore the warning for a bit. If it were a simple synchronous function(i.e. just the normal Python function), it would have been <class ‘int’>
, since the function returns 1
. But in this case, we have another Python object called “coroutine”. But then what is coroutine? To get some hint, let’s open the module typing
and look for typing.Coroutine
.
# some parts are omitted
# …
class Awaitable(Protocol[_T_co]):
@abstractmethod
def __await__(self) -> Generator[Any, None, _T_co]: ...
class Coroutine(Awaitable[_V_co], Generic[_T_co, _T_contra, _V_co]):
# …
So we can see that a coroutine object in Python is closely related to generators. As a matter of fact, throughout the Python history it has evolved from the concept of the generator objects. Therefore I would say it is essential to understand generators in order to investigate coroutine objects in Python.
It was a fairly long introduction. Let’s begin with Python generators.
Then What is a Generator?
I assume that you have come across the keyword “yield” or “yield from” so often in Python context, or at least once if you are beyond the beginner level of Python or any other programming languages.
At least in case of Python, this yield
keyword always goes with the term “generator”. They actually define each other - a generator must have the keyword yield
, and a function with yield
is recognized as a “generator function”(a function that makes a generator that executes the body of it).
But before we look into what this yield
does, let’s think about the meaning of the word “generator”. What that yield
has to do with the word “generator”? If we go back to the explanation from the docs about the yield expression), it says
When a generator function is called, it returns an iterator known as a generator.
Right. A generator is an iterator. But then what is an iterator in Python?
Iterator
To put it simply, iterator is an object that provides a consistent interface for accessing elements in a collection object, such as list, dictionary, or set in Python(so it is not only applied to Python, but it is a universal concept: consult the “Iterator” chapter in the G.o.F book). In Python specifically, any class with the __next__()
dunder can be an iterator, and this __next__()
method is executed when we traverse the objects with for … in …
loop, giving the “next” element of the current element from the collection of our interest.
Remark: Some might be curious about the __iter__()
dunder. In fact, it is what makes the object itera*ble*, not itera*ter*. If I borrow the words from G.o.F, an iterable is a factory object that creates an iterator. But digging into this is somewhat out of our current context, so I would like to stop here. For those who are interested in comparing iterables and iterators in Python, there are many resources, such as the one from RealPython.
So iterator is only concerning about tracking the next object. And, this is the exact interface that generators follow.
Generator as an Iterator
So a generator is an iterator. That means, a generator cares for a collection of data to be provided to the user, by considering what would come as "next". But how?
Many materials introduce generator in the context of lazy loading: it retrieves data only when it is necessary. To connect this feature to our previous brief discussion about iterators, a generator retrieves “next” only when it is required - the “next” data doesn’t exist in memory before the generator tries to get it.
As an example, you’ll see this lazy loading feature clearly from the following code:
from typing import TypeVar
T = TypeVar("T")
def get_element(*, element: T) -> T:
print(f"element generated: {element}")
return element
if __name__ == "__main__":
collection = ['hitotsu', 2, "three", 4, "go"]
print("--- non-generator(list comprehension) ---")
non_generator = [get_element(element=element) for element in collection]
for element in non_generator:
print(f"print element: {element}")
print("--- non-generator test ends ---")
print("--- generator(generator expression) ---")
generator = (get_element(element=element) for element in collection)
for element in generator:
print(f"print element: {element}")
print("--- generator test ends ---")
where the result should be:
--- non-generator(list comprehension) ---
element generated: hitotsu
element generated: 2
element generated: three
element generated: 4
element generated: go
print element: hitotsu
print element: 2
print element: three
print element: 4
print element: go
--- non-generator test ends ---
--- generator(generator expression) ---
element generated: hitotsu
print element: hitotsu
element generated: 2
print element: 2
element generated: three
print element: three
element generated: 4
print element: 4
element generated: go
print element: go
--- generator test ends ---
So in a nutshell, generator is an iterator that returns the next element on demand.
The Role of yield
in Generators
Remark: the explanation from the Python documentation is somewhat succinct: see this priceless Youtube video by ByteByteGo for understanding the concept of yield
and coroutine.
However, that doesn’t seem to be a special benefit of using generators instead of ordinary iterators. If we are concerning about heavy computation of each element(which is usually the reason we apply the concept of lazy loading), we could simply workaround by using ordinary iterators rather than directly computing the next element in advance, in order to implement lazy loading.
So rather than preparing for
data = [heavy1, heavy2, ..., heavy10]
we could just run
for i in range(10): heavy = heavy_computation
But then why would we still consider generators as valuable? Note that we haven’t discussed about yield
yet!.
Say we want to produce a collection of data, where each element is a result of some kind of heavy computational processes. If we simply use a normal iterator such as range
, we can write code like this:
class HeavyComputationResult:
# this is just for type annotation!
…
def heavy_computation(*, arg: int) -> HeavyComputationResult:
local_heavy_var = …
# …
return something
for i in range(5):
something = heavy_computation(arg=i)
# do something else
But this means that for every iteration we need to call this heavy_computation
function which would probably require a vast volume of stack memory. This could be a computational burden, since the CPU need to operate not only to provide the stack memory but also do other CPU-intensive tasks like computing local variables that might not change throughout the whole call stack.
That’s where our yield
comes in as a solution. If you read the docs or any other materials I introduced on this yield
expression, it preserves the callstack and only yields the control flow, such that we don’t have to compute local variables redundantly.
So our code could be improved as below:
def heavy_computation_generator(*, iter_times: int) -> HeavyComputationResult:
local_heavy_var = …
for i in range(iter_times):
yield something
So local_heavy_var
is called only once within this generator function, and now we could save our memory and time all together.
Simple(=“Classic”) Coroutine
By now, our generator only produces(=“generates”) data, but how is this related to async API after all? You’ll remember our discussion about “Generator” was originally from “Coroutine”s. But if you search for the words “Coroutine Python” on Google, most of the materials out there will come with keywords such as async
or await
, which are obviously meaning that you’re reading about Async APIs. So where is the gap between the generator and the coroutines in Python?
From PEP 342, you can discover some clues on this gap. The Motivation part clearly says that
- generator does have “pausing” functionality(with
yield
), but it only produces output data, not being able to consume input data - so there are many limitations, but one of the main one is that the programmer cannot fully control the flow of logic, since a generator doesn’t “listen”(=get an input) to the programmer
- one implication is that generators cannot communicate with each other well enough; i.e. maintaining stack frames while exchanging the control flow is fairly hard to implement, since we cannot directly manipulate a running generator
So as PEP 342 indicates, the coroutine concept in Python had been invented based on the generators. This transitional concept is called “simple coroutine” or “classic coroutine”. Here we would like to call it “simple”, following PEP 342. To fully understand how a “native coroutine” works in Python, we need to look into this simple coroutine first, which will be discussed in the next article.
Conclusion
In this post we have talked about what a generator is: it is an iterator object that retrieves data on demand, maintaining its stack frame for higher performance.
Please stay in tune for the next article, “Generators as Coroutines”.
Top comments (1)
This tutorial will give you a firm grasp of Python's approach to async IO, which is a concurrent programming design that has received dedicated support in .
E-Commerce Agency In Dubai