Generator and yield are used frequently in Python. In this article, let’s discuss some basics of generator
, the benefit for generator, and how we use yield
to create a generator.
At the same time, we study two concepts in computer science: lazy evaluation and stream.
Iterables
First, we need to know iterables
before understanding generator
. Because generator
is also an iterator in essential.
In Python, iterable is an object which can be iterated over , such as in a for-loop.
Most collection data structures are iterables, such as list, tuple, set. For example, we create a list and iterate it one by one:
lst = [1, 2, 3]
for i in lst:
print(i)
# 1
# 2
# 3
lst = [x+x for x in range(3)]
for x in lst:
print(x)
# 0
# 2
# 4
We can also iterate the characters in a string:
string = "cat"
for c in string:
print(c)
# c
# a
# t
The limitation of iterable
The limitation of iterable is that we need to store all the values in memory, before we begin to iterate over them. This will cost too much memory in some scenarios. A typical scenario is reading lines from a file:
def file_reader(file_path):
fp = open(file_path)
return fp.read().split("\n")
Think about what will happen if we read a large file, such as a file with a size of 6 GB?
We need to save all the lines in memory when loading content from the file.
Actually, in most cases, we only want to iterate line by line to finish some data processing tasks. We maybe break out the loop in advance, loading all lines into memory does not necessary.
Could we just have a strategy of reading data by need? Python introduced generator
to solve this problem.
Generator
A generator is also iterator, but its key feature is Lazy evaluation. Lazy evaluation is a classic concept in computer science and adopted by many programming languages such as Haskell. The core idea of lazy evaluation is call-by-need. Lazy evaluation can lead to a reduction in memory footprint.
A generator is an iterator in the style of iterating by need. We will not calculate and store the values at once, but generate them on the fly when we are iterating.
We have two ways to create a generator
, generator expression and generator function.
generator expression
is similar with list comprehension, except we use ()
. Similar as iterator, we use next
function to get the next item:
g1 = (x*x for x in range(10))
print(type(g1))
print(next(g1))
print(next(g1))
# <type 'generator'>
# 0
# 1
The difference here is we don’t compute all the values when creating the generator. x*x
is calculated when we are iterating the generator.
To understand the difference, let’s run this code snippet:
>>> timeit.timeit('lst = [time.sleep(1) for x in range(5)]', number=2)
10.032547950744629
>>> timeit.timeit('lst = (time.sleep(1) for x in range(5))', number=2)
1.0013580322265625e-05
As we can see from the result, when we create an iterable, it costs about 10 seconds. Because we execute the time.sleep(1) 10 times.
But time.sleep(1) actually does not run when we are creating generator.
Yield
Another way to create generator
is by using generator function, we use keyword yield
to return a generator in a function.
Let’s check out this fib
function which returns a generator with N Fibonacci numbers:
def fib(cnt):
n, a, b = 0, 0, 1
while n < cnt:
yield a
a, b = b, a + b
n = n + 1
g = fib(10)
for i in range(10):
print g.next(),
# 0 1 1 2 3 5 8 13 21 34
Let’s use yield
to rewrite above file reading program:
def file_reader(file_path):
for row in open(file_path, "r"):
yield row
for row in file_reader('./demo.txt'):
print(row),
In this way, we won’t load all the content into memory, but loading it by reading the lines.
Stream
With a generator, we could construct a data structure with infinite items. This kind of sequence of data elements is called Stream in computer science. It’s fancy because we could express the infinite concept in mathematical.
Suppose we want a sequence with all the Fibonacci numbers. How to achieve this?
We only need to remove the count parameter from the above function!
def all_fib():
n, a, b = 0, 0, 1
while True:
yield a
a, b = b, a + b
n = n + 1
all_fib_numbers = all_fib()
Yes! We get a variable which could stand for all the Fibonacci numbers. Let’s write a generic function to take n items from any stream.
def take(n, seq):
result = []
try:
for i in range(n):
result.append(next(seq))
except StopIteration:
pass
return result
So, take(all_fib_numbers, 10)
will return the first 10 Fibonacci numbers as a result.
Conclusion
Generator
in Python is a powerful tool for delaying computation and saving both time and space. The core idea of lazy evaluation is: compute it until you really need it. This also helps us to express the infinite concepts.
The post Python: Generator, Yield and Stream appeared first on Coder's Cat.
Top comments (1)
Very well written, short and precise.