Python Tips #1: Generator And Yield Keyword In 5 Minutes
Python generator gives us a lazy iterator that is introduced in PEP255. We probably guess the idea from the “lazy” term. It means we are able to evaluate the piece of code whenever you want.
It’s hard to understand clearly, right? For easier to understand the idea behind the lazy iterator. Imagine that we have a book, 100 pages, but we have many things to do today and we can’t read all day. So we decided that we will read 1-page per day, mark the current page with a bookmark and stop reading. Every single step will be repeated day by day, and after 100 days we read all 100 pages of the book.
Creating A Generator
We have two primary ways to create a generator in Python: generator function and generator expression. Let’s see how to create a generator in both ways and what is the differences.
Generator Function
The main key to make a generator function different from a regular function is the yield
keyword. Below is a simple generator function:
def fib():
a, b = 0, 1
while 1:
yield b
a, b = b, a+b
In this example, instead of return at the end of the function, we use yield
. The yield keyword temporarily suspends the processing and returns the current result to the caller. The local variable, execution state will be stored and ready to resume anytime we call .next()
function.
Generator Expression
Instead of defining a generator function, Python offers a generator expression in order to create a generator quickly. It is very useful in the same situation such as list comprehension. We will create a generator in one line of code. Let’s see below example:
import random
# List comprehension: creating a list of random interger
list_comp = [random.randint(1,10) for i in range(10)]
## >> Output: [7, 1, 7, 1, 10, 1, 6, 7, 2, 1]
# Generator expression
gen_comp = (random.randint(1,10) for i in range(10))
## >> Output: <generator object <genexpr> at 0x1028651d0>
The slight difference between list comprehension and generator expression is the brackets and parentheses.
Generator Use Cases
The generator is useful in many use cases of programming. Most of the time, it optimizes memory and runs time. However, in a specific situation, it can help you solve the problem that you think you can’t handle.
In this section, we will walk through every use case example to be more familiar with the generator.
Example 1: Reading Extremely Large File
It is absolutely sure that the developers have to read a file at least one time in their working. It is so easy to do with a built-in function like open()
in Python. However, what will happen if we read an extremely large file, for example, a 50Gb data file.
Assuming we have a large text file called “test_file.txt”. We try to read it:
data_file = open("test_file.txt")
data_lines = data_file.read().split("\n")
We probably get a MemoryError
exception when executing the above code. The error happened because data_file.read()
will read entire data from the file into memory. We don’t have enough memory to store entire data, So it’s our problem to handle whenever reading a large file, especially the file size is twice as large as the machine’s memory.
In this case, the file object returned from open()
allows us to do a lazy iterator. We can avoid the MemoryError
by using the generator as below:
def read_large_file(file_path):
with open(file_path) as data_file:
for line in data_file:
yield line
data = read_large_file()
print(data)
print("The fist line: %s" % next(data))
print("The second line: %s" % next(data))
print("The third line: %s" % next(data))
""" Output
<generator object read_csv_file at 0x10ff9d780>
The fist line: "STATION","NAME","DATE","TAVG","TMAX","TMIN","WDFG","WSFG"
The second line: "CA006159123","UXBRIDGE WEST, ON CA","2019-01-01","26","38","13"," 29","107.4"
The third line: "CA006159123","UXBRIDGE WEST, ON CA","2019-01-02","17","27","7",,
"""
In the above example, we can see how a lazy iterator saves us. The read_large_file()
function returned a generator object instead of the entire content in the file. Then we can call next()
function to get the next line whenever we want. It’s so great. If you are new to the generator, use it in your next project.
We have described the syntax of generator expression to quickly build a generator. So let’s see how to use this expression in the above example:
data = (line for line in open("test_file.csv"))
print(data)
print("The fist line: %s" % next(data))
print("The second line: %s" % next(data))
print("The third line: %s" % next(data))
""" Output
<generator object read_csv_file at 0x10ff9d780>
The fist line: "STATION","NAME","DATE","TAVG","TMAX","TMIN","WDFG","WSFG"
The second line: "CA006159123","UXBRIDGE WEST, ON CA","2019-01-01","26","38","13"," 29","107.4"
The third line: "CA006159123","UXBRIDGE WEST, ON CA","2019-01-02","17","27","7",,
"""
Example 2: Optimize Memory Usage With Generator
In this example, we will dig deeper to understand the benefit of using the generator in Python. We will read a CSV file and compare the memory consumption between using the generator and read the entire file.
However, how to check memory usage of an object in Python? We have to say thank Python for a built-in function sys.getsizeof()
. Everything is ready, let’s do the code below:
import sys
with open("data.csv") as data_file:
data = data_file.read()
print("Size of normal object: %s bytes" % sys.getsizeof(data))
generator_data = (line for line in open("data.csv"))
print("Size of generator object: %s bytes" % sys.getsizeof(generator_data))
"""Output
Size of normal object: 297899 bytes
Size of generator object: 80 bytes
"""
When running this example, please remember to change my “data.csv” file by your own file. We can see the big difference in memory usage. The generator only uses 80 bytes
, while the other is 297899 bytes
. Obviously we save a lot of memory with a simple tip – Python generator.
Example 3: Generator vs List Comprehension
Clearly we know a generator expression and list comprehension look similar. And we can loop over both of it. Therefore, we want to use this example to compare generator and list comprehension in Python.
First of all, we run below code to determine memory usage:
import sys
import random
data = [random.randint(1, 40) for i in range(10)]
print("Size of normal object: %s bytes" % sys.getsizeof(data))
generator_data = (random.randint(1, 40) for i in range(10))
print("Size of generator object: %s bytes" % sys.getsizeof(generator_data))
"""Output
Size of normal object: 200 bytes
Size of generator object: 80 bytes
"""
In the above code, we generated a list of 10 random numbers and a generator of 10 random numbers. The output is the same as example 2. We can optimize memory. Next, we will figure out which one has better performance. We use a built-in module cProfile
to evaluate the execution time.
import cProfile
import random
cProfile.run("max([random.randint(1, 40) for i in range(100000)])")
"""Output
300004 function calls in 0.137 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.024 0.024 0.137 0.137 <string>:1(<module>)
100000 0.075 0.000 0.081 0.000 random.py:177(randrange)
100000 0.029 0.000 0.109 0.000 random.py:240(randint)
1 0.002 0.002 0.002 0.002 {max}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 0.005 0.000 0.005 0.000 {method 'random' of '_random.Random' objects}
1 0.002 0.002 0.002 0.002 {range}
"""
cProfile.run("max((random.randint(1, 40) for i in range(100000)))")
"""Output
400005 function calls in 0.149 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
100001 0.027 0.000 0.138 0.000 <string>:1(<genexpr>)
1 0.000 0.000 0.149 0.149 <string>:1(<module>)
100000 0.076 0.000 0.082 0.000 random.py:177(randrange)
100000 0.029 0.000 0.111 0.000 random.py:240(randint)
1 0.010 0.010 0.148 0.148 {max}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 0.006 0.000 0.006 0.000 {method 'random' of '_random.Random' objects}
1 0.001 0.001 0.001 0.001 {range}
"""
We should notice that we need 400005 function calls in 0.149 seconds to get the maximum of a list comprehension. While we only need 300004 function calls in 0.137 seconds. Because of that, if the memory isn’t a matter, we should use list comprehension for better performance.
Generator-iterator methods
Until here we get more familiar with the generator in Python. In this section, we will learn some advanced methods of the generator. These methods will help to control the execution of the generator function effectively.
Using .send(value)
We can send a specific value to a generator by using .send(value)
. Let’s see how to do that:
import random
def exampleGenerator(value=None):
for _ in range(10):
if value:
value = yield value
else:
value = yield random.randint(1, 40)
g = exampleGenerator()
print(next(g))
print(next(g))
print(g.send(50))
print(g.send(110))
""" Output
39
27
50
110
"""
Remember to define a value argument to allows receiving value via .send()
function.
Using .throw()
Another useful method of the generator is .throw()
. This method allows us to raise an exception where the execution stopped. Then return the next yielded value. It is a little bit hard to imagine, make an example to be clear:
def exampleGenerator(value=None):
for i in range(10):
try:
if value:
value = yield value
else:
value = yield i
except Exception as e:
print(e)
pass
g = exampleGenerator()
print(next(g))
print(next(g))
print(g.throw(ValueError, "[ValueError] Using .throw()!!!"))
""" Output
0
1
[ValueError] Using .throw()!!!
2
"""
In this code, we create a generator that its values from 1 to 10. At the line where we call .throw()
, the generator gets the exception first and returns the right yielded value (2) right after.
Using .close()
This method is easy to understand. Just call it whenever we want to close a generator. After that, we can’t iterate this generator anymore. If we try to execute the next()
function, we will receive a StopIteration
exception.
def exampleGenerator(value=None):
for i in range(10):
try:
if value:
value = yield value
else:
value = yield i
except Exception as e:
print(e)
pass
g = exampleGenerator()
print(next(g))
print(next(g))
print(g.throw(ValueError, "[ValueError] Using .throw()!!!"))
g.close()
print(next(g))
""" Output
0
1
[ValueError] Using .throw()!!!
2
Traceback (most recent call last):
File "read_csv_file.py", line 19, in <module>
print(next(g))
StopIteration
"""
Conclusion
After a long read, let’s see what we have learned:
- Understanding what is the generator in Python.
- How to use a generator function and generator expression.
- How to avoid memory error when reading a large file.
- Optimize memory with the generator.
- Knowing when using a generator or list comprehension.
Have you ever use the generator in your project? If this is the first time you learn it, do you plan to use it in the future? Do you know another use case of the generator? Don’t hesitate to leave a comment to share it.
References
If need more information about the generator. We highly recommend the below references:
In case, you need more tutorials from us, just visit our FREE tutorials.
The post Python Tips #1: Generator And Yield Keyword In 5 Minutes appeared first on Python Geeks.
Top comments (0)