DEV Community

Cover image for Python Generators
Odhiambo Paul
Odhiambo Paul

Posted on • Edited on

Python Generators

Generators are functions that return an iterable set of items that can be iterated through one at a time. They were introduced with Python Enhancement Proposal 255 (PEP 255). Python Enhancement Proposal 255 is a design document that provides information to the Python developer community describing the concept of generators in Python.

Python generators are objects that can be looped over similar to that of a list. Unlike lists, lazy iterator contents are not stored in the memory. The efficient way to iterate through large datasets is through the use of generators.

Prerequisites

You'll need to have a basic understanding of Python to get started. This will help guide you through each of the functions and the keywords. All the functions and code used can be found on repl.it.

To execute various functions, click on RUN.

You will now be able to access the variables and the function return values from the Python interpreter.

A generator function is defined like a normal function, but rather than returning a value from the return keyword, it produces and returns a value via the yield keyword. If we find the keyword yield in the function, it becomes a generator function and returns a generator object.

Generator Object uses lazy evaluations to yield sequences. To get values from the generator object, call the next() method on the generator object or loop through the generator object.

A yield statement pauses the execution of the function and returns the value to the caller, but maintains the status of the function to allow the function to continue execution from where it is left off. When resumed, the function continues execution immediately after the last run of the yield. This generates a set of values over time, rather than computing them at once and sending them back as a list.

Generator Expressions are a memory-efficient and high-performance generalization of list comprehensions and generators as discussed later in the article.

Generator Expressions versus List Comprehensions

list_comprehension = ['List Comprehension' for n in range(4)]
generator_expression = ('Generator expression' for n in range(4))
Enter fullscreen mode Exit fullscreen mode

Unlike list comprehensions, generator expressions don’t construct list-objects, they generate values “just in time” as a generator function or class-based iterator would. Class-based iterator and generator functions implement the iterators which makes it possible to construct a list object by calling the next() object.

You need to call next() or loop through the generator object to access the values produced by the generator expression. When there isn't the next value in the generator object, a StopIteration exception is thrown.

>>> next(generator_expression)
'List Comprehension'
>>> next(generator_expression)
'List Comprehension'
>>> next(generator_expression)
'List Comprehension'
>>> next(generator_expression)
'List Comprehension'
>>> next(generator_expression)
StopIteration
Enter fullscreen mode Exit fullscreen mode

A for loop can be used to iterate the generator object.

for expression in generator_expression:
    print(expression)
Enter fullscreen mode Exit fullscreen mode

Given a range of n numbers to filter out odd numbers. We could solve the problem using three different methods listed below.

  1. Using function.
def even_integers_function(n):
    result = []
    for i in range(n):
        if i%2==0:
            result.append(i)
    return result
Enter fullscreen mode Exit fullscreen mode
>>>even_integers_function(10)
[0,2,4,6,8,10]
Enter fullscreen mode Exit fullscreen mode
  1. Using Generator function. The generator function below yields a generator object that we can iterate through to get a list of even numbers.
def even_integer_generator(n):
    for i in range(n):
        if i % 2 == 0:
            yield i
Enter fullscreen mode Exit fullscreen mode

A generator function returns a generator object that can be iterated to get the values.

>>>even_integer_generator(10)
<generator object even_integer_generator at 0x1036c3200>
Enter fullscreen mode Exit fullscreen mode
>>>list(even_integer_generator(10))
[0,2,4,6,8,10]
Enter fullscreen mode Exit fullscreen mode
  1. Using Generator expression.
>>> even_integers = (i for i in range(10) if i%2==0)
>>> list(even_integers)
[0,2,4,6,8]
Enter fullscreen mode Exit fullscreen mode

Given a list of names, we can convert each name in the list to uppercase using the following code:

name_list = ["Adam","Eve","John","Doe","Peter","Paul","Kevin"]
Enter fullscreen mode Exit fullscreen mode
  1. Use of functions.
def names_to_uppercase(names):
    uppercase_names = []
    for name in names:
        name.upper()
        uppercase_names.append(name)
    return uppercase_names
Enter fullscreen mode Exit fullscreen mode
  1. Use of generator functions.
def names_to_uppercase_generator(names):
    for name in names:
        name.upper()
        yield name
Enter fullscreen mode Exit fullscreen mode
  1. Use of generator expressions.
>>> uppercase_names = (name.upper() for name in name_list)
>>> list(uppercase_names)
['ADAM', 'EVE', 'JOHN', 'DOE', 'PETER', 'PAUL', 'KEVIN']
Enter fullscreen mode Exit fullscreen mode

Profiling Generator Performance

>>> import sys
# Square root of numbers in a range using list comprehension.
>>> nums_squared_list = [n * 2 for n in range(10000)]
>>> sys.getsizeof(nums_squared_list)
87724
# Square root of numbers in a range using generator functions.
>>> nums_squared_generator = (n ** 2 for n in range(10000))
>>> print(sys.getsizeof(nums_squared_generator))
125
Enter fullscreen mode Exit fullscreen mode

List comprehension uses 87724 bytes of memory while the generator function uses only 125 bytes of memory. By using generators we save memory when compared to list comprehension where a lot of memory is used.

The getsizeof() object returns the amount of memory that holds the nums_squared_list list compared to len() object that would return the total number of items with the nums_sqaured_list list.

>>>len(nums_squared_list)
10000
Enter fullscreen mode Exit fullscreen mode

Generating the Fibonacci sequence using generator functions.
The Fibonacci sequence is a series of numbers where the next number is identified by adding the two previous numbers.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

def fibonacci_gen():
    trailing,leading = 0,1
    while True:
        yield leading
        trailing,leading = leading, trailing+leading
Enter fullscreen mode Exit fullscreen mode

To get the first value of the Fibonacci sequence that we generated using generator function above we call the fibonacci_gen() function and assign it to variable fib, we then call next() on the fib variable, fib.next() to get the next number in the sequence.

>>>fib = fibonacci_gen()
>>>fib.next()
Enter fullscreen mode Exit fullscreen mode

To get the list of the first 10 values for the fibonacci_gen() function we call the fib.next() inside a for loop with a range of 10:

for _ in range(10):
    fib.next()
Enter fullscreen mode Exit fullscreen mode

Building Generator pipelines

Data pipelines allow you to string together code to process large datasets or streams of data without maxing out your machine’s memory.

When working with pipelines:

  • Several pipes can be linked together.

  • Items flow one by one through the entire pipeline.

  • We can package pipeline functionality into a callable function.

To get the longest name from the text file names.txt, we create two generator methods.

  • The first generator method full_names yields names line by line stripping off the new line from each line.
  • The second generator method lengths yield a tuple with a name and an integer representing its length.
  • To get the longest name we use max and pass in the lengths generator object and pass a key for the lengths.
def get_longest_name():
    full_names = (name.strip() for name in open("text/names.txt"))
    lengths = ((name,len(name)) for name in full_names)
    longest = max(lengths,key=lambda x:x[1])
Enter fullscreen mode Exit fullscreen mode

Conclusions

In this article we went over the concepts of generators in Python. They are useful when working with complex datasets and performing memory-intensive tasks. You can find more on Python generators from the Python wiki. These will help you create better applications with cleaner code and offer alternatives to creating nested for loops and working with list comprehensions that are memory intensive.

Top comments (0)