DEV Community

Cover image for 3 easy ways to speed up your python code within minutes
Bence Kotis
Bence Kotis

Posted on

3 easy ways to speed up your python code within minutes

I. Benchmark, benchmark, benchmark

Benchmarking sounds like a tedious process, but if you already have working code separated into functions, it can be as easy as adding a decorator to the function you are trying to profile.

First off, lets install line_profiler so that we can measure the time spent on each line of code in our function:

pip3 install line_profiler

This provides a decorator(profile) that you can use to benchmark any function in your code line by line. As an example, lets say that we have the following code:

#filename: test.py

@profile
def sum_of_lists(ls):
    '''Calculates the sum of an input list of lists'''

    s = 0
    for l in ls:
        for val in l:
            s += val

    return s


#create a list of lists
smallrange = list(range(10000))
inlist = [smallrange, smallrange, smallrange, smallrange]

#now sum them
list_sum = sum_of_lists(inlist)

print(list_sum)

This will profile the sum_of_lists function when called - notice the profile decorator above the function definition.

Now we can profile our code by doing:

python3 -m line_profiler test.py

Which gives us:

Alt Text

The 5th column shows the percentage of the runtime spent on each line - this will point you to the section of your code that needs optimization the most, as this is where most of the runtime is spent.

Keep in mind that this benchmarking library has significant overhead, but it's perfect for finding weak points in your code and replacing them with something more efficient.

For running line_profiler inside Jupyter notebooks, check out the %%lprun magic command.

2. Compile your Python modules using Cython

If you don't want to modify your project at all but still want some performance gains for free, Cython is your friend.

Although Cython is not a general purpose python to C compiler, Cython lets you compile your python modules into shared object files(.so), which can be loaded by your main python script.

For this, you will need to have Cython, as well as a C compiler installed on your machine:

pip3 install cython

If you are on a Debian, you can download GCC by doing:

sudo apt install gcc

Lets separate the starting example code into 2 files, named test_cython.py and test_module.pyx:

#filename: test_module.pyx

def sum_of_lists(ls):
    '''Calculates the sum of an input list of lists'''

    s = 0
    for l in ls:
        for val in l:
            s += val

    return s

Our main file has to import this function from the test_module.pyx file:

#filename: test_cython.py

from test_module import *

#create a list of lists
smallrange = list(range(10000))
inlist = [smallrange,smallrange,smallrange,smallrange]

#now sum them
list_sum = sum_of_lists(inlist)

print(list_sum)

Now lets define a setup.py file for compiling our module using Cython:

#filename: setup.py

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("test_module.pyx")
)

Finally, its time to compile our module:

python3 setup.py build_ext --inplace

Now lets see how much better this version does compared to the original by timing them 1000 times:

In this case Cython nets us an almost 2X speed-up compared to the original - but this will vary depending on the type of code you are trying to optimize.

Alt Text

If you are looking to take advantage of Cython inside Jupyter notebooks, there is a %%Cython magic available which lets you compile your functions with minimal hassle.

3. Avoid loops when possible

In many cases using operations like map, list comprehensions or numpy.vectorize(usually the fastest) in python instead of loops can give you a significant performance boost without much work, as these operations are heavily optimized internally. Lets modify our previous example a bit by replacing the nested loops with map and sum:

#filename: test_map.py

def sum_of_lists_map(ls):
    '''Calculates the sum of an input list of lists'''

    return(sum(list(map(sum,ls))))


#create a list of lists
smallrange = list(range(10000))
inlist = [smallrange,smallrange,smallrange,smallrange]

#now sum them
list_sum = sum_of_lists_map(inlist)

print(list_sum)

Lets see how the new map version does compared to the original by timing them 1000 times:

The map version is over 6X faster than the original!

Alt Text

Conclusion

These were 3 easy to implement tips to net you some extra performance - for more information about line_profiler and Cython in Jupyter, you can check out the %%lprun and %%cython cell magics.

Top comments (0)