Unlocking Multi-Threading in Python with Numba

While Python is a great language to know and use and is very popular for machine learning and data science. However, one of its drawbacks is the lack of support for parallel execution of code (especially in the realm of scientific computing). While it does have a multithreading module, the threads do not execute in parallel because of the Global Interpreter Lock (GIL).

One workaround is to use the multiprocessing module; this module circumvents the GIL issue by creating separate python processes which then can execute in parallel. However, depending on your use case, this may not offer much benefit as interprocess communication can be more costly compared with a true multi-threaded process.

One option is Cython which is a language that meets Python and C/C++ halfway. It can allow you to implement your performance sensitive aspects in C++ (or C) and then wrap them into a Python module for ease of use. This approach is great once you have settled on and validated an idea and are ready to create a production ready version. However, for quick prototyping, this process can get a little clunky and sort of defeats the purpose of using a language like Python in the first place. Cython does support use of OpenMP (a shared memory multiprocessing API) to access certain components of multi-threading directly, but I've found it to be a little awkward when using it within a Jupyter notebook.

It is for these reasons that I've settled on Numba when I need to improve speed while working on an idea. Roughly, it converts a user defined function in Python to machine code (using the llvm compiler) and calls this machine code directly whenever the function is called. Moreover, it supports NumPy and allows access to actual multi-threading (it even gives you access to your GPU if you really need it).

Here's a simple example of using Numba just to show how easy it is to use. To indicate to Numba that you wish to compile your function, you simply add the jit decorator before your function.

@numba.jit(nopython=True)
def pop_gauss(x, y, mu = 0., sigma = 1.):
    l = x.size

    for i in range(l):
        y[i] = 1 / np.sqrt(2 * np.pi * sigma ** 2) * np.exp(-1 * (x[i] - mu)**2 / (2 * sigma**2))

Note: The first time the function is called, Numba compiles it in the background and saves the machine code in memory. For this reason, the first execution may be a little slow; however, subsequent uses of the function will be faster than the pure Python equivalent.

Creating a function which executes in parallel is just as easy

@numba.jit(nopython=True, parallel=True)
def ext2d_gauss(x, y, mu = 0., sigma = 1.):
    x = np.atleast_2d(x)
    y = np.atleast_2d(y)

    l = x.shape[0]

    for i in numba.prange(l):
        pop_gauss(x[i], y[i], mu, sigma)

Note: When writing parallel code, be careful not to create race conditions!

You can then call these functions as you normally would.

x = np.arange(-10,10,0.01)
y = np.empty_like(x)

pop_gauss(x,y)

and

x2 = np.tile(x,100).reshape(100,-1)
y2 = np.empty_like(x2)

ext2d_gauss(x2,y2)

Be sure to check out the Numba page which has lots of useful tutorials and information.

DEV Community

Unlocking Multi-Threading in Python with Numba

Top comments (0)