DEV Community: hrenski

Compiling Your Python Code with Cython

hrenski — Fri, 15 Nov 2019 15:54:50 +0000

Python is a great language to test and prototype with... partly because it is interpreted, so the code is readily referenced and changed. But there may be times when you'd like to compartmentalize your Python code (for example by compiling it); this is easily done using Cython. I'll outline the steps below (make sure you have Cython installed!). If you'd like to find out more, check out the Cython documentation page.

Take all of your your Python code that you'd like to compile and add it to a utilities file. Here, as an example, we'll make a utilities.py file with a toy function.
```
import numpy as np

def everyother(a):
    return np.ascontiguousarray(a[::2])
```

Next, create a setup.py file:

from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules = cythonize("utilities.py"))

Finally, compile it by typing python setup.py build_ext --inplace.

On my computer, these steps produce an ".so" (shared object) file utilities.cpython-37m-x86_64-linux-gnu.so that you can import in a Python script or interactive instance.

```
In [1]: import numpy as np                                                                                                                                                                                                                                                     

In [2]: import utilities as ut                                                                                                                                                                                                                                                 

In [3]: a = np.arange(12)                                                                                                                                                                                                                                                      

In [4]: a                                                                                                                                                                                                                                                                      
Out[4]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [5]: ut.everyother(a)                                                                                                                                                                                                                                                       
Out[5]: array([ 0,  2,  4,  6,  8, 10])
```

That's all there is to it!

Unlocking Multi-Threading in Python with Numba

hrenski — Tue, 05 Nov 2019 15:27:16 +0000

While Python is a great language to know and use and is very popular for machine learning and data science. However, one of its drawbacks is the lack of support for parallel execution of code (especially in the realm of scientific computing). While it does have a multithreading module, the threads do not execute in parallel because of the Global Interpreter Lock (GIL).

One workaround is to use the multiprocessing module; this module circumvents the GIL issue by creating separate python processes which then can execute in parallel. However, depending on your use case, this may not offer much benefit as interprocess communication can be more costly compared with a true multi-threaded process.

One option is Cython which is a language that meets Python and C/C++ halfway. It can allow you to implement your performance sensitive aspects in C++ (or C) and then wrap them into a Python module for ease of use. This approach is great once you have settled on and validated an idea and are ready to create a production ready version. However, for quick prototyping, this process can get a little clunky and sort of defeats the purpose of using a language like Python in the first place. Cython does support use of OpenMP (a shared memory multiprocessing API) to access certain components of multi-threading directly, but I've found it to be a little awkward when using it within a Jupyter notebook.

It is for these reasons that I've settled on Numba when I need to improve speed while working on an idea. Roughly, it converts a user defined function in Python to machine code (using the llvm compiler) and calls this machine code directly whenever the function is called. Moreover, it supports NumPy and allows access to actual multi-threading (it even gives you access to your GPU if you really need it).

Here's a simple example of using Numba just to show how easy it is to use. To indicate to Numba that you wish to compile your function, you simply add the jit decorator before your function.

@numba.jit(nopython=True)
def pop_gauss(x, y, mu = 0., sigma = 1.):
    l = x.size

    for i in range(l):
        y[i] = 1 / np.sqrt(2 * np.pi * sigma ** 2) * np.exp(-1 * (x[i] - mu)**2 / (2 * sigma**2))

Note: The first time the function is called, Numba compiles it in the background and saves the machine code in memory. For this reason, the first execution may be a little slow; however, subsequent uses of the function will be faster than the pure Python equivalent.

Creating a function which executes in parallel is just as easy

@numba.jit(nopython=True, parallel=True)
def ext2d_gauss(x, y, mu = 0., sigma = 1.):
    x = np.atleast_2d(x)
    y = np.atleast_2d(y)

    l = x.shape[0]

    for i in numba.prange(l):
        pop_gauss(x[i], y[i], mu, sigma)

Note: When writing parallel code, be careful not to create race conditions!

You can then call these functions as you normally would.

x = np.arange(-10,10,0.01)
y = np.empty_like(x)

pop_gauss(x,y)

and

x2 = np.tile(x,100).reshape(100,-1)
y2 = np.empty_like(x2)

ext2d_gauss(x2,y2)

Be sure to check out the Numba page which has lots of useful tutorials and information.

Check Your Assumptions - What's Going In To Your Model

hrenski — Fri, 18 Oct 2019 14:18:39 +0000

During my data science course, my instructor has stressed many times (and it has been reiterated in multiple blogs and videos):

"Good features make good models."

and

"Always be skeptical."

A recent experience has driven these mantras home for me, and I thought I would share them.

Recently, I have been looking at different methods of feature extraction, and since we have been discussing neural nets in my course, I naturally began looking at autoencoders (AE). Since convolutional neural nets (CNNs) also interesting to me, I decided to put these together and try to setup a convolutional autoencoder (CAE). While there are many good blogs and guides on CAEs, they usually give you the network architecture straight away. In order to give myself some experience with the details, I decided to start with a bare bones CAE then tune the network parameters and architecture by hand. (I won't get into the actual network that I started with here as I'm planning to write later about my experience and observations while adjusting the various components.)

I figured using images with faces was a good start, so I searched for and found the Labeled Faces in the Wild (LFW) dataset. There are several versions of the images:

Un-edited images
aligned images via funneling
aligned images via deep funneling
aligned images using commercial software

I figured using aligned images would be easier to use, so I downloaded the deep funneled version and got to work.

After combining all the images into a single dataset, I ran through several epochs on my network and viewed the output to see how well the encoder was reconstructing the images. At this point, the network was still simple, so I wasn't expecting high quality. Here's an example of what I saw.

As you can see, each reconstructed image, regardless of the original face, has a face that is eerily similar.

At first, I suspected that this was due to the simplicity of my network; it was outputting an aggregate face based on all of the inputs. However, keeping in mind my instructors slogan, I went back and downloaded the un-edited version of the images (without the deep funneling).

After running the images without alignment through the CAE, I didn't see the same eerie face in each reconstructed image, but something truer (albeit very low resolution) to the input.

Even my simple CAE (without much structure) had picked up on the impact of the deep funneling; it had, in a sense, started learning the weights that the funneling neural net had used to align the faces. By removing the existing bias, I was able to start focusing on training a network to generate a faithful reconstruction and not one affected by previous edits.

This was a informative experience for me as it drove home the idea, that it is good to question where your data came from and what process has been applied to it.

Why I'm Learning Data Science

hrenski — Fri, 06 Sep 2019 11:44:35 +0000

So I've started learning my journey into learning data science and machine learning... and the way I've chosen to start down this path is by attending a 15 week bootcamp at the Flatiron school. While I'm not much of a writer (brevity above all!), one component of the course is to express the learning process through a sequence of blogs. And here's my attempt.

Why data science?

My background is in math, and I spent a great deal of time (up to a doctorate) studying it. When people ask me what it means to have a Ph.D, I like to joke that it means that I went to school for too long. But really, once I started studying it, the math bug bite me... it couldn't stop. Studying math can be bittersweet, you struggle for long periods wrapping your head around things followed by a brief "aha!" moment... until you sink back into confusion. But I loved the challenge it presented and I dug into it as deeply as I could... until the day came that I needed to graduate and go out into the world. Then I quickly realized that finding a job in math was going to very difficult. This might sound the exact opposite to everything anyone has ever said about math and getting a technical job, but you'd have to understand, I studied pure math... As pure as I could find. The closest I got to any kind of application was how one theory could be used in another.

So my job search began until I finally found a job... doing something I knew nothing about. When I started, I pursued it, worked at it, and tried to master as much of it as I could. While it was still technical in nature, the further I got into my new career, the more distant my math felt and the less passionate I felt for what I now found myself doing. I sank into a daily cycle of working... and found myself missing my passion. Honestly, this was tough for me to process, but I soldiered through it trying to be as pragmatic as possible. And I discovered a silver lining, as a part of my job, I got to deal with massive amounts of data which I would have to examine, process, clean, and manipulate in order to produce desired results.

The more I worked with the data, the more I enjoyed it. It revived an old interest I had in coding and allowed me to use my math background. So I started pursuing this growing interest with the same excitement that I had for my studies in school. I started learning python, c++, and even gained experience in parallel coding and computing (which seemed like black magic to me) all as means to more efficiently work with the data that I had. And for several years, this was enough... but all the while, I wondered if I couldn't be doing this more. Finally hit a cap of sorts within my job, and I decided to make the plunge completely into a field that I could fully pursue my new passion... working with data. One of Flatiron's mantras sums it up best for me, "Always do what you love."

Ok... but why a bootcamp?

During all of my self-study into data science, one thought continually nagged me... How can I find a way to do this full time? Demonstrating to potential employers, that I have the necessary skills isn't always easy... especially without any credentials or documented background. Additionally, how can I most efficiently fill in knowledge gaps and build my experience into a fully-fledged set of skills? I had heard of various data science bootcamps and decided to look into them. This is how I found the Flatiron school; a 15 week program that mixes lectures with real-world problems to build and cement all the skills I need to make this leap into a new career. As necessary as it was, leaving a field that you've established yourself in and starting fresh isn't easy. Having the support and guidance of a program like this helps ease the anxiety and let's me focus on studying, learning, and preparing for the future.

So here I am, I've begun the program and am wrapping up the first week. The pace is blistering...

But I'm excited to continue learning and absorbing as much as I can, and I hope that it pays off in a job in which I can do what I love.