Discovering OpenCV using Python: Convolution

#beginners #python #opencv #imageprocessing

It has been three months since I started my Python development journey and under the watchful eye of my snakey Jedi master (read boyfriend working as developer) I have started to explore the mighty OpenCV library. I first heard about computer vision in connection with self-driving cars and how they identify objects; however, computer vision is much more powerful than just allowing computers to "see" the real world.

The goal of this lesson was to grasp the principle of convolution that acts as a building stone of most image processing functions.

As master Kenobi said, "the Force is what gives a Jedi his/her power" and naturally, as I am a smart padawan, I immediately grasped the meaning of these wise words in these technological times: use Python.

Let's first learn the basics of OpenCV

My first interactions with OpenCV were quite harmonious, I explored some basic image manipulation functions on image processing using useful resources on the internet and of course the documentation itself (read an image, draw a line, change color, blend two images...). After that, when I was deemed ready, it was time to get serious.

What is convolution?

Convolution is a mathematical way of combining two signals to form a third signal (Digital Signal Processing). To really understand this I-still-don't-get-it definition, I manually went through the whole process by implementing a simple 3x3 matrix.

To put it in simple words, imagine a picture, which consists of many pixels. For simplicity, let's say the image is in gray-scale. The process starts with taking a pixel (which really is just a value between 0 till 255, 0 being black and 255 being white) and considering that as "center". Now, take the convolution matrix (also called kernel), align it to the center pixel and identify the center's local neighbors up to the size of the convolution matrix. Multiply this new identified matrix with the convolution matrix (both are of same size) and add up their products (for excel geeks, SUMPRODUCT). Save the sumproduct as a value of the transformed pixel in the same location as center pixel but in a new image (not the original, as that would disrupt the values of the neighboring pixels). Doing so for every pixel of the original image is convolution (see image below).

image source

Let's get back to image processing

If you have ever edited an image to increase blur or to sharpen, then you have experienced the practical use of convolution in image processing. The transformation depends on the values and shape of the convolution matrix and thanks to the smart people who are willing to share, there are many useful sources that already provide the required matrices for various transformations. Here are the results from my application of some matrices:

All work is of course available and documented on GitHub, so check it out!

It is important to mention that convolution is not used only in image processing, but it is a powerful method that is applied in various fields (mathematics, digital signal processing, audio processing, machine learning, ...). Though the explanation provided above is closely related to image processing, the principle behind it is same for every application.

In conclusion

For those that are also at the beginning of the journey, I wholeheartedly recommend getting your hands dirty by playing around with these matrices and explore OpenCV. Fair word of warning, it is quite math intense so brush up on your derivations and matrice operations. May the Python be with you.