## DEV Community

Luciano Strika

Posted on • Originally published at datastuff.tech on

# K Means Clustering with Dask (Image Filters for Pictures of Kittens)

Applying filters to images is not a new concept to anyone. We take a picture, make a few changes to it, and now it looks cooler. But where does Artificial Intelligence come in? Let’s try out a fun use for Unsupervised Machine Learning with K Means Clustering.

I’ve written before about K Means Clustering , so I will assume you’re familiar with the algorithm this time. If you’re not, this is the in-depth introduction I wrote.

And I also tried my hand at image compression (well, reconstruction) with autoencoders, to varying degrees of success.

However this time, my goal is not to reconstruct the best possible image, but just to see the effects of recreating a picture with the least possible colors.

Instead of making the picture look as similar to the original as possible, I just want us to look at it and say “neat!”.

## How to do image filters with K Means Clustering

First of all, it’s always good to remember an image is just a vector of pixels. Each pixel is a tuple of three integer values between 0 and 255 (an unsigned byte), which represent that pixel’s color’s RGB values.

We want to use K Means clustering to find the k colors that best characterize an image. That just means we could treat each pixel as a single data point (in 3-dimensional space), and cluster them.

So first, we’ll want to turn an image into a vector of pixels in Python. Here’s how we do it.

As an aside, I don’t think the vector_of_pixels function needs to use a Python list. I’m sure there has to be some way to flatten a numpy array , I just couldn’t find any (at least not one that did it in the order I wanted).

If you can think of any way, let me know in the comments!

The next step is fitting the model to the image, so that it clusters the pixels into k colors. Then, it’s just a matter of assigning the corresponding cluster color to each position in the image.

For instance, maybe our pic has only three colors: two reddish ones and a greenish one. If we fit that to 2 clusters, all the reddish pixels would turn some different shade of red (getting clustered together), and the other ones would turn into some greenish one.

But enough with the explanations, let’s see the program in action!

As usual, you are free to run it yourself with any pic you want, here’s the GitHub repository with the code.

## The results

We will apply the filter to pictures of kittens, taken from the awesome “Cats vs Dogs” kaggle dataset.

We’ll start with a picture of a cat, and apply the filter with different values for k. Here’s the original picture:

First, let’s check how many colors this picture originally had.

With just one line of numpy, we count the unique values a pixel takes on this picture. This image in particular has 243 different colors , even though it has a total of 166167 pixels.

Now, let’s see the result of clustering it to 2, 5 and 10 different colors only.

Did you notice a trend? Each color we add has diminishing returns. The difference between having 2 colors and having 5, is a lot more than the difference between 5 and 10. However with 10 colors, the flat areas are smaller, and we have more granularity. Moving on to 15 and 24 colors!

Moving on to a different picture: Here’s the original (256 different colors) and here’s a compressed one (24 colors again).

As an interesting note, the “compressed” image weighs 18KB and the uncompressed one 16KB. I don’t really know why this is, since compressors are pretty complicated beasts, but would love to read your theories in the comments.

## Conclusions

We were able to make new images with only 10% of the original’s colors, which looked very similar to them. We also got some cool looking filters thanks to K means clustering. Can you think of any other fun application for clustering? Do you think other clustering techniques could have yielded more interesting results?

If you want to answer any of these questions, feel free to contact me on Twitter, Medium or Dev.to.

Are you interested in starting a career in Data Science? Do you want to be an awesome Machine Learning professional? Check out my recommended reading list: “3 Machine Learning Books that will Help You Level Up as a Data Scientist”. One of them actually taught me what I know about K Means Clustering.

The post K Means Clustering with Dask (Editing Pictures of Kittens) appeared first on Data Stuff.

mt3o

Are you doing the thing that 8-bit GIF and PNG images do? There are some algorithms to do that. There are some tricks involving pixel moving allowing you to have better pictures with less colors.

Luciano Strika

What do GIFs and PNGs do? I'm just clustering the colors and then replacing each pixel with its corresponding cluster's color.

mt3o

That's one of the ways to reduce image complexity (number of colors) when creating 8-bit gif and png images.
You seem young (that's not an insult, good forbid!) and might not know what I'm taking about. Grab an old version of adobe photoshop, the older the better. They have (or rather: had) this cool dialog "save for web" where you could apply different compression settings and compare images, for example which is better, 50% jpg or 24 color gif, and play with those settings. There were 3 ways of reducing the number of colors, one of them reminds me of what you did here.

Luciano Strika

I know that! I didn't know about the photoshop bit (never been the artist type), but I remember there was a format (I think it was GIF?) where you actually defined a small set of colors and you'd only use those, so the image was lighter.
That's why I was surprised when the reduced images were heavier than the original ones! I was expecting JPG to somehow profit from the reduction.