DEV Community

Mohammed Mushahid Qureshi
Mohammed Mushahid Qureshi

Posted on • Edited on

Identifying Colours in Images using Python and OpenCV

Hi everyone,

I'll try to explain in this post how we can used an unsupervised ML model to find the most used colours in an image.

KMeans Clustering

KMeans clustering is a clustering algorithm which means it is used to divide the given data into subgroups based such that data in one subgroup or cluster is different from the data in another subgroup or cluster. Clustering is one of the methods used in unsupervised machine learning. This means that the performance of the algorithm is not evaluated by comparing its output to true labels of the data, instead the goal is to investigate the structure of the data by grouping it into clusters or subgroups.

One of the applications of KMeans clustering is that it can be used to group together colours in an image to find the most used colours in a given image.

Building a model in a Jupyter Notebook or Google Colaboratory

Imports

We start by importing the libraries we are going to use. We'll be using matplotlib.pyplot to generate the pie chart, Opencv to read the image, KMeans algorithm from the sklearn.cluster package, rgb2lab to covert image colours to lab and deltaE_cie76 to compare them. We will also use the os module to combine paths when reading files and the Counter from collections library to extract the count.
The imports should look like this:

import os
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
import cv2
from collections import Counter
from skimage.color import rgb2lab, deltaE_cie76
Enter fullscreen mode Exit fullscreen mode

Since we are working in Jupyter notebook we also need to add %matplotlib inline to tell matplotlib to display the plots inside the notebook.

Using OpenCV to read the image

Images can be read using the method cv2.read() by passing the complete path as an argument. We can use pyplot to then plot this image.

image = cv2.read(os.path.join('path_to_image', 'image.jpg')
plt.imshow(image)
Enter fullscreen mode Exit fullscreen mode

At this point you'll notice that there is something wrong with colour of the image that is plotted. This is because OpenCV reads the image in Blue Green Red colours and to view the actual image we need to convert to Red Green Blue using the method cv2.CvtColor).

image = cv2.CvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
Enter fullscreen mode Exit fullscreen mode

Colour Identification

We first need a function to convert RGB values to hex colour codes so we can use these as labels in our pie chart. We are using string formatting for this which
displays our integers as hexadecimal numbers. We could also use the method binascii.hexlify().

def RGB2HEX(color):
    return "#{:02x}{:02x}{:02x}".format(int(color[0]), int(color[1]), int(color[2]))
Enter fullscreen mode Exit fullscreen mode

Getting colours from an image

First we'll reduce the image size to reduce the execution time of our program. We also need to convert the shape of the array containing the image to something that we could pass to our cluster so convert the array to 2 dimensions.

mod_img = cv2.resize(img, (600, 400), interpolation = cv2.INTER_AREA)
mod_img = mod_img.reshape(mod_img.shape[0]*mod_img.shape[1], 3)
Enter fullscreen mode Exit fullscreen mode

Now we implement KMeans

number_of_colours = 8
clf = KMeans(n_clusters = number_of_colours)
labels = clf.fit_predict(modified_image)
Enter fullscreen mode Exit fullscreen mode

KMeans algorithm creates clusters based on the supplied count of clusters which in our case will be the top colours. We use fit and predict on the same image and extract the prediction into the variable labels.

Counting the colours and plotting the Pie chart

We use Counter to get the count of all labels i.e. how many times each value is present in labels. To find the colours, we use clf.cluster_centers_ where all the centroids of all clusters are stored. We iterate over through the keys in counts and get ordered_colours which is way of knowing which data belongs to which cluster(here we use that to group similar colours to provide better result) and now we have the values for how many times a colour is present in the image. Finally we convert the values to hex codes and store them in hex_colours.

We plot a pie chart with the values from the counts and the labels from hex_colours. We also get the colours for the pie chart from hex_colors

counts = Counter(labels)
center_colours = clf.cluster_centers_
ordered_colours =  [center_colors[i] for i in counts.keys()]
hex_colors = [RGB2HEX(ordered_colors[i]) for i in counts.keys()]
plt.figure(figsize = (8, 6))
plt.pie(counts.values(), labels = hex_colors, colors = hex_colors)
Enter fullscreen mode Exit fullscreen mode

Bringing everything together into a function

This function accepts three arguments: path to the image, no of colours to be identifies and a Boolean show_chart to display the pie chart.

def get_colours(img_path, no_of_colours, show_chart):
    img = get_img(img_path)
    mod_img = cv2.resize(img, (600, 400), interpolation = cv2.INTER_AREA)
    mod_img = mod_img.reshape(mod_img.shape[0]*mod_img.shape[1], 3)

    #Define the clusters
    clf = KMeans(n_clusters = no_of_colours)
    labels = clf.fit_predict(mod_img)

    counts = Counter(labels)
    counts = dict(sorted(counts.items()))

    center_colours = clf.cluster_centers_
    ordered_colours = [center_colours[i] for i in counts.keys()]
    hex_colours = [RGB2HEX(ordered_colours[i]) for i in counts.keys()]
    rgb_colours = [ordered_colours[i] for i in counts.keys()]

    if (show_chart):
        plt.figure(figsize = (8, 6))
        plt.pie(counts.values(), labels = hex_colours, colors = hex_colours)
        return
    else:
        return rgb_colours
Enter fullscreen mode Exit fullscreen mode

You can find the google colab notebook here: https://github.com/mushahidq/py_colour_identifier/blob/main/colour_identifier.ipynb

I'll soon be making another post on how to turn this into a web app and deploy to heroku.

This is my first post so some feedback would be much appreciated.

Top comments (0)