DEV Community

Cover image for Starting out with images

Starting out with images

A while ago, I started out my journey into computer science with Harvard University's CS50 program. In the week 0 (Computational thinking; Scratch programming language), I came across a fascinating idea : Images can basically be represented by integers & I was hooked into it.

NOTE : Concepts covered in scratch programming language such as variables, functions, loops, conditionals & more, are sufficient to get started with playing around with images, but I had to figure out many things on my own. Moving ahead, I chose Python & the OpenCV library for this activity, because transition from Scratch programming language to Python, is very smooth.

Notes on representation of images

Notes on representation of image

What are images, exactly?

Images are basically a collection of pixels (short for Picture elements), where each pixel represents only a single color. Moreover, hundreds of thousands of pixels combine together to form an image (e.g. An image with a resolution of 480*640 has 307,200 pixels). In python, a pixel can be represented by a data structure called as a list. For representing a pixel, we need to have a list storing three values, Red, Green & Blue or [R,G,B], because combination of colors, Red, Green & Blue can represent any color & the value of Red, Green & Blue would vary between 0 & 255.

So, what is a list?

List is a data structure in Python, which is just like array in other programming languages such as C, C++, .. etc, with the major difference being that an array can only store variables and/or constants of same data type, while list can store constants and/or variables of multiple data types.


int arrayName[] = {1,2,3,4}        //Array in C/C++
int arrayName[] = {1,2,"asxyzp"}   //WRONG - Array in C/C++ can't store variables/constants of non-homogeneous together
listName = [1,2,"asxyp"]           #Possible in python

Enter fullscreen mode Exit fullscreen mode

Here's a good resource on Lists in python from GeeksForGeeks.

What is OpenCV?

OpenCV (CV in OpenCV refers to "Computer Vision") is a library of functions which allows the computers to process & gain a high-level understanding of digital videos or images in real time.

Challenge #1

Now, the first big challenge for me was to capture an image using my laptop's web cam & storing the image.This can be achieved very easily using OpenCV's VideoCapture class :


import cv2    #Imports OpenCV library in the program
PhotoObj = cv2.VideoCapture(0)

Enter fullscreen mode Exit fullscreen mode

In the above code, PhotoObj is an object, which stores the captured image. To convert the object into a numpy array (which is similar to a list), we have to use the function read() which will generate a tuple (which is also like a list, but it's elements are unchangeable), which will contain two values, the first value being a boolean value, which will be true if the image has been captured correctly & false, if not. The second value will be a numpy array (which is also like a list) which would store [R,G,B] values of pixels. The parameter, 0, in VideoCapture() tells the computer to take the image from device's primary camera, else we can also use the parameter, 1, which tells the computer to take the image from device's secondary camera.


PhotoTuple = PhotoFrame.read()
if PhotoTuple[0]:
   PhotoFrame = PhotoTuple[1]
else:
   print("The image has not been captured correctly")

Enter fullscreen mode Exit fullscreen mode

Now, there's only one step left, which is to write the image & this can be achieved by OpenCV's imwrite() function, which accepts two parameters, the first being the name of the file with the file type (e.g. asxyzp.jpeg) & the second being the numpy array which stores the image.


PhotoFrame = PhotoTuple[1]
cv2.imwrite("asxyzp.jpeg",PhotoFrame)

Enter fullscreen mode Exit fullscreen mode

And we are done with our first challenge. Here's the output of our first program : A not-very-happy me.
Output of the above program

Output of the program

Challenge #2

After completing the first challenge, I had gained some basic ideas about what images & pixels are & how to capture an image using OpenCV library in Python programming language. Next, I got an idea about a simple experiment with generating a pseudo-random image, or an image where Pixel's Red, Green & Blue values are pseudo-randomly selected from numbers between 0 & 255.

So, how can we select a integer between 0 & 255 pseudo-randomly in Python? This can be achieved by Python's "random" module which has a function called randint(start,end).


import random
def GenerateRandomPixel():
    Red = random.randint(0,255)
    Green = random.randint(0,255)
    Blue = random.randint(0,255)

Enter fullscreen mode Exit fullscreen mode

Once, this is done we have to generate a row which will store N pixels (for an image with M*N resolution) & then we have to generate M such rows of pixels (for an image with M*N resolution). The output of the program would be a list, which would like this :


[[[R11,G11,B11],...,[R1N,G1N,B1N]],...,[[RM1,GM1,BM1],...,[RMN,GMN,BMN]]]

Enter fullscreen mode Exit fullscreen mode

As we have the list output (as above), we have to convert this list into a numpy array, which as I had said earlier, is quite similar to the list data structure of python, but this conversion is still required because imwrite() function of the OpenCV library can only accept numpy array for the image & list to numpy array conversion, can be done by :


import numpy
numpy.asarray(listName)        #Converts list to numpy array

Enter fullscreen mode Exit fullscreen mode

Once we are done with this, we simply have to use the cv2.imwrite() function. The output of the program would be :
Pseudo-random Image

A pseudo-random image

Additionaly, I did a another experiment, but instead of integers between 0 & 255, I chose prime integers between 0 & 255, but it's output was quite similar to above.

Challenge 3

After I was done with pseudo-random images, I was curious whether the pseudo-random image is really random or not, so I decided to build a frequency plotter for pixels. If the image is random, then it would generate a straight line. The program was relatively simple to build, but it took a bit longer to program. To make this happen, I had to use the matplotlib library.

To plot the points in a graph using matplotlib, we need two lists, countArr, which would basically store integers from 1 to total number of "unique pixels" in an image + 1 & the other variable would be PixFrq, which would basically store frequency of each unique pixel corresponding to the integer of countArr. Here, countArr contains the value of x-axis & PixFrq contains the value of y-axis. Once, we have both the list, we need to do the following to plot the graph :

import matplotlib.pyplot as plt
plt.plot(countArr,PixFrq,'k.') #k signifies color * . signifies point
plt.xlabel("Pixel values") #Shows label for x-axis
plt.ylabel("Frequency") #Shows label for y-axis
plt.show() #Plots the graph


Output of the program for the pseudo-random image:
![Frequency of pseudo-random image](https://asxyzpcode.files.wordpress.com/2020/02/freqimgrandom.png)
So, the output turned out to be a straight line.

Furtherwards, I also plotted the output of the program for the first output, asxyzp.jpeg:
![Frequency of asxyzp.jpeg](https://asxyzpcode.files.wordpress.com/2020/02/freqasxyzp.png)
The above plot shows huge redundancy of pixels in the image.

##**Challenge 4**

The next idea, which came to my mind was to do **matrix operations** on values of pixels of the image, simply for fun. But the problem in doing matrix operations on pixels is that matrix operations can be done on integers or floats, but not a list, so we needed to do composition of a list into an integer, such that the integer could later be decomposed into a list. Initially, I was confused, so I asked for help on [mathematics stack exchange](https://math.stackexchange.com/questions/3502539/composition-of-multiple-integers-into-one-integer-vice-versa) & later realized what I was looking for was left-shift & right-shift operator & this is how it could be done:

- Shifting N bits to the left is equivalent to multiplying with 2^(N)
- Shifting N bits to the right is equivalent to //ing with 2^(N)

For composition:

Enter fullscreen mode Exit fullscreen mode

Pixel to integer conversion
Int = Pix[0]<<16 + Pix[1]<<8 + Pix[2]


For decomposition:

Enter fullscreen mode Exit fullscreen mode

Integer to pixel conversion
R = Int>>16
Int = Int - R<<16
G = Int>>8
B = Int - G<<8



This way, we can convert a list/numpy array of pixels into a list representing a matrix. Though I didn't move ahead with my experiments with matrix operations on images, but the above experimentation gave me fundamental ideas about how to proceed with matrix operations.

##**Challenge 5**

Now, I moved onto something very common & less "experimental" : **Color image to greyscale image conversion**. Before reading about greyscale images, I used to (mistakenly) think that black-and-white images are basically greyscale images, but I was wrong. Greyscale images are images where each pixel of the image represents the pixel intensity, which varies from 0 (Black) to 255 (White). Also while searching about Greyscale images, I learnt about the fact that the human eye perceives different color intensities differently & this led to creation of luma coding :

Enter fullscreen mode Exit fullscreen mode

Grey = 0.3*Red + 0.59*Green + 0.11*Blue
Red = Grey
Green = Grey
Blue = Grey


But, instead of Luma coding, we can also use the ITU-R Recommendation BT.709, for more accurate representation of perceived images :

Enter fullscreen mode Exit fullscreen mode

Grey = 0.2126*Red + 0.7152*Green + 0.0722*Blue
Red = Grey
Green = Grey
Blue = Grey




Output when **asxyzp.jpeg** is the input:
![Greyscale output of asxyzp.jpeg](https://asxyzpcode.files.wordpress.com/2020/02/grey_asxyzp.jpeg)

##**Challenge 6**

The next thing which I wanted to do was **greyscale to binary/black-and-white image conversion**. I didn't knew how to do this, so I tried to two different algorithms for doing the same:

**Algorithm 1** :
![Algorithm 1](https://i.ibb.co/KwNJQvB/Algo1.png)

**Algorithm 2** :
Quite similar to algorithm 1, but with the only difference that the value 127 is replaced by the average pixel intensity of all pixels.

<center>**Output of asxyzp.jpeg for algorithm 1:**</center>
![B/W image](https://asxyzpcode.files.wordpress.com/2020/02/bin_grey_asxyzp-1.jpeg)<br><br>

<center>**Test image for algorithm 2:**</center>
![Test image](https://asxyzpcode.files.wordpress.com/2020/02/test.png)

<br><br><center>**Output for test image for algorithm 2:**</center>
![Output for test image](https://asxyzpcode.files.wordpress.com/2020/02/bin_grey_test.png)

Dissatisfied from both the outputs, I started searching for a proper solution & <a href="https://cs.stackexchange.com/a/120322/99035">realized</a>, what I was trying to do is called as thresholding, a method to select a threshold, below which the pixel value would be [0,0,0] & above which the pixel value would be [255,255,255].

Thank you soo much for reading this article. You can find the code & it's output, <a href="https://github.com/asxyzp/ExperimentsInCS/tree/master/Image">here</a>. Also, you can follow me on <a href="https://twitter.com/aashishium">twitter</a>.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)