DEV Community: haratena9

Image Processing #3 Statistics

haratena9 — Mon, 14 Feb 2022 13:31:17 +0000

About this memorandum

Recently, I had a chance to experience deep learning and image/video processing at work, but there were many things I didn't understand about how to touch parameters and amplify data, so I decided to study video processing from scratch.

Statistics

Image (pixel value) statistics: used for various image analysis
- Mean value
- Median
- Frequent value
- Variance
- Contrast
Some can be calculated from histograms

The right image shows a white whiteboard with a black metamon on it.
max:1 (white), min:0.0 (black), and we can see that the contrast is 1.0.
There are many ways to calculate contrast.(see code below)

Statistics calculation code

im_files =[file_path_A, file_path_B, file_path_C]

for file in im_files:
    im = imread(file)[:,:,:3]  # For RGBA, extract only RGB
    im = rgb2gray(im)
    imshow(im)
    plt.show()

    print('mean: ', im.mean())
    print('std: ', im.std())
    print('median: ', np.median(im))
    print('max: ', im.max())
    print('min: ', im.min())
    print('contrast1: ', (im.max() - im.min()) / (im.max() + im.min()) ) # Michelson contrast
    print('contrast2: ', im.max() / im.min() if im.min() > 0 else np.nan ) # contrast ratio
    print('contrast3: ', im.max() - im.min() ) # contrast difference
    print()

Calculate the mean and variance

Unsuggested code

Keep the definition formula
Large computational complexity: 2 double loops

im = rgb2gray(imread('file_path'))
h, w = im.shape

mean = 0
for y in range(h):
    for x in range(w):
        mean += im[y, x]
mean /= h * w
print('mean: ', mean)

var = 0
for y in range(h):
    for x in range(w):
        var += (im[y, x] - mean)**2
var /= h * w
print('variance: ', var)
print('std: ', np.sqrt(var))

Suggested code

Transformed formula
Half the computation: one loop.

im = rgb2gray(imread('file_path'))
h, w = im.shape

mean = 0
var = 0
for y in range(h):
    for x in range(w):
        mean += im[y, x]
        var  += im[y, x]**2

mean /= h * w
print('mean: ', mean)

var /= h * w
var -= mean**2
print('variance: ', var)
print('std: ', np.sqrt(var))

Image Processing #2 RGB, Histogram

haratena9 — Sat, 12 Feb 2022 06:18:15 +0000

About this memorandum

Color images and grayscale images

Grayscale image: pixel value represents brightness
RGB color image: the pixel value represents the brightness of each RGB (for clarity, see the image below)
- 1 pixel: 8bit×3=24bit(3Byte)

However, in reality, the correct understanding is that it consists of three R, G and B grayscale images.
Since the car is originally red, the pixel value in the R channel on the left is close to 255, and it appears to be white in the grayscale image. On the other hand, the G and B channels have smaller pixel values and appear to be black.

Code

im = imread([file_path])

imshow(im)
plt.title("original RGB image")
plt.show()

r_channel = im[:, :, 0]
g_channel = im[:, :, 1]
b_channel = im[:, :, 2]

fig = plt.figure(figsize=(15,3))

for i, c in zip(range(3), 'RGB'):
    ax = fig.add_subplot(1, 3, i + 1)
    imshow(im[:, :, i], vmin=0, vmax=255)
    plt.colorbar()
    plt.title(f'{c} channel')

plt.show()

RGB&BGR

The only difference is the interpretation of the data, but be careful.

RGB

Used in many textbook descriptions.
Many image processing libraries also use this format.
- skimage, matplotlib for python

BGR

Also often used
- opencv (python, C/C++)
- COLORREF on Windows (0x00bbggrr in hexadecimal)
- Hardware

A common mistake is that when images loaded with opencv (BGR) are displayed with scikit-image (RGB), the red and blue are reversed.

Code (example of mistake)

im_BGR = cv2.imread(INPUT_DIR + 'IMG-4034.JPG') # OpenCV
imshow(im_BGR) # matplotlibのimshowはRGBを仮定
plt.title('show BGR image as RGB image')
plt.axis('off')
plt.show()

There are several ways to fix this, including the following
Code:RGB and GBR conversion

### use the built-in functions
im_BGR_to_RGB = cv2.cvtColor(im_BGR, cv2.COLOR_BGR2RGB)
imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()

### not use the built-in functions1
im_BGR_to_RGB = im_BGR[:, :, ::-1]
imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()

### not use the built-in functions1(explanation process of 1 above.)
im_BGR_to_RGB = np.zeros_like(im_BGR)

im_BGR_to_RGB[:, :, 0] = im_BGR[:, :, 2]
im_BGR_to_RGB[:, :, 1] = im_BGR[:, :, 1]
im_BGR_to_RGB[:, :, 2] = im_BGR[:, :, 0]

imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()

How to create a grayscale image.

There is no wrong or right way to do this, just different standards.
However, all methods look almost the same, but the values are different, so it is necessary to recognize when working together.

use the built-in functions

Divide the sum of the RGB values by three

Standard: PAL/NTSC

Standard: HDTV (same as built-in functions)

Code

im = imread(INPUT_DIR + 'IMG-4034.JPG')

imshow(im)
plt.title("original RGB image")
plt.show()

# Using the built-in rgb2gray function;gray = 0.2125 R + 0.7154 G + 0.0721 B
im_gray1 = rgb2gray(im)
imshow(im_gray1, vmin=0, vmax=1) # 型はfloat，範囲は[0,1]になる
plt.colorbar()
plt.title("rgb2gray min {0} max {1}".format(im_gray1.min(), im_gray1.max() ))
plt.show()

# The average of RGB is used as a grayscale image. First convert to float (the range will be [0,255]), then convert to uint8 for display.
im_gray2 = (im[:,:,0].astype(float) +
            im[:,:,1].astype(float) + 
            im[:,:,2].astype(float)) / 3
imshow(im_gray2, vmin=0, vmax=255)
plt.colorbar()
plt.title("(R+B+G)/3 min {0:.2f} max {1:.2f}".format(im_gray2.min(), im_gray2.max() ))
plt.show()


# The weighted average of RGB is used as the grayscale image.
# https://en.wikipedia.org/wiki/Grayscale#Luma_coding_in_video_systems
im_gray3 = (0.299 * im[:,:,0].astype(float) +
            0.587 * im[:,:,1].astype(float) + 
            0.114 * im[:,:,2].astype(float))
imshow(im_gray3, vmin=0, vmax=255)
plt.colorbar()
plt.title("$\gamma'$ of PAL and NTSC min {0:.2f} max {1:.2f}".format(im_gray3.min(), im_gray3.max() ))
plt.show()

# The weighted average of RGB is used as a grayscale image. The weight coefficients vary depending on the standard.
# https://en.wikipedia.org/wiki/Grayscale#Luma_coding_in_video_systems
# This is what rgb2gray() uses.http://scikit-image.org/docs/dev/api/skimage.color.html#skimage.color.rgb2gray
im_gray4 = (0.2126 * im[:,:,0].astype(float) +
            0.7152 * im[:,:,1].astype(float) + 
            0.0722 * im[:,:,2].astype(float))
imshow(im_gray4, vmin=0, vmax=255)
plt.colorbar()
plt.title("$\gamma'$ of HDTV min {0:.2f} max {1:.2f}".format(im_gray4.min(), im_gray4.max() ))
plt.show()

Histogram

A graph showing the frequency distribution of pixel values.
The third picture shows a drawing of Metamon on a whiteboard.
The peak of the Metamon drawing can be seen around 0.5~0.8, although it is a little uneven depending on the amount of ink in the marker (the amount of force used when drawing, etc.).
In addition, the area where the PC display on the other side of the whiteboard is reflected (to the left of Metamon's mouth) is quite white.This is thought to be the peak around 0.

Code

im_files = ['file_path1', 'file_path2', 'file_path3']

for file in im_files:
    im = imread(file)[:,:,:3]  # In the case of RGBA, extract only RGB

    fig = plt.figure(figsize=(20,3))

    ax = fig.add_subplot(1, 3, 1)
    im = rgb2gray(im) # Range;[0,1]
    imshow(im)
    plt.axis('off')

    bins = 256

    ax = fig.add_subplot(1, 3, 2)
    freq, bins = histogram(im)
    plt.plot(bins, freq)
    plt.xlabel("intensity")
    plt.ylabel("frequency")
    plt.title('histogram (linear)')
    plt.xlim(0,1)


    ax = fig.add_subplot(1, 3, 3)
    freq, bins = histogram(im)
    plt.plot(bins, freq)
    plt.xlabel("intensity")
    plt.ylabel("log frequency")
    plt.yscale('log')
    plt.title('histogram (log)')
    plt.xlim(0,1)

    plt.show();

Image Processing #1 Pixels, Quantization, and Sampling

haratena9 — Sat, 12 Feb 2022 05:31:47 +0000

About this memorandum

Digital image (an image is a collection of pixels)

An image is made up of pixels.
Pixel value
- unit8:an integer value from 0 to 255 (0:white => black:255)
- float:real number between 0 and 1.
Pixel size
- unit8:1byte/pixel (8bit)
- float32:4bytes/pixel (32bit)
Prayscale image
- The pixel value represents only the brightness of the image.
Represented as a two-dimensional array in the program.



im = imread('[file path]')
imshow(im)



im_eye = im[70:100, 120:150]
imshow(im_eye)



im_eye2 = im[80:90, 125:140]
imshow(im_eye2)

summary

Grayscale image



im_gray = cv2.imread('[file_path]', cv2.IMREAD_GRAYSCALE)
imshow(im_gray)

Notes:accessing the image array

Order of access to arrays
- Rows, columns
Access order to pixels
- Vertical, horizontal
- y, x
If looping, outside is y, inside is x
- The second index in the array is the contiguous memory area.

In other words, it is the opposite of the general sense (x, y), so it is a hotbed of bugs during implementation.

Code



im = np.zeros((5, 5)) # image with width 3✕ height 2 (array)
im[2, 3] = 255 # Access pixels with (x,y)=(3,2)
print(im)

imshow(im)
plt.axis('off')
plt.show()

Result



[[  0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0. 255.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]]

Sampling and Quantization

Signals are sampled and quantized
Sampling: spatial discretization
Quantization: Discretization of values
- Normally, 8bit 256 steps
- For special applications, such as medical use, 10-bit and 12-bit are also available.

Sampling



image_downscaled1 = downscale_local_mean(im_gray, (1, 1))
image_downscaled2 = downscale_local_mean(im_gray, (10, 10))
image_downscaled3 = downscale_local_mean(im_gray, (20, 20))
image_downscaled4 = downscale_local_mean(im_gray, (50, 50))

Quantization



# 32bit quantization
bins = np.linspace(0, im.max(), 2**5)
digi_image1 = np.digitize(im, bins)
digi_image1 = (np.vectorize(bins.tolist().__getitem__)(digi_image1-1).astype(int))

# 16bit quantization
bins = np.linspace(0, im.max(), 2**4)
digi_image2 = np.digitize(im, bins)
digi_image2 = (np.vectorize(bins.tolist().__getitem__)(digi_image2-1).astype(int))

# 8bit quantization
bins = np.linspace(0, im.max(), 2**3)
digi_image3 = np.digitize(im, bins)
digi_image3 = (np.vectorize(bins.tolist().__getitem__)(digi_image3-1).astype(int))

# 4bit quantization
bins = np.linspace(0, im.max(), 2**2)
digi_image4 = np.digitize(im, bins)
digi_image4 = (np.vectorize(bins.tolist().__getitem__)(digi_image4-1).astype(int))

※If the quantization is made too coarse, non-existent contours "pseudo contours" will be generated.