Machine Learning - Max & Average Pooling

#machinelearning #beginners #computerscience

Hey there,

How is the sunday going there? Corona there yet?
This is going to be the 10th post in the series.
Lets get to the topics without furthor ado.

There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling returns the maximum value from the portion of the image covered by the Kernel. On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.

We'll now take a look at the other key concept in convolutional neural networks called max pooling. In short, max pooling is just the process of reducing the size of an input image by summarizing regions. Let's see how this works concretely with an example.

In order to perform max pooling, we need to select 2 things, a grid, which is the pool size, and a stride.

For this example, we're going to use a two-by-two pixel grid that we can see here in orange.

We then look at the pixels inside our orange grid, and select the pixel with the greatest value.

For example, here we have the pixel values 22, 27, 91, and 110. In this case, we select the value 110 since this is the greatest pixel value among these.

The other parameter we use is called the stride. The stride determines the number of pixels to slide the window across the image.

In this example, we use a stride of two, which means we will slide the orange grid two pixels at each step.
So we slide our orange grid 2 pixels to the right. Now, in this new position, we again select the pixel with the greatest value. We now have pixel values of 36, 313, 120, and 522.
So now, we select the value of 522, since this is the greatest pixel value among these. We now add this pixel value of 522 to the corresponding pixel in the new image, and we continue this process until we cover the entire convoluted image.

The result will be a new image that's smaller than the original image, and you can say we've down-sampled the original image. In this particular example, we end up with a new image that's half the size of the original image. The size of this new image will vary depending on your choice of the grid size and the stride.

Recap

A convolution is the process of applying a filter (“kernel”) to an image. Max pooling is the process of reducing the size of the image through downsampling.

Convolutional layers can be added to the neural network model using the Conv2D layer type in Keras. This layer is similar to the Dense layer, and has weights and biases that need to be tuned to the right values. The Conv2D layer also has kernels (filters) whose values need to be tuned as well. So, in a Conv2D layer the values inside the filter matrix are the variables that get tuned in order to produce the right output.

Here are some of terms that were introduced in this lesson:

CNNs: Convolutional neural network. That is, a network which has at least one convolutional layer. A typical CNN also includes other types of layers, such as pooling layers and dense layers.

Convolution: The process of applying a kernel (filter) to an image
Kernel / filter: A matrix which is smaller than the input, used to transform the input into chunks

Padding: Adding pixels of some value, usually 0, around the input image
Pooling The process of reducing the size of an image through downsampling.There are several types of pooling layers. For example, average pooling converts many values into a single value by taking the average. However, maxpooling is the most common.

Maxpooling: A pooling process in which many values are converted into a single value by taking the maximum value from among them.

Stride: the number of pixels to slide the kernel (filter) across the image.

Downsampling: The act of reducing the size of an image

If you want to know more details about how CNNs works make sure to check out this Comprehensive Guide to Convolutional Neural Networks.