Convolutional Neural Networks (CNN) in a Brief

#deeplearning #machinelearning #datascience #architecture

What is Convolutional Neural Network (CNN) ?

A neural network in which at least one layer is a convolutional layer.
Depending on features, we categorize the images (classify) using CNN.
Yann Lecun is considered the grandfather of Convolutional neural networks.

What is a Convolutional Layer ?

These are the layers of convolutional neural network where filters are applied to the original image.

Steps involved in constructing a Convolutional Neural Network:

Convolution Operation.
1. Stride.
2. ReLU Layer.
Pooling.
Flattening.
Full Connection.

Fig 1. Different Steps in constructing CNN

1. Convolution Operation :

In this process, we reduce the size of the image by passing the input image through a Feature detector/Filter/Kernel so as to convert it into a Feature Map/ Convolved feature/ Activation Map
It helps remove the unnecessary details from the image.
We can create many feature maps (detects certain features from the image) to obtain our first convolution layer.
Involves element-wise multiplication of convolutional filter with the slice of an input matrix and finally the summation of all values in the resulting matrix.

Fig 2. Convolution Operation on a matrix / Image

1.1. Stride:

The number of pixels by which we are moving the filter over the input matrix is called a stride.

1.2. ReLU Activation Function :

ReLU is the most commonly used activation function in the world.
When applying convolution, there is a risk we might create something linear and there we need to break linearity.
Rectified Linear unit can be described by the function f(x) = max(x, 0).
We are applying the rectifier to increase the non-linearity in our image/CNN. Rectifier keeps only non-negative values of an image.

2. Pooling :

It helps to reduce the spatial size of the convolved feature which in-turn helps to to decrease the computational power required to process the data.
Here we are able to preserve the dominant features, thus helping in the process of effectively training the model.
Converts the Feature Map into a Pooled Feature Map.

Pooling is divided into 2 types:
1. Max Pooling - Returns the max value from the portion of the image covered by the kernel.
2. Average Pooling - Returns the average of all values from the portion of the image covered by the kernel.

3. Flattening :

Involves converting a Pooled feature Map into one-dimensional Column vector.

4. Full Connection :

The flattened output is fed to a feed-forward neural network with backpropagation applied to every iteration.
Over a series of epochs, the model is able to identify dominating features and low-level features in images and classify them using the Softmax Classification technique (It brings the output values between 0 and 1).