CNN in short

CNN is not a subject that can be explained briefly, but I will try to explain cnn briefly.

CNN is an abbreviation for Convolutional Neural Networks. CNNs are a deep learning algorithm frequently used especially in image recognition and processing tasks.

CNNs use a technique called convolutional processing to capture local features of an image. This makes CNNs more effective when working with images than other deep learning models.

The basic components of CNN are:

Convolution Layer: This layer applies a filter (or kernel) on the input image, and each filter detects different features of the image (e.g. edges, corners, etc.).

Activation Function: It is usually called ReLU (Rectified Linear Unit) and is applied to each pixel value resulting from the convolution. The activation function increases the model's ability to solve nonlinear problems.

Pooling Layer (or Subsampling Layer): This layer is used to reduce the input size. This reduces the complexity of the model and prevents overfitting.

Fully Connected Layer: This layer performs the final classification task using learned local features.

CNNs are often created by sequentially combining multiple layers of convolution, activation, and pooling, with one or more fully connected layers added at the end.

To create a simple CNN model using the keras library in Python, a code like the following can be written:

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Create the model
model = Sequential()

# Add convolution layer
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))

# Add pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten convolution and pooling layers
model.add(Flatten())

# Add the full connection layer
model.add(Dense(128, activation='relu'))

# Add the output layer
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In this example, a convolution layer, a pooling layer, and a fully connected layer are added. The model is optimized for a binary classification problem (sigmoid activation function and binary_crossentropy loss function are used).

The Convolutional Layer

Convolutional layers are the key building block of the network, where most of the computations are carried out. It works by applying a filter to the input data to identify features. This filter, known as a feature detector, checks the image input’s receptive fields for a given feature. This operation is referred to as convolution.

The filter is a two-dimensional array of weights that represents part of a 2-dimensional image. A filter is typically a 3×3 matrix, although there are other possible sizes. The filter is applied to a region within the input image and calculates a dot product between the pixels, which is fed to an output array. The filter then shifts and repeats the process until it has covered the whole image. The final output of all the filter processes is called the feature map.

The CNN typically applies the ReLU (Rectified Linear Unit) transformation to each feature map after every convolution to introduce nonlinearity to the ML model. A convolutional layer is typically followed by a pooling layer. Together, the convolutional and pooling layers make up a convolutional block.

Additional convolution blocks will follow the first block, creating a hierarchical structure with later layers learning from the earlier layers. For example, a CNN model might train to detect cars in images. Cars can be viewed as the sum of their parts, including the wheels, boot, and windscreen. Each feature of a car equates to a low-level pattern identified by the neural network, which then combines these parts to create a high-level pattern[1].

Activation Layer

Activation layers introduce nonlinearity into the network by adding an activation function to the output of the previous layer. will apply an element-by-element activation function to the output of the convolution layer. Some common activation functions are RELU : max(0, x), Tanh , Leaky RELU , etc. The volume remains unchanged, so the output volume has dimensions 32 x 32 x 12.[2]

The Pooling Layers

A pooling or downsampling layer reduces the dimensionality of the input. Like a convolutional operation, pooling operations use a filter to sweep the whole input image, but it doesn’t use weights. The filter instead uses an aggregation function to populate the output array based on the receptive field’s values.

There are two key types of pooling:

Average pooling: The filter calculates the receptive field’s average value when it scans the input.
Max pooling: The filter sends the pixel with the maximum value to populate the output array. This approach is more common than average pooling.
Pooling layers are important despite causing some information to be lost, because they help reduce the complexity and increase the efficiency of the CNN. It also reduces the risk of overfitting.[1]

Flattening

The resulting feature maps are flattened into a one-dimensional vector after the convolution and pooling layers so they can be passed into a completely linked layer for categorization or regression.

The Fully Connected Layer

The final layer of a CNN is a fully connected layer.

The FC layer performs classification tasks using the features that the previous layers and filters extracted. Instead of ReLu functions, the FC layer typically uses a softmax function that classifies inputs more appropriately and produces a probability score between 0 and 1 [1].

GeeksForGeeks explained CNN perfectly =>
GeeksForGeeks