DEV Community

Francisco Canova
Francisco Canova

Posted on

Coursera's Deep Learning Specialization: Foundations of CNNs (Week 1)

CNN Basics

By the mathematical nature of computing convolutions, we can detect edges in images, and therefore features. With that information Convolutional Neural Networks(CNNs) are instrumental in computer vision applications today and is why I wanted to learn and practice implementing them. This week I learned the basics of how CNN and their typical components work(convolutional, pooling, activation, fully connected[FC], and the different padding types). Where the purposes/steps of each of these components are listed below and in the order they are usually found in a basic CNN.

Padding Layer

We pad a given image because by the nature of convolutional multiplication, the outputted matrix will be smaller than the original inputted image. So in order to preserve size when completing a forward pass, we pad the border of an image with zeros.

Convolution Layer Steps

  1. n filters are applied to the top left corner[typically] of an inputted image
  2. You compute the element wise product of the filters and the input image
  3. Then you slide the filters across the given image by a predetermined stride amount, repeating steps 1, 2, and 3 until you reach the bottom right corner of the image.

Pooling Layer

Where you sample from a predetermined region in the image and obtain the maximum, average, minimum etc. of those numbers. Selecting the maximum value is typically whats done in computer vision since it helps detect edges much more efficiently.

Image description

Activation Layer

Where the values in one layer are inputed into a predefined function, which paired with back propagation provides us with the opportunity to extract the "information" the CNN learned in a way that we can interpret.

Common activation functions in Computer Vision are
Rectified Linear Unit(ReLU): Used when passing information from layer to layer, very rarely used as the output layer.
Softmax: Used for multi-class classification problems where the vector output is x classes long, and the index with the highest value is the selected class.
Sigmoid: used for binary classification where the differentiation between the two classes is if the activations output is above or below 0.5.

Some Graph examples of the activation functions above + some additional ones.

Below is a visual representation of a basic CNN, where the arrows represent the layers, and the blue boxes represent the vectors passed between each layer. With the sizes of the blue boxes representing the vector dimensions.
Image description
The order of layers above are
Convolution --> Activation(ReLU) --> Pooling(Max) --> FC --> Activation(Softmax)

Practice

Created a the forward and backwards propagation of a CNN from scratch in order to learn how CNNs work conceptually in more depth. Also completed the same task using Tensorflow's Functional APIto practice how these Deep Learning Algorithms are built and deployed more efficiently.

Below is an example of JUST the convolution layer from scratch and below should it should be the MUCH more efficient Tensorflow implementation that includes all the layers typically in a basic CNN (convolution, pooling, activation, and FC)

Forward Propogation from scratch

def conv_forward(A_prev, W, b, hparameters):

    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    (f, f, n_C_prev, n_C) = W.shape

    stride = hparameters["stride"]
    pad = hparameters['pad']

    n_H = int((n_H_prev - f + (2 * pad)) / stride + 1)
    n_W = int((n_W_prev - f + (2 * pad)) / stride + 1)

    Z = np.zeros((m, n_H, n_W, n_C))

    A_prev_pad = zero_pad(A_prev, pad)

    for i in range(m):   
        a_prev_pad = A_prev_pad[i]   
        for h in range(n_H):    
            vert_start = stride * h
            vert_end = stride * h + f

            for w in range(n_W):       
                horiz_start = stride * w
                horiz_end = horiz_start + f

                for c in range(n_C):   
                    a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]
                    weights = W[:, :, :, c]
                    biases = b[:, :, :, c]
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, weights, biases)


    cache = (A_prev, W, b, hparameters)

    return Z, cache
Enter fullscreen mode Exit fullscreen mode

Forward Propogation using Tensorflow's Functional API

def convolutional_model(input_shape):

    input_img = tf.keras.Input(shape=input_shape)
    Z1 = tfl.Conv2D(8, 4, strides = 1, padding = 'same')(input_img)
    A1 = tfl.ReLU()(Z1)
    P1 = tfl.MaxPool2D((8,8), 8, 'same')(A1)
    Z2 = tfl.Conv2D(16, 2, strides = 1, padding = 'same')(P1)
    A2 = tfl.ReLU()(Z2)
    P2 = tfl.MaxPool2D((4,4), 4, 'same')(A2)
    F = tfl.Flatten()(P2)
    outputs = tfl.Dense(6, activation = 'softmax')(F)
    model = tf.keras.Model(inputs=input_img, outputs=outputs)
    return model
Enter fullscreen mode Exit fullscreen mode

Top comments (0)