In a linear fit, data is fitted to a linear equation, typically to the slope-intercept form:
Likewise, for a multi-linear equation with a total number of independent variables, the expression is:
Either of the above equations can be fitted by changing the weights in a way that minimizes the error between the data points and the theoretical points generated by the equation. This error metric is typically known as a loss function or cost function.
In the case of deep learning, the equation is more complex. The multilayer perceptron (MLP) model is used as an example here for simplicity, as it may be one of the simplest neural networks to explain. Its architecture is shown below:
In here, each column is a layer of vertical nodes, where the first layer from the left has nodes containing the input variables containing the information of each pixel in this example. The layers residing between the first and last layers in the network are known as hidden layers as the flow of operations in these sectors is typically not disclosed by the machine and is thus hidden to the user. The nodes in these hidden layers are also typically known as neurons as they handle most of the coefficients adjustments during the training phase. Finally, the last layer contains the output nodes, which give rise to the dependent variables of the model.
It is helpful to get a notion of the connectivity of this topology to understand how the network operates. Here, the connections to 2 different nodes of the above network will be explored. First, let's look at the first node of the first hidden layer:
The connections of the 4 input variables to the first node of the first hidden layer is a linear combination of the first four variables to such node, and can be expressed as:
Now let's look at the connections for the following node in the second hidden layer:
The connections of the 6 hidden nodes from the first hidden layer to the first node of the second hidden layer is a linear combination of the latter 6 hidden nodes. This new node can then be defined by the below equation. The node previously defined in the above equation has been bolded for context.
”MLP neural networks are linear combinations of recursively-nested linear combinations.”
The number of nested operations depend on the depth [the number of layers of the network] and the number of variables and nodes in the system.
In a similar manner as with the linear equation fit, deep learning can be seen as the process done to minimize the cost or loss function of a deep learning neural network given the inputs and the shape of the network.
Image classification and image recognition problems are widely taught in deep learning. The popular image recognition problem of "cat and dog" identification from a library of these animals occurs through a process analogous to the following heuristic.
For this example, let's use the following figure of a puppy as a "shiba.png" file (right-click Save Image As...):
The .png image may be imported to Python in the following way:
import numpy as np import cv2 as vision #import computer vision module #read color image and declare it as (LxWx(rgb_color_channels(3))) array dog = vision.imload("shiba.png") # we can also decrease the complexity of the image by looking at the pixel intensities of the first channel, the "red" channel: dog = dog[:,:,0] dog = dog.reshape((*dog.shape[:2])) # L X W array
The resulting image is now a 180 x 180 pixels array. To correlate the image in an analogous manner to the linear regressions made with the slope intercept form equation, each of these pixels must be made dependent to the inputs in the neural network. This can be done by vectorizing the above image array with the "flatten()" operation:
X = dog.flatten()
The variable above is thus a 1-D vector with a length of 32,400 pixels.
After this transformation, each pixel can be more easily assigned to each of the nodes on the dense layer(s) of the neural network, which is fitted with a weight
. The vector
containing these weights must be of the same size as the number of pixels of this vectorized image. As an example, let's make these weights very small random numbers:
W = np.random.random(dog.shape)/10000
The value of a single
data point is then the dot product between this vector and the vectorized image:
y = np.sum(W*X) print(y)
This value is analogous to calculating a singular value with the multi-linear equation shown at the beginning of this post with a random set of weights. As this value is only a single data point, it is evident that more than a single image is needed to make a proper fit for a cat-dog identification model.
Having the network properly identify this image as the drawing of a dog or cat will require passing the neural network through many images containing dogs as well as cats, allowing the network to adjust the weights associated with each pixel in the images to generate an adequate variable . In order to classify qualitatively, a logistic function (a "sigmoid") is typically used with the regression.
This is a reduced explanation on a simple neural network model. For a deeper understanding on this concept, here are some good online references:
Ian Goodfellow et. al. Deep Learning
Michael Nielsen Neural Networks and Deep Learning
Adrian Rosebrock pyimagesearch