DEV Community: Akshay Mahajan

Neural Network Basics: Gradient Descent

Akshay Mahajan — Tue, 07 Apr 2020 12:44:30 +0000

In the previous post, we discussed what a loss function is for a neural network and how it helps us to train the network in order to produce better, more accurate results. In this post, we will see how we can use gradient descent to optimize the loss function of a neural network.

Gradient Descent

Gradient Descent is an iterative algorithm to find the minimum of a differentiable function. It uses the slope of a function to find the direction of descent and then takes a small step towards the descent direction in each iteration. This process continues until it reaches the minimum value of the function.

Let's say we want to optimize a function J(W) with respect to the parameter W. We can summarize the working of the Gradient Descent algorithm as follows:

Start at any random point on the function.
Calculate the slope of the function at that point.
Take a small step in the direction opposite to the slope of the function.
Repeat until it reaches the minimum value.

The algorithm works in the same way for any N-dimensional differentiable function. The example above shows a 2D plot because it is easier for us to visualize.

For our neural network, we need to optimize the value of our loss function J(W), with respect to the weights(W) in the used in the network. We can write the algorithm as:

Algorithm

Learning Rate

The parameter is known as the learning rate. It is the rate at which we descend the slope of the function.

A small learning rate means that the algorithm will take small steps in each iteration and will take a long time to converge.
A very large learning rate can cause the algorithm to overshoot the point of minimum value and then overshoot again in the opposite direction, which can eventually cause it to diverge.

To find an optimum learning rate, it is a good idea to start with something small, and slowly increase the learning rate if it takes a long time to converge.

Conclusion

To summarize the basics of a neural network:

Perceptron is the basic building block of a neural network. It multiplies an input with a weight and applies a non-linearity to the product.
Perceptrons connect together to form a layer of a neural network. There are multiple set of weights between different layers.
To train the network we choose a loss function and then optimize the loss functions with respect to the weights W using gradient descent.

This concludes this series on the basics of neural networks. I would love to hear your views and feedback. Feel free to hit me up on Twitter.

Neural Network Basics: Training a Neural Network

Akshay Mahajan — Wed, 25 Mar 2020 13:51:54 +0000

In the previous post, we discussed how the perceptrons connect to form a layer of a neural network and how these layers connect to form what is called a deep neural network. In this post, we will start with an example and learn how a neural network is trained. Further, we will discuss, how the network computes loss.

Example

Let us consider an example problem where we want to predict whether a student will pass or fail a class given the number of lectures attended(x1) and the number of hours spent(x2) on the final project by the student.

We can represent this problem with the following model:

To make a prediction, we need to initialize the weights(W). We can start by initializing weights randomly. Now let us say, for a student who attended 4 lectures and spent 5 hours on the final project, the model predicted output of 0.1 or 10%. But in reality, the student passed the class. We need a way to tell our model that the predicted output was wrong so that the model can adjust its weights to predict an output closer to the actual value. This can be done by defining a loss function:

We can average up the loss for each training set sample to calculate the final average loss. This is also called the cost function or empirical loss.

Loss

We can use different loss functions for different models depending on the output produced by the model. Some of the common functions include mean squared error loss & binary cross-entropy loss

Mean Squared Error Loss can be used with models that produce continuous numbers as output. An example of this would be predicting the score of a student or the price of a house.

Binary Cross-Entropy Loss can be used with models that produce a probability between 0 & 1. An example of this would be predicting whether a student will pass or fail.

The final piece left in training the model is to optimize the loss. By optimizing we mean that we want to minimize the loss of the model over all the inputs in the training data. This can be achieved by using an optimization algorithm called Gradient Descent.

Conclusion

In this post, we learned how loss is calculated for a neural network. In the next post and final post, we will learn about the Gradient Descent Algorithm and how it can be used to minimize the loss so that our model can predict outputs with better accuracy.

I would love to hear your views and feedback. Feel free to hit me up on Twitter.

Neural Network Basics: Building a Neural Network

Akshay Mahajan — Mon, 16 Mar 2020 12:33:54 +0000

In the previous post, we discussed the structural building block of a neural network, also called the perceptron. In this post, we will learn how these perceptrons connect together to form a layer of a neural network and how these layers connect to form what is called a deep neural network.

Multi-Output Perceptron

Let us recall the structure of a perceptron, that we discussed in the last post.

Now, let us consider that we have 2 perceptrons, instead of 1, each connected to all the inputs. This can be visually represented as the image below.

This is called a multi-output perceptron.

This allows us to create as many outputs as we want, by stacking together multiple perceptrons.
For the sake of simplicity, the bias term has been omitted here.
Each perceptron is connected to all the inputs from the previous layer. This is known as a dense layer.

Single Layer Neural Network

Let's add more perceptrons to create a single layer neural network.

Here, we can see 3 types of layers:

The first layer contains the inputs to the network and is called Input Layer
The last layer provides the final output of the network and is called the Output Layer
The layers between the input and output layer are called the Hidden Layers. The number of hidden layers represent the depth of the network.

As the number of connections increase, so do the number of weights. For example, there are two sets of weights, w1,1⁽¹⁾ connecting x1 to z1 & w1,1⁽²⁾ connecting z1 to y^{^}1. The equations for the perceptrons in the hidden and output layer can be represented as:

To make these easier to manage, and also to speed up computation, the inputs, weights, and outputs are usually stored in the form of vectors. This also allows us to take advantage of multi-core CPUs to speed up training by better parallelizing computations.

To vectorize the equation, we can represent input, output, and weights as vectors and simply replace the summation of products with a matrix dot product:

Conclusion

In this post, we learned how a neural network is constructed by connected individual perceptrons. In the next part, we will see how the network computes the output and eventually gets better and more accurate.

I would love to hear your views and feedback in the comments. Feel free to hit me up on Twitter.

Neural Network Basics: The Perceptron

Akshay Mahajan — Sun, 08 Mar 2020 13:09:31 +0000

A Neural Network is a machine learning model inspired by the human brain. A neural network learns to perform a task by looking at examples without being explicitly programmed to perform the task. These tasks can vary from predicting sales based on historic data, detecting objects in images, language translation, etc.

The Perceptron

The Perceptron is the structural building block of a neural network. It is modeled after the neurons inside the brain.

Mathematically, this can be represented by the following equation:

We can explain the working of a single perceptron as follows:

Each perceptron receives a set of inputs x1 to xm
Each input has a weight associated with it ie. w1 to wm for x1 to xm respectively.
Each input is multiplied with its respective weight, added, and passed as input to an activation function.
A bias term w0 is also added, which allows us to shift the activation function left or right, irrespective of the inputs.
An activation function, also called a non-linearity, is a non-linear function that produces the final output of the perceptron.

Activation Functions

An activation function is a non-linear function. A few common activation functions are the sigmoid function, the hyperbolic tangent (tanh) function, and the rectified linear unit (ReLU) function. The choice of the activation function used depends on the type of output expected from the perceptron. For example, the output of the sigmoid function ranges between 0 and 1. Therefore, it is useful when output is supposed to be a probability.

Why do we need an Activation Function

Activation functions introduce non-linearity into the network. In real life, almost all of the data is non-linear. Without an activation function, the output will always be linear. Non-linearities, on the other hand, allows us to approximate arbitrarily complex functions.

Conclusion

In this post, we learned how a single perceptron works. In the next part, we will see how these perceptrons are connected together to form a neural network and how the network learns to do a task that it was not explicitly programmed to do.

I would love to hear your views and feedback. Feel free to hit me up on Twitter.

Getting Started with Javascript Testing

Akshay Mahajan — Tue, 25 Feb 2020 15:41:07 +0000

As software engineers, it is our job to write code that solves problems and to make sure that it solves the problem correctly. Testing helps us to ensure that the software we write behaves the way it was intended to. Unit Testing is the most basic type of testing that can be performed on a piece of code that verifies its correctness for a given set of values.

General Structure of a Unit Test

A Unit Test generally consists of 3 things:

A unit (a block of code or a function) that needs to be tested
The inputs to the unit for which it needs to be tested
The expected output for the given inputs

Let's build a Mini Testing Library

Let us build a small function that can convert temperatures from Fahrenheit to Celcius. Before we start building the function, we can think of a few possible test cases for the function.

The input of 0 should return an output of -17.77777777777778
The input of 5 should return an output of -15
The input of -4 should return an output of -20

This process of building and writing tests before the actual implementation of the functionality is known as Test-Driven Development (TDD).

function ftoc(f) {
  return f - 32 * 5/9;
}

Looking at the Structure of a Unit Test, let us write some utility functions that can help us abstract the working of the test.

function expect(result) {
  return {
    toBe: function(expected) {
      if (result !== expected) {
        throw new Error(`Expected ${expected}, but got ${result}`)
      }
    }
  }
}

function it(description, fn) {
  try {
    fn();
    console.log(`✅ ${description}`)
  } catch (error) {
    console.log(`❌ ${description}: ${error}`)
  }
}

it("Convert 0F to Celcius", function() {
  expect(ftoc(0)).toBe(-17.77777777777778)
})
it("Convert Positive Temparaterue to Celcius", function() {
  expect(ftoc(5)).toBe(-15)
})
it("Convert Negative Temparature to Celcius", function() {
  expect(ftoc(-4)).toBe(-20)
})

On executing the above code, the following output is produced.

This shows that our conversion function works for just one of the cases and fails for the other two. To fix the function update the function to include a set of parentheses to fix the conversion function.

function ftoc(f) {
  return (f - 32) * 5/9;
}

Re-running the tests gets the following output.

Conclusion

Well, that was easy, right? We built a simple testing library in just a few lines of code that can be used to test any code we write. Although it is far from something that can be used in production, the main idea and structure remain the same. There are a lot of popular and feature-rich testing frameworks like Jest, Mocha, etc. that provide advanced features like a detailed description of failing tests along with the exact line number and stack trace, that is more suitable for use in production environments.

I would love to hear your views and feedback in the comments. You can also hit me up on Twitter.