How Neural Networks Work

dattran1999 — Thu, 25 Feb 2021 11:40:29 +0000

Teaching Philosophy

I know that there are A LOT of tutorials/blog posts on neural networks already (some of my favourites include 3B1B series on YouTube),
but I am a big advocate of learning by doing. So this series will not just present a bunch of information to you,
but actually asking you to implement the things we covered in each post.

Introduction to Neural Network

The inspiration of Artificial Neural Networks (or neural network for short) comes from Biological Neural Networks. But I haven't had a biology class
since high school so I have no idea how a biological neural network works :) but I bet it looks something like this:

For this tutorial, we will go through the primitive building block of an Artificial Neural Networks, which is a perceptron.

Assumed maths knowledge

Functions
Coordinates geometry

Perceptron

Perceptron and its learning rule is not popular anymore, but it is a great start for building an understanding of how everything works.

The goal of perception is to classify sets of points.

How perceptron works

Definition: A perceptron is a function that takes several inputs, and produces one output.

Formula:

where w's are the weights, f is the activation function (explained below), x's are the inputs, and y is the output.

This is basically putting a polynomial into some function called activation function.

And the goal of perceptron is to classify sets of points.

Weights of perceptron and Classification

To understand the importance of weights, it's useful to think about the case where we only have 2 inputs.
Considering only the part where we multiply inputs by weights and sum them up, we have:

Notice that this equation is very similar to the standard form of linear equation, which is of the form:

Example

Consider the following diagram, where we want to classify point A and point B (i.e. finding a way to separate them).

In the diagram, the line has equation

Looking visually, it's clear that the line separates the two points. Below is the mathematical explanation.

From coordinate geometry, we know that any points to the "above" or "to the left" of the line (e.g. point A) will satisfy

and any points to the "below" or "to the right" of the line (e.g. point B) will satisfy

With that straight line, we have successfully classified point A and point B into 2 classes.
But that only works visually, not mathematically yet. To make it work mathematically, we need the activation function.

Importance of Weights

It is important to note that if the weights are different, we might not be able to classify point A and point B. One such example is the line
, which is a horizontal line passing through (0,0)

So a question to ask it how do we find the weights that will correctly classify the points we have. The answer to that is through perceptron learning, and we will cover that in the next post.

Activation function

More often that not, we want the output in the range 0 to 1 only, to notate if that certain perceptron is activated or not.
So we need some function, called activation function, to do that for us.
One simple way to achieve that is to use the heaviside function, which converts all negative numbers to 0, and all positive numbers (including 0) to 1.

Coming back to our example, for point A, it satisfies

Hence putting that in the heaviside function will output 1. With the similar approach, putting B in the heaviside function outputs 0.

Therefore, we have correctly classify points A and B mathematically.

Sum Up

Weights will determine if a straight line (or plane in higher dimension) can separate the points into classes. Only a set of weights will be able to separate the points.

Activation function is just a function that generalize all points that fit certain criteria.

Exercise

Write a function that takes a list of pairs of coordinates, and a list of classes, determine if the given weights will be able to classify the classes.

def is_correct_weights(coords, classes, w_0, w_1, w_2) -> bool:
    pass

# example from above
coords = [(-0.7, 2.7), (1.5, 1.1)]
# classes[i] is class of coords[i]
classes = [0, 1]
is_correct_weights(coords, classes, -1, -2, 1) # True
is_correct_weights(coords, classes, -1, 0, 1) # False

NOTE: do it in any language you want, but it is recommended to use Python, since we will use Python much more later on.

DEV Community: dattran1999