DEV Community

Cover image for 🧠 An AI / neural network...in vanilla JS! 😱 With no libraries! 🤯

🧠 An AI / neural network...in vanilla JS! 😱 With no libraries! 🤯

GrahamTheDev on October 13, 2023

Have you ever tried to actually build a neural network? No, neither have I...until today! In this article we will cover a few things I learned and...
Collapse
 
moopet profile image
Ben Sinclair

This is pretty much the content of my intro to NN class at university, summed up in one post :)

Collapse
 
grahamthedev profile image
GrahamTheDev

Haha, well that is very kind, but if that is the case then Universities are slacking hard! 🤣💗

Collapse
 
moopet profile image
Ben Sinclair

Well it was 30 years ago :) They're probably more comprehensive nowadays.

Collapse
 
cicirello profile image
Vincent A. Cicirello • Edited

Nice simple explanation. Sometimes the best intros are by those who are new to something. Good job with your post.

For some more insights into why sigmoid functions for activation units.... First, there are others that are more commonly used now as you pointed out such as ReLU units. But for a very long time, the sigmoid was the preferred unit. It has some connections in statistics (e.g. cumulative distribution functions). But it also has some useful properties.

One of those useful properties is that when the weights are small and the weighted sum of inputs is near 0 the sigmoid is approximately linear. It becomes more non-linear as weights are further from 0. Why is this useful? The simpler the model that fits the data is generally preferred. The more non-linear, the greater the risk of overfitting the training data. Early in training process weights are nearer to 0 (initialized to be near 0) than they are later in training. So neural network tends to become more non-linear the longer you train. This is why you also need to use cross validation or some other validation approach to ensure you don't overtrain. Essentially, model gets increasingly non-linear over time, sort of like simpler model to more complex.

Another more basic reason for sigmoid is the common use of neural networks for classification. The range of output from 0 to 1 fits naturally to binary decisions (yes/no, 0/1, etc). Or to probabilistic decisions if you interpret the output as a probability (e.g. ouput of 0.8 means that there is a probability of 0.8 that the input is an X for whatever X you are trying to classify).

During the long period of time when the benefits of deep learning were not yet fully seen, there are also some classic theoretical results that rely on sigmoids:

  • any boolean function can be represented with a neural net with a hidden layer of sigmoids and a sigmoid for the output;
  • any continuous function can be represented with 2 hidden layers of sigmoids and a linear unit for the output;
  • any function can be represented with 3 hidden layers of sigmoids and a linear unit for the output.

Those results suggest that you need at most 3 hidden layers. And the more layers you have the longer training takes and the more training data you need. So for a long time deep learning was seen as impractical (e.g. not enough processor speed and data).

Now that deep learning is more practical than it was in the past due to availability of much larger data sets as well as many core systems, GPUs, VPUs, etc, those results are less important, and other activation functions have grown in use for a variety of reasons.

Collapse
 
grahamthedev profile image
GrahamTheDev

Thanks for the extensive write up and extra info, that was super interesting and really informative. 🙏🏼💗

Collapse
 
cicirello profile image
Vincent A. Cicirello

You're welcome. Someplace in your post you mentioned the challenge in setting learning rate (e.g. too low causing slow convergence, etc). Using momentum can sometimes help. Essentially it's an extra term in the weight update rules that includes the updates from the previous iterations.

Thread Thread
 
grahamthedev profile image
GrahamTheDev

I will check that out too, thanks! 🙏🏼💪🏼💗

Collapse
 
kloro2006 profile image
tom arnall • Edited

When I try to run version 1 from bash shell, I get:

neuralNetwork.train([data.x, data.y], encode(data.label));

ReferenceError: encode is not defined
    at train ([stdin]:71:39)
    at [stdin]:78:1

Enter fullscreen mode Exit fullscreen mode

I can't find a def' for encode on your page.

~

Collapse
 
grahamthedev profile image
GrahamTheDev

The encode function is in the codepen, it is after the train function about 30 lines down. 💗

Collapse
 
jonrandy profile image
Jon Randy 🎖️

I built a neural network from scratch many years ago on my ZX Spectrum. It could do simple character recognition if I recall. This was probably in the early '90s

Collapse
 
grahamthedev profile image
GrahamTheDev

Very cool! I am not at that level yet, I am at the "poke it and prod it" caveman stage! 🤣💗

Collapse
 
jonrandy profile image
Jon Randy 🎖️

It was kinda limited given the 4mhz clock speed, and the fact it was running in interpreted BASIC on a machine with only 128K total memory!

Thread Thread
 
grahamthedev profile image
GrahamTheDev

Limitations just mean innovation though! 💪🏼💗

Collapse
 
code_rabbi profile image
Emeka Orji

Woah! man, you are a legend

Collapse
 
grahamthedev profile image
GrahamTheDev

Have you ever built a neural network?

Did you use a library?

Let me know what you have done, and please do tell me if I made some mistakes in my model, so I can improve my understanding for my next attempt! 💗

Collapse
 
vicariousv profile image
AndrewBarrell1

Really nice article! It was really nice to follow your from the ground exploration with you!

Collapse
 
mudjaycker profile image
maryimana butom

Thanks guy 🙏🏿

Collapse
 
codingjlu profile image
codingjlu

This is incredibly awesome. It's so cool to see your journey, and best of luck in the future!

I've always wanted to make a NN in JavaScript, and you've certainly motivated me ;)

Collapse
 
eerk profile image
eerk

This was a lot of fun to play with!

The hidden layers become more useful when your training data is more complex. A neural network can't really learn much from only 4 data points.