DEV Community

🧠 An AI / neural network...in vanilla JS! 😱 With no libraries! 🤯

GrahamTheDev on October 13, 2023

Have you ever tried to actually build a neural network? No, neither have I...until today! In this article we will cover a few things I learned and...

Read full post

Ben Sinclair • Oct 13 '23

This is pretty much the content of my intro to NN class at university, summed up in one post :)

GrahamTheDev • Oct 13 '23

Haha, well that is very kind, but if that is the case then Universities are slacking hard! 🤣💗

Ben Sinclair • Oct 14 '23

Well it was 30 years ago :) They're probably more comprehensive nowadays.

Vincent A. Cicirello • Oct 13 '23 • Edited

Nice simple explanation. Sometimes the best intros are by those who are new to something. Good job with your post.

For some more insights into why sigmoid functions for activation units.... First, there are others that are more commonly used now as you pointed out such as ReLU units. But for a very long time, the sigmoid was the preferred unit. It has some connections in statistics (e.g. cumulative distribution functions). But it also has some useful properties.

One of those useful properties is that when the weights are small and the weighted sum of inputs is near 0 the sigmoid is approximately linear. It becomes more non-linear as weights are further from 0. Why is this useful? The simpler the model that fits the data is generally preferred. The more non-linear, the greater the risk of overfitting the training data. Early in training process weights are nearer to 0 (initialized to be near 0) than they are later in training. So neural network tends to become more non-linear the longer you train. This is why you also need to use cross validation or some other validation approach to ensure you don't overtrain. Essentially, model gets increasingly non-linear over time, sort of like simpler model to more complex.

Another more basic reason for sigmoid is the common use of neural networks for classification. The range of output from 0 to 1 fits naturally to binary decisions (yes/no, 0/1, etc). Or to probabilistic decisions if you interpret the output as a probability (e.g. ouput of 0.8 means that there is a probability of 0.8 that the input is an X for whatever X you are trying to classify).

During the long period of time when the benefits of deep learning were not yet fully seen, there are also some classic theoretical results that rely on sigmoids:

any boolean function can be represented with a neural net with a hidden layer of sigmoids and a sigmoid for the output;
any continuous function can be represented with 2 hidden layers of sigmoids and a linear unit for the output;
any function can be represented with 3 hidden layers of sigmoids and a linear unit for the output.

Those results suggest that you need at most 3 hidden layers. And the more layers you have the longer training takes and the more training data you need. So for a long time deep learning was seen as impractical (e.g. not enough processor speed and data).

Now that deep learning is more practical than it was in the past due to availability of much larger data sets as well as many core systems, GPUs, VPUs, etc, those results are less important, and other activation functions have grown in use for a variety of reasons.

GrahamTheDev • Oct 13 '23

Thanks for the extensive write up and extra info, that was super interesting and really informative. 🙏🏼💗

Vincent A. Cicirello • Oct 13 '23

You're welcome. Someplace in your post you mentioned the challenge in setting learning rate (e.g. too low causing slow convergence, etc). Using momentum can sometimes help. Essentially it's an extra term in the weight update rules that includes the updates from the previous iterations.

GrahamTheDev • Oct 13 '23

I will check that out too, thanks! 🙏🏼💪🏼💗

tom arnall • Nov 17 '23 • Edited

When I try to run version 1 from bash shell, I get:

neuralNetwork.train([data.x, data.y], encode(data.label));

ReferenceError: encode is not defined
    at train ([stdin]:71:39)
    at [stdin]:78:1

I can't find a def' for encode on your page.

GrahamTheDev • Nov 17 '23

The encode function is in the codepen, it is after the train function about 30 lines down. 💗

Jon Randy 🎖️ • Oct 13 '23

I built a neural network from scratch many years ago on my ZX Spectrum. It could do simple character recognition if I recall. This was probably in the early '90s

GrahamTheDev • Oct 13 '23

Very cool! I am not at that level yet, I am at the "poke it and prod it" caveman stage! 🤣💗

Jon Randy 🎖️ • Oct 13 '23

It was kinda limited given the 4mhz clock speed, and the fact it was running in interpreted BASIC on a machine with only 128K total memory!

GrahamTheDev • Oct 13 '23

Limitations just mean innovation though! 💪🏼💗

Emeka Orji • Oct 13 '23

Woah! man, you are a legend

GrahamTheDev • Oct 13 '23

Have you ever built a neural network?

Did you use a library?

Let me know what you have done, and please do tell me if I made some mistakes in my model, so I can improve my understanding for my next attempt! 💗

AndrewBarrell1 • Oct 15 '23

Really nice article! It was really nice to follow your from the ground exploration with you!

maryimana butom • Oct 16 '23

Thanks guy 🙏🏿

codingjlu • Oct 13 '23

This is incredibly awesome. It's so cool to see your journey, and best of luck in the future!

I've always wanted to make a NN in JavaScript, and you've certainly motivated me ;)

eerk • Dec 6 '23

This was a lot of fun to play with!

The hidden layers become more useful when your training data is more complex. A neural network can't really learn much from only 4 data points.