Have you ever tried to actually build a neural network? No, neither have I...until today!
In this article we will cover a few things I learned and...
For further actions, you may consider blocking this person and/or reporting abuse
This is pretty much the content of my intro to NN class at university, summed up in one post :)
Haha, well that is very kind, but if that is the case then Universities are slacking hard! 🤣💗
Well it was 30 years ago :) They're probably more comprehensive nowadays.
Nice simple explanation. Sometimes the best intros are by those who are new to something. Good job with your post.
For some more insights into why sigmoid functions for activation units.... First, there are others that are more commonly used now as you pointed out such as ReLU units. But for a very long time, the sigmoid was the preferred unit. It has some connections in statistics (e.g. cumulative distribution functions). But it also has some useful properties.
One of those useful properties is that when the weights are small and the weighted sum of inputs is near 0 the sigmoid is approximately linear. It becomes more non-linear as weights are further from 0. Why is this useful? The simpler the model that fits the data is generally preferred. The more non-linear, the greater the risk of overfitting the training data. Early in training process weights are nearer to 0 (initialized to be near 0) than they are later in training. So neural network tends to become more non-linear the longer you train. This is why you also need to use cross validation or some other validation approach to ensure you don't overtrain. Essentially, model gets increasingly non-linear over time, sort of like simpler model to more complex.
Another more basic reason for sigmoid is the common use of neural networks for classification. The range of output from 0 to 1 fits naturally to binary decisions (yes/no, 0/1, etc). Or to probabilistic decisions if you interpret the output as a probability (e.g. ouput of 0.8 means that there is a probability of 0.8 that the input is an X for whatever X you are trying to classify).
During the long period of time when the benefits of deep learning were not yet fully seen, there are also some classic theoretical results that rely on sigmoids:
Those results suggest that you need at most 3 hidden layers. And the more layers you have the longer training takes and the more training data you need. So for a long time deep learning was seen as impractical (e.g. not enough processor speed and data).
Now that deep learning is more practical than it was in the past due to availability of much larger data sets as well as many core systems, GPUs, VPUs, etc, those results are less important, and other activation functions have grown in use for a variety of reasons.
Thanks for the extensive write up and extra info, that was super interesting and really informative. 🙏🏼💗
You're welcome. Someplace in your post you mentioned the challenge in setting learning rate (e.g. too low causing slow convergence, etc). Using momentum can sometimes help. Essentially it's an extra term in the weight update rules that includes the updates from the previous iterations.
I will check that out too, thanks! 🙏🏼💪🏼💗
When I try to run version 1 from bash shell, I get:
I can't find a def' for encode on your page.
~
The encode function is in the codepen, it is after the train function about 30 lines down. 💗
I built a neural network from scratch many years ago on my ZX Spectrum. It could do simple character recognition if I recall. This was probably in the early '90s
Very cool! I am not at that level yet, I am at the "poke it and prod it" caveman stage! 🤣💗
It was kinda limited given the 4mhz clock speed, and the fact it was running in interpreted BASIC on a machine with only 128K total memory!
Limitations just mean innovation though! 💪🏼💗
Woah! man, you are a legend
Have you ever built a neural network?
Did you use a library?
Let me know what you have done, and please do tell me if I made some mistakes in my model, so I can improve my understanding for my next attempt! 💗
Really nice article! It was really nice to follow your from the ground exploration with you!
Thanks guy 🙏🏿
This is incredibly awesome. It's so cool to see your journey, and best of luck in the future!
I've always wanted to make a NN in JavaScript, and you've certainly motivated me ;)
This was a lot of fun to play with!
The hidden layers become more useful when your training data is more complex. A neural network can't really learn much from only 4 data points.