DEV Community

Build a flexible Neural Network with Backpropagation in Python

Samay Shamdasani on August 07, 2017

What is a Neural Network? Before we get started with the how of building a Neural Network, we need to understand the what first. Neura...
Collapse
 
frenkel2008 profile image
max frenkel • Edited

Nice, but never seems to converge on array([[ 0.92, 0.86, 0.89]]). What's a good learning rate for the W update step? It should probably get smaller as error diminishes.

Actually, there is a bug in sigmoidPrime(), your derivative is wrong. It should return self.sigmoid(s) * (1 - self.sigmoid(s))

Collapse
 
shamdasani profile image
Samay Shamdasani

Hey Max,

I looked into this and with some help from my friend, I understood what was happening.

Your derivative is indeed correct. However, see how we return o in the forward propagation function (with the sigmoid function already defined to it). Then, in the backward propagation function we pass o into the sigmoidPrime() function, which if you look back, is equal to self.sigmoid(self.z3). So, the code is correct.

Collapse
 
shamdasani profile image
Samay Shamdasani

Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.

I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?

Collapse
 
justinpchang profile image
Justin Chang

The derivation for the sigmoid prime function can be found here.

Collapse
 
zhaytam profile image
Haytam Zanid

There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.

Collapse
 
rochaowng profile image
RochaOwng

Awesome tutorial, many thanks.
But I have one doubt, can you help me?

self.z2_error = self.o_delta.dot(self.W2.T) # z2 error: how much our hidden layer weights contributed to output error
self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2) # applying derivative of sigmoid to z2 error

self.W1 += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights
self.W2 += self.z2.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights

what means those T's? self.w2.T, self.z2.T etc...

Collapse
 
tamilarasu_u profile image
Tamilarasu U • Edited

T is to transpose matrix in numpy.
docs.scipy.org/doc/numpy-1.14.0/re...

Collapse
 
danielagustian profile image
DanielAgustian

Hello, i'm a noob on Machine Learning, so i wanna ask, is there any requirement for how many hidden layer do you need in a neural network? The hidden layer on this project is 3, is it because of input layer + output layer? Or it is completely random?

Collapse
 
jacockcroft profile image
Josh Cockcroft

Hi, this is a fantastic tutorial, thank you. I'm currently trying to build on this to take four inputs rather than two, but am struggling to get it to work. Do you have any guidance on scaling this up from two inputs?

Collapse
 
mecatronicope profile image
mecatronicope

Ok, I believe i miss something. Where are the new inputs (4,8) for hours studied and slept? And the predicted value for the output "Score"?

Collapse
 
dhlpradip profile image
प्रदिप

Thanks for the great tutorial but how exactly can we use it to predict the result for next input? I tried adding 4,8 in the input and it would cause error as:
input:

Traceback (most recent call last):
[[0.5 1. ]
[0.25 0.55555556]
[0.75 0.66666667]
[1. 0.88888889]]
Actual Output:
File "D:/try.py", line 58, in
[[0.92]
[0.86]
[0.89]]
print ("Loss: \n" + str(np.mean(np.square(y - NN.forward(X))))) # mean sum squared loss
Predicted Output:
[[0.17124108]
ValueError: operands could not be broadcast together with shapes (3,1) (4,1)
[0.17259949]
[0.20243644]
[0.20958544]]

Process finished with exit code 1

Collapse
 
tanaydin profile image
tanaydin sirin • Edited

after training done, you can make it like

Q = np.array(([4, 8]), dtype=float)
print "Input: \n" + str(Q)
print "Predicted Output: \n" + str(NN.forward(Q))

Collapse
 
eternal_learner profile image
eternal_learner

Samay, this has been great to read.

Assume I wanted to add another layer to the NN.

Would I update the backprop to something like:

def backward(self, X, y, o):
# backward propgate through the network
self.o_error = y - o
self.o_delta = self.o_error*self.sigmoidPrime(o)

self.z3_error = self.o_delta.dot(self.W3.T) 
self.z3_delta = self.z3_error*self.sigmoidPrime(self.z3) 

self.z2_error = self.o_delta.dot(self.W2.T) 
self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2) 

self.W1 += X.T.dot(self.z2_delta) 
self.W2 += self.z2.T.dot(self.z3_delta) 
self.W3 += self.z3.T.dot(self.o_delta) 
Collapse
 
bartekspitza profile image
bartekspitza

Nice guide. I have one question:

Shouldn't the input to the NN be a vector? Right now the NN is receiving the whole training matrix as its input. The network has two input neurons so I can't see why we wouldn't pass it some vector of the training data.

Tried googling this but couldnt find anything useful so would really appreciate your response!

Collapse
 
ayeo profile image
ayeo • Edited

I am not a python expert but it is probably usage of famous vectorized operations ;)

Collapse
 
assifbashir2 profile image
assifbashir

Causes of unintentional weight loss
Unintentional weight loss has many different causes.

It might be caused by a stressful event like a divorce, losing a job, or the death of a loved one. It can also be caused by malnutrition, a health condition or a combination of things.
Read moore bit.ly/3J3UxSQ

Collapse
 
stereodealer profile image
P̾̔̅̊͂̏̚aͬͪ̄v̋̒lo͛̎

what is the 'input later' ?

Collapse
 
davidmroth profile image
David Roth

Pretty sure the author meant 'input layer'.

Great article!

Collapse
 
shamdasani profile image
Samay Shamdasani

Yep! Just fixed it :)

Collapse
 
nrayamajhee profile image
Nishan Rayamajhee • Edited

Great Tutorial!

I translated this tutorial to rust with my own matrix operation implementation, which is terribly inefficient compared to numpy, but still produces similar result to this tutorial. Here's the docs: docs.rs/artha/0.1.0/artha/ and the code: gitlab.com/nrayamajhee/artha

Collapse
 
tamilarasu_u profile image
Tamilarasu U

Excellent article for a beginner, but I just noticed Bias is missing your neural network. Isn't it required for simple neural networks?

And also you haven't applied any Learning rate. Will not it make the Gradient descent to miss the minimum?

Collapse
 
ayeo profile image
ayeo • Edited

Great introduction! I have used it to implement this:

github.com/ayeo/letter_recognizer

Collapse
 
breener96 profile image
Breener96

Great tutorial, explained everything so clearly!!

Collapse
 
nnamdi profile image
Nnamdi

Nice!

Collapse
 
anton_1921 profile image
Antonio

(2 * .6) + (9 * .3) = 7.5 wrong.
It is 3.9

Collapse
 
shamdasani profile image
Samay Shamdasani

Good catch! That is definitely my mistake. If one replaces it with 3.9, the final score would only be changed by one hundredth (.857 --> .858)!

Collapse
 
s_sumdaeq profile image
Xingdu Qiao

Great article for beginners like me! Thank you very much!

Collapse
 
mraza007 profile image
Muhammad

Great article actually helped me understand how neural network works

Collapse
 
mohamednedal profile image
MohamedNedal

Hi, in this line:
for i in xrange(1000):
it told me that 'xrange' is not defined. Could you please explain how to fix it?

Collapse
 
ayeo profile image
ayeo

With newer python version function is renamed to "range"