DEV Community

loading...

Discussion on: Build a flexible Neural Network with Backpropagation in Python

Collapse
frenkel2008 profile image
max frenkel

Nice, but never seems to converge on array([[ 0.92, 0.86, 0.89]]). What's a good learning rate for the W update step? It should probably get smaller as error diminishes.

Actually, there is a bug in sigmoidPrime(), your derivative is wrong. It should return self.sigmoid(s) * (1 - self.sigmoid(s))

Collapse
shamdasani profile image
Samay Shamdasani Author

Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.

I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?

Collapse
zhaytam profile image
Haytam Zanid

There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.

Collapse
justinpchang profile image
Justin Chang

The derivation for the sigmoid prime function can be found here.

Collapse
shamdasani profile image
Samay Shamdasani Author

Hey Max,

I looked into this and with some help from my friend, I understood what was happening.

Your derivative is indeed correct. However, see how we return o in the forward propagation function (with the sigmoid function already defined to it). Then, in the backward propagation function we pass o into the sigmoidPrime() function, which if you look back, is equal to self.sigmoid(self.z3). So, the code is correct.