Discussion on: Build a flexible Neural Network with Backpropagation in Python

View post

Nice, but never seems to converge on array([[ 0.92, 0.86, 0.89]]). What's a good learning rate for the W update step? It should probably get smaller as error diminishes.

Actually, there is a bug in sigmoidPrime(), your derivative is wrong. It should return self.sigmoid(s) * (1 - self.sigmoid(s))

Samay Shamdasani • Sep 24 '17

Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.

I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?

Haytam Zanid • Aug 13 '18

There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.

Justin Chang • Oct 22 '17

The derivation for the sigmoid prime function can be found here.

Samay Shamdasani • Dec 21 '17

Hey Max,

I looked into this and with some help from my friend, I understood what was happening.

Your derivative is indeed correct. However, see how we return o in the forward propagation function (with the sigmoid function already defined to it). Then, in the backward propagation function we pass o into the sigmoidPrime() function, which if you look back, is equal to self.sigmoid(self.z3). So, the code is correct.