Nice, but never seems to converge on array([[ 0.92, 0.86, 0.89]]). What's a good learning rate for the W update step? It should probably get smaller as error diminishes.
Actually, there is a bug in sigmoidPrime(), your derivative is wrong. It should return self.sigmoid(s) * (1 - self.sigmoid(s))
Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.
I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?
There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.
I looked into this and with some help from my friend, I understood what was happening.
Your derivative is indeed correct. However, see how we return o in the forward propagation function (with the sigmoid function already defined to it). Then, in the backward propagation function we pass o into the sigmoidPrime() function, which if you look back, is equal to self.sigmoid(self.z3). So, the code is correct.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Nice, but never seems to converge on array([[ 0.92, 0.86, 0.89]]). What's a good learning rate for the W update step? It should probably get smaller as error diminishes.
Actually, there is a bug in sigmoidPrime(), your derivative is wrong. It should return self.sigmoid(s) * (1 - self.sigmoid(s))
Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.
I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?
There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.
The derivation for the sigmoid prime function can be found here.
Hey Max,
I looked into this and with some help from my friend, I understood what was happening.
Your derivative is indeed correct. However, see how we return
o
in the forward propagation function (with the sigmoid function already defined to it). Then, in the backward propagation function we pass o into the sigmoidPrime() function, which if you look back, is equal to self.sigmoid(self.z3). So, the code is correct.