Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
## A Bug In Our CodeIn the previous post, we got automatic gradient calculation going for the whole expression graph.
However, it has a tricky bug. Here's a sample program that invokes the bug:
Let me know if you have any questions!a = Value(3.0, label='a')
b = a + a ; b.label = 'b'
b.backward()
draw_dot(b)

In the above, forward pass looks alright:
b = a + a = 3 + 3 = 6
But think about the backward pass:
b = a + a
db/da = 1 + 1 = 2
The answer should be `2`, but we've got `1` as the `a.grad` value.
The problem is in the `__add__` operation of `Value` class:
python
class Value:
def init(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0
self._backward = lambda: None # by default doesn't do anything (for a leaf
# node for ex)
def repr(self):
return f"Value(data={self.data})"
def add(self, other):
out = Value(self.data + other.data, (self, other), '+')
# out.grad = 1 here
# derivative of '+' is just distributing the grad of the output to inputs
def backward():
self.grad = 1.0 * out.grad # a.grad = 1
other.grad = 1.0 * out.grad # again a.grad = 1
out._backward = backward
Here is another example of a bug:
python
a = Value(-2.0, label='a')
b = Value(3.0, label='b')
d = a * b ; d.label = 'd'
e = a + b ; e.label = 'e'
f = d * e ; f.label = 'f'
f.backward()
draw_dot(f)

We know that for multiplication operation:
self.grad = other.data * out.grad
d.grad = e.data * out.grad = 1 * 1 = 1
e.grad = d.data * out.grad = -6 * 1 = -6
So far, so good.
Let's look for the next stage:
self.grad = other.data * out.grad
b.grad = a.data * d.grad = -2 * 1 = -2
But, if we consider the expression,
e = a + b
a.grad = b.grad = e.grad = -6
So we have the conflict - of `b.grad = -6` (addition) and `b.grad = -2` (multiplication)
So the general problem here is that - when a Value is used multiple times, there is a conflict and overwriting happens.
So first maybe the grad results of addition are updated, but then in another iteration the grad results of multiplication are also updated - overwriting the previous value.
## Solving the bug - "Accumulate Gradients" rather than Replacing Them
The Wikipedia page for [Chain Rule](https://en.wikipedia.org/wiki/Chain_rule#Multivariable_case) a section on multivariable case.
The gist of the general solution is that gradients must be accumulated, rather than replaced, in calculating gradients.
So, the new `Value` class is as follows where in `_backwards` we accumulate, rather than replace gradients:
python
class Value:
def init(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0
self._backward = lambda: None # by default doesn't do anything (for a leaf
# node for ex)
def repr(self):
return f"Value(data={self.data})"
def add(self, other):
out = Value(self.data + other.data, (self, other), '+')
# derivative of '+' is just distributing the grad of the output to inputs
def backward():
self.grad += 1.0 * out.grad
other.grad += 1.0 * out.grad
out._backward = backward
return out
def mul(self, other):
out = Value(self.data * other.data, (self, other), '*')
# derivative of `mul` is gradient of result multiplied by sibling's data
def backward():
self.grad += other.data * out.grad
other.grad += self.data * out.grad
out._backward = backward
return out
def tanh(self):
x = self.data
t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
out = Value(t, (self, ), 'tanh')
# derivative of tanh = 1 - (tanh)^2
def backward():
self.grad += (1 - t**2) * out.grad
out._backward = backward
return out
def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
v
def build_topo(self):
visited = set()
topo = []
for node in self._nodes:
if node not in visited:
build_topo(node)
visited.add(node)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()
Now the gradient calculations are correct:
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
- 🔗 Why git? Git is universal. Every editor, every IDE, every AI…
Top comments (0)