DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on • Edited on

Getting micrograd to Perfectly Predict Answers To A Sample Problem

Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

## Training: Changing weight.data based on weight.gradient slightly (accorindg to Learning Rate)

From the previous post - we created n.parameters() a list of all the nodes in the neural network.

In total we have 41 neurons:

Neuron Count

One of the neuron's data value is shown below:

Neuron data

Now the goal is to change the data value of this neuron, in accordance with the gradient feedback.

for p in n.parameters():
    p.data += 0.01 * p.grad # something like that (wip)
Enter fullscreen mode Exit fullscreen mode

We also note that the gradient is negative for that neuron:

Neuron negative grad

Determining the sign of the step factor 0.01

So there's a bit of reasoning to do here, to determine the sign of step factor.

The goal is to minimize the loss (bring loss = 0).

p.data is positive - 0.85.

p.grad is negative - -0.27.

In p.data += 0.01 * p.grad, the result is p.data is decreased a bit, increasing the loss.

But in p.data += -0.01 * p.grad, the result is p.data is increased a bit, reducing the loss.

So therefore, the correct option is to have a negative step factor.

Corrected code:

for p in n.parameters():
    p.data += -0.01 * p.grad 
Enter fullscreen mode Exit fullscreen mode

Now we can see that the loss before/after weight adjustment and conclude that - through the backward pass plus gradient descent, we got a more accurate result:

More accurate result

Automating Gradient Descent To Get a Highly Accurate Network

Setting the right learning rate value is a subtle art. If it is too low, it takes too long to converge. If it is too large a step size, the process gets unstable and may explode the loss. So finding the perfect rate is a subtle art.

Implementing a Training Loop

We put a loop repeating the forward pass, backward pass and weight updates process:

for k in range(20):

  # forward pass
  ypred = [n(x) for x in xs]
  loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

  # backward pass
  loss.backward()

  # update - gradient descent
  for p in n.parameters():
    p.data += -0.05 * p.grad

  print(k, loss.data)
Enter fullscreen mode Exit fullscreen mode

The training gives an output like this:

0 4.149044341397712
1 2.8224176124705482
2 1.0767374634555338
3 0.4436221441110331
4 0.048639680823661345
5 0.0007984305003777319
6 5.758159329954795e-06
7 1.1072290005342024e-07
8 1.1331571852917713e-08
9 1.8004031247688252e-09
10 3.886667439780539e-10
11 1.190170455797565e-10
12 5.491701244447392e-11
13 4.086071696354591e-11
14 5.2487460541263784e-11
15 1.235857710202349e-10
16 5.557297068527374e-10
17 4.829530833029305e-09
18 7.912558681799505e-08
19 2.2910484425631455e-06
Enter fullscreen mode Exit fullscreen mode

You can see that the loss is getting to really small numbers near the final passes.

Now we compare actual y to predicted y:

print("actual", ys)
print("predicted", ypred)
Enter fullscreen mode Exit fullscreen mode

And we get perfect results:

actual [1.0, -1.0, -1.0, 1.0]
predicted [
    Value(data=1.0, grad=0.0, label=''), 
    Value(data=-0.9986538494836703, grad=0.0026923010326593833, label=''), 
    Value(data=-0.9993079543151291, grad=0.0013840913697418245, label=''), 
    Value(data=1.0, grad=0.0, label='')
]
Enter fullscreen mode Exit fullscreen mode

Fixing a subtle bug in the training loop

Common mistakes

Each of the neurons in the net has weight and grad attributes.

In our training loop, the first iteration is fine - when we do loss.backward() we fill in the grad values for each neuron.

But on the second iteration and next, we keep accumulating the grad values (and are never reset to 0).

So the feedback given to each neuron could be slightly wrong. We have to reset grad to 0.

Corrected training loop:


python
for k in range(20):

  # forward pass
  ypred = [n(x) for x in xs]
  loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

  # > reset grad to zero before backward pass
  for p in n.parameters():
    p.grad = 0.0

  # backward pass
  loss.backward()

  # update - gradient descent
  for p in n.parameters():
    p.data += -0.05 * p.grad

  print(k, loss.data)

We get a similar result as above in this case, since the problem was quite a simple one. It so happens in neural network that sometimes, we seem to get a successful result even when the logic is a bit buggy. For complex problems, these sorts of issues/bugs can derail the solution process - and one has to watch out for common mistakes.



## Reference

[The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube](https://www.youtube.com/watch?v=VMj-3S1tku0)

 [![git-lrc](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yzvpkxm9mga1pweneahx.png)](https://github.com/HexmosTech/git-lrc) 
 *AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production. 

 git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.* 


 Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use. 

 ⭐ Star it on GitHub: 
 

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc logo

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit


git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…



Enter fullscreen mode Exit fullscreen mode

Top comments (0)