Shrijith Venkatramana

Posted on Feb 17, 2025 • Edited on Mar 7

Fully Automated Gradient Calculation on Expression Graph (As Explained By Karpathy)

#machinelearning #ai #programming #python

Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building a tool that makes generating API docs from your code ridiculously easy.

In the previous posts, we manually calculated gradient values for each node via backpropagation.

In this post - we will see how these manual calculations can be automated step by step to perform automatic back-propagation.

Implementing Automatic Back-Propagation for Each Node

We implement node-level grad when the Value is created.

First we do the "forward pass", that is, transform input to output using the operation.

And then immediately, we differentiate the operation, and calculate the grad for the input(s).

You can see in the code below that for each operation, we have added a custom _backward property which can trigger the backpropagation calculation.

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0
    self._backward = lambda: None # by default doesn't do anything (for a leaf
                                  # node for ex)

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    out = Value(self.data + other.data, (self, other), '+')

    # derivative of '+' is just distributing the grad of the output to inputs
    def backward():
      self.grad = 1.0 * out.grad
      other.grad = 1.0 * out.grad

    out._backward = backward

    return out

  def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    # derivative of `mul` is gradient of result multiplied by sibling's data
    def backward():
      self.grad = other.data * out.grad
      other.grad = self.data * out.grad

    out._backward = backward

    return out

  def tanh(self):
      x = self.data
      t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
      out = Value(t, (self, ), 'tanh')

      # derivative of tanh = 1 - (tanh)^2
      def backward():
        self.grad = (1 - t**2) * out.grad

      out._backward = backward
      return out

To start with, we initialize the expression graph - without doing any backprop activities. So grad in each node is 0:

# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')

# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')

# bias of the neuron
b = Value(6.8813735870195432, label='b')

x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'

n = x1w1x2w2 + b; n.label = 'n'

o = n.tanh(); o.label = 'o'

draw_dot(o)

Now we are ready to trigger backprop on each operation:

o.grad = 1.0 # base case - needs to be set explicitly
o._backward()
n._backward()
b._backward() # does nothing - leaf node
x1w1x2w2._backward()
x1w1._backward()
x2w2._backward()
draw_dot(o)

The result matches our manual calculations from the previous post:

Fully Automated BackPropagation

In the previous section - we got a sort of semi automated backprop.

We still had to manually pick nodes from "last" to "first", and call backprop in that order.

So the first task is to get a list of nodes in the "last" to "first" order.

For this, we are going to implement the topological sort idea into our program - so we always process nodes in the correct order

Topological Sort

Code:

topo = []
visited = set()
def build_topo(v):
    if v not in visited:
        visited.add(v)
        for child in v._prev:
            build_topo(child)
        topo.append(v)
build_topo(o)
topo

The above code gives us the nodes in the "first" to "last" order:

Using Topological Sort Results to Automate Backprop For the Entire Graph

Code:

o.grad = 1.0

topo = []
visited = set()
def build_topo(v):
    if v not in visited:
        visited.add(v)
        for child in v._prev:
            build_topo(child)
        topo.append(v)
build_topo(o)

for node in reversed(topo):
    node._backward()

draw_dot(o)

And voila - we get the gradient value for each node - in a fully automated manner in autograd:

Integrating the automated backprop logic into `Value` class

We add a new method backward to the Value class to perform backprop automatically from that node to all the previous nodes:

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0
    self._backward = lambda: None # by default doesn't do anything (for a leaf
                                  # node for ex)

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    out = Value(self.data + other.data, (self, other), '+')

    # derivative of '+' is just distributing the grad of the output to inputs
    def backward():
      self.grad = 1.0 * out.grad
      other.grad = 1.0 * out.grad

    out._backward = backward

    return out

  def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    # derivative of `mul` is gradient of result multiplied by sibling's data
    def backward():
      self.grad = other.data * out.grad
      other.grad = self.data * out.grad

    out._backward = backward

    return out

  def tanh(self):
      x = self.data
      t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
      out = Value(t, (self, ), 'tanh')

      # derivative of tanh = 1 - (tanh)^2
      def backward():
        self.grad = (1 - t**2) * out.grad

      out._backward = backward
      return out

  def backward(self):
    topo = []
    visited = set()
    def build_topo(v):
        if v not in visited:
            visi```
{% endraw %}
python
ted.add(v)
            for child in v._prev:
                build_topo(child)
            topo.append(v)
    build_topo(self)

    self.grad = 1.0
    for node in reversed(topo):
        node._backward()
{% raw %}

Now I can just call the following to get calculate gradients for all the nodes automatically:


python
o.backward()
draw_dot(o)

Reference

The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI…

View on GitHub

DEV Community

Fully Automated Gradient Calculation on Expression Graph (As Explained By Karpathy)

Implementing Automatic Back-Propagation for Each Node

Fully Automated BackPropagation

Topological Sort

Using Topological Sort Results to Automate Backprop For the Entire Graph

Integrating the automated backprop logic into `Value` class

Reference

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

See It In Action

Why

Top comments (0)

Implementing Automatic Back-Propagation for Each Node

Fully Automated BackPropagation

Topological Sort

Using Topological Sort Results to Automate Backprop For the Entire Graph

Integrating the automated backprop logic into Value class

Reference

HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

See It In Action

Why

Integrating the automated backprop logic into `Value` class