DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on • Edited on

4 2 2 2 2

Back-Propagation Spelled Out - As Explained by Karpathy

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.

Adding Labels To Improve Graph Readability

Add label parameter to Value class:

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    return Value(self.data + other.data, (self, other), '+')

  def __mul__(self, other):
    return Value(self.data * other.data, (self, other), '-')

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
print(d._prev)
print(d._op)
print("---")
print(e._prev)
print(e._op)
Enter fullscreen mode Exit fullscreen mode

Update draw_dot to include the label in the graph

Originally we had the node expression as:

dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record')
Enter fullscreen mode Exit fullscreen mode

Replace with:

dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record')
Enter fullscreen mode Exit fullscreen mode

Now draw_dot(d) returns:

Re-Render graph with Labels

Graph with Labels

Let's add a few nodes - f and L to the expression

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Enter fullscreen mode Exit fullscreen mode

Generate graph:

draw_dot(L)
Enter fullscreen mode Exit fullscreen mode

More Complex Expression

This graph we've built above is the forward-pass of laying out the nodes.

What We Want to Calculate

We want to know how the inputs (weights - a,b,c,d,e,f) affect the output (the loss function L). So - we want to find: dL/dL, dL/df, dL/de, dL/dd, dL/dc, dL/db, dL/da.

Add the grad parameter to accommodate backpropogation

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0 # 0 means no impact on output to start with
Enter fullscreen mode Exit fullscreen mode

Update the node graphics information

dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
Enter fullscreen mode Exit fullscreen mode

Graph with grad property

Manually Performing Back-Propagation for The Given Graph

Node L

What is dL/dL - that is if we change L by a tiny amount, how will it affect the output L? The answer is obviously - 1.

That is,

L.grad = 1
Enter fullscreen mode Exit fullscreen mode

The Expression

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Enter fullscreen mode Exit fullscreen mode

Node d

L = d * f

By known rules:

dL/dd = f

By derivation:

dL/dd = 

(f(x+h) - f(x))/h = 

(d*f + h*f - d*f)/h = 

h*f/h =

f

That is, dL/dd = f = -2.0
Enter fullscreen mode Exit fullscreen mode

So, we do

d.grad = -2.0
Enter fullscreen mode Exit fullscreen mode

Node f

By symmetry, we get that dL/df = d = 4.0

That is,

f.grad = 4.0
Enter fullscreen mode Exit fullscreen mode

The new updated graph is like this:

Updated Graph

How to do Numerical Verification of the Derivatives

def verify_dL_by_df():
  h = 0.001

  a = Value(2.0, label='a')
  b = Value(-3.0, label='b')
  c = Value(10, label='c')
  e = a * b; e.label = 'e'
  d = e + c; d.label = 'd'
  f = Value(-2.0, label='f')
  L = d * f; L.label = 'L'
  L1 = L.data

  a = Value(2.0, label='a')
  b = Value(-3.0, label='b')
  c = Value(10, label='c')
  e = a * b; e.label = 'e'
  d = e + c; d.label = 'd'
  f = Value(-2.0 + h, label='f') # bumb f a little bit
  L = d * f; L.label = 'L'
  L2 = L.data

  print((L2 - L1)/h)

verify_dL_by_df() # prints out 3.9999 ~ 4
Enter fullscreen mode Exit fullscreen mode

The Challenge - How do we calculate dL/dc?

We know dL/dd = -2.0 - so we know how L is affected by d.

The question is how is c going to impact L through d.

First, we can calculate the "local derivative", or figure out how c impacts d first.

That is,

dd/dc = ?

We know that:

d = c + e

So once we differentiate by c, we get: dd/dc = 1

Similarly, dd/de = 1.

Now the question is, how to put together dd/dc and dL/dd?

We need something called the Chain Rule:

Chain Rule

So, applying chain rule, we get:

dL/dc = dL/dd * dd/dc
dL/dc = -2.0 * 1.0 = -2.0
Enter fullscreen mode Exit fullscreen mode

Similarly, dL/de = -2.0

Let's set the values in python, and redraw the graph now:

c.grad = -2.0
e.grad = -2.0
Enter fullscreen mode Exit fullscreen mode

Graph with grads for c & e

Figuring out dL/da and dL/db

We know:

dL/de = -2.0

We want to know:

dL/da = dL/de * de/da

We know that:

e = a * b
de/da = b
de/da = b = -3.0
Enter fullscreen mode Exit fullscreen mode

We can also find:

e = a * b
de/db = a
de/db = a = 2.0
Enter fullscreen mode Exit fullscreen mode

So, now to get what we need:

dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0
dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0
Enter fullscreen mode Exit fullscreen mode

We set the values in python, and redraw to get the full graph:

a.grad = 6.0
b.grad = -4.0
Enter fullscreen mode Exit fullscreen mode

Final graph

Reference

The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

API Trace View

Struggling with slow API calls?

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

The Most Contextual AI Development Assistant

Pieces.app image

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more