Here is a gentle introduction about automatic differentiation in modern machine learning libraries or deep learning libraries like Pytorch, Jax, or even tensorflow. One of the powerful features of them is auto gradient, or we don't have to initiate the function blablabla with respect to blablabla, because the machine automatically computes the gradient automatically. The implementation of automatic gradient can be seen when we compute the gradient descent to minimize the error with respect to parameters.
So, what exactly the auto differential is? if you jump into this articles without machine learning background you may be confusing about auto diff or auto gradient, in this article I will consider it the same, and for now I will use auto gradient terms. I’ll repeat it, what exactly auto gradient is? basically auto gradient is a numerical method for efficiently calculating the derivative of any function using programming techniques.
For this part I will reduce the mathematical terms or explanations or whatever, cause i'm not a mathematical guy, neither knows the mathematical idea behind deeply. So I will explain as a self-proclaimed ordinary engineer
Came up from the definition of the derivative above, which can then be elaborated again with the equation.
If the value of the xi element is slightly added and subtracted while the value of the other elements stays the same, the quantifier on the right side represents the difference in the value of the function f. The finite difference method is the name of the technique used to calculate the derivative's value.
And also, we talked about derivatives, whether native derivative, partial derivative, and chain rule derivative. Those three kinds of derivatives are needed for this idea.
It should be mentioned that Larry Gonick discusses derivatives in a very amiable manner in his book Cartoon Calculus.
In machine learning or deep learning, auto gradients can be used for optimization. Or we use the best algorithm ever, gradient descent (obviously, it's a hot take for me). With the optimization, machine learning model can learn and the updates the parameters while the process of back-forward propagation arrives.
Here is a simple implementation in python code.
def auto_gradient(
f: DifferentiableFunc: Callable[[np.ndarray], np.ndarray],
args: Sequence[np.ndarray],
eps: float = 1e-7,
) -> Sequence[np.ndarray]:
"""
Compute the gradient of f with respect to each argument in args.
"""
grads = []
for i, arg in enumerate(args):
grad = np.empty_like(arg) # allocate space for the gradient
for ix in itertools.product(*[range(x) for x in arg.shape]): # iterate over all indices
v = arg[ix].copy() # save the original value
arg[ix] = v + eps
v1 = float(f(*args)) # must be scalar
arg[ix] = v - eps
v2 = float(f(*args))
arg[ix] = v
grad[ix] = (v1-v2) / (2*eps) # the slope
grads.append(grad)
return grads
Unfortunately, this approach for autogradient is not good enough and requires big resources, so it is more advisable to use a computational graph.
Disclaimer : 100% written by the human brain.
References :
Top comments (0)