DEV Community

Yuxing
Yuxing

Posted on

Softmax Derivation: A Step-by-Step Approach

f(x1, x2, ..., xn) = (e^x1 / Σ(e^xj), e^x2 / Σ(e^xj), ..., e^xn / Σ(e^xj)) (j = 1 to n)
where f[xi] = e^xi / Σ(e^xj) 

Let's begin!

∂f/∂xi = (d(e^x1 / Σ(e^xj))/dxi, d(e^x2 / Σ(e^xj))/dxi, ..., d(e^xi/ Σ(e^xj))/dxi, ..., d(e^xn / Σ(e^xj))/dxi)
(j = 1 to n)
∂f/∂xi = (-e^x1 * e^xi / (Σ(e^xj))^2, -e^x2 * e^xi / (Σ(e^xj))^2, ..., e^xi(Σ(e^xj) - e^xi) / (Σ(e^xj))^2, ...,
-e^xn * e^xi / (Σ(e^xj))^2) (j = 1 to n) 

When k=i,
(∂f/∂xi)[k] = e^xi * (Σ(e^xj) - e^xi) / (Σ(e^xj))^2 (j = 1 to n)
(∂f/∂xi)[k] = (e^xi / Σ(e^xj) * ((Σ(e^xj) - e^xi) / Σ(e^xj)) (j = 1 to n)
(∂f/∂xi)[k] = f[xi] * (1 - f[xi]) (j = 1 to n) 

When k≠i,
(∂f/∂xi)[k] = -e^xk * e^xi / (Σ(e^xj))^2 (j = 1 to n)
(∂f/∂xi)[k] = (-e^xk / Σ(e^xj)) * (e^xi / Σ(e^xj)) (j = 1 to n)
(∂f/∂xi)[k] = -f[xk] * f[xi] (j = 1 to n) 

In summary,
(∂f/∂xi)[k] = f[xi] * (1 - f[xi]) (j = 1 to n, k=i)
(∂f/∂xi)[k] = -f[xk] * f[xi] (j = 1 to n, k≠i)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)