DEV Community

Cover image for Cross Entropy Derivatives, Part 4: Solving for other output classes
Rijul Rajesh
Rijul Rajesh

Posted on

Cross Entropy Derivatives, Part 4: Solving for other output classes

In the previous article, we have solved the derivative for one of the output classes, we will do the same for the other ones now.

Now let us see what happens when we calculate the cross-entropy loss for Virginica with respect to ( b_3 ).

The predicted probability for Virginica is produced by the softmax function. The inputs to the softmax function are the raw output values for Setosa, Versicolor, and Virginica.

Only the green crinkled surface is directly influenced by ( b_3 ). This green surface represents the raw output for Setosa and is one of the inputs to the softmax function. The green surface itself is formed by summing the blue and orange surfaces and then adding ( b_3 ).

Because the cross-entropy loss is linked to ( b_3 ) through the predicted probability for Virginica and the raw output for Setosa, we can apply the chain rule to compute the derivative of the cross entropy with respect to ( b_3 ).

As before, we start by computing the derivative of the cross entropy with respect to the predicted probability for Virginica. By substituting the cross-entropy equation and simplifying, we obtain the following result.

When the predicted probability for Virginica is used to compute the cross entropy, the derivative of the cross entropy with respect to ( b_3 ) is

Now let us apply the same reasoning for Versicolor.

When the observed measurements correspond to Versicolor, and we follow the same steps used for Virginica, the resulting derivative is again the predicted probability for Setosa:

At this point, we can summarize the results as follows:

It is important to note that we are currently targeting ( b_3 ), which only influences the raw output for Setosa. To influence the raw outputs for Versicolor and Virginica, we must instead target ( b_4 ) and ( b_5 ).

The corresponding derivatives are:

Now that all the derivatives have been calculated, we will begin optimizing the bias terms using backpropagation in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

โ€ฆ and youโ€™re done! ๐Ÿš€

Installerpedia Screenshot

๐Ÿ”— Explore Installerpedia here

Top comments (0)