*Memos:
- My post explains layers in PyTorch.
- My post explains activation functions in PyTorch.
- My post explains optimizers in PyTorch.
A loss function is the function which can get the mean(average) of the sum of the losses(differences) between model's predictions and train or test data to optimize a model during training or to evaluate how good a model is during testing. *Loss function is also called Cost Function or Error Function.
There are popular loss functions as shown below:
(1) L1 Loss:
- can compute the mean(average) of the sum of the absolute losses(differences) between model's predictions and train or test data.
- 's formula:
- is used for a regression model.
- is also called Mean Absolute Error(MAE).
- is L1Loss() in PyTorch. *My post explains
L1Loss(). - 's pros:
- It's less sensitive to outliers and anomalies.
- The losses can be easily compared because they are just made absolute so the range of them is not big.
- 's cons:
(2) L2 Loss:
- can compute the mean(average) of the sum of the squared losses(differences) between model's predictions and train or test data.
- 's formula:
- is used for a regression model.
- is also called Mean Squared Error(MSE).
- is MSELoss() in PyTorch. *My post explains
MSELoss(). - 's pros:
- All squared losses can be differentiable.
- 's cons:
- It's sensitive to outliers and anomalies.
- The losses cannot be easily compared because they are squared so the range of them is big.
(3) Huber Loss:
- can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between model's predictions and train or test data compared with
deltawhich you set. *Memos:-
deltais 1.0 basically. - Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.
-
- 's formula. *The 1st one is L2 Loss-like one and the 2nd one is L1 Loss-like one:
- is used for a regression model.
- is HuberLoss() in PyTorch. *My post explains
HuberLoss(). - with
deltaof 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch. - 's pros:
- It's less sensitive to outliers and anomalies.
- All losses can be differentiable.
- The losses can be more easily compared than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.
- 's cons:
- The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.
(4) BCE(Binary Cross Entropy) Loss:
- can compute the mean(average) of the sum of the losses(differences) between model's binary predictions and binary train or test data.
- s' formula:
- is used for Binary Classification in Computer Vision:
*Memos:
- Binary Classification is the technology to classify data into two classes.
- Computer Vision is the technology which enables a computer to understand objects.
- is also called Binary Cross Entropy or Log(Logarithmic) Loss.
- is BCELoss() in PyTorch:
*Memos:
-
My post explains
BCELoss(). - Basically, Sigmoid is applied before BCE Loss:
*Memos:
- My post explains Sigmoid.
- There is BCEWithLogitsLoss() in PyTorch which is the combination of Sigmoid and BCE Loss. *My post explains
BCEWithLogitsLoss().
-
My post explains
(5) Cross Entropy Loss:
- can compute the mean(average) of the sum of the losses(differences) between model's predictions and train or test data:
- s' formula:
- is used for Multiclass Classification in Computer Vision. *Multiclass Classification is the technology to classify data into multiple classes.
- is CrossEntropyLoss() in PyTorch. *My post explains
CrossEntropyLoss(). - s' code from scratch in PyTorch:
import torch
y_pred = torch.tensor([7.4, 2.8, -0.6])
y_train = torch.tensor([3.9, -5.1, 9.3])
def cross_entropy(y_pred, y_train):
return -torch.sum(y_train * torch.log(y_pred))
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)
y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])
print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(12.2420)
- s' code with mean from scratch in PyTorch:
import torch
y_pred = torch.tensor([7.4, 2.8, -0.6])
y_train = torch.tensor([3.9, -5.1, 9.3])
def cross_entropy(y_pred, y_train): # ↓ ↓ mean ↓ ↓
return (-torch.sum(y_train * torch.log(y_pred))) / y_pred.ndim
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)
y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])
print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(6.1210)
Top comments (0)