Goal¶

This post aims to compare loss functions in deep learning with PyTorch.

The following loss functions are covered in this post:

Mean Absolute Error (L1 Loss)
Mean Square Error (L2 Loss)
Binary Cross Entropy (BCE)
Kullback-Leibler divergence (KL divergence)

Reference

Libraries¶

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
%matplotlib inline

Create a data for calculating the loss¶

In [2]:

x = torch.Tensor(np.linspace(-1, 1, 100))
y = torch.Tensor(np.zeros(100))

In [3]:

plt.plot(x.numpy(), y.numpy(), '.-');

Loss Functions¶

Mean Absolute Error (L1 Loss)¶

$loss (x, y) = |x - y|$

In [4]:

criterion = nn.L1Loss()
loss = criterion(x, y)
loss

Out[4]:

tensor(0.5051)

In [5]:

plt.plot(x.numpy(), np.abs(x.numpy()-y.numpy()));
plt.title('MAE - L1 Loss')
plt.xlabel('true y');
plt.ylabel('predicated y');

Mean Square Error Loss (L2 Loss)¶

$loss(x, y) = (x-y)^{2}$

In [6]:

criterion = nn.MSELoss()
criterion(x, y)

Out[6]:

tensor(0.3401)

In [7]:

plt.plot(x.numpy(), (x.numpy()-y.numpy())**2);
plt.title('MSE - L2 Loss')
plt.xlabel('true y');
plt.ylabel('predicated y');

Binary Cross-Entropy Loss¶

When $y$ is binary, this is also called Binary Cross Entropy (BCE).

$loss(x, y) = - \sum x log y$

In [47]:

x = torch.Tensor(np.linspace(0.01, .99, 100))
y_label = torch.Tensor(np.ones(100))
y_pred = torch.Tensor(np.linspace(0.01, .6, 100))
m = nn.Sigmoid()

In [52]:

criterion = nn.BCELoss()
criterion(y_pred, y_label)

Out[52]:

tensor(1.4531)

In [49]:

plt.plot(x.numpy(), -np.multiply(y_label.numpy(), np.log(y_pred.numpy())) -np.multiply(1 - y_label.numpy(), np.log(1-y_pred.numpy())));
plt.title('Binary Cross entropy Loss');
plt.xlabel('Predicated y');
plt.ylabel('Loss');

Kullback-Leibler divergence¶

$loss(x, y) = y \cdot (log y - log x) = y \cdot (log \frac{y}{x})$

In [51]:

criterion = nn.KLDivLoss(reduction='batchmean')
criterion(y_pred, y_label)

Out[51]:

tensor(-0.3050)

Note: To suppress the warning caused by reduction = 'mean', this uses `reduction='batchmean'`. Otherwise, it doesn’t return the true kl divergence value. In the next major release, 'mean' will be changed to be the same as 'batchmean'.

In [50]:

plt.plot(x.numpy(),  y_label.numpy() * np.log(y_label.numpy() / x.numpy()));
plt.title('Kullback-Leibler divergence');
plt.xlabel('Predicated y');
plt.ylabel('KL Divergence');