SmoothL1Loss#
- classtorch.nn.SmoothL1Loss(size_average=None,reduce=None,reduction='mean',beta=1.0)[source]#
Creates a criterion that uses a squared term if the absoluteelement-wise error falls below beta and an L1 term otherwise.It is less sensitive to outliers than
torch.nn.MSELossand in some casesprevents exploding gradients (e.g. see the paperFast R-CNN by Ross Girshick).For a batch of size, the unreduced loss can be described as:
with
Ifreduction is notnone, then:
Note
Smooth L1 loss can be seen as exactly
L1Loss, but with theportion replaced with a quadratic function such that its slope is 1 at.The quadratic segment smooths the L1 loss near.Note
Smooth L1 loss is closely related to
HuberLoss, beingequivalent to (note that Smooth L1’s beta hyper-parameter isalso known as delta for Huber). This leads to the following differences:As beta -> 0, Smooth L1 loss converges to
L1Loss, whileHuberLossconverges to a constant 0 loss. When beta is 0, Smooth L1 loss is equivalent to L1 loss.As beta ->, Smooth L1 loss converges to a constant 0 loss, while
HuberLossconverges toMSELoss.For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1.For
HuberLoss, the slope of the L1 segment is beta.
- Parameters
size_average (bool,optional) – Deprecated (see
reduction). By default,the losses are averaged over each loss element in the batch. Note that forsome losses, there are multiple elements per sample. If the fieldsize_averageis set toFalse, the losses are instead summed for each minibatch. IgnoredwhenreduceisFalse. Default:Truereduce (bool,optional) – Deprecated (see
reduction). By default, thelosses are averaged or summed over observations for each minibatch dependingonsize_average. WhenreduceisFalse, returns a loss perbatch element instead and ignoressize_average. Default:Truereduction (str,optional) – Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number ofelements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime,specifying either of those two args will overridereduction. Default:'mean'beta (float,optional) – Specifies the threshold at which to change between L1 and L2 loss.The value must be non-negative. Default: 1.0
- Shape:
Input:, where means any number of dimensions.
Target:, same shape as the input.
Output: scalar. If
reductionis'none', then, same shape as the input.