Rate this Page

SmoothL1Loss#

classtorch.nn.SmoothL1Loss(size_average=None,reduce=None,reduction='mean',beta=1.0)[source]#

Creates a criterion that uses a squared term if the absoluteelement-wise error falls below beta and an L1 term otherwise.It is less sensitive to outliers thantorch.nn.MSELoss and in some casesprevents exploding gradients (e.g. see the paperFast R-CNN by Ross Girshick).

For a batch of sizeNN, the unreduced loss can be described as:

(x,y)=L={l1,...,lN}T\ell(x, y) = L = \{l_1, ..., l_N\}^T

with

ln={0.5(xnyn)2/beta,if xnyn<betaxnyn0.5beta,otherwise l_n = \begin{cases}0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\|x_n - y_n| - 0.5 * beta, & \text{otherwise }\end{cases}

Ifreduction is notnone, then:

(x,y)={mean(L),if reduction=‘mean’;sum(L),if reduction=‘sum’.\ell(x, y) =\begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}\end{cases}

Note

Smooth L1 loss can be seen as exactlyL1Loss, but with thexy<beta|x - y| < betaportion replaced with a quadratic function such that its slope is 1 atxy=beta|x - y| = beta.The quadratic segment smooths the L1 loss nearxy=0|x - y| = 0.

Note

Smooth L1 loss is closely related toHuberLoss, beingequivalent tohuber(x,y)/betahuber(x, y) / beta (note that Smooth L1’s beta hyper-parameter isalso known as delta for Huber). This leads to the following differences:

  • As beta -> 0, Smooth L1 loss converges toL1Loss, whileHuberLossconverges to a constant 0 loss. When beta is 0, Smooth L1 loss is equivalent to L1 loss.

  • As beta ->++\infty, Smooth L1 loss converges to a constant 0 loss, whileHuberLoss converges toMSELoss.

  • For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1.ForHuberLoss, the slope of the L1 segment is beta.

Parameters
  • size_average (bool,optional) – Deprecated (seereduction). By default,the losses are averaged over each loss element in the batch. Note that forsome losses, there are multiple elements per sample. If the fieldsize_averageis set toFalse, the losses are instead summed for each minibatch. Ignoredwhenreduce isFalse. Default:True

  • reduce (bool,optional) – Deprecated (seereduction). By default, thelosses are averaged or summed over observations for each minibatch dependingonsize_average. Whenreduce isFalse, returns a loss perbatch element instead and ignoressize_average. Default:True

  • reduction (str,optional) – Specifies the reduction to apply to the output:'none' |'mean' |'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number ofelements in the output,'sum': the output will be summed. Note:size_averageandreduce are in the process of being deprecated, and in the meantime,specifying either of those two args will overridereduction. Default:'mean'

  • beta (float,optional) – Specifies the threshold at which to change between L1 and L2 loss.The value must be non-negative. Default: 1.0

Shape:
  • Input:()(*), where* means any number of dimensions.

  • Target:()(*), same shape as the input.

  • Output: scalar. Ifreduction is'none', then()(*), same shape as the input.

forward(input,target)[source]#

Runs the forward pass.

Return type

Tensor