Rate this Page

BCEWithLogitsLoss#

classtorch.nn.BCEWithLogitsLoss(weight=None,size_average=None,reduce=None,reduction='mean',pos_weight=None)[source]#

This loss combines aSigmoid layer and theBCELoss in one singleclass. This version is more numerically stable than using a plainSigmoidfollowed by aBCELoss as, by combining the operations into one layer,we take advantage of the log-sum-exp trick for numerical stability.

The unreduced (i.e. withreduction set to'none') loss can be described as:

(x,y)=L={l1,,lN},ln=wn[ynlogσ(xn)+(1yn)log(1σ(xn))],\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quadl_n = - w_n \left[ y_n \cdot \log \sigma(x_n)+ (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right],

whereNN is the batch size. Ifreduction is not'none'(default'mean'), then

(x,y)={mean(L),if reduction=‘mean’;sum(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}\end{cases}

This is used for measuring the error of a reconstruction in for examplean auto-encoder. Note that the targetst[i] should be numbersbetween 0 and 1.

It’s possible to trade off recall and precision by adding weights to positive examples.In the case of multi-label classification the loss can be described as:

c(x,y)=Lc={l1,c,,lN,c},ln,c=wn,c[pcyn,clogσ(xn,c)+(1yn,c)log(1σ(xn,c))],\ell_c(x, y) = L_c = \{l_{1,c},\dots,l_{N,c}\}^\top, \quadl_{n,c} = - w_{n,c} \left[ p_c y_{n,c} \cdot \log \sigma(x_{n,c})+ (1 - y_{n,c}) \cdot \log (1 - \sigma(x_{n,c})) \right],

wherecc is the class number (c>1c > 1 for multi-label binary classification,c=1c = 1 for single-label binary classification),nn is the number of the sample in the batch andpcp_c is the weight of the positive answer for the classcc.

pc>1p_c > 1 increases the recall,pc<1p_c < 1 increases the precision.

For example, if a dataset contains 100 positive and 300 negative examples of a single class,thenpos_weight for the class should be equal to300100=3\frac{300}{100}=3.The loss would act as if the dataset contains3×100=3003\times 100=300 positive examples.

Examples

>>>target=torch.ones([10,64],dtype=torch.float32)# 64 classes, batch size = 10>>>output=torch.full([10,64],1.5)# A prediction (logit)>>>pos_weight=torch.ones([64])# All weights are equal to 1>>>criterion=torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)>>>criterion(output,target)# -log(sigmoid(1.5))tensor(0.20...)

In the above example, thepos_weight tensor’s elements correspond to the 64 distinct classesin a multi-label binary classification scenario. Each element inpos_weight is designed to adjust theloss function based on the imbalance between negative and positive samples for the respective class.This approach is useful in datasets with varying levels of class imbalance, ensuring that the losscalculation accurately accounts for the distribution in each class.

Parameters
  • weight (Tensor,optional) – a manual rescaling weight given to the lossof each batch element. If given, has to be a Tensor of sizenbatch.

  • size_average (bool,optional) – Deprecated (seereduction). By default,the losses are averaged over each loss element in the batch. Note that forsome losses, there are multiple elements per sample. If the fieldsize_averageis set toFalse, the losses are instead summed for each minibatch. Ignoredwhenreduce isFalse. Default:True

  • reduce (bool,optional) – Deprecated (seereduction). By default, thelosses are averaged or summed over observations for each minibatch dependingonsize_average. Whenreduce isFalse, returns a loss perbatch element instead and ignoressize_average. Default:True

  • reduction (str,optional) – Specifies the reduction to apply to the output:'none' |'mean' |'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number ofelements in the output,'sum': the output will be summed. Note:size_averageandreduce are in the process of being deprecated, and in the meantime,specifying either of those two args will overridereduction. Default:'mean'

  • pos_weight (Tensor,optional) – a weight of positive examples to be broadcasted with target.Must be a tensor with equal size along the class dimension to the number of classes.Pay close attention to PyTorch’s broadcasting semantics in order to achieve the desiredoperations. For a target of size [B, C, H, W] (where B is batch size) pos_weight ofsize [B, C, H, W] will apply different pos_weights to each element of the batch or[C, H, W] the same pos_weights across the batch. To apply the same positive weightalong all spatial dimensions for a 2D multi-class target [C, H, W] use: [C, 1, 1].Default:None

Shape:
  • Input:()(*), where* means any number of dimensions.

  • Target:()(*), same shape as the input.

  • Output: scalar. Ifreduction is'none', then()(*), sameshape as input.

Examples

>>>loss=nn.BCEWithLogitsLoss()>>>input=torch.randn(3,requires_grad=True)>>>target=torch.empty(3).random_(2)>>>output=loss(input,target)>>>output.backward()
forward(input,target)[source]#

Runs the forward pass.

Return type

Tensor