Rate this Page

CTCLoss#

classtorch.nn.modules.loss.CTCLoss(blank=0,reduction='mean',zero_infinity=False)[source]#

The Connectionist Temporal Classification loss.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over theprobability of possible alignments of input to target, producing a loss value which is differentiablewith respect to each input node. The alignment of input to target is assumed to be “many-to-one”, whichlimits the length of the target sequence such that it must be\leq the input length.

Parameters
  • blank (int,optional) – blank label. Default00.

  • reduction (str,optional) – Specifies the reduction to apply to the output:'none' |'mean' |'sum'.'none': no reduction will be applied,'mean': the output losses will be divided by the target lengths andthen the mean over the batch is taken,'sum': the output losses will be summed.Default:'mean'

  • zero_infinity (bool,optional) – Whether to zero infinite losses and the associated gradients.Default:FalseInfinite losses mainly occur when the inputs are too shortto be aligned to the targets.

Shape:
  • Log_probs: Tensor of size(T,N,C)(T, N, C) or(T,C)(T, C),whereT=input lengthT = \text{input length},N=batch sizeN = \text{batch size}, andC=number of classes (including blank)C = \text{number of classes (including blank)}.The logarithmized probabilities of the outputs (e.g. obtained withtorch.nn.functional.log_softmax()).

  • Targets: Tensor of size(N,S)(N, S) or(sum(target_lengths))(\operatorname{sum}(\text{target\_lengths})),whereN=batch sizeN = \text{batch size} andS=max target length, if shape is (N,S)S = \text{max target length, if shape is } (N, S).It represents the target sequences. Each element in the targetsequence is a class index. And the target index cannot be blank (default=0).In the(N,S)(N, S) form, targets are padded to thelength of the longest sequence, and stacked.In the(sum(target_lengths))(\operatorname{sum}(\text{target\_lengths})) form,the targets are assumed to be un-padded andconcatenated within 1 dimension.

  • Input_lengths: Tuple or tensor of size(N)(N) or()(),whereN=batch sizeN = \text{batch size}. It represents the lengths of theinputs (must each beT\leq T). And the lengths are specifiedfor each sequence to achieve masking under the assumption that sequencesare padded to equal lengths.

  • Target_lengths: Tuple or tensor of size(N)(N) or()(),whereN=batch sizeN = \text{batch size}. It represents lengths of the targets.Lengths are specified for each sequence to achieve masking under theassumption that sequences are padded to equal lengths. If target shape is(N,S)(N,S), target_lengths are effectively the stop indexsns_n for each target sequence, such thattarget_n=targets[n,0:s_n] foreach target in a batch. Lengths must each beS\leq SIf the targets are given as a 1d tensor that is the concatenation of individualtargets, the target_lengths must add up to the total length of the tensor.

  • Output: scalar ifreduction is'mean' (default) or'sum'. Ifreduction is'none', then(N)(N) if input is batched or()() if input is unbatched, whereN=batch sizeN = \text{batch size}.

Examples

>>># Target are to be padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>S=30# Target sequence length of longest target in batch (padding length)>>>S_min=10# Minimum target length, for demonstration purposes>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target=torch.randint(low=1,high=C,size=(N,S),dtype=torch.long)>>>>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>target_lengths=torch.randint(...low=S_min,...high=S,...size=(N,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(N,),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(sum(target_lengths),),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded and unbatched (effectively N=1)>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>>>># Initialize random batch of input vectors, for *size = (T,C)>>>input=torch.randn(T,C).log_softmax(1).detach().requires_grad_()>>>input_lengths=torch.tensor(T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(target_lengths,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()
Reference:

A. Graves et al.: Connectionist Temporal Classification:Labelling Unsegmented Sequence Data with Recurrent Neural Networks:https://www.cs.toronto.edu/~graves/icml_2006.pdf

Note

In order to use CuDNN, the following must be satisfied:targets must bein concatenated format, allinput_lengths must beT.blank=0blank=0,target_lengths256\leq 256, the integer arguments must be ofdtypetorch.int32.

The regular implementation uses the (more common in PyTorch)torch.long dtype.

Note

In some circumstances when using the CUDA backend with CuDNN, this operatormay select a nondeterministic algorithm to increase performance. If this isundesirable, you can try to make the operation deterministic (potentially ata performance cost) by settingtorch.backends.cudnn.deterministic=True.Please see the notes onReproducibility for background.

forward(log_probs,targets,input_lengths,target_lengths)[source]#

Runs the forward pass.

Return type

Tensor