Rate this Page

CTCLoss#

classtorch.nn.CTCLoss(blank=0,reduction='mean',zero_infinity=False)[source]#

The Connectionist Temporal Classification loss.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over theprobability of possible alignments of input to target, producing a loss value which is differentiablewith respect to each input node. The alignment of input to target is assumed to be “many-to-one”, whichlimits the length of the target sequence such that it must be\leq the input length.

Parameters:
  • blank (int,optional) – blank label. Default00.

  • reduction (str,optional) – Specifies the reduction to apply to the output:'none' |'mean' |'sum'.'none': no reduction will be applied,'mean': the output losses will be divided by the target lengths andthen the mean over the batch is taken,'sum': the output losses will be summed.Default:'mean'

  • zero_infinity (bool,optional) – Whether to zero infinite losses and the associated gradients.Default:FalseInfinite losses mainly occur when the inputs are too shortto be aligned to the targets.

Shape:
  • Log_probs: Tensor of size(T,N,C)(T, N, C) or(T,C)(T, C),whereT=input lengthT = \text{input length},N=batch sizeN = \text{batch size}, andC=number of classes (including blank)C = \text{number of classes (including blank)}.The logarithmized probabilities of the outputs (e.g. obtained withtorch.nn.functional.log_softmax()).

  • Targets: Tensor of size(N,S)(N, S) or(sum(target_lengths))(\operatorname{sum}(\text{target\_lengths})),whereN=batch sizeN = \text{batch size} andS=max target length, if shape is (N,S)S = \text{max target length, if shape is } (N, S).It represents the target sequences. Each element in the targetsequence is a class index. And the target index cannot be blank (default=0).In the(N,S)(N, S) form, targets are padded to thelength of the longest sequence, and stacked.In the(sum(target_lengths))(\operatorname{sum}(\text{target\_lengths})) form,the targets are assumed to be un-padded andconcatenated within 1 dimension.

  • Input_lengths: Tuple or tensor of size(N)(N) or()(),whereN=batch sizeN = \text{batch size}. It represents the lengths of theinputs (must each beT\leq T). And the lengths are specifiedfor each sequence to achieve masking under the assumption that sequencesare padded to equal lengths.

  • Target_lengths: Tuple or tensor of size(N)(N) or()(),whereN=batch sizeN = \text{batch size}. It represents lengths of the targets.Lengths are specified for each sequence to achieve masking under theassumption that sequences are padded to equal lengths. If target shape is(N,S)(N,S), target_lengths are effectively the stop indexsns_n for each target sequence, such thattarget_n=targets[n,0:s_n] foreach target in a batch. Lengths must each beS\leq SIf the targets are given as a 1d tensor that is the concatenation of individualtargets, the target_lengths must add up to the total length of the tensor.

  • Output: scalar ifreduction is'mean' (default) or'sum'. Ifreduction is'none', then(N)(N) if input is batched or()() if input is unbatched, whereN=batch sizeN = \text{batch size}.

Examples

>>># Target are to be padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>S=30# Target sequence length of longest target in batch (padding length)>>>S_min=10# Minimum target length, for demonstration purposes>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target=torch.randint(low=1,high=C,size=(N,S),dtype=torch.long)>>>>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>target_lengths=torch.randint(...low=S_min,...high=S,...size=(N,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(N,),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(sum(target_lengths),),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded and unbatched (effectively N=1)>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>>>># Initialize random batch of input vectors, for *size = (T,C)>>>input=torch.randn(T,C).log_softmax(1).detach().requires_grad_()>>>input_lengths=torch.tensor(T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(target_lengths,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()
Reference:

A. Graves et al.: Connectionist Temporal Classification:Labelling Unsegmented Sequence Data with Recurrent Neural Networks:https://www.cs.toronto.edu/~graves/icml_2006.pdf

Note

In order to use CuDNN, the following must be satisfied: thetargets must bein concatenated format, allinput_lengths must beT.blank=0blank=0,target_lengths256\leq 256, the integer arguments must be ofdtypetorch.int32, and thelog_probs itself must be ofdtypetorch.float32.

The regular implementation uses the (more common in PyTorch)torch.long dtype.

Note

In some circumstances when using the CUDA backend with CuDNN, this operatormay select a nondeterministic algorithm to increase performance. If this isundesirable, you can try to make the operation deterministic (potentially ata performance cost) by settingtorch.backends.cudnn.deterministic=True.Please see the notes onReproducibility for background.

forward(log_probs,targets,input_lengths,target_lengths)[source]#

Runs the forward pass.

Return type:

Tensor