CTCLoss #

classtorch.nn.modules.loss.CTCLoss(blank=0,reduction='mean',zero_infinity=False)[source]#

The Connectionist Temporal Classification loss.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over theprobability of possible alignments of input to target, producing a loss value which is differentiablewith respect to each input node. The alignment of input to target is assumed to be “many-to-one”, whichlimits the length of the target sequence such that it must be $\leq \leq$ the input length.

Parameters

blank (int,optional) – blank label. Default $00$ .
reduction (str,optional) – Specifies the reduction to apply to the output:'none' |'mean' |'sum'.'none': no reduction will be applied,'mean': the output losses will be divided by the target lengths andthen the mean over the batch is taken,'sum': the output losses will be summed.Default:'mean'
zero_infinity (bool,optional) – Whether to zero infinite losses and the associated gradients.Default:FalseInfinite losses mainly occur when the inputs are too shortto be aligned to the targets.

Shape:

Log_probs: Tensor of size $(T, N, C) (T, N, C)$ or $(T, C) (T, C)$ ,where $T = input length T = \text{input length}$ , $N = batch size N = \text{batch size}$ , and $C = number of classes (including blank) C = \text{number of classes (including blank)}$ .The logarithmized probabilities of the outputs (e.g. obtained withtorch.nn.functional.log_softmax()).
Targets: Tensor of size $(N, S) (N, S)$ or $(sum (target_lengths)) (\operatorname{sum}(\text{target\_lengths}))$ ,where $N = batch size N = \text{batch size}$ and $S = max target length, if shape is (N, S) S = \text{max target length, if shape is } (N, S)$ .It represents the target sequences. Each element in the targetsequence is a class index. And the target index cannot be blank (default=0).In the $(N, S) (N, S)$ form, targets are padded to thelength of the longest sequence, and stacked.In the $(sum (target_lengths)) (\operatorname{sum}(\text{target\_lengths}))$ form,the targets are assumed to be un-padded andconcatenated within 1 dimension.
Input_lengths: Tuple or tensor of size $(N) (N)$ or $() ()$ ,where $N = batch size N = \text{batch size}$ . It represents the lengths of theinputs (must each be $\leq T \leq T$ ). And the lengths are specifiedfor each sequence to achieve masking under the assumption that sequencesare padded to equal lengths.
Target_lengths: Tuple or tensor of size $(N) (N)$ or $() ()$ ,where $N = batch size N = \text{batch size}$ . It represents lengths of the targets.Lengths are specified for each sequence to achieve masking under theassumption that sequences are padded to equal lengths. If target shape is $(N, S) (N,S)$ , target_lengths are effectively the stop index $s_{n} s_n$ for each target sequence, such thattarget_n=targets[n,0:s_n] foreach target in a batch. Lengths must each be $\leq S \leq S$ If the targets are given as a 1d tensor that is the concatenation of individualtargets, the target_lengths must add up to the total length of the tensor.
Output: scalar ifreduction is'mean' (default) or'sum'. Ifreduction is'none', then $(N) (N)$ if input is batched or $() ()$ if input is unbatched, where $N = batch size N = \text{batch size}$ .

Examples

>>># Target are to be padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>S=30# Target sequence length of longest target in batch (padding length)>>>S_min=10# Minimum target length, for demonstration purposes>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target=torch.randint(low=1,high=C,size=(N,S),dtype=torch.long)>>>>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>target_lengths=torch.randint(...low=S_min,...high=S,...size=(N,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>N=16# Batch size>>>>>># Initialize random batch of input vectors, for *size = (T,N,C)>>>input=torch.randn(T,N,C).log_softmax(2).detach().requires_grad_()>>>input_lengths=torch.full(size=(N,),fill_value=T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(N,),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(sum(target_lengths),),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()>>>>>>>>># Target are to be un-padded and unbatched (effectively N=1)>>>T=50# Input sequence length>>>C=20# Number of classes (including blank)>>>>>># Initialize random batch of input vectors, for *size = (T,C)>>>input=torch.randn(T,C).log_softmax(1).detach().requires_grad_()>>>input_lengths=torch.tensor(T,dtype=torch.long)>>>>>># Initialize random batch of targets (0 = blank, 1:C = classes)>>>target_lengths=torch.randint(low=1,high=T,size=(),dtype=torch.long)>>>target=torch.randint(...low=1,...high=C,...size=(target_lengths,),...dtype=torch.long,...)>>>ctc_loss=nn.CTCLoss()>>>loss=ctc_loss(input,target,input_lengths,target_lengths)>>>loss.backward()

Reference:: A. Graves et al.: Connectionist Temporal Classification:Labelling Unsegmented Sequence Data with Recurrent Neural Networks:https://www.cs.toronto.edu/~graves/icml_2006.pdf

Note

In order to use CuDNN, the following must be satisfied:targets must bein concatenated format, allinput_lengths must beT. $b l a n k = 0 blank=0$ ,target_lengths $\leq 256 \leq 256$ , the integer arguments must be ofdtypetorch.int32.

The regular implementation uses the (more common in PyTorch)torch.long dtype.

Note

In some circumstances when using the CUDA backend with CuDNN, this operatormay select a nondeterministic algorithm to increase performance. If this isundesirable, you can try to make the operation deterministic (potentially ata performance cost) by settingtorch.backends.cudnn.deterministic=True.Please see the notes onReproducibility for background.

forward(log_probs,targets,input_lengths,target_lengths)[source]#

Runs the forward pass.

Return type: Tensor

On this page

Show Source

PyTorch Libraries

Movatterモバイル変換

CTCLoss #

Docs

Tutorials

Resources

Movatterモバイル変換

CTCLoss#

Docs

Tutorials

Resources

CTCLoss #