Rate this Page

AdaptiveLogSoftmaxWithLoss#

classtorch.nn.AdaptiveLogSoftmaxWithLoss(in_features,n_classes,cutoffs,div_value=4.0,head_bias=False,device=None,dtype=None)[source]#

Efficient softmax approximation.

As described inEfficient softmax approximation for GPUs by Edouard Grave, Armand Joulin,Moustapha Cissé, David Grangier, and Hervé Jégou.

Adaptive softmax is an approximate strategy for training models with largeoutput spaces. It is most effective when the label distribution is highlyimbalanced, for example in natural language modelling, where the wordfrequency distribution approximately follows theZipf’s law.

Adaptive softmax partitions the labels into several clusters, according totheir frequency. These clusters may contain different number of targetseach.Additionally, clusters containing less frequent labels assign lowerdimensional embeddings to those labels, which speeds up the computation.For each minibatch, only clusters for which at least one target ispresent are evaluated.

The idea is that the clusters which are accessed frequently(like the first one, containing most frequent labels), should also be cheapto compute – that is, contain a small number of assigned labels.

We highly recommend taking a look at the original paper for more details.

  • cutoffs should be an ordered Sequence of integers sortedin the increasing order.It controls number of clusters and the partitioning of targets intoclusters. For example settingcutoffs=[10,100,1000]means that first10 targets will be assignedto the ‘head’ of the adaptive softmax, targets11, 12, …, 100 will beassigned to the first cluster, and targets101, 102, …, 1000 will beassigned to the second cluster, while targets1001, 1002, …, n_classes - 1 will be assignedto the last, third cluster.

  • div_value is used to compute the size of each additional cluster,which is given asin_featuresdiv_valueidx\left\lfloor\frac{\texttt{in\_features}}{\texttt{div\_value}^{idx}}\right\rfloor,whereidxidx is the cluster index (with clustersfor less frequent words having larger indices,and indices starting from11).

  • head_bias if set to True, adds a bias term to the ‘head’ of theadaptive softmax. See paper for details. Set to False in the officialimplementation.

Warning

Labels passed as inputs to this module should be sorted according totheir frequency. This means that the most frequent label should berepresented by the index0, and the least frequentlabel should be represented by the indexn_classes - 1.

Note

This module returns aNamedTuple withoutputandloss fields. See further documentation for details.

Note

To compute log-probabilities for all classes, thelog_probmethod can be used.

Parameters
  • in_features (int) – Number of features in the input tensor

  • n_classes (int) – Number of classes in the dataset

  • cutoffs (Sequence) – Cutoffs used to assign targets to their buckets

  • div_value (float,optional) – value used as an exponent to compute sizesof the clusters. Default: 4.0

  • head_bias (bool,optional) – IfTrue, adds a bias term to the ‘head’ of theadaptive softmax. Default:False

Returns

  • output is a Tensor of sizeN containing computed targetlog probabilities for each example

  • loss is a Scalar representing the computed negativelog likelihood loss

Return type

NamedTuple withoutput andloss fields

Shape:
forward(input_,target_)[source]#

Runs the forward pass.

Return type

_ASMoutput

log_prob(input)[source]#

Compute log probabilities for alln_classes\texttt{n\_classes}.

Parameters

input (Tensor) – a minibatch of examples

Returns

log-probabilities of for each classccin range0<=c<=n_classes0 <= c <= \texttt{n\_classes}, wheren_classes\texttt{n\_classes} is aparameter passed toAdaptiveLogSoftmaxWithLoss constructor.

Return type

Tensor

Shape:
predict(input)[source]#

Return the class with the highest probability for each example in the input minibatch.

This is equivalent toself.log_prob(input).argmax(dim=1), but is more efficient in some cases.

Parameters

input (Tensor) – a minibatch of examples

Returns

a class with the highest probability for each example

Return type

output (Tensor)

Shape:
reset_parameters()[source]#

Resets parameters based on their initialization used in__init__.