Rate this Page

torch.nn.functional.embedding_bag#

torch.nn.functional.embedding_bag(input,weight,offsets=None,max_norm=None,norm_type=2,scale_grad_by_freq=False,mode='mean',sparse=False,per_sample_weights=None,include_last_offset=False,padding_idx=None)[source]#

Compute sums, means or maxes ofbags of embeddings.

Calculation is done without instantiating the intermediate embeddings.Seetorch.nn.EmbeddingBag for more details.

Note

This operation may produce nondeterministic gradients when given tensors on a CUDA device. SeeReproducibility for more information.

Parameters
  • input (LongTensor) – Tensor containing bags of indices into the embedding matrix

  • weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1,and number of columns equal to the embedding size

  • offsets (LongTensor,optional) – Only used wheninput is 1D.offsets determinesthe starting index position of each bag (sequence) ininput.

  • max_norm (float,optional) – If given, each embedding vector with norm larger thanmax_normis renormalized to have normmax_norm.Note: this will modifyweight in-place.

  • norm_type (float,optional) – Thep in thep-norm to compute for themax_norm option.Default2.

  • scale_grad_by_freq (bool,optional) – if given, this will scale gradients by the inverse of frequency ofthe words in the mini-batch. DefaultFalse.Note: this option is not supported whenmode="max".

  • mode (str,optional) –"sum","mean" or"max". Specifies the way to reduce the bag.Default:"mean"

  • sparse (bool,optional) – ifTrue, gradient w.r.t.weight will be a sparse tensor. See Notes undertorch.nn.Embedding for more details regarding sparse gradients.Note: this option is not supported whenmode="max".

  • per_sample_weights (Tensor,optional) – a tensor of float / double weights, or Noneto indicate all weights should be taken to be 1. If specified,per_sample_weightsmust have exactly the same shape as input and is treated as having the sameoffsets, if those are not None.

  • include_last_offset (bool,optional) – ifTrue, the size of offsets is equal to the number of bags + 1.The last element is the size of the input, or the ending index position of the last bag (sequence).

  • padding_idx (int,optional) – If specified, the entries atpadding_idx do not contribute to thegradient; therefore, the embedding vector atpadding_idx is not updatedduring training, i.e. it remains as a fixed “pad”. Note that the embeddingvector atpadding_idx is excluded from the reduction.

Return type

Tensor

Shape:
  • input (LongTensor) andoffsets (LongTensor, optional)

    • Ifinput is 2D of shape(B, N), it will be treated asB bags (sequences)each of fixed lengthN, and this will returnB values aggregated in a waydepending on themode.offsets is ignored and required to beNone in this case.

    • Ifinput is 1D of shape(N), it will be treated as a concatenation ofmultiple bags (sequences).offsets is required to be a 1D tensor containingthe starting index positions of each bag ininput. Therefore, foroffsetsof shape(B),input will be viewed as havingB bags.Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.

  • weight (Tensor): the learnable weights of the module of shape(num_embeddings, embedding_dim)

  • per_sample_weights (Tensor, optional). Has the same shape asinput.

  • output: aggregated embedding values of shape(B, embedding_dim)

Examples:

>>># an Embedding module containing 10 tensors of size 3>>>embedding_matrix=torch.rand(10,3)>>># a batch of 2 samples of 4 indices each>>>input=torch.tensor([1,2,4,5,4,3,2,9])>>>offsets=torch.tensor([0,4])>>>F.embedding_bag(input,embedding_matrix,offsets)tensor([[ 0.3397,  0.3552,  0.5545],        [ 0.5893,  0.4386,  0.5882]])>>># example with padding_idx>>>embedding_matrix=torch.rand(10,3)>>>input=torch.tensor([2,2,2,2,4,3,2,9])>>>offsets=torch.tensor([0,4])>>>F.embedding_bag(input,embedding_matrix,offsets,padding_idx=2,mode='sum')tensor([[ 0.0000,  0.0000,  0.0000],        [-0.7082,  3.2145, -2.6251]])