Rate this Page

★★★★★

torch.nn.functional #

Created On: Jun 11, 2019 | Last Updated On: Dec 08, 2025

Convolution functions#

`conv1d`	Applies a 1D convolution over an input signal composed of several input planes.
`conv2d`	Applies a 2D convolution over an input image composed of several input planes.
`conv3d`	Applies a 3D convolution over an input image composed of several input planes.
`conv_transpose1d`	Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".
`conv_transpose2d`	Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".
`conv_transpose3d`	Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"
`unfold`	Extract sliding local blocks from a batched input tensor.
`fold`	Combine an array of sliding local blocks into a large containing tensor.

Pooling functions#

`avg_pool1d`	Applies a 1D average pooling over an input signal composed of several input planes.
`avg_pool2d`	Applies 2D average-pooling operation in $k H \times k W kH \times kW$ regions by step size $s H \times s W sH \times sW$ steps.
`avg_pool3d`	Applies 3D average-pooling operation in $k T \times k H \times k W kT \times kH \times kW$ regions by step size $s T \times s H \times s W sT \times sH \times sW$ steps.
`max_pool1d`	Applies a 1D max pooling over an input signal composed of several input planes.
`max_pool2d`	Applies a 2D max pooling over an input signal composed of several input planes.
`max_pool3d`	Applies a 3D max pooling over an input signal composed of several input planes.
`max_unpool1d`	Compute a partial inverse of`MaxPool1d`.
`max_unpool2d`	Compute a partial inverse of`MaxPool2d`.
`max_unpool3d`	Compute a partial inverse of`MaxPool3d`.
`lp_pool1d`	Apply a 1D power-average pooling over an input signal composed of several input planes.
`lp_pool2d`	Apply a 2D power-average pooling over an input signal composed of several input planes.
`lp_pool3d`	Apply a 3D power-average pooling over an input signal composed of several input planes.
`adaptive_max_pool1d`	Applies a 1D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool2d`	Applies a 2D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool3d`	Applies a 3D adaptive max pooling over an input signal composed of several input planes.
`adaptive_avg_pool1d`	Applies a 1D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool2d`	Apply a 2D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool3d`	Apply a 3D adaptive average pooling over an input signal composed of several input planes.
`fractional_max_pool2d`	Applies 2D fractional max pooling over an input signal composed of several input planes.
`fractional_max_pool3d`	Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms#

Thetorch.nn.attention.bias module contains attention_biases that are designed to be used withscaled_dot_product_attention.

scaled_dot_product_attention

scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,

Non-linear activation functions#

`threshold`	Apply a threshold to each element of the input Tensor.
`threshold_`	In-place version of`threshold()`.
`relu`	Applies the rectified linear unit function element-wise.
`relu_`	In-place version of`relu()`.
`hardtanh`	Applies the HardTanh function element-wise.
`hardtanh_`	In-place version of`hardtanh()`.
`hardswish`	Apply hardswish function, element-wise.
`relu6`	Applies the element-wise function $ReLU6 (x) = \min (\max (0, x), 6) \text{ReLU6}(x) = \min(\max(0,x), 6)$ .
`elu`	Apply the Exponential Linear Unit (ELU) function element-wise.
`elu_`	In-place version of`elu()`.
`selu`	Applies element-wise, $SELU (x) = s c a l e * (\max (0, x) + \min (0, α * (\exp (x) - 1))) \text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))$ , with $α = 1.6732632423543772848170429916717 \alpha=1.6732632423543772848170429916717$ and $s c a l e = 1.0507009873554804934193349852946 scale=1.0507009873554804934193349852946$ .
`celu`	Applies element-wise, $CELU (x) = \max (0, x) + \min (0, α * (\exp (x / α) - 1)) \text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))$ .
`leaky_relu`	Applies element-wise, $LeakyReLU (x) = \max (0, x) + negative_slope * \min (0, x) \text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)$
`leaky_relu_`	In-place version of`leaky_relu()`.
`prelu`	Applies element-wise the function $PReLU (x) = \max (0, x) + weight * \min (0, x) \text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$ where weight is a learnable parameter.
`rrelu`	Randomized leaky ReLU.
`rrelu_`	In-place version of`rrelu()`.
`glu`	The gated linear unit.
`gelu`	When the approximate argument is 'none', it applies element-wise the function $GELU (x) = x * Φ (x) \text{GELU}(x) = x * \Phi(x)$
`logsigmoid`	Applies element-wise $LogSigmoid (x_{i}) = \log (\frac{1}{1 + \exp (- x_{i})}) \text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)$
`hardshrink`	Applies the hard shrinkage function element-wise
`tanhshrink`	Applies element-wise, $Tanhshrink (x) = x - Tanh (x) \text{Tanhshrink}(x) = x - \text{Tanh}(x)$
`softsign`	Applies element-wise, the function $SoftSign (x) = \frac{x}{1 + ∣ x ∣} \text{SoftSign}(x) = \frac{x}{1 + \|x\|}$
`softplus`	Applies element-wise, the function $Softplus (x) = \frac{1}{β} * \log (1 + \exp (β * x)) \text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$ .
`softmin`	Apply a softmin function.
`softmax`	Apply a softmax function.
`softshrink`	Applies the soft shrinkage function elementwise
`gumbel_softmax`	Sample from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretize.
`log_softmax`	Apply a softmax followed by a logarithm.
`tanh`	Applies element-wise, $Tanh (x) = \tanh (x) = \frac{\exp (x) - \exp (- x)}{\exp (x) + \exp (- x)} \text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}$
`sigmoid`	Applies the element-wise function $Sigmoid (x) = \frac{1}{1 + \exp (- x)} \text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}$
`hardsigmoid`	Apply the Hardsigmoid function element-wise.
`silu`	Apply the Sigmoid Linear Unit (SiLU) function, element-wise.
`mish`	Apply the Mish function, element-wise.
`batch_norm`	Apply Batch Normalization for each channel across a batch of data.
`group_norm`	Apply Group Normalization for last certain number of dimensions.
`instance_norm`	Apply Instance Normalization independently for each channel in every data sample within a batch.
`layer_norm`	Apply Layer Normalization for last certain number of dimensions.
`local_response_norm`	Apply local response normalization over an input signal.
`rms_norm`	Apply Root Mean Square Layer Normalization.
`normalize`	Perform $L_{p} L_p$ normalization of inputs over specified dimension.

Linear functions#

linear

Applies a linear transformation to the incoming data: $y = x A^{T} + b y = xA^T + b$ .

bilinear

Applies a bilinear transformation to the incoming data: $y = x_{1}^{T} A x_{2} + b y = x_1^T A x_2 + b$

Dropout functions#

`dropout`	During training, randomly zeroes some elements of the input tensor with probability`p`.
`alpha_dropout`	Apply alpha dropout to the input.
`feature_alpha_dropout`	Randomly masks out entire channels (a channel is a feature map).
`dropout1d`	Randomly zero out entire channels (a channel is a 1D feature map).
`dropout2d`	Randomly zero out entire channels (a channel is a 2D feature map).
`dropout3d`	Randomly zero out entire channels (a channel is a 3D feature map).

Sparse functions#

embedding

Generate a simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Compute sums, means or maxes ofbags of embeddings.

one_hot

Takes LongTensor with index values of shape(*) and returns a tensor of shape(*,num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions#

pairwise_distance

Seetorch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity betweenx1 andx2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions#

`binary_cross_entropy`	Compute Binary Cross Entropy between the target and input probabilities.
`binary_cross_entropy_with_logits`	Compute Binary Cross Entropy between target and input logits.
`poisson_nll_loss`	Compute the Poisson negative log likelihood loss.
`cosine_embedding_loss`	Compute the cosine embedding loss.
`cross_entropy`	Compute the cross entropy loss between input logits and target.
`ctc_loss`	Compute the Connectionist Temporal Classification loss.
`gaussian_nll_loss`	Compute the Gaussian negative log likelihood loss.
`hinge_embedding_loss`	Compute the hinge embedding loss.
`kl_div`	Compute the KL Divergence loss.
`l1_loss`	Compute the L1 loss, with optional weighting.
`mse_loss`	Compute the element-wise mean squared error, with optional weighting.
`margin_ranking_loss`	Compute the margin ranking loss.
`multilabel_margin_loss`	Compute the multilabel margin loss.
`multilabel_soft_margin_loss`	Compute the multilabel soft margin loss.
`multi_margin_loss`	Compute the multi margin loss, with optional weighting.
`nll_loss`	Compute the negative log likelihood loss.
`huber_loss`	Compute the Huber loss, with optional weighting.
`smooth_l1_loss`	Compute the Smooth L1 loss.
`soft_margin_loss`	Compute the soft margin loss.
`triplet_margin_loss`	Compute the triplet loss between given input tensors and a margin greater than 0.
`triplet_margin_with_distance_loss`	Compute the triplet margin loss for input tensors using a custom distance function.

Vision functions#

`pixel_shuffle`	Rearranges elements in a tensor of shape $(, C \times r^{2}, H, W) (, C \times r^2, H, W)$ to a tensor of shape $(, C, H \times r, W \times r) (, C, H \times r, W \times r)$ , where r is the`upscale_factor`.
`pixel_unshuffle`	Reverses the`PixelShuffle` operation by rearranging elements in a tensor of shape $(, C, H \times r, W \times r) (, C, H \times r, W \times r)$ to a tensor of shape $(, C \times r^{2}, H, W) (, C \times r^2, H, W)$ , where r is the`downscale_factor`.
`pad`	Pads tensor.
`interpolate`	Down/up samples the input.
`upsample`	Upsample input.
`upsample_nearest`	Upsamples the input, using nearest neighbours' pixel values.
`upsample_bilinear`	Upsamples the input, using bilinear upsampling.
`grid_sample`	Compute grid sample.
`affine_grid`	Generate 2D or 3D flow field (sampling grid), given a batch of affine matrices`theta`.

DataParallel functions (multi-GPU, distributed)#

data_parallel#

torch.nn.parallel.data_parallel

Evaluate module(input) in parallel across the GPUs given in device_ids.

Low-Precision functions#

`ScalingType`	alias of`_ScalingType`
`SwizzleType`	alias of`_SwizzleType`
`grouped_mm`	Computes a grouped matrix multiply that shares weight shapes across experts but allows jagged token counts per expert, which is common in Mixture-of-Experts (MoE) layers.
`scaled_mm`	scaled_mm(mat_a, mat_b, scale_a, scale_recipe_a, scale_b, scale_recipe_b, swizzle_a, swizzle_b, bias, output_dtype,
`scaled_grouped_mm`	scaled_grouped_mm(mat_a, mat_b, scale_a, scale_recipe_a, scale_b, scale_recipe_b, swizzle_a, swizzle_b, bias, offs,

On this page

Edit on GitHub

Show Source

PyTorch Libraries

Movatterモバイル変換

torch.nn.functional #

Convolution functions#

Pooling functions#

Attention Mechanisms#

Non-linear activation functions#

Linear functions#

Dropout functions#

Sparse functions#

Distance functions#

Loss functions#

Vision functions#

DataParallel functions (multi-GPU, distributed)#

data_parallel#

Low-Precision functions#

Docs

Tutorials

Resources

Movatterモバイル変換

torch.nn.functional#

Convolution functions#

Pooling functions#

Attention Mechanisms#

Non-linear activation functions#

Linear functions#

Dropout functions#

Sparse functions#

Distance functions#

Loss functions#

Vision functions#

DataParallel functions (multi-GPU, distributed)#

data_parallel#

Low-Precision functions#

Docs

Tutorials

Resources

torch.nn.functional #