torch.nn.functional#
Created On: Jun 11, 2019 | Last Updated On: Dec 08, 2025
Convolution functions#
conv1d | Applies a 1D convolution over an input signal composed of several input planes. |
conv2d | Applies a 2D convolution over an input image composed of several input planes. |
conv3d | Applies a 3D convolution over an input image composed of several input planes. |
conv_transpose1d | Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution". |
conv_transpose2d | Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution". |
conv_transpose3d | Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution" |
unfold | Extract sliding local blocks from a batched input tensor. |
fold | Combine an array of sliding local blocks into a large containing tensor. |
Pooling functions#
avg_pool1d | Applies a 1D average pooling over an input signal composed of several input planes. |
avg_pool2d | Applies 2D average-pooling operation in regions by step size steps. |
avg_pool3d | Applies 3D average-pooling operation in regions by step size steps. |
max_pool1d | Applies a 1D max pooling over an input signal composed of several input planes. |
max_pool2d | Applies a 2D max pooling over an input signal composed of several input planes. |
max_pool3d | Applies a 3D max pooling over an input signal composed of several input planes. |
max_unpool1d | Compute a partial inverse of |
max_unpool2d | Compute a partial inverse of |
max_unpool3d | Compute a partial inverse of |
lp_pool1d | Apply a 1D power-average pooling over an input signal composed of several input planes. |
lp_pool2d | Apply a 2D power-average pooling over an input signal composed of several input planes. |
lp_pool3d | Apply a 3D power-average pooling over an input signal composed of several input planes. |
adaptive_max_pool1d | Applies a 1D adaptive max pooling over an input signal composed of several input planes. |
adaptive_max_pool2d | Applies a 2D adaptive max pooling over an input signal composed of several input planes. |
adaptive_max_pool3d | Applies a 3D adaptive max pooling over an input signal composed of several input planes. |
adaptive_avg_pool1d | Applies a 1D adaptive average pooling over an input signal composed of several input planes. |
adaptive_avg_pool2d | Apply a 2D adaptive average pooling over an input signal composed of several input planes. |
adaptive_avg_pool3d | Apply a 3D adaptive average pooling over an input signal composed of several input planes. |
fractional_max_pool2d | Applies 2D fractional max pooling over an input signal composed of several input planes. |
fractional_max_pool3d | Applies 3D fractional max pooling over an input signal composed of several input planes. |
Attention Mechanisms#
Thetorch.nn.attention.bias module contains attention_biases that are designed to be used withscaled_dot_product_attention.
scaled_dot_product_attention | scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, |
Non-linear activation functions#
threshold | Apply a threshold to each element of the input Tensor. |
threshold_ | In-place version of |
relu | Applies the rectified linear unit function element-wise. |
relu_ | In-place version of |
hardtanh | Applies the HardTanh function element-wise. |
hardtanh_ | In-place version of |
hardswish | Apply hardswish function, element-wise. |
relu6 | Applies the element-wise function. |
elu | Apply the Exponential Linear Unit (ELU) function element-wise. |
elu_ | In-place version of |
selu | Applies element-wise,, with and. |
celu | Applies element-wise,. |
leaky_relu | Applies element-wise, |
leaky_relu_ | In-place version of |
prelu | Applies element-wise the function where weight is a learnable parameter. |
rrelu | Randomized leaky ReLU. |
rrelu_ | In-place version of |
glu | The gated linear unit. |
gelu | When the approximate argument is 'none', it applies element-wise the function |
logsigmoid | Applies element-wise |
hardshrink | Applies the hard shrinkage function element-wise |
tanhshrink | Applies element-wise, |
softsign | Applies element-wise, the function |
softplus | Applies element-wise, the function. |
softmin | Apply a softmin function. |
softmax | Apply a softmax function. |
softshrink | Applies the soft shrinkage function elementwise |
gumbel_softmax | Sample from the Gumbel-Softmax distribution (Link 1Link 2) and optionally discretize. |
log_softmax | Apply a softmax followed by a logarithm. |
tanh | Applies element-wise, |
sigmoid | Applies the element-wise function |
hardsigmoid | Apply the Hardsigmoid function element-wise. |
silu | Apply the Sigmoid Linear Unit (SiLU) function, element-wise. |
mish | Apply the Mish function, element-wise. |
batch_norm | Apply Batch Normalization for each channel across a batch of data. |
group_norm | Apply Group Normalization for last certain number of dimensions. |
instance_norm | Apply Instance Normalization independently for each channel in every data sample within a batch. |
layer_norm | Apply Layer Normalization for last certain number of dimensions. |
local_response_norm | Apply local response normalization over an input signal. |
rms_norm | Apply Root Mean Square Layer Normalization. |
normalize | Perform normalization of inputs over specified dimension. |
Linear functions#
Dropout functions#
dropout | During training, randomly zeroes some elements of the input tensor with probability |
alpha_dropout | Apply alpha dropout to the input. |
feature_alpha_dropout | Randomly masks out entire channels (a channel is a feature map). |
dropout1d | Randomly zero out entire channels (a channel is a 1D feature map). |
dropout2d | Randomly zero out entire channels (a channel is a 2D feature map). |
dropout3d | Randomly zero out entire channels (a channel is a 3D feature map). |
Sparse functions#
embedding | Generate a simple lookup table that looks up embeddings in a fixed dictionary and size. |
embedding_bag | Compute sums, means or maxes ofbags of embeddings. |
one_hot | Takes LongTensor with index values of shape |
Distance functions#
pairwise_distance | See |
cosine_similarity | Returns cosine similarity between |
pdist | Computes the p-norm distance between every pair of row vectors in the input. |
Loss functions#
binary_cross_entropy | Compute Binary Cross Entropy between the target and input probabilities. |
binary_cross_entropy_with_logits | Compute Binary Cross Entropy between target and input logits. |
poisson_nll_loss | Compute the Poisson negative log likelihood loss. |
cosine_embedding_loss | Compute the cosine embedding loss. |
cross_entropy | Compute the cross entropy loss between input logits and target. |
ctc_loss | Compute the Connectionist Temporal Classification loss. |
gaussian_nll_loss | Compute the Gaussian negative log likelihood loss. |
hinge_embedding_loss | Compute the hinge embedding loss. |
kl_div | Compute the KL Divergence loss. |
l1_loss | Compute the L1 loss, with optional weighting. |
mse_loss | Compute the element-wise mean squared error, with optional weighting. |
margin_ranking_loss | Compute the margin ranking loss. |
multilabel_margin_loss | Compute the multilabel margin loss. |
multilabel_soft_margin_loss | Compute the multilabel soft margin loss. |
multi_margin_loss | Compute the multi margin loss, with optional weighting. |
nll_loss | Compute the negative log likelihood loss. |
huber_loss | Compute the Huber loss, with optional weighting. |
smooth_l1_loss | Compute the Smooth L1 loss. |
soft_margin_loss | Compute the soft margin loss. |
triplet_margin_loss | Compute the triplet loss between given input tensors and a margin greater than 0. |
triplet_margin_with_distance_loss | Compute the triplet margin loss for input tensors using a custom distance function. |
Vision functions#
pixel_shuffle | Rearranges elements in a tensor of shape to a tensor of shape, where r is the |
pixel_unshuffle | Reverses the |
pad | Pads tensor. |
interpolate | Down/up samples the input. |
upsample | Upsample input. |
upsample_nearest | Upsamples the input, using nearest neighbours' pixel values. |
upsample_bilinear | Upsamples the input, using bilinear upsampling. |
grid_sample | Compute grid sample. |
affine_grid | Generate 2D or 3D flow field (sampling grid), given a batch of affine matrices |
DataParallel functions (multi-GPU, distributed)#
data_parallel#
| Evaluate module(input) in parallel across the GPUs given in device_ids. |
Low-Precision functions#
ScalingType | alias of |
SwizzleType | alias of |
grouped_mm | Computes a grouped matrix multiply that shares weight shapes across experts but allows jagged token counts per expert, which is common in Mixture-of-Experts (MoE) layers. |
scaled_mm | scaled_mm(mat_a, mat_b, scale_a, scale_recipe_a, scale_b, scale_recipe_b, swizzle_a, swizzle_b, bias, output_dtype, |
scaled_grouped_mm | scaled_grouped_mm(mat_a, mat_b, scale_a, scale_recipe_a, scale_b, scale_recipe_b, swizzle_a, swizzle_b, bias, offs, |