Aliases in torch.nn#
Created On: Jul 25, 2025 | Last Updated On: Jul 25, 2025
The following are aliases to their counterparts intorch.nn in nested namespaces.
torch.nn.modules#
The following are aliases to their counterparts intorch.nn in thetorch.nn.modules namespace.
Containers (Aliases)#
A sequential container. | |
Holds submodules in a list. | |
Holds submodules in a dictionary. | |
Holds parameters in a list. | |
Holds parameters in a dictionary. |
Convolution Layers (Aliases)#
Applies a 1D convolution over an input signal composed of several input planes. | |
Applies a 2D convolution over an input signal composed of several input planes. | |
Applies a 3D convolution over an input signal composed of several input planes. | |
Applies a 1D transposed convolution operator over an input image composed of several input planes. | |
Applies a 2D transposed convolution operator over an input image composed of several input planes. | |
Applies a 3D transposed convolution operator over an input image composed of several input planes. | |
A | |
A | |
A | |
A | |
A | |
A | |
Extracts sliding local blocks from a batched input tensor. | |
Combines an array of sliding local blocks into a large containing tensor. |
Pooling layers (Aliases)#
Applies a 1D max pooling over an input signal composed of several input planes. | |
Applies a 2D max pooling over an input signal composed of several input planes. | |
Applies a 3D max pooling over an input signal composed of several input planes. | |
Computes a partial inverse of | |
Computes a partial inverse of | |
Computes a partial inverse of | |
Applies a 1D average pooling over an input signal composed of several input planes. | |
Applies a 2D average pooling over an input signal composed of several input planes. | |
Applies a 3D average pooling over an input signal composed of several input planes. | |
Applies a 2D fractional max pooling over an input signal composed of several input planes. | |
Applies a 3D fractional max pooling over an input signal composed of several input planes. | |
Applies a 1D power-average pooling over an input signal composed of several input planes. | |
Applies a 2D power-average pooling over an input signal composed of several input planes. | |
Applies a 3D power-average pooling over an input signal composed of several input planes. | |
Applies a 1D adaptive max pooling over an input signal composed of several input planes. | |
Applies a 2D adaptive max pooling over an input signal composed of several input planes. | |
Applies a 3D adaptive max pooling over an input signal composed of several input planes. | |
Applies a 1D adaptive average pooling over an input signal composed of several input planes. | |
Applies a 2D adaptive average pooling over an input signal composed of several input planes. | |
Applies a 3D adaptive average pooling over an input signal composed of several input planes. |
Padding Layers (Aliases)#
Pads the input tensor using the reflection of the input boundary. | |
Pads the input tensor using the reflection of the input boundary. | |
Pads the input tensor using the reflection of the input boundary. | |
Pads the input tensor using replication of the input boundary. | |
Pads the input tensor using replication of the input boundary. | |
Pads the input tensor using replication of the input boundary. | |
Pads the input tensor boundaries with zero. | |
Pads the input tensor boundaries with zero. | |
Pads the input tensor boundaries with zero. | |
Pads the input tensor boundaries with a constant value. | |
Pads the input tensor boundaries with a constant value. | |
Pads the input tensor boundaries with a constant value. | |
Pads the input tensor using circular padding of the input boundary. | |
Pads the input tensor using circular padding of the input boundary. | |
Pads the input tensor using circular padding of the input boundary. |
Non-linear Activations (weighted sum, nonlinearity) (Aliases)#
Applies the Exponential Linear Unit (ELU) function, element-wise. | |
Applies the Hard Shrinkage (Hardshrink) function element-wise. | |
Applies the Hardsigmoid function element-wise. | |
Applies the HardTanh function element-wise. | |
Applies the Hardswish function, element-wise. | |
Applies the LeakyReLU function element-wise. | |
Applies the Logsigmoid function element-wise. | |
Allows the model to jointly attend to information from different representation subspaces. | |
Applies the element-wise PReLU function. | |
Applies the rectified linear unit function element-wise. | |
Applies the ReLU6 function element-wise. | |
Applies the randomized leaky rectified linear unit function, element-wise. | |
Applies the SELU function element-wise. | |
Applies the CELU function element-wise. | |
Applies the Gaussian Error Linear Units function. | |
Applies the Sigmoid function element-wise. | |
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. | |
Applies the Mish function, element-wise. | |
Applies the Softplus function element-wise. | |
Applies the soft shrinkage function element-wise. | |
Applies the element-wise Softsign function. | |
Applies the Hyperbolic Tangent (Tanh) function element-wise. | |
Applies the element-wise Tanhshrink function. | |
Thresholds each element of the input Tensor. | |
Applies the gated linear unit function. |
Non-linear Activations (other) (Aliases)#
Applies the Softmin function to an n-dimensional input Tensor. | |
Applies the Softmax function to an n-dimensional input Tensor. | |
Applies SoftMax over features to each spatial location. | |
Applies the function to an n-dimensional input Tensor. | |
Efficient softmax approximation. |
Normalization Layers (Aliases)#
Applies Batch Normalization over a 2D or 3D input. | |
Applies Batch Normalization over a 4D input. | |
Applies Batch Normalization over a 5D input. | |
A | |
A | |
A | |
Applies Group Normalization over a mini-batch of inputs. | |
Applies Batch Normalization over a N-Dimensional input. | |
Applies Instance Normalization. | |
Applies Instance Normalization. | |
Applies Instance Normalization. | |
A | |
A | |
A | |
Applies Layer Normalization over a mini-batch of inputs. | |
Applies local response normalization over an input signal. | |
Applies Root Mean Square Layer Normalization over a mini-batch of inputs. |
Recurrent Layers (Aliases)#
Base class for RNN modules (RNN, LSTM, GRU). | |
Apply a multi-layer Elman RNN with or non-linearity to an input sequence. | |
Apply a multi-layer long short-term memory (LSTM) RNN to an input sequence. | |
Apply a multi-layer gated recurrent unit (GRU) RNN to an input sequence. | |
An Elman RNN cell with tanh or ReLU non-linearity. | |
A long short-term memory (LSTM) cell. | |
A gated recurrent unit (GRU) cell. |
Transformer Layers (Aliases)#
A basic transformer layer. | |
TransformerEncoder is a stack of N encoder layers. | |
TransformerDecoder is a stack of N decoder layers. | |
TransformerEncoderLayer is made up of self-attn and feedforward network. | |
TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network. |
Linear Layers (Aliases)#
A placeholder identity operator that is argument-insensitive. | |
Applies an affine linear transformation to the incoming data:. | |
Applies a bilinear transformation to the incoming data:. | |
A |
Dropout Layers (Aliases)#
During training, randomly zeroes some of the elements of the input tensor with probability | |
Randomly zero out entire channels. | |
Randomly zero out entire channels. | |
Randomly zero out entire channels. | |
Applies Alpha Dropout over the input. | |
Randomly masks out entire channels. |
Sparse Layers (Aliases)#
A simple lookup table that stores embeddings of a fixed dictionary and size. | |
Compute sums or means of 'bags' of embeddings, without instantiating the intermediate embeddings. |
Distance Functions (Aliases)#
Returns cosine similarity between and, computed alongdim. | |
Computes the pairwise distance between input vectors, or between columns of input matrices. |
Loss Functions (Aliases)#
Creates a criterion that measures the mean absolute error (MAE) between each element in the input and target. | |
Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input and target. | |
This criterion computes the cross entropy loss between input logits and target. | |
The Connectionist Temporal Classification loss. | |
The negative log likelihood loss. | |
Negative log likelihood loss with Poisson distribution of target. | |
Gaussian negative log likelihood loss. | |
The Kullback-Leibler divergence loss. | |
Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities: | |
This loss combines aSigmoid layer and theBCELoss in one single class. | |
Creates a criterion that measures the loss given inputs,, two 1D mini-batch or 0DTensors, and a label 1D mini-batch or 0DTensor (containing 1 or -1). | |
Measures the loss given an input tensor and a labels tensor (containing 1 or -1). | |
Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input (a 2D mini-batchTensor) and output (which is a 2DTensor of target class indices). | |
Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. | |
Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. | |
Creates a criterion that optimizes a two-class classification logistic loss between input tensor and target tensor (containing 1 or -1). | |
Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target of size. | |
Creates a criterion that measures the loss given input tensors, and aTensor label with values 1 or -1. | |
Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input (a 2D mini-batchTensor) and output (which is a 1D tensor of target class indices,): | |
Creates a criterion that measures the triplet loss given an input tensors,, and a margin with a value greater than. | |
Creates a criterion that measures the triplet loss given input tensors,, and (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function ("distance function") used to compute the relationship between the anchor and positive example ("positive distance") and the anchor and negative example ("negative distance"). |
Vision Layers (Aliases)#
Rearrange elements in a tensor according to an upscaling factor. | |
Reverse the PixelShuffle operation. | |
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. | |
Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. | |
Applies a 2D bilinear upsampling to an input signal composed of several input channels. |
Shuffle Layers (Aliases)#
Divides and rearranges the channels in a tensor. |
torch.nn.utils#
The following are aliases to their counterparts intorch.nn.utils in nested namespaces.
Utility functions to clip parameter gradients.
Clip the gradient norm of an iterable of parameters. | |
Clip the gradient norm of an iterable of parameters. | |
Clip the gradients of an iterable of parameters at specified value. |
Utility functions to flatten and unflatten Module parameters to and from a single vector.
Flatten an iterable of parameters into a single vector. | |
Copy slices of a vector into an iterable of parameters. |
Utility functions to fuse Modules with BatchNorm modules.
Fuse a convolutional module and a BatchNorm module into a single, new convolutional module. | |
Fuse convolutional module parameters and BatchNorm module parameters into new convolutional module parameters. | |
Fuse a linear module and a BatchNorm module into a single, new linear module. | |
Fuse linear module parameters and BatchNorm module parameters into new linear module parameters. |
Utility functions to convert Module parameter memory formats.
Convert | |
Convert |
Utility functions to apply and remove weight normalization from Module parameters.
Apply weight normalization to a parameter in the given module. | |
Remove the weight normalization reparameterization from a module. | |
Apply spectral normalization to a parameter in the given module. | |
Remove the spectral normalization reparameterization from a module. |
Utility functions for initializing Module parameters.
Given a module class object and args / kwargs, instantiate the module without initializing parameters / buffers. |