Rate this Page

RMSNorm#

classtorch.nn.modules.normalization.RMSNorm(normalized_shape,eps=None,elementwise_affine=True,device=None,dtype=None)[source]#

Applies Root Mean Square Layer Normalization over a mini-batch of inputs.

This layer implements the operation as described inthe paperRoot Mean Square Layer Normalization

yi=xiRMS(x)γi,whereRMS(x)=ϵ+1ni=1nxi2y_i = \frac{x_i}{\mathrm{RMS}(x)} * \gamma_i, \quad\text{where} \quad \text{RMS}(x) = \sqrt{\epsilon + \frac{1}{n} \sum_{i=1}^{n} x_i^2}

The RMS is taken over the lastD dimensions, whereDis the dimension ofnormalized_shape. For example, ifnormalized_shapeis(3,5) (a 2-dimensional shape), the RMS is computed overthe last 2 dimensions of the input.

Parameters
  • normalized_shape (int orlist ortorch.Size) –

    input shape from an expected inputof size

    [×normalized_shape[0]×normalized_shape[1]××normalized_shape[1]][* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]

    If a single integer is used, it is treated as a singleton list, and this module willnormalize over the last dimension which is expected to be of that specific size.

  • eps (Optional[float]) – a value added to the denominator for numerical stability. Default:torch.finfo(x.dtype).eps

  • elementwise_affine (bool) – a boolean value that when set toTrue, this modulehas learnable per-element affine parameters initialized to ones (for weights). Default:True.

Shape:
  • Input:(N,)(N, *)

  • Output:(N,)(N, *) (same shape as input)

Examples:

>>>rms_norm=nn.RMSNorm([2,3])>>>input=torch.randn(2,2,3)>>>rms_norm(input)
extra_repr()[source]#

Return the extra representation of the module.

Return type

str

forward(x)[source]#

Runs the forward pass.

Return type

Tensor

reset_parameters()[source]#

Resets parameters based on their initialization used in __init__.