Rate this Page

★★★★★

torch.nn.init #

Created On: Jun 11, 2019 | Last Updated On: Jul 07, 2022

Warning

All the functions in this module are intended to be used to initialize neural networkparameters, so they all run intorch.no_grad() mode and will not be taken intoaccount by autograd.

torch.nn.init.calculate_gain(nonlinearity,param=None)[source]#

Return the recommended gain value for the given nonlinearity function.

The values are as follows:

nonlinearity	gain
Linear / Identity	$11$
Conv{1,2,3}D	$11$
Sigmoid	$11$
Tanh	$\frac{5}{3} \frac{5}{3}$
ReLU	$\sqrt{2} \sqrt{2}$
Leaky Relu	$\sqrt{\frac{2}{1 + {negative_slope}^{2}}} \sqrt{\frac{2}{1 + \text{negative\_slope}^2}}$
SELU	$\frac{3}{4} \frac{3}{4}$

Warning

In order to implementSelf-Normalizing Neural Networks ,you should usenonlinearity='linear' instead ofnonlinearity='selu'.This gives the initial weights a variance of1/N,which is necessary to induce a stable fixed point in the forward pass.In contrast, the default gain forSELU sacrifices the normalizationeffect for more stable gradient flow in rectangular layers.

Parameters

nonlinearity (Literal['linear','conv1d','conv2d','conv3d','conv_transpose1d','conv_transpose2d','conv_transpose3d','sigmoid','tanh','relu','leaky_relu','selu']) – the non-linear function (nn.functional name)
param (Optional[Union[int,float]]) – optional parameter for the non-linear function

Return type

float

Examples

>>>gain=nn.init.calculate_gain(..."leaky_relu",0.2...)# leaky_relu with negative_slope=0.2

torch.nn.init.uniform_(tensor,a=0.0,b=1.0,generator=None)[source]#

Fill the input Tensor with values drawn from the uniform distribution.

$U (a, b) \mathcal{U}(a, b)$ .

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
a (float) – the lower bound of the uniform distribution
b (float) – the upper bound of the uniform distribution
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.uniform_(w)

torch.nn.init.normal_(tensor,mean=0.0,std=1.0,generator=None)[source]#

Fill the input Tensor with values drawn from the normal distribution.

$N (mean, {std}^{2}) \mathcal{N}(\text{mean}, \text{std}^2)$ .

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
mean (float) – the mean of the normal distribution
std (float) – the standard deviation of the normal distribution
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.normal_(w)

torch.nn.init.constant_(tensor,val)[source]#

Fill the input Tensor with the value $val \text{val}$ .

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
val (float) – the value to fill the tensor with

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.constant_(w,0.3)

torch.nn.init.ones_(tensor)[source]#

Fill the input Tensor with the scalar value1.

Parameters: tensor (Tensor) – an n-dimensionaltorch.Tensor
Return type: Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.ones_(w)

torch.nn.init.zeros_(tensor)[source]#

Fill the input Tensor with the scalar value0.

Parameters: tensor (Tensor) – an n-dimensionaltorch.Tensor
Return type: Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.zeros_(w)

torch.nn.init.eye_(tensor)[source]#

Fill the 2-dimensional inputTensor with the identity matrix.

Preserves the identity of the inputs inLinear layers, where asmany inputs are preserved as possible.

Parameters: tensor (Tensor) – a 2-dimensionaltorch.Tensor
Return type: Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.eye_(w)

torch.nn.init.dirac_(tensor,groups=1)[source]#

Fill the {3, 4, 5}-dimensional inputTensor with the Dirac delta function.

Preserves the identity of the inputs inConvolutionallayers, where as many input channels are preserved as possible. In caseof groups>1, each group of channels preserves identity

Parameters

tensor (Tensor) – a {3, 4, 5}-dimensionaltorch.Tensor
groups (int,optional) – number of groups in the conv layer (default: 1)

Return type

Tensor

Examples

>>>w=torch.empty(3,16,5,5)>>>nn.init.dirac_(w)>>>w=torch.empty(3,24,5,5)>>>nn.init.dirac_(w,3)

torch.nn.init.xavier_uniform_(tensor,gain=1.0,generator=None)[source]#

Fill the inputTensor with values using a Xavier uniform distribution.

The method is described inUnderstanding the difficulty of trainingdeep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).The resulting tensor will have values sampled from $U (- a, a) \mathcal{U}(-a, a)$ where

a = gain \times \sqrt{\frac{6}{fan_in + fan_out}} a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
gain (float) – an optional scaling factor
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.xavier_uniform_(w,gain=nn.init.calculate_gain("relu"))

torch.nn.init.xavier_normal_(tensor,gain=1.0,generator=None)[source]#

Fill the inputTensor with values using a Xavier normal distribution.

The method is described inUnderstanding the difficulty of training deep feedforwardneural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensorwill have values sampled from $N (0, {std}^{2}) \mathcal{N}(0, \text{std}^2)$ where

std = gain \times \sqrt{\frac{2}{fan_in + fan_out}} \text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
gain (float) – an optional scaling factor
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.xavier_normal_(w)

torch.nn.init.kaiming_uniform_(tensor,a=0,mode='fan_in',nonlinearity='leaky_relu',generator=None)[source]#

Fill the inputTensor with values using a Kaiming uniform distribution.

The method is described inDelving deep into rectifiers: Surpassinghuman-level performance on ImageNet classification - He, K. et al. (2015).The resulting tensor will have values sampled from $U (- bound, bound) \mathcal{U}(-\text{bound}, \text{bound})$ where

bound = gain \times \sqrt{\frac{3}{fan_mode}} \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}

Also known as He initialization.

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
a (float) – the negative slope of the rectifier used after this layer (onlyused with'leaky_relu')
mode (Literal['fan_in','fan_out']) – either'fan_in' (default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in theforward pass. Choosing'fan_out' preserves the magnitudes in thebackwards pass.
nonlinearity (Literal['linear','conv1d','conv2d','conv3d','conv_transpose1d','conv_transpose2d','conv_transpose3d','sigmoid','tanh','relu','leaky_relu','selu']) – the non-linear function (nn.functional name),recommended to use only with'relu' or'leaky_relu' (default).
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.kaiming_uniform_(w,mode="fan_in",nonlinearity="relu")

Note

Be aware thatfan_in andfan_out are calculated assumingthat the weight matrix is used in a transposed manner,(i.e.,x@w.T inLinear layers, wherew.shape=[fan_out,fan_in]).This is important for correct initialization.If you plan to usex@w, wherew.shape=[fan_in,fan_out],pass in a transposed weight matrix, i.e.nn.init.kaiming_uniform_(w.T,...).

torch.nn.init.kaiming_normal_(tensor,a=0,mode='fan_in',nonlinearity='leaky_relu',generator=None)[source]#

Fill the inputTensor with values using a Kaiming normal distribution.

The method is described inDelving deep into rectifiers: Surpassinghuman-level performance on ImageNet classification - He, K. et al. (2015).The resulting tensor will have values sampled from $N (0, {std}^{2}) \mathcal{N}(0, \text{std}^2)$ where

std = \frac{gain}{\sqrt{fan_mode}} \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}

Also known as He initialization.

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
a (float) – the negative slope of the rectifier used after this layer (onlyused with'leaky_relu')
mode (Literal['fan_in','fan_out']) – either'fan_in' (default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in theforward pass. Choosing'fan_out' preserves the magnitudes in thebackwards pass.
nonlinearity (Literal['linear','conv1d','conv2d','conv3d','conv_transpose1d','conv_transpose2d','conv_transpose3d','sigmoid','tanh','relu','leaky_relu','selu']) – the non-linear function (nn.functional name),recommended to use only with'relu' or'leaky_relu' (default).
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.kaiming_normal_(w,mode="fan_out",nonlinearity="relu")

Note

torch.nn.init.trunc_normal_(tensor,mean=0.0,std=1.0,a=-2.0,b=2.0,generator=None)[source]#

Fill the input Tensor with values drawn from a truncated normal distribution.

The values are effectively drawn from thenormal distribution $N (mean, {std}^{2}) \mathcal{N}(\text{mean}, \text{std}^2)$ with values outside $[a, b] [a, b]$ redrawn until they are withinthe bounds. The method used for generating the random values worksbest when $a \leq mean \leq b a \leq \text{mean} \leq b$ .

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
mean (float) – the mean of the normal distribution
std (float) – the standard deviation of the normal distribution
a (float) – the minimum cutoff value
b (float) – the maximum cutoff value
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.trunc_normal_(w)

torch.nn.init.orthogonal_(tensor,gain=1,generator=None)[source]#

Fill the inputTensor with a (semi) orthogonal matrix.

Described inExact solutions to the nonlinear dynamics of learning in deeplinear neural networks - Saxe, A. et al. (2013). The input tensor must haveat least 2 dimensions, and for tensors with more than 2 dimensions thetrailing dimensions are flattened.

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor, where $n \geq 2 n \geq 2$
gain (float) – optional scaling factor
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.orthogonal_(w)

torch.nn.init.sparse_(tensor,sparsity,std=0.01,generator=None)[source]#

Fill the 2D inputTensor as a sparse matrix.

The non-zero elements will be drawn from the normal distribution $N (0, 0.01) \mathcal{N}(0, 0.01)$ , as described inDeep learning viaHessian-free optimization - Martens, J. (2010).

Parameters

tensor (Tensor) – an n-dimensionaltorch.Tensor
sparsity (float) – The fraction of elements in each column to be set to zero
std (float) – the standard deviation of the normal distribution used to generatethe non-zero values
generator (Optional[Generator]) – the torch Generator to sample from (default: None)

Return type

Tensor

Examples

>>>w=torch.empty(3,5)>>>nn.init.sparse_(w,sparsity=0.1)

On this page

Edit on GitHub

Show Source

PyTorch Libraries

Movatterモバイル変換

torch.nn.init #

Docs

Tutorials

Resources

Movatterモバイル変換

torch.nn.init#

Docs

Tutorials

Resources

torch.nn.init #