RNN#
- classtorch.nn.modules.rnn.RNN(input_size,hidden_size,num_layers=1,nonlinearity='tanh',bias=True,batch_first=False,dropout=0.0,bidirectional=False,device=None,dtype=None)[source]#
Apply a multi-layer Elman RNN with ornon-linearity to an input sequence. For each element in the input sequence,each layer computes the following function:
where is the hidden state at timet, isthe input at timet, and is the hidden state of theprevious layer at timet-1 or the initial hidden state at time0.If
nonlinearityis'relu', then is used instead of.# Efficient implementation equivalent to the following with bidirectional=Falsernn=nn.RNN(input_size,hidden_size,num_layers)params=dict(rnn.named_parameters())defforward(x,hx=None,batch_first=False):ifbatch_first:x=x.transpose(0,1)seq_len,batch_size,_=x.size()ifhxisNone:hx=torch.zeros(rnn.num_layers,batch_size,rnn.hidden_size)h_t_minus_1=hx.clone()h_t=hx.clone()output=[]fortinrange(seq_len):forlayerinrange(rnn.num_layers):input_t=x[t]iflayer==0elseh_t[layer-1]h_t[layer]=torch.tanh(input_t@params[f"weight_ih_l{layer}"].T+h_t_minus_1[layer]@params[f"weight_hh_l{layer}"].T+params[f"bias_hh_l{layer}"]+params[f"bias_ih_l{layer}"])output.append(h_t[-1].clone())h_t_minus_1=h_t.clone()output=torch.stack(output)ifbatch_first:output=output.transpose(0,1)returnoutput,h_t
- Parameters
input_size – The number of expected features in the inputx
hidden_size – The number of features in the hidden stateh
num_layers – Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two RNNs together to form astacked RNN,with the second RNN taking in outputs of the first RNN andcomputing the final results. Default: 1nonlinearity – The non-linearity to use. Can be either
'tanh'or'relu'. Default:'tanh'bias – If
False, then the layer does not use bias weightsb_ih andb_hh.Default:Truebatch_first – If
True, then the input and output tensors are providedas(batch, seq, feature) instead of(seq, batch, feature).Note that this does not apply to hidden or cell states. See theInputs/Outputs sections below for details. Default:Falsedropout – If non-zero, introduces aDropout layer on the outputs of eachRNN layer except the last layer, with dropout probability equal to
dropout. Default: 0bidirectional – If
True, becomes a bidirectional RNN. Default:False
- Inputs: input, hx
input: tensor of shape for unbatched input, when
batch_first=Falseor whenbatch_first=Truecontaining the features ofthe input sequence. The input can also be a packed variable length sequence.Seetorch.nn.utils.rnn.pack_padded_sequence()ortorch.nn.utils.rnn.pack_sequence()for details.hx: tensor of shape for unbatched input or containing the initial hiddenstate for the input sequence batch. Defaults to zeros if not provided.
where:
- Outputs: output, h_n
output: tensor of shape for unbatched input, when
batch_first=Falseor whenbatch_first=Truecontaining the output features(h_t) from the last layer of the RNN, for eacht. If atorch.nn.utils.rnn.PackedSequencehas been given as the input, the outputwill also be a packed sequence.h_n: tensor of shape for unbatched input or containing the final hidden statefor each element in the batch.
- Variables
weight_ih_l[k] – the learnable input-hidden weights of the k-th layer,of shape(hidden_size, input_size) fork = 0. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer,of shape(hidden_size, hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the k-th layer,of shape(hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer,of shape(hidden_size)
Note
All the weights and biases are initialized fromwhere
Note
For bidirectional RNNs, forward and backward are directions 0 and 1 respectively.Example of splitting the output layers when
batch_first=False:output.view(seq_len,batch,num_directions,hidden_size).Note
batch_firstargument is ignored for unbatched inputs.Warning
There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA.You can enforce deterministic behavior by setting the following environment variables:
On CUDA 10.1, set environment variable
CUDA_LAUNCH_BLOCKING=1.This may affect performance.On CUDA 10.2 or later, set environment variable(note the leading colon symbol)
CUBLAS_WORKSPACE_CONFIG=:16:8orCUBLAS_WORKSPACE_CONFIG=:4096:2.See thecuDNN 8 Release Notes for more information.
Note
If the following conditions are satisfied:1) cudnn is enabled,2) input data is on the GPU3) input data has dtype
torch.float164) V100 GPU is used,5) input data is not inPackedSequenceformatpersistent algorithm can be selected to improve performance.Examples:
>>>rnn=nn.RNN(10,20,2)>>>input=torch.randn(5,3,10)>>>h0=torch.randn(2,3,20)>>>output,hn=rnn(input,h0)
- forward(input:Tensor,hx:Optional[Tensor]=None)→tuple[torch.Tensor,torch.Tensor][source]#
- forward(input:PackedSequence,hx:Optional[Tensor]=None)→tuple[torch.nn.utils.rnn.PackedSequence,torch.Tensor]
Runs the forward pass.