LSTM#
- classtorch.nn.LSTM(input_size,hidden_size,num_layers=1,bias=True,batch_first=False,dropout=0.0,bidirectional=False,proj_size=0,device=None,dtype=None)[source]#
Apply a multi-layer long short-term memory (LSTM) RNN to an input sequence.For each element in the input sequence, each layer computes the followingfunction:
where is the hidden state at timet, is the cellstate at timet, is the input at timet,is the hidden state of the layer at timet-1 or the initial hiddenstate at time0, and,,, are the input, forget, cell, and output gates, respectively. is the sigmoid function, and is the Hadamard product.
In a multilayer LSTM, the input of the -th layer() is the hidden state of the previous layer multiplied bydropout where each is a Bernoulli randomvariable which is with probability
dropout.If
proj_size>0is specified, LSTM with projections will be used. This changesthe LSTM cell in the following way. First, the dimension of will be changed fromhidden_sizetoproj_size(dimensions of will be changed accordingly).Second, the output hidden state of each layer will be multiplied by a learnable projectionmatrix:. Note that as a consequence of this, the outputof LSTM network will be of different shape as well. See Inputs/Outputs sections below for exactdimensions of all variables. You can find more details inhttps://arxiv.org/abs/1402.1128.- Parameters
input_size – The number of expected features in the inputx
hidden_size – The number of features in the hidden stateh
num_layers – Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two LSTMs together to form astacked LSTM,with the second LSTM taking in outputs of the first LSTM andcomputing the final results. Default: 1bias – If
False, then the layer does not use bias weightsb_ih andb_hh.Default:Truebatch_first – If
True, then the input and output tensors are providedas(batch, seq, feature) instead of(seq, batch, feature).Note that this does not apply to hidden or cell states. See theInputs/Outputs sections below for details. Default:Falsedropout – If non-zero, introduces aDropout layer on the outputs of eachLSTM layer except the last layer, with dropout probability equal to
dropout. Default: 0bidirectional – If
True, becomes a bidirectional LSTM. Default:Falseproj_size – If
>0, will use LSTM with projections of corresponding size. Default: 0
- Inputs: input, (h_0, c_0)
input: tensor of shape for unbatched input, when
batch_first=Falseor whenbatch_first=Truecontaining the features ofthe input sequence. The input can also be a packed variable length sequence.Seetorch.nn.utils.rnn.pack_padded_sequence()ortorch.nn.utils.rnn.pack_sequence()for details.h_0: tensor of shape for unbatched input or containing theinitial hidden state for each element in the input sequence.Defaults to zeros if (h_0, c_0) is not provided.
c_0: tensor of shape for unbatched input or containing theinitial cell state for each element in the input sequence.Defaults to zeros if (h_0, c_0) is not provided.
where:
- Outputs: output, (h_n, c_n)
output: tensor of shape for unbatched input, when
batch_first=Falseor whenbatch_first=Truecontaining the output features(h_t) from the last layer of the LSTM, for eacht. If atorch.nn.utils.rnn.PackedSequencehas been given as the input, the outputwill also be a packed sequence. Whenbidirectional=True,output will containa concatenation of the forward and reverse hidden states at each time step in the sequence.h_n: tensor of shape for unbatched input or containing thefinal hidden state for each element in the sequence. When
bidirectional=True,h_n will contain a concatenation of the final forward and reverse hidden states, respectively.c_n: tensor of shape for unbatched input or containing thefinal cell state for each element in the sequence. When
bidirectional=True,c_n will contain a concatenation of the final forward and reverse cell states, respectively.
- Variables
weight_ih_l[k] – the learnable input-hidden weights of the layer(W_ii|W_if|W_ig|W_io), of shape(4*hidden_size, input_size) fork = 0.Otherwise, the shape is(4*hidden_size, num_directions * hidden_size). If
proj_size>0was specified, the shape will be(4*hidden_size, num_directions * proj_size) fork > 0weight_hh_l[k] – the learnable hidden-hidden weights of the layer(W_hi|W_hf|W_hg|W_ho), of shape(4*hidden_size, hidden_size). If
proj_size>0was specified, the shape will be(4*hidden_size, proj_size).bias_ih_l[k] – the learnable input-hidden bias of the layer(b_ii|b_if|b_ig|b_io), of shape(4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the layer(b_hi|b_hf|b_hg|b_ho), of shape(4*hidden_size)
weight_hr_l[k] – the learnable projection weights of the layerof shape(proj_size, hidden_size). Only present when
proj_size>0wasspecified.weight_ih_l[k]_reverse – Analogous toweight_ih_l[k] for the reverse direction.Only present when
bidirectional=True.weight_hh_l[k]_reverse – Analogous toweight_hh_l[k] for the reverse direction.Only present when
bidirectional=True.bias_ih_l[k]_reverse – Analogous tobias_ih_l[k] for the reverse direction.Only present when
bidirectional=True.bias_hh_l[k]_reverse – Analogous tobias_hh_l[k] for the reverse direction.Only present when
bidirectional=True.weight_hr_l[k]_reverse – Analogous toweight_hr_l[k] for the reverse direction.Only present when
bidirectional=Trueandproj_size>0was specified.
Note
All the weights and biases are initialized fromwhere
Note
For bidirectional LSTMs, forward and backward are directions 0 and 1 respectively.Example of splitting the output layers when
batch_first=False:output.view(seq_len,batch,num_directions,hidden_size).Note
For bidirectional LSTMs,h_n is not equivalent to the last element ofoutput; theformer contains the final forward and reverse hidden states, while the latter contains thefinal forward hidden state and the initial reverse hidden state.
Note
batch_firstargument is ignored for unbatched inputs.Note
proj_sizeshould be smaller thanhidden_size.Warning
There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA.You can enforce deterministic behavior by setting the following environment variables:
On CUDA 10.1, set environment variable
CUDA_LAUNCH_BLOCKING=1.This may affect performance.On CUDA 10.2 or later, set environment variable(note the leading colon symbol)
CUBLAS_WORKSPACE_CONFIG=:16:8orCUBLAS_WORKSPACE_CONFIG=:4096:2.See thecuDNN 8 Release Notes for more information.
Note
If the following conditions are satisfied:1) cudnn is enabled,2) input data is on the GPU3) input data has dtype
torch.float164) V100 GPU is used,5) input data is not inPackedSequenceformatpersistent algorithm can be selected to improve performance.Examples:
>>>rnn=nn.LSTM(10,20,2)>>>input=torch.randn(5,3,10)>>>h0=torch.randn(2,3,20)>>>c0=torch.randn(2,3,20)>>>output,(hn,cn)=rnn(input,(h0,c0))