Movatterモバイル変換


[0]ホーム

URL:


Navigation

MachineLearningMastery.com

Making developers awesome at machine learning

Making developers awesome at machine learning

How to Develop LSTM Models for Time Series Forecasting

Long Short-Term Memory networks, orLSTMs for short, can be applied totime series forecasting.

There are many types of LSTM models that can be used for each specific type of time series forecasting problem.

In this tutorial, you will discover how to develop a suite of LSTM models for a range of standardtime series forecasting problems.

The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.

After completing this tutorial, you will know:

  • How to develop LSTM models for univariate time series forecasting.
  • How to develop LSTM models for multivariate time series forecasting.
  • How to develop LSTM models for multi-step time series forecasting.

This is a large and important post; you may want to bookmark it for future reference.

Kick-start your project with my new bookDeep Learning for Time Series Forecasting, includingstep-by-step tutorials and thePython source code files for all examples.

Let’s get started.

How to Develop LSTM Models for Time Series Forecasting

How to Develop LSTM Models for Time Series Forecasting
Photo byN i c o l a, some rights reserved.

Tutorial Overview

In this tutorial, we will explore how to develop a suite of different types of LSTM models for time series forecasting.

The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.

This tutorial is divided into four parts; they are:

  1. Univariate LSTM Models
    1. Data Preparation
    2. Vanilla LSTM
    3. Stacked LSTM
    4. Bidirectional LSTM
    5. CNN LSTM
    6. ConvLSTM
  2. Multivariate LSTM Models
    1. Multiple Input Series.
    2. Multiple Parallel Series.
  3. Multi-Step LSTM Models
    1. Data Preparation
    2. Vector Output Model
    3. Encoder-Decoder Model
  4. Multivariate Multi-Step LSTM Models
    1. Multiple Input Multi-Step Output.
    2. Multiple Parallel Input and Multi-Step Output.

Univariate LSTM Models

LSTMs can be used to model univariate time series forecasting problems.

These are problems comprised of a single series of observations and a model is required to learn from the series of past observations to predict the next value in the sequence.

We will demonstrate a number of variations of the LSTM model for univariate time series forecasting.

This section is divided into six parts; they are:

  1. Data Preparation
  2. Vanilla LSTM
  3. Stacked LSTM
  4. Bidirectional LSTM
  5. CNN LSTM
  6. ConvLSTM

Each of these models are demonstrated for one-step univariate time series forecasting, but can easily be adapted and used as the input part of a model for other types of time series forecasting problems.

Data Preparation

Before a univariate series can be modeled, it must be prepared.

The LSTM model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the LSTM can learn.

Consider a given univariate sequence:

1
[10, 20, 30, 40, 50, 60, 70, 80, 90]

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

1
2
3
4
5
X,y
10, 20, 3040
20, 30, 4050
30, 40, 5060
...

Thesplit_sequence() function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# univariate data preparation
fromnumpyimportarray
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=3
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.

1
2
3
4
5
6
[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90

Now that we know how to prepare a univariate series for modeling, let’s look at developing LSTM models that can learn the mapping of inputs to outputs, starting with a Vanilla LSTM.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Vanilla LSTM

A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

We can define a Vanilla LSTM for univariate time series forecasting as follows.

1
2
3
4
5
6
...
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.

We are working with a univariate series, so the number of features is one, for one variable.

The number of time steps as input is the number we chose when preparing our dataset as an argument to thesplit_sequence() function.

The shape of the input for each sample is specified in theinput_shape argument on the definition of first hidden layer.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:

1
[samples, timesteps, features]

Oursplit_sequence() function in the previous section outputs the X with the shape [samples, timesteps], so we easily reshape it to have an additional dimension for the one feature.

1
2
3
4
...
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))

In this case, we define a model with 50 LSTM units in the hidden layer and an output layer that predicts a single numerical value.

The model is fit using the efficientAdam version of stochastic gradient descent and optimized using the mean squared error, or ‘mse‘ loss function.

Once the model is defined, we can fit it on the training dataset.

1
2
3
...
# fit model
model.fit(X,y,epochs=200,verbose=0)

After the model is fit, we can use it to make a prediction.

We can predict the next value in the sequence by providing the input:

1
[70, 80, 90]

And expecting the model to predict something like:

1
[100]

The model expects the input shape to be three-dimensional with [samples, timesteps, features], therefore, we must reshape the single input sample before making the prediction.

1
2
3
4
5
...
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)

We can tie all of this together and demonstrate how to develop a Vanilla LSTM for univariate time series forecasting and make a single prediction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# univariate lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=3
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=200,verbose=0)
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Running the example prepares the data, fits the model, and makes a prediction.

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model predicts the next value in the sequence.

1
[[102.09213]]

Stacked LSTM

Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.

An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.

We can address this by having the LSTM output a value for each time step in the input data by setting thereturn_sequences=True argument on the layer. This allows us to have 3D output from hidden LSTM layer as input to the next.

We can therefore define a Stacked LSTM as follows.

1
2
3
4
5
6
7
...
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',return_sequences=True,input_shape=(n_steps,n_features)))
model.add(LSTM(50,activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

We can tie this together; the complete code example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# univariate stacked lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a univariate sequence
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=3
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',return_sequences=True,input_shape=(n_steps,n_features)))
model.add(LSTM(50,activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=200,verbose=0)
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

1
[[102.47341]]

Bidirectional LSTM

On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

This is called aBidirectional LSTM.

We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.

1
2
3
4
5
6
...
# define model
model=Sequential()
model.add(Bidirectional(LSTM(50,activation='relu'),input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

The complete example of the Bidirectional LSTM for univariate time series forecasting is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# univariate bidirectional lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportBidirectional
 
# split a univariate sequence
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=3
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))
# define model
model=Sequential()
model.add(Bidirectional(LSTM(50,activation='relu'),input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=200,verbose=0)
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

1
[[101.48093]]

CNN LSTM

A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.

The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.

A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret.This hybrid model is called a CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.

We can parameterize this and define the number of subsequences asn_seq and the number of time steps per subsequence asn_steps. The input data can then be reshaped to have the required structure:

1
[samples, subsequences, timesteps, features]

For example:

1
2
3
4
5
6
7
8
9
10
...
# choose a number of time steps
n_steps=4
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features=1
n_seq=2
n_steps=2
X=X.reshape((X.shape[0],n_seq,n_steps,n_features))

We want to reuse the same CNN model when reading in each sub-sequence of data separately.

This can be achieved by wrapping the entire CNN model in aTimeDistributed wrapper that will apply the entire model once per input, in this case, once per input subsequence.

The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each ‘read’ operation of the input sequence.

The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.

1
2
3
4
...
model.add(TimeDistributed(Conv1D(filters=64,kernel_size=1,activation='relu'),input_shape=(None,n_steps,n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))

Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input sequence and makes a prediction.

1
2
3
...
model.add(LSTM(50,activation='relu'))
model.add(Dense(1))

We can tie all of this together; the complete example of a CNN-LSTM model for univariate time series forecasting is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# univariate cnn lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportFlatten
fromkeras.layersimportTimeDistributed
fromkeras.layers.convolutionalimportConv1D
fromkeras.layers.convolutionalimportMaxPooling1D
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=4
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features=1
n_seq=2
n_steps=2
X=X.reshape((X.shape[0],n_seq,n_steps,n_features))
# define model
model=Sequential()
model.add(TimeDistributed(Conv1D(filters=64,kernel_size=1,activation='relu'),input_shape=(None,n_steps,n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50,activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=500,verbose=0)
# demonstrate prediction
x_input=array([60,70,80,90])
x_input=x_input.reshape((1,n_seq,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

1
[[101.69263]]

ConvLSTM

A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.

The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.

The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

1
[samples, timesteps, rows, columns, features]

For our purposes, we can split each sample into subsequences where timesteps will become the number of subsequences, orn_seq, and columns will be the number of time steps for each subsequence, orn_steps. The number of rows is fixed at 1 as we are working with one-dimensional data.

We can now reshape the prepared samples into the required structure.

1
2
3
4
5
6
7
8
9
10
...
# choose a number of time steps
n_steps=4
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features=1
n_seq=2
n_steps=2
X=X.reshape((X.shape[0],n_seq,1,n_steps,n_features))

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.

The output of the model must then be flattened before it can be interpreted and a prediction made.

1
2
3
...
model.add(ConvLSTM2D(filters=64,kernel_size=(1,2),activation='relu',input_shape=(n_seq,1,n_steps,n_features)))
model.add(Flatten())

The complete example of a ConvLSTM for one-step univariate time series forecasting is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# univariate convlstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportFlatten
fromkeras.layersimportConvLSTM2D
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the sequence
ifend_ix>len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps=4
# split into samples
X,y=split_sequence(raw_seq,n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features=1
n_seq=2
n_steps=2
X=X.reshape((X.shape[0],n_seq,1,n_steps,n_features))
# define model
model=Sequential()
model.add(ConvLSTM2D(filters=64,kernel_size=(1,2),activation='relu',input_shape=(n_seq,1,n_steps,n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=500,verbose=0)
# demonstrate prediction
x_input=array([60,70,80,90])
x_input=x_input.reshape((1,n_seq,1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example predicts the next value in the sequence, which we expect would be 100.

1
[[103.68166]]

Now that we have looked at LSTM models for univariate data, let’s turn our attention to multivariate data.

Multivariate LSTM Models

Multivariate time series data means data where there is more than one observation for each time step.

There are two main models that we may require with multivariate time series data; they are:

  1. Multiple Input Series.
  2. Multiple Parallel Series.

Let’s take a look at each in turn.

Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.

The input time series are parallel because each series has an observation at the same time steps.

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

1
2
3
4
5
...
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])

We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.

1
2
3
4
5
6
7
...
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# multivariate data preparation
fromnumpyimportarray
fromnumpyimporthstack
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
print(dataset)

Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.

1
2
3
4
5
6
7
8
9
[[ 10  15  25]
[ 20  25  45]
[ 30  35  65]
[ 40  45  85]
[ 50  55 105]
[ 60  65 125]
[ 70  75 145]
[ 80  85 165]
[ 90  95 185]]

As with the univariate time series, we must structure these data into samples with input and output elements.

An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.

If we chose three input time steps, then the first sample would look as follows:

Input:

1
2
3
10, 15
20, 25
30, 35

Output:

1
65

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function namedsplit_sequences() that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input/output samples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can test this function on our dataset using three time steps for each input time series as input.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# multivariate data preparation
fromnumpyimportarray
fromnumpyimporthstack
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps=3
# convert into input/output
X,y=split_sequences(dataset,n_steps)
print(X.shape,y.shape)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example first prints the shape of the X and y components.

We can see that the X component has a three-dimensional structure.

The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by an LSTM as input. The data is ready to use without further reshaping.

We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(7, 3, 2) (7,)
 
[[10 15]
[20 25]
[30 35]] 65
[[20 25]
[30 35]
[40 45]] 85
[[30 35]
[40 45]
[50 55]] 105
[[40 45]
[50 55]
[60 65]] 125
[[50 55]
[60 65]
[70 75]] 145
[[60 65]
[70 75]
[80 85]] 165
[[70 75]
[80 85]
[90 95]] 185

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified for the input layer via theinput_shape argument.

1
2
3
4
5
6
...
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

When making a prediction, the model expects three time steps for two input time series.

We can predict the next value in the output series providing the input values of:

1
2
3
80, 85
90, 95
100, 105

The shape of the one sample with three time steps and two variables must be [1, 3, 2].

We would expect the next value in the sequence to be 100 + 105, or 205.

1
2
3
4
5
...
# demonstrate prediction
x_input=array([[80,85],[90,95],[100,105]])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# multivariate lstm example
fromnumpyimportarray
fromnumpyimporthstack
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps=3
# convert into input/output
X,y=split_sequences(dataset,n_steps)
# the dataset knows the number of features, e.g. 2
n_features=X.shape[2]
# define model
model=Sequential()
model.add(LSTM(50,activation='relu',input_shape=(n_steps,n_features)))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=200,verbose=0)
# demonstrate prediction
x_input=array([[80,85],[90,95],[100,105]])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

1
[[208.13531]]

Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.

For example, given the data from the previous section:

1
2
3
4
5
6
7
8
9
[[ 10  15  25]
[ 20  25  45]
[ 30  35  65]
[ 40  45  85]
[ 50  55 105]
[ 60  65 125]
[ 70  75 145]
[ 80  85 165]
[ 90  95 185]]

We may want to predict the value for each of the three time series for the next time step.

This might be referred to as multivariate forecasting.

Again, the data must be split into input/output samples in order to train a model.

The first sample of this dataset would be:

Input:

1
2
3
10, 15, 25
20, 25, 45
30, 35, 65

Output:

1
40, 45, 85

Thesplit_sequences() function below will split multiple parallel time series with rows for time steps and one series per column into the required input/output shape.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can demonstrate this on the contrived problem; the complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# multivariate output data prep
fromnumpyimportarray
fromnumpyimporthstack
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps=3
# convert into input/output
X,y=split_sequences(dataset,n_steps)
print(X.shape,y.shape)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example first prints the shape of the prepared X and y components.

The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).

The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).

The data is ready to use in an LSTM model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.

Then, each of the samples is printed showing the input and output components of each sample.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(6, 3, 3) (6, 3)
 
[[10 15 25]
[20 25 45]
[30 35 65]] [40 45 85]
[[20 25 45]
[30 35 65]
[40 45 85]] [ 50  55 105]
[[ 30  35  65]
[ 40  45  85]
[ 50  55 105]] [ 60  65 125]
[[ 40  45  85]
[ 50  55 105]
[ 60  65 125]] [ 70  75 145]
[[ 50  55 105]
[ 60  65 125]
[ 70  75 145]] [ 80  85 165]
[[ 60  65 125]
[ 70  75 145]
[ 80  85 165]] [ 90  95 185]

We are now ready to fit an LSTM model on this data.

Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.

We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for the input layer via theinput_shape argument. The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

1
2
3
4
5
6
7
...
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',return_sequences=True,input_shape=(n_steps,n_features)))
model.add(LSTM(100,activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam',loss='mse')

We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.

1
2
3
70, 75, 145
80, 85, 165
90, 95, 185

The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3]

1
2
3
4
5
...
# demonstrate prediction
x_input=array([[70,75,145],[80,85,165],[90,95,185]])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)

We would expect the vector output to be:

1
[100, 105, 205]

We can tie all of this together and demonstrate a Stacked LSTM for multivariate output time series forecasting below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# multivariate output stacked lstm example
fromnumpyimportarray
fromnumpyimporthstack
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps
# check if we are beyond the dataset
ifend_ix>len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps=3
# convert into input/output
X,y=split_sequences(dataset,n_steps)
# the dataset knows the number of features, e.g. 2
n_features=X.shape[2]
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',return_sequences=True,input_shape=(n_steps,n_features)))
model.add(LSTM(100,activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=400,verbose=0)
# demonstrate prediction
x_input=array([[70,75,145],[80,85,165],[90,95,185]])
x_input=x_input.reshape((1,n_steps,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the data, fits the model, and makes a prediction.

1
[[101.76599 108.730484 206.63577 ]]

Multi-Step LSTM Models

A time series forecasting problem that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting.

Specifically, these are problems where the forecast horizon or interval is more than one time step.

There are two main types of LSTM models that can be used for multi-step forecasting; they are:

  1. Vector Output Model
  2. Encoder-Decoder Model

Before we look at these models, let’s first look at the preparation of data for multi-step forecasting.

Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.

For example, given the univariate time series:

1
[10, 20, 30, 40, 50, 60, 70, 80, 90]

We could use the last three time steps as input and forecast the next two time steps.

The first sample would look as follows:

Input:

1
[10, 20, 30]

Output:

1
[40, 50]

Thesplit_sequence() function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the sequence
ifout_end_ix>len(sequence):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# multi-step data preparation
fromnumpyimportarray
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the sequence
ifout_end_ix>len(sequence):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# split into samples
X,y=split_sequence(raw_seq,n_steps_in,n_steps_out)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example splits the univariate series into input and output time steps and prints the input and output components of each.

1
2
3
4
5
[10 20 30] [40 50]
[20 30 40] [50 60]
[30 40 50] [60 70]
[40 50 60] [70 80]
[50 60 70] [80 90]

Now that we know how to prepare data for multi-step forecasting, let’s look at some LSTM models that can learn this mapping.

Vector Output Model

Like other types of neural network models, the LSTM can output a vector directly that can be interpreted as a multi-step forecast.

This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.

As with the LSTMs for univariate data in a prior section, the prepared samples must first be reshaped. The LSTM expects data to have a three-dimensional structure of [samples, timesteps, features], and in this case, we only have one feature so the reshape is straightforward.

1
2
3
4
...
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))

With the number of input and output steps specified in then_steps_in andn_steps_out variables, we can define a multi-step time-series forecasting model.

Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional, CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.

1
2
3
4
5
6
7
...
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',return_sequences=True,input_shape=(n_steps_in,n_features)))
model.add(LSTM(100,activation='relu'))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam',loss='mse')

The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:

1
[70, 80, 90]

We would expect the predicted output to be:

1
[100, 110]

As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.

1
2
3
4
5
...
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps_in,n_features))
yhat=model.predict(x_input,verbose=0)

Tying all of this together, the Stacked LSTM for multi-step forecasting with a univariate time series is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# univariate multi-step vector-output stacked lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the sequence
ifout_end_ix>len(sequence):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# split into samples
X,y=split_sequence(raw_seq,n_steps_in,n_steps_out)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',return_sequences=True,input_shape=(n_steps_in,n_features)))
model.add(LSTM(100,activation='relu'))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=50,verbose=0)
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps_in,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example forecasts and prints the next two time steps in the sequence.

1
[[100.98096 113.28924]]

Encoder-Decoder Model

A model specifically developed for forecasting variable length output sequences is called theEncoder-Decoder LSTM.

The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another.

This model can be used for multi-step time series forecasting.

As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.

The encoder is a model responsible for reading and interpreting the input sequence. The output of the encoder is a fixed length vector that represents the model’s interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.

1
2
...
model.add(LSTM(100,activation='relu',input_shape=(n_steps_in,n_features)))

The decoder uses the output of the encoder as an input.

First, the fixed-length output of the encoder is repeated, once for each required time step in the output sequence.

1
2
...
model.add(RepeatVector(n_steps_out))

This sequence is then provided to an LSTM decoder model. The model must output a value for each value in the output time step, which can be interpreted by a single output model.

1
2
...
model.add(LSTM(100,activation='relu',return_sequences=True))

We can use the same output layer or layers to make each one-step prediction in the output sequence. This can be achieved by wrapping the output part of the model in aTimeDistributed wrapper.

1
2
....
model.add(TimeDistributed(Dense(1)))

The full definition for an Encoder-Decoder model for multi-step time series forecasting is listed below.

1
2
3
4
5
6
7
8
...
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',input_shape=(n_steps_in,n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(100,activation='relu',return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam',loss='mse')

As with other LSTM models, the input data must be reshaped into the expected three-dimensional shape of [samples, timesteps, features].

1
2
...
X=X.reshape((X.shape[0],X.shape[1],n_features))

In the case of the Encoder-Decoder model, the output, or y part, of the training dataset must also have this shape. This is because the model will predict a given number of time steps with a given number of features for each input sample.

1
2
...
y=y.reshape((y.shape[0],y.shape[1],n_features))

The complete example of an Encoder-Decoder LSTM for multi-step time series forecasting is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# univariate multi-step encoder-decoder lstm example
fromnumpyimportarray
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportRepeatVector
fromkeras.layersimportTimeDistributed
 
# split a univariate sequence into samples
defsplit_sequence(sequence,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequence)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the sequence
ifout_end_ix>len(sequence):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequence[i:end_ix],sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
raw_seq=[10,20,30,40,50,60,70,80,90]
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# split into samples
X,y=split_sequence(raw_seq,n_steps_in,n_steps_out)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features=1
X=X.reshape((X.shape[0],X.shape[1],n_features))
y=y.reshape((y.shape[0],y.shape[1],n_features))
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',input_shape=(n_steps_in,n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(100,activation='relu',return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=100,verbose=0)
# demonstrate prediction
x_input=array([70,80,90])
x_input=x_input.reshape((1,n_steps_in,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example forecasts and prints the next two time steps in the sequence.

1
2
[[[101.9736  
  [116.213615]]]

Multivariate Multi-Step LSTM Models

In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.

It is possible to mix and match the different types of LSTM models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.

In this section, we will provide short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:

  1. Multiple Input Multi-Step Output.
  2. Multiple Parallel Input and Multi-Step Output.

Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.

Multiple Input Multi-Step Output

There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.

For example, consider our multivariate time series from a prior section:

1
2
3
4
5
6
7
8
9
[[ 10  15  25]
[ 20  25  45]
[ 30  35  65]
[ 40  45  85]
[ 50  55 105]
[ 60  65 125]
[ 70  75 145]
[ 80  85 165]
[ 90  95 185]]

We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.

Input:

1
2
3
10, 15
20, 25
30, 35

Output:

1
2
65
85

Thesplit_sequences() function below implements this behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out-1
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1:out_end_ix,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can demonstrate this on our contrived dataset.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# multivariate multi-step data preparation
fromnumpyimportarray
fromnumpyimporthstack
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out-1
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1:out_end_ix,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# covert into input/output
X,y=split_sequences(dataset,n_steps_in,n_steps_out)
print(X.shape,y.shape)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example first prints the shape of the prepared training data.

We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps, and two variables for the 2 input time series.

The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.

The prepared samples are then printed to confirm that the data was prepared as we specified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(6, 3, 2) (6, 2)
 
[[10 15]
[20 25]
[30 35]] [65 85]
[[20 25]
[30 35]
[40 45]] [ 85 105]
[[30 35]
[40 45]
[50 55]] [105 125]
[[40 45]
[50 55]
[60 65]] [125 145]
[[50 55]
[60 65]
[70 75]] [145 165]
[[60 65]
[70 75]
[80 85]] [165 185]

We can now develop an LSTM model for multi-step predictions.

A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# multivariate multi-step stacked lstm example
fromnumpyimportarray
fromnumpyimporthstack
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out-1
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:-1],sequences[end_ix-1:out_end_ix,-1]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# covert into input/output
X,y=split_sequences(dataset,n_steps_in,n_steps_out)
# the dataset knows the number of features, e.g. 2
n_features=X.shape[2]
# define model
model=Sequential()
model.add(LSTM(100,activation='relu',return_sequences=True,input_shape=(n_steps_in,n_features)))
model.add(LSTM(100,activation='relu'))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=200,verbose=0)
# demonstrate prediction
x_input=array([[70,75],[80,85],[90,95]])
x_input=x_input.reshape((1,n_steps_in,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.

We would expect the next two steps to be: [185, 205]

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.

1
[[188.70619 210.16513]]

Multiple Parallel Input and Multi-Step Output

A problem with parallel time series may require the prediction of multiple time steps of each time series.

For example, consider our multivariate time series from a prior section:

1
2
3
4
5
6
7
8
9
[[ 10  15  25]
[ 20  25  45]
[ 30  35  65]
[ 40  45  85]
[ 50  55 105]
[ 60  65 125]
[ 70  75 145]
[ 80  85 165]
[ 90  95 185]]

We may use the last three time steps from each of the three time series as input to the model and predict the next time steps of each of the three time series as output.

The first sample in the training dataset would be the following.

Input:

1
2
3
10, 15, 25
20, 25, 45
30, 35, 65

Output:

1
2
40, 45, 85
50, 55, 105

Thesplit_sequences() function below implements this behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix:out_end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)

We can demonstrate this function on the small contrived dataset.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# multivariate multi-step data preparation
fromnumpyimportarray
fromnumpyimporthstack
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportRepeatVector
fromkeras.layersimportTimeDistributed
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix:out_end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# covert into input/output
X,y=split_sequences(dataset,n_steps_in,n_steps_out)
print(X.shape,y.shape)
# summarize the data
foriinrange(len(X)):
print(X[i],y[i])

Running the example first prints the shape of the prepared training dataset.

We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.

The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(5, 3, 3) (5, 2, 3)
 
[[10 15 25]
[20 25 45]
[30 35 65]] [[ 40  45  85]
[ 50  55 105]]
[[20 25 45]
[30 35 65]
[40 45 85]] [[ 50  55 105]
[ 60  65 125]]
[[ 30  35  65]
[ 40  45  85]
[ 50  55 105]] [[ 60  65 125]
[ 70  75 145]]
[[ 40  45  85]
[ 50  55 105]
[ 60  65 125]] [[ 70  75 145]
[ 80  85 165]]
[[ 50  55 105]
[ 60  65 125]
[ 70  75 145]] [[ 80  85 165]
[ 90  95 185]]

We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model.

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# multivariate multi-step encoder-decoder lstm example
fromnumpyimportarray
fromnumpyimporthstack
fromkeras.modelsimportSequential
fromkeras.layersimportLSTM
fromkeras.layersimportDense
fromkeras.layersimportRepeatVector
fromkeras.layersimportTimeDistributed
 
# split a multivariate sequence into samples
defsplit_sequences(sequences,n_steps_in,n_steps_out):
X,y=list(),list()
foriinrange(len(sequences)):
# find the end of this pattern
end_ix=i+n_steps_in
out_end_ix=end_ix+n_steps_out
# check if we are beyond the dataset
ifout_end_ix>len(sequences):
break
# gather input and output parts of the pattern
seq_x,seq_y=sequences[i:end_ix,:],sequences[end_ix:out_end_ix,:]
X.append(seq_x)
y.append(seq_y)
returnarray(X),array(y)
 
# define input sequence
in_seq1=array([10,20,30,40,50,60,70,80,90])
in_seq2=array([15,25,35,45,55,65,75,85,95])
out_seq=array([in_seq1[i]+in_seq2[i]foriinrange(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1=in_seq1.reshape((len(in_seq1),1))
in_seq2=in_seq2.reshape((len(in_seq2),1))
out_seq=out_seq.reshape((len(out_seq),1))
# horizontally stack columns
dataset=hstack((in_seq1,in_seq2,out_seq))
# choose a number of time steps
n_steps_in,n_steps_out=3,2
# covert into input/output
X,y=split_sequences(dataset,n_steps_in,n_steps_out)
# the dataset knows the number of features, e.g. 2
n_features=X.shape[2]
# define model
model=Sequential()
model.add(LSTM(200,activation='relu',input_shape=(n_steps_in,n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(200,activation='relu',return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer='adam',loss='mse')
# fit model
model.fit(X,y,epochs=300,verbose=0)
# demonstrate prediction
x_input=array([[60,65,125],[70,75,145],[80,85,165]])
x_input=x_input.reshape((1,n_steps_in,n_features))
yhat=model.predict(x_input,verbose=0)
print(yhat)

Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.

We would expect the values for these series and time steps to be as follows:

1
2
90, 95, 185
100, 105, 205

Note: Yourresults may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We can see that the model forecast gets reasonably close to the expected values.

1
2
[[[ 91.86044   97.77231  189.66768 ]
  [103.299355 109.18123  212.6863  ]]]

Further Reading

Summary

In this tutorial, you discovered how to develop a suite of LSTM models for a range of standard time series forecasting problems.

Specifically, you learned:

  • How to develop LSTM models for univariate time series forecasting.
  • How to develop LSTM models for multivariate time series forecasting.
  • How to develop LSTM models for multi-step time series forecasting.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It providesself-study tutorials on topics like:
CNNs,LSTMs,Multivariate Forecasting,Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

See What's Inside

991 Responses toHow to Develop LSTM Models for Time Series Forecasting

  1. Jenna MaNovember 16, 2018 at 12:09 am#

    This tutorial is so helpful to me. Thank you very much!
    It will be more helpful in the real projects if the dataset is split into batches. Hope you will mention this in the future.

    • Jason BrownleeNovember 16, 2018 at 6:16 am#

      Keras will split the dataset into batches.

      • Jenna MaNovember 16, 2018 at 7:27 pm#

        I think this blog (https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/) may answer my question. I will do more research. Thanks a lot.

      • mariaOctober 8, 2019 at 8:52 pm#

        Hi!

        i would like to cite your book “Deep Learning for Time Series Forecasting: Predict the Future
        with MLPs, CNNs and LSTMs in Python.” Is there an appropriate format for doing this?

      • HOctober 31, 2019 at 2:31 am#

        Hi Jason,
        I want please an example of Sliding window-based support vector regression for prediction.
        have you this example .

        Thanks a lot

        • Jason BrownleeOctober 31, 2019 at 5:35 am#

          Thanks for the suggestion.

          • SruthiMarch 29, 2021 at 11:14 pm#

            Hi Jason, It was a great tutorial
            I have a question :

            IN Multiple Parallel inputs, the output of the LSTM Encdoer 0Decoder model will be 3D, how do we transform it back to 2D? I am asking this because I have performed scaling on the data using minmaxscaler() and it expects the input to be a 2d array.

            In order to compare the predicted values with the original values, I need to perform inverse scaling, but I am stuck at how to reshape the 3d input and output back to 2d without losing any data.

          • Jason BrownleeMarch 30, 2021 at 6:05 am#

            You might need to write custom code to collect values for each variable before inverting the scale.

    • chiAugust 27, 2019 at 7:07 am#

      Hello Jason,

      Thank you so so much for your post, it was super helpful. For the multiple timesteps output LSTM model, I am wondering what will be the difference of the performance between model-1 and model-2? Model-1 is your multiple timesteps output LSTM model, for example, we input last 7 days data features, and the output is the next 5 days prices. Model-2 is the simple 1-timstep output LSTM model, where the input is last 7 days data features, output is the next day price. Then we use our predicted price as the new input to predict future prices until we predict all next 5 days prices.
      I am wondering what are the key differences between those 2 strategies to predict the next 5 days prices? What are the advantages and disadvantages of those 2 LSTM models?

      Thank you,

    • RickMarch 28, 2020 at 5:45 pm#

      Hey Jason,
      Thanks for the blogs. They are really helpful and I have learned a lot from machinelearningmastery.
      This blog about LSTM is very informative, but I have a question

      I have a set of amplitude scans, and I want to predict next scan (many to one problem). So my data is of (6,590) and the result should be (1,590). 590 are the amplitude values in the scan.

      A. Is it possible to address this problem with LSTM and
      B. Even if possible how much accurate do you think the system might perform given the number of time steps and features it is predicting.

      Thanks

  2. AmyNovember 16, 2018 at 7:17 am#

    Thanks Jason for this good tutorial. I have a question. When we have two different time series, 1 and 2. Time series 1 will influence time series 2 and our goal is to predict the future value of time series 2. How can we use LSTM for this case?

  3. KwanNovember 22, 2018 at 8:03 pm#

    Thanks Jason for this good tutorial, I have read your tutorial for a long time , I have a question. How to use LSTM model forecasting Multi-Site Multivariate Time Series, such as EMC Data Science Global Hackathon dataset, thank you very much!

  4. CaiyuanNovember 29, 2018 at 1:33 pm#

    Thank you for sharing. I found that the results of time series prediction using LSTM are similar to the results of one step behind the original sequence. What do you think?

    • Jason BrownleeNovember 29, 2018 at 2:40 pm#

      Sounds like the model has learned a persistance model and may not be skillful.

      • Sudrit Saisa-ingAugust 9, 2019 at 8:48 pm#

        I have some question?
        If I have model from LSTM,I want to know percent of accurate of new prediction.

        How to know percent accurate for new forcast?

        Thank you

  5. WLFDecember 5, 2018 at 6:04 pm#

    Thanks a lot! I have read your websites for a long time!
    I have a question, in “Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras” you said that:
    “LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. ”
    So why don’t you normalize input here?
    Because you used relu? Because the data is increasing (so we can’t normalize the future input)? Or because you just give us an example?
    Do you suggest normalizing here?

    • Jason BrownleeDecember 6, 2018 at 5:51 am#

      It would be a good idea to prepare the data with normalization or similar here.

      I chose not to because it seems to confuse more readers than it helps. Also, choice of relu does make the model a lot more robust to unscaled data.

  6. rkk621December 6, 2018 at 2:27 am#

    Thanks for a great article. Minor typo or confusion:

    For the Multiple input case in Multivariate series, if we use three time steps and

    10,15
    20,25
    30,35

    as our inputs, shouldn’t the output (predicted val used for training) be

    85

    instead of 65?

    • Jason BrownleeDecember 6, 2018 at 5:57 am#

      In the chosen framing of the problem, we want to predict the output at t not t+1, given inputs up to and including t.

      You can choose to frame the problem differently if you like. It is arbitrary.

    • WLFDecember 6, 2018 at 11:02 pm#

      You can also reference ‘Multiple Parallel …’

      So you can find the differences in function ‘split_sequences’

      if you want to predict 85, you can change the code to:

      if end_ix > len(sequences)-1:
      break
      seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix, -1]

      Notice ‘len(sequences)-1’, and ‘sequences[end_ix, -1]’

  7. IdaDecember 10, 2018 at 7:55 pm#

    Thanks sooooo much Jason.
    It helped me a lot.

  8. JohnDecember 12, 2018 at 9:21 am#

    Hi Jason,

    Thanks for this nice blog! I am new to LSTM in time-series, and I need your help.

    Most info on internet is for a single time series and for next-step forecasting. I want to produce 6 months ahead forecast using previous 15 months for 100 different time series, each of length 54 months.

    So, there is 34 windows for each time-series if we use sliding windows. So, my initial X_train has a shape of (3400,15). Then. I am reshaping my X_train [samples, timesteps, features] as follows: (3400, 15, 1). Is this reshaping correct? In genera, how can we choose “timesteps” and “features” arguments in this multi-input multi-step forecast?

    Also, how can I choose “batch_size” and “units”? Since I want 6 months ahead forecast, my output should be a matrix with dimensions (100,6). I chose units=6, and batch_size=1. Are these numbers correct?

    Thanks for your help!

    • Jason BrownleeDecember 12, 2018 at 2:14 pm#

      Looks good.

      Time steps is really problem specific – e.g. how much history do you need to make a prediction. Perhaps test with your data.

      Batch size and units – again, depends on your problem. Test. 6 units is too few. Start with 100, try 500, 1000, etc. Batch size of 1 seems small, perhaps also try 32, 64, etc.

      Let me know how you go.

      • JohnDecember 13, 2018 at 2:06 am#

        Hi Jason,

        Thanks for your response.

        I don’t understand “6 units is too few”. In documentation of lstm functions in R, units is defined as “dimensionality of the output space”. Since I need an output with 6 columns (6 months forecast), I define units=6. Any other number does not produce the output I want. Is there anything wrong in my interpretation?

        • Jason BrownleeDecember 13, 2018 at 7:55 am#

          I recommend using a Dense layer as the output rather than the outputting from the LSTM directly.

          Then dramatically increase the capacity of the model by increasing the number of LSTM units.

    • Ravi Varma InjetiDecember 18, 2019 at 12:55 am#

      Hii Jason that’s great tutorial. I have time series data of the size 2245 where timings of bus from starting station to destination station. I want to find the pattern is it possible through LSTM WITHOUT THE CATEGORICAL RESPONSES.

  9. ShaifaliDecember 16, 2018 at 1:21 am#

    Bidirectional LSTM works better than LSTM. Can you please explain the working of bidirectional LSTM. Since we do not know future values. How do we do prediction?

  10. Jenna MaDecember 16, 2018 at 4:24 pm#

    In the last encoder-decoder model, if I have different features of input and output, is it correct that I change the code like this?
    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features_in)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features_out)))
    model.compile(optimizer=’adam’, loss=’mse’)

    • Jason BrownleeDecember 17, 2018 at 6:19 am#

      I’m sure I understand, what do you mean exactly?

      • Jenna MaDecember 18, 2018 at 1:50 pm#

        I am sorry for not expressing my question clearly.
        In the last part of your tutorial, you gave an example like this:
        [[10 15 25]
        [20 25 45]
        [30 35 65]]
        [[ 40 45 85]
        [ 50 55 105]]
        Then, you introduced the Encoder-Decoder LSTM to model this problem.
        If I want to use the last three time steps from each of the three time series as input to the model and predict the next two time steps of the third time series as output. Namely, my input and output elements are like the following. The shapes of input and output are (5, 3, 3) and (5, 2, 1) respectively.
        [[10 15 25]
        [20 25 45]
        [30 35 65]]
        [[85]
        [105]]
        When I define the Encoder-Decoder LSTM model, the code will be like this:
        model = Sequential()
        model.add(LSTM(200, activation=’relu’, input_shape=(3,3)))
        model.add(RepeatVector(2))
        model.add(LSTM(200, activation=’relu’, return_sequences=True))
        model.add(TimeDistributed(Dense(1)))
        model.compile(optimizer=’adam’, loss=’mse’)
        Is it correct?
        Thank you very much!

        • Jason BrownleeDecember 18, 2018 at 2:36 pm#

          It looks correct, but I don’t have the capacity to test the code to be sure.

          • Jenna MaDecember 18, 2018 at 6:05 pm#

            Thank you!
            I test the code, and I want to show you what I got.
            I assume the input sequence:
            in_seq1 = np.arange(10,1000,10)
            in_seq2 = np.arange(15,1005,10)
            Define the prediction input:
            x_input = np.array([[960, 965, 1925], [970, 975, 1945], [980, 985, 1965]])
            I expect the output values would be as follows:
            [ [1985] [2005] ]
            And the model forecasts: [ [1997.1425] [2026.6136] ]
            I think this means that the model can work.

          • Jason BrownleeDecember 19, 2018 at 6:31 am#

            Nice work! Now you can start tuning the model to lift skill.

  11. daniDecember 19, 2018 at 12:55 pm#

    how we can test these examples if have big excel data set?and its time series data, kindly refer to a link?

  12. mkDecember 20, 2018 at 7:13 pm#

    Can Multivariate time series apply to cnn-lstm model?

  13. LionelDecember 21, 2018 at 6:16 pm#

    I want to predict visibility on one airport for the next 120 hours.
    I already build a LSTM to predict the visibility for the next hour, solely based on visibility observation. (Basically, the network learned that persistance is a good algorithm.)

    My next step is to include a weather model forecast of say humidity as input.

    I have then as input:
    visibility observation on the airport (past and present)
    prediction of humidity for the next 120 hours.

    I have trouble to combine these two information.
    Do you have suggestions?

    • Jason BrownleeDecember 22, 2018 at 6:03 am#

      What trouble are you having exactly?

      • LionelDecember 22, 2018 at 7:21 pm#

        let’s say:
        Input : last 120 h of measured visibility
        weather forcast for the next 120 h

        Output: visibility prediction for the next 120 h

        Implementation:
        make visibility prediction every hour for the next 120 h

        I have trouble to see how the LSTM will update its state every hour, since it will only get as new information a measured visibility for the last hour, and not about the full 120 h prediction.

        I must say that I’m a newbie in ML.

        • Jason BrownleeDecember 23, 2018 at 6:04 am#

          The model is only aware of the data that you provide it.

  14. PotofskiDecember 22, 2018 at 3:48 am#

    Thanks a lot for your post. Your work is a great resource on forecasts with lstm!

    Assume, I have dependent time series (heating costs and temperature) and I want to predict the dependent (heating costs), how could I implement temperature predictions (from other weather forecasts) into my model for heating cost predictions?

    Do you know of any common approaches to this? Or any papers on how to handle external forecasts for independent variables?

  15. Jenna MaJanuary 4, 2019 at 9:35 pm#

    Hi Jason,
    I think I saw you mentioning the activation function ‘relu’ usually works better than ‘tanh’ in LSTM model. But, I forget I saw this in which post. I don’t find any post from your blog that focuses on how to choose the activation function. So, I submit this question under this post and hope you don’t mind.
    Is it true that ‘relu’ often works better than ‘tanh’ in your experience? If you have any post talking about activation function, please give me the title or URL.
    Thank you very much!

    • Jason BrownleeJanuary 5, 2019 at 6:55 am#

      It really depends on the dataset, I have found LSTMs with relu more robust on some problems.

  16. Jenna MaJanuary 7, 2019 at 12:50 am#

    Thank you! So, the way I can make sure which activation function is the best for my dataset is to enumerate and see the results?

  17. MattJanuary 9, 2019 at 3:21 am#

    This is awesome for someone starting out with LSTM.

    All the content on your site is amazing, I really appreciate it. Thank you.

  18. Andrew JabbittJanuary 10, 2019 at 4:23 am#

    Hi Jason,

    Still lovin’ your work!

    1 question: can you please explain the purpose of the out_seq series in the Multiple Parallel Series example?

    Many thanks,
    Andrew

    • Jason BrownleeJanuary 10, 2019 at 7:57 am#

      It is the output sequence, dependent upon the input sequences.

      • AndreiFebruary 20, 2020 at 2:59 am#

        Correct me if I’m wrong, but isn’t the prediction the output? I mean, besides the way you obtained the out_seq sequence in the first place, it’s no different than in_seq1 or in_seq2. It could even be considered an engineered feature that expands the data.

        • Jason BrownleeFebruary 20, 2020 at 6:19 am#

          Prediction is the output of the model.

          Perhaps I don’t follow your question?

  19. sophiaJanuary 22, 2019 at 8:29 am#

    another great article, Jason! I’m trying to get started on a project that is similar to the LSTM model described in this article:https://medium.com/bcggamma/using-deep-learning-to-predict-not-just-what-but-when-fae6515acb1b

    I’d greatly appreciate your input on how to develop an LSTM model that can predict ‘what’ a consumer may buy and ‘when’ they will buy it;

    Based on your article, it looks like the right model to choose would be Multiple Parallel Input and Multi-Step Output. Would you agree or do you think i should choose a different model? Any pointers or links to relevant articles would help!

    Thanks,

    • Jason BrownleeJanuary 22, 2019 at 11:42 am#

      I’d encourage you to prototype and explore a suite of different framings of the problem in order to discover what works best for your specific dataset.

  20. RamanJanuary 22, 2019 at 3:17 pm#

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    defsplit_sequence(sequence,n_steps):
    X,y=list(),list()
    foriinrange(len(sequence)):
    # find the end of this pattern
    end_ix=i+n_steps
    # check if we are beyond the sequence
    ifend_ix>len(sequence)-1:
    break
    # gather input and output parts of the pattern
    seq_x,seq_y=sequence[i:end_ix],sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
    returnarray(X),array(y)
     
    # define input sequence
    raw_seq=[10,20,30,40,50,60,70,80,90]
    # choose a number of time steps
    n_steps=3
    # split into samples
    X,y=split_sequence(raw_seq,n_steps)
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features=1
    X=X.reshape((X.shape[0],X.shape[1],n_features))
    # define model
    model=Sequential()
    model.add(LSTM(50,activation='relu',input_shape=(n_steps,n_features)))

    I have used your code to get started, at the last step I am getting a below error-
    NameError: name ‘to_list’ is not defined

    Could you please help, I am not sure what am i missing here.

    Thanks for your help

  21. RamanJanuary 23, 2019 at 4:09 pm#

    Hi Jason,

    Thanks for taking time out, I have copied your code line by line and checked couple of times as well. Example is from Vanila LSTM.

    Checks done-
    I was getting some error, then I followed stack overflow and downgraded my keras to Version: 2.1.5
    I searched stack overflow and related questions and even posted my questions there.

    Your help is appreciated.

    • Jason BrownleeJanuary 24, 2019 at 6:38 am#

      I recommend using the latest version of Keras and TensorFlow.

  22. SarraJanuary 30, 2019 at 3:02 am#

    Please, have you an example of LSTM encoder-decoder with the train / test-evaluation partitions.

    I tried but it does not work like this:

    # split into samples

    trainX, trainy = split_sequence(train, n_steps_in, n_steps_out)
    testX, testy = split_sequence(test, n_steps_in, n_steps_out)

    # reshape

    trainX = trainX.reshape((trainX.shape[0], trainX.shape[1], n_features))
    testX = testX.reshape((testX.shape[0], testX.shape[1], n_features))
    ….

    # fit model
    model.fit(trainX, trainy, epochs=5, verbose=2)

    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)

    # calculate root mean squared error
    trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
    print(‘Train Score: %.2f RMSE’ % (trainScore))
    testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
    print(‘Test Score: %.2f RMSE’ % (testScore))

    thank you very much

    • Jason BrownleeJanuary 30, 2019 at 8:14 am#

      I may, you can use the search box to look at all tutorials that use the encoder-decoder pattern.

  23. GunayFebruary 6, 2019 at 2:43 am#

    Hi Jason,

    Thanks for this tutorial. I am quite new to the time series forecasting with LSTM. I have a question about the part “Multiple Parallel Input and Multi-Step Output”. The output data shape is (5,2,3). I mean the each instance on the output is not just a sequence, It is a sequence of sequence. And you have show the example there with Encoder and Decoder. I just want to implement one of the methods of Stacked or Bidirectional LSTM. But I am not sure which number I should put the Dense layer. For example, in the previous examples, the output shape is like (6,2) and It is obvious we should put 2 for the Dense layer. But I can not figure out the right thing for the Stacked LSTM. Do you have any example tutorial for this?

    Kind Regards,
    Gunay

    • Jason BrownleeFebruary 6, 2019 at 7:51 am#

      With multi-step output, the number of nodes in the output layer must match the number of output time steps.

      With multivariate multi-step, a vanilla or bidirectional LSTM is not suited. You could force it, but you will need n x m nodes in the output for n time steps for m time series. The time steps of each series would be flattened in this structure. You must interpret each of the outputs as a specific time step for a specific series consistently during training and prediction.

      I don’t have an example, it is not an ideal approach.

      • GunayFebruary 6, 2019 at 7:27 pm#

        Thank you!

  24. GunayFebruary 6, 2019 at 7:29 pm#

    Is there any alternative structure for this kind of problems except Encoder-Decoder?

    • Jason BrownleeFebruary 7, 2019 at 6:37 am#

      Yes, the one I described. There may be others, it is good to brainstorm and prototype approaches.

  25. TianFebruary 10, 2019 at 5:29 pm#

    Thanks for your great tutorial. I just wonder should we avoid using bidirectional LSTM for time series data? Does it mean we use future data to train the past model parameters?

    • Jason BrownleeFebruary 11, 2019 at 7:56 am#

      No, it means the model will process the input sequence forwards and backwards at the same time.

  26. GunayFebruary 15, 2019 at 8:56 am#

    Hi Jason,

    I faced one problem and just interesting maybe you did it before. I have the forecasting problem as like Multiple Input Multi-Step Output but a little bit different. Let’s just assume, my input(which are features dataset) and output (target we want to forecast) datasets have historic data. And I should forecast one week ahead for the target. But I have also the one week ahead forecasted input dataset(which is forecasted by another system). I should use both the historic input and one week ahead forecasted input to forecast one week ahead output. But I do not know how I should use that one week ahead forecasted input data during the learning process. Can you give me any hint?

  27. AnirbanFebruary 15, 2019 at 4:08 pm#

    What if we want to predict anything for the next 20 upcoming days! Here sequentially we have to predict for 20 days. How can we apply LSTM here?

  28. AaronMarch 6, 2019 at 2:47 pm#

    HI Jason, thanks for all the tutorials. They are really helpful. I am looking to try and implement an LSTM that returns a sequence, and had read this tutorial –https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

    One thing I am having trouble understanding is how to really shape the input data and get a sequence output using Tensorflow / Keras. I am looking to predict the sequence T – T+12 hours using T-1 – T-48 hours. So predicting the next 12 hours from the last 48 hours in 1 hour increments. Each hour of data has a dozen or so features for that time step. From what I have read of yours so far it seems as if each of the 48 previous time steps should be considered features of the time step T to predict a sequence for the next 12 hours. And so basically, from what I gather, I would end up with the input for Timestep T having 576 columns (48 time steps, each with 12 features) – I mean does that seem right? I am also a bit unsure of what particular model I should use… is it going to be a multi-step, multi-input network… just a bit confused on the jargon as well and maybe thats why I’m having trouble figuring out what I need to do.

    Looking at some of your books too, but not sure what might be the right one to help guide me through a problem like this.

    Thanks,
    Aaron

      • AaronMarch 6, 2019 at 11:59 pm#

        Thanks! That definitely makes sense now from the input shape standpoint. If I have 20 samples with 48 timesteps and 12 features the input shape would be [20, 48, 12]

        For the output however, looking through the Keras docshttps://keras.io/layers/recurrent/, I am trying to get a return sequence. Would I be using a 3D tensor? (batch_size, timesteps, units) where it would look like (20, 12, 1)? Since I am trying to find 1 value at each of the 12 time steps for the sample size of 20

        Thanks again!
        Aaron

        • Jason BrownleeMarch 7, 2019 at 6:52 am#

          I don’t recommend returning a sequence from the LSTM itself, instead use an encoder-decoder model:
          https://machinelearningmastery.com/start-here/#lstm

          • AaronMarch 7, 2019 at 10:05 am#

            Why don’t you recommend returning a sequence from the LSTM? If I was using the below encoder-decoder model from another one of your posts, what would the output of the first LSTM be?

            model = Sequential()
            model.add(LSTM(…, input_shape=(…)))
            model.add(RepeatVector(…))
            model.add(LSTM(…, return_sequences=True))
            model.add(TimeDistributed(Dense(…)))

          • Jason BrownleeMarch 7, 2019 at 2:32 pm#

            Generally the output sequence from an LSTM is the activation of the nodes from each step in the input sequence. It is unlikely to capture anything meaningful.

            It is better to interpret these activations or the final activations with more LSTM or Dense layers, and the output a sequence of the same or different lengths using a separate model.

          • GideonMay 2, 2019 at 6:10 am#

            Hi there,
            I love this tutorial, all of your tutorials actually but this one I have found the most helpful. Questions about the MIMO LSTM output shape has come up a few times, and I am also having trouble with it.

            I am trying to use a Dense layer as my final layer as you suggest, passing it n_steps_out as an argument. I am predicting 3 variables and n_steps_out is 10.

            Keras complains that it is expecting the dense layer to have 2 dimensions, but I am passing it an array with shape (n_samples,n_steps_out,n_features)

            Can you help me make sense of this?

            Thank you

          • Jason BrownleeMay 2, 2019 at 8:09 am#

            I would recommend a model with a time distributed wrapper or decoder for multivariate multi-step output, so you can output one vector for each time step.

  29. AbderrahimMarch 15, 2019 at 5:20 pm#

    Hi Jason,
    I have a question: are LSTM suitable for predicting based on a test set with the same nature of inputs as of train set ? Like in other cases of prediction where you will be having input signals in train set, that the model will work on. plus the memory based on the fact that entries are ordered.
    I trained an LSTM on a CNN model acting on ordered images, to predict a timeserie. on test set I have the following ordered set of images by time. I guess there is no concept of horizon here, how should I improve my model, and what starting point in predicting test set in this case?

    Many thanks.

    • Jason BrownleeMarch 16, 2019 at 7:48 am#

      I would recommend modeling the raw time series directly, instead of images of the time series.

  30. TaysonMarch 26, 2019 at 12:06 am#

    Hello Jason,

    Many thanks for the helpful article..
    I have tried to copy the code “Multiple Parallel Input and Multi-Step Output” and run it exactly the same without any changing but I got a different results than the one you got.

    [ [
    [147.56306 167.8626 312.92883]
    [185.38152 205.36024 385.96536] ] ]

    Is there any reason for that?

    Best regards,
    Tayson

  31. ChrisMarch 26, 2019 at 1:27 am#

    Hi Jason,
    How would you handle building the LSTM model for time series data with irregular time intervals (e.g. Jan 1, Jan 2, Jan 4, Jan 7, Jan 13, Jan 14, etc…)?

    It appears this model presupposes a regular time-interval spacing.

    You could fill the “missing” days with zeros or impute them with, say, the mean of the last 3 values, but I would like to know how to make the LSTM model without filling/imputing the time series data. How would you handle this?

    Thanks, and great lesson.

    • Jason BrownleeMarch 26, 2019 at 8:10 am#

      Yes, I would try many approaches and compare results, such as:

      – model as is
      – normalize interval with padding
      – upsample/downsample to new intervals
      – etc.

      • nebAugust 21, 2019 at 7:17 am#

        Follow-up to this question
        Holding number of features constant

        Are the various combination of models above able to cope when the number of time-steps per each Sample is variable?

        Or do the underlying model assumptions break in some way?

        • Jason BrownleeAugust 21, 2019 at 1:57 pm#

          Yes, you can either pad all samples to the same length or use a dynamic RNN. Assumptions of the model hold for both cases.

  32. RonMarch 27, 2019 at 1:06 am#

    If we are forecasting in monthly buckets and using 5 years of data, how do we know how many months of data to have on each row?

    • Jason BrownleeMarch 27, 2019 at 9:05 am#

      Perhaps perform a sensitivity analysis of the model to see how history impacts model performance.

      There will be a sweet spot for a given dataset.

      • RonMarch 27, 2019 at 1:16 pm#

        Thanks Jason! If the history has distinct patterns for each quarter, should we have 3 months in each row? How would the results differ when we keep 12 months on each row versus 3 months on each row versus 1 month on each row?

        • Jason BrownleeMarch 27, 2019 at 2:07 pm#

          Depends on the dataset, I recommend testing to discover the specific answers with your data and model.

  33. PeterMarch 28, 2019 at 5:57 am#

    Hi Jason,

    I am trying to predict high and low value of a time series in next X days, my output layer in RNN is :

    model.add(Dense(2, activation=’linear’))

    so basically output vector is [y_high, y_low], the model works pretty well however it sometimes outputs y_low > y_high, which of course doesn’t make any sense, is there a way to enforce model so that condition y_high >= y_low is always met.

    • Jason BrownleeMarch 28, 2019 at 8:24 am#

      Interesting, perhaps you simplify the problem and predict a value in a discrete ordinal interval, e.g. each category is a blocks of values?

      • PeterMarch 29, 2019 at 3:00 am#

        I was trying to modify loss function but I am unable to access y_pred individual members, I don’t even know whether it’s ultimately possible.

  34. JoeMarch 29, 2019 at 2:43 am#

    Hi Jason, a colleague and I are thinking of trying an LSTM model for time series forecasting. We are faced with over a thousand potential predictors, and would like to select only a smaller number for the final model. In particular, I have recently become fascinated by SHAP values; e.g., see this informal blog post by Scott Lundberg himself, in the context of XGBoost.
    https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27

    Tantalizingly, Scott L. demonstrates SHAP values in the context of an LSTM model here:
    https://slundberg.github.io/shap/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.html
    But that is using text input (sentiment classification in the IMDB data set), which involves an Embedding layer just before the LSTM layer. For a non-text problem like time series forecasting, we would exclude the Embedding layer. But doing so breaks the code.

    Do you have any suggestions how SHAP values might be used in the context of LSTMs for time series forecasting (not text processing)? If not, do you have any suggestions for feature selection in that context?

    Thanks!

    • Jason BrownleeMarch 29, 2019 at 8:42 am#

      I don’t know what SHAP is, sorry.

    • SamApril 14, 2020 at 6:31 pm#

      Hi, Joe. I am running into the exact same topic. Have you found a way to implement SHAP to multivariate timeseries forecasting?

  35. HsinMarch 29, 2019 at 5:26 pm#

    Hi Jason,
    Thanks for this useful tutorial.
    I am confused to inverse scaling of my data after splitting it into the form:
    x(data_length, n_step, feature)
    Because the scaler only can be used in 2D condition.

    What I want to do is evaluate rmse between prediction and true values, so I have to
    inverse transform data. Could you please tell me how to deal with this problem?

  36. PratikMarch 30, 2019 at 12:12 am#

    Hi Jason,
    Firstly, I must say you have a fabulous chunk of articles on ML/DL. Thanks for helping out the community at large.

    Coming to LSTMs, I am stuck in one problem from last few days. Here is how it goes –
    I have 3 columns namely customer id and basket_index and timestamp. For every customer, each row represents one time stamp. Lets say there are 3 customers with variable time stamps. First one is having 30 time stamps, 2nd is having 25 and 3rd is having 50. So, the total number of rows are 105. Now for the column basket index, each row signifies a list of product keys bought by any customer on a particular timestamp. Here is the snapshot of the dataset –

    CustomerID basket_index timestamp predicted_basket
    111 [1,2,3] 1 [4,5]
    111 [4,5] 2 [9,7]
    111 [9,7] 3 [3,5,6,1]
    .
    .
    222 [6,2,3] 1 [1,0,2,5]
    222 [1,0,2,5] 2 [7,5]
    .
    .
    333
    .
    . and so on..
    Now, since every customer has a different time series,

    1) How to pass everything into one network?
    2) Do I have to build multiple LSTM models (one for each customer) in this case?

    3) Also, I am creating an embedding layer for both customer and product keys (taking mean for every basket). How to specify how many steps back does every time series look in such cases?
    4) How should I specify batch size in this case?

    Your help will be really appreciated. Thanks!

  37. HuipingMarch 30, 2019 at 1:13 am#

    Thanks Jason for nice post.

    One question hopes to get your guide: For a LSTM work, we can’t stop on say the model is good but most important is how to use the good model outcome.

    For example flu or not for patients. Now I want to predict the flu for future half year (Jun-2019 to Dec-2019) but what I have is history data (I have past 4 years those people’s flu data and target on that model is half year from 6-1-2018 to 12-31-2018).

    How can I apply history LSTM outcome to predict future?

    Can I get a list of important features from the history model with some value(like a weight) and apply this to my future data?

    Or can i get the list of important feature from a good fit LSTM model and those features are important than other features?

    Appreciate your guide!

  38. Jeyson HernándezApril 3, 2019 at 11:40 am#

    Hi Jason,

    Amazing work! Thanks sharing us your knowledge, this tutorial was so helpfull.

    I’m new in ML/DL, i’m trying to predict sales in a company for future six months using LSTM. But i have an issue, i’m not sure about how to get more than 1 next step from your code using just one x vector by input. I’m using a monthly time step

    Could you help me to understand a little bit better how to get it?

  39. Md. Abul Kalam AzadApril 4, 2019 at 3:05 pm#

    Dear Sir,

    Thanks for your sharing example. I have collected traffic information like (Road property, weather, datetime,adjacent road speed, target road speed and more) for predicting road speed. Currently, I have prepared my code using Vanilla LSTM model for one step as well as multi-step-ahead prediction. Can you suggest me for which below model will be best for road speed prediction with higher accuracy?

    Models are:
    Data Preparation
    Vanilla LSTM
    Stacked LSTM
    Bidirectional LSTM
    CNN LSTM
    ConvLSTM

    I am waiting for your response.

    Thanks,
    Azad

  40. FazanoApril 8, 2019 at 8:58 pm#

    hi Jason, im using vanilla LSTM for forecasting,and i want to forecast 10 days ahead using this code

    # Forecat real future

    # Number of desired forecast in the future
    L=10
    #creat inputs and output empty matrices for future forecasting
    Future_input=np.zeros((L,3))
    Future=np.zeros((L,1))

    #add last 3 forecast as input for forecasting next day (tommorow)
    Future_input[0,:]=[predict[-3],predict[-2],predict[-1]]
    #create 3 dimension input for LSTM inputs
    Future_input= np.reshape(Future_input,(Future_input.shape[0],1,Future_input.shape[1]))
    #predict tommorrow value
    Future[0,0]=model.predict(np.expand_dims(Future_input[0],axis=0))

    #Loop to predict next 9 days values
    for i in range (0,9):
    Future_input[i+1,0,:]=np.roll(Future_input[i,0,:], -1, axis=0)
    Future_input[i+1,0,2]=Future[i,0]
    Future[i+1,0]=model.predict(np.expand_dims(Future_input[i],axis=0))

    #print 10 day ahead values
    print(Future)

    can it be like that?

  41. SherApril 13, 2019 at 2:41 am#

    Hi, do you have any tips for implementing univariate ConvLSTM for two-dimensional spatial-temporal data? I’m trying to input 10 time steps of 55 x 55 images for single-step time series forecasting.

    The following error code appears:
    “ValueError: Error when checking target: expected dense_10 to have 2 dimensions, but got array with shape (10, 55, 55)”

  42. RandyApril 13, 2019 at 6:21 am#

    Dear Sir,
    i have sequence 1247 data and i want to forecast 30 next, so the data would be 1277.
    i follow this tutorial, but it just can 1 or 2 forecast. and i follow this tutorial

    https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/

    but i get little confusion. so you have any advise to me?
    its stock price data actually.

  43. LloydApril 19, 2019 at 6:23 am#

    Amazing Tutorial, thank you.

    I have a question, is there a model where the outputs can influence each other?

    I.e. you have multiple sequences all which move independently but can influence the others?

    Thank you

    • Jason BrownleeApril 19, 2019 at 3:03 pm#

      Thanks.

      Yes, an encoder-decoder model that outputs a time step for each series in concert might be such an approach.

  44. JulesApril 19, 2019 at 10:07 pm#

    Awesome. Great Explanation as always. I have always got rather frustrated and confused over the shape of data going into Keras models. So I relied upon your tutorials to make it clear.

    Anyway using your examples I have been able demonstrate use of LSTM in predicting simple 2-D ballistics prediction calculations. I have used your code to help me here.

    https://github.com/JulesVerny/BallisticsRNNPredictions

    Pygame is required to animate the simulations

  45. KishoreApril 19, 2019 at 11:48 pm#

    Dear Prof,

    Imagine I have raw text containing only words ‘N1,N2,N3,………….,N1000’ in a shuffled format , i.e, 1 million words, each of which can belong to any of these 1000 words.

    I want to select the number of time steps =5, and predict the next word.
    Eg: An input of [N1,N6,N5,N88,N32] would be followed by ‘N73′.

    Now, assume that I have tokenized all the 1000 possible words into numbers.

    This is a scenario with 1000 possible output classes.
    So should I replace model.add(Dense(1)) with model.add(Dense(1000,activation=’softmax’)) ?
    If not, what is the main change I need to make, as compared to your univariate stacked LSTM code ?

    • Jason BrownleeApril 20, 2019 at 7:39 am#

      If the words are shuffled, then there would be no structure for a model to learn.

  46. HKApril 23, 2019 at 8:06 pm#

    Dear Jason!

    I’m trying to use stacked lstm for this problem – Multiple Parallel Input and Multi-Step Output.
    However I’m not sure how the final Dense layer should look like. Could you give me some hints, please?

    • Jason BrownleeApril 24, 2019 at 7:57 am#

      Perhaps start with the example in the above post and then add an additional LSTM layer?

      • HKApril 29, 2019 at 6:11 am#

        Which example do you mean? I can’t find any example for Multiple parallel input and multi step output LSTM, which uses stacked LSTM layers instead of encoder decoder.

        • Jason BrownleeApril 29, 2019 at 8:28 am#

          Yes, under the section “Multivariate Multi-Step LSTM Models”

          Specifically the subsection “Multiple Parallel Input and Multi-Step Output”

          The examples can be adapted to use any models you wish.

  47. Raman SinghApril 25, 2019 at 8:27 am#

    Thanks Jason for detailed explanation.

    Could you please tell how can we add hyperparameters for tuning “Forget Gate”, Input Gate” and “Output Gate” in LSTM compile or fit methods or is it done internally and we can’t control these gates?

  48. willApril 25, 2019 at 1:36 pm#

    How to predict multiple such inputs, x_input = array([[70,75,145], [80,85,165], [90,95,185],…,[200,205,405]]),Expect the next output, [210,215,425],
    See this input in the article,x_input = array([[70,75,145], [80,85,165], [90,95,185]]),Predict such results,[[101.76599 108.730484 206.63577 ]],But it doesn’t seem to matter why you need to enter such a sequence.in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90]),in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    thanks

    • Jason BrownleeApril 25, 2019 at 2:45 pm#

      I believe there is a few multi-time step models listed above that will provide a good starting point.

  49. JohnApril 25, 2019 at 4:33 pm#

    Hi Jason,

    Thanks for the article.
    I was working with your code and planning to implement in my work, but I have noticed a different behavior. If I compile and run the code different times, it gives different result each time although I didn’t change anything in your code. I have tried with your example data and run several times and each time I got different results. I tried with my own dataset and the result is the same.

    Now I am confused to implement LSTM in my work.
    Could you please clarify this behavior?

  50. ShivaApril 26, 2019 at 10:59 pm#

    Hi jason,

    Say we have 3 variates(X).. and 1 dependent (Y)
    The relation of 2 variate in X is like for 3 lags and 1 variate is 30 lag.

    What is your advice when we have to model in such case?

  51. RaghuApril 28, 2019 at 3:45 pm#

    Hi Jason,

    Thanks for the very informative tutorial. Can you please throw more light on how to come up with confidence intervals for the predicted value

  52. parsaApril 28, 2019 at 10:42 pm#

    Hi Jason
    Thanks for your helpful tutorial
    Could you please tell how can we predict the futures that we don’t have its data available
    for example, I finalized my LSTM model, how can I predict the values on 2050

  53. willApril 28, 2019 at 11:41 pm#

    Thanks for the article.However, I have a problem that every prediction results are different, such as Multiple Parallel Series,The first time is [[101.25582 106.49429 207.8928 ]],The second time it became [[101.82945 107.527626 209.8016 ]],Why is this?
    thanks

  54. shivaApril 29, 2019 at 4:39 am#

    I want to restate my question…
    Suppose we are trying to model a water bucket that was 1 open inlet at the top and 2 outlets at the side one near the top and one near the bottom.
    this will mean that the outlet at the top can release when the water is really good..
    the outlet near the bottom has release which is exponential function of water above it.

    now say such systems are in paralell(one above another, say2) and series(say 2, the final outlet from each parallel series join at the final output.) (Total 4 buckets).

    can this be modeled by LSTM?
    I have done this analytically…results are ok ..
    tyring to use lstm for this ,,,

  55. AliApril 29, 2019 at 7:53 pm#

    Hello Jason,

    to the step:

    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])

    I want to ask how I can load a fully column out of a dataset.
    I don´t want to insert each value because I have more than 22 million rows. After that I want to split into sequences of 200-400 time steps.

    To the step:

    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])

    I don´t have a right mathematical equation. I want to predict the output without any knowledge about the relationship between the input signals.

    I hope you can help me.

    Kind regards

    Ali

  56. SreeApril 29, 2019 at 8:59 pm#

    Hi Jason,

    Thanks for these explanations and sample codes!

    I was interested in the example you have provided for multi-variate version of LSTM. You have provided an example of a simple addition case. How can this be extended to instances where there are multiple inputs, but an exact relation between the inputs are not known even though it is known that the inputs are correlated? Thanks much for your guidance!

    • Jason BrownleeApril 30, 2019 at 6:54 am#

      The model will learn the relationship, addition was just for demonstration.

  57. SreeMay 1, 2019 at 9:34 am#

    Thanks Jason! That’s perfect.

    In that case, what should the statement “out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])” be replaced by, since we don’t know the exact relation between the variables? Thanks again!

  58. SreeMay 2, 2019 at 8:17 pm#

    Thanks Jason. I shall read the content on that link.

    Cheers,
    Sree.

  59. GideonMay 3, 2019 at 5:37 am#

    Hello, thank you again. I think my previous question could be made more clear.
    I would like to use the vector output approach for a mimo lstm, making multi step predictions into the future similar to your encoder/decoder example.

    I have tried using the split_sequences method from the encoder/decoder example with the vector output example and the dimensions dont work out. I end up with a value error

    ValueError: Error when checking target: expected dense_2 to have 2 dimensions, but got array with shape (5, 2, 3)

    I greatly appreciate your help, I have been struggling with this for a while. I would imagine the output should be a matrix (number of features X prediction horizon) so I think there is something conceptually I am not understanding.

    Thank you, and thank you for all of the wonderful tutorials

    Gideon

    • Jason BrownleeMay 3, 2019 at 6:25 am#

      Perhaps start wit the code example you want to use and slowly change it for your needs.

      If the data size does not match the models expectations, you will need to change the data shape or change the model’s expectations.

      • GideonMay 3, 2019 at 7:30 am#

        I will toil away some more, but I just want to be sure it is possible to use a dense layer/vector output approach for Multiple Parallel Input and Multi-Step Output LSTM in Keras.
        Thanks again for your time.

        Gideon

        • Jason BrownleeMay 3, 2019 at 2:40 pm#

          It is possible to use a Dense for multi-step multivariate output without a decoder or timedistributed wrapper layer, it is just ugly.

          E.g. the output would be a vector with n x m nodes, where n is number of variates and m is the number of steps.

          • Gideon PriorMay 4, 2019 at 8:20 am#

            Ive figured it out, and its not too ugly and exactly what I needed. I was unaware of the Reshape layer in Keras.

            from keras.layers import Reshape

            model.add(Dense(n_steps_out*n_features))
            model.add(Reshape((n_steps_out,n_features)))

            Thank you again for your help. I am buying your book right now.

            Cheers

            Gideon

          • Jason BrownleeMay 5, 2019 at 6:18 am#

            Nice work.

        • GeorgeJuly 17, 2019 at 11:48 pm#

          Hi Gideon,

          I was struggling around something similar and applying your solution, solved all the matters! Do you have any more documentation on this?

  60. AlbertoMay 3, 2019 at 11:07 pm#

    Hello Jason,

    Great article, very useful. I want to use LSTM to predict sun irradiance 12 hours ahead using 8 features (including sun irradiance) of the last 24 hours as inputs. Thus, it would be a multivariate multi-step LSTM where the output is a sequence of 12 timesteps. I have 8 years of data and I want to use first 6 for training and last 2 for testing. I have some questions:

    1) Should I overlap the input sequences?

    2) Should I use a vector output model or an encoder-decoder model?

    • Jason BrownleeMay 4, 2019 at 7:08 am#

      I recommend testing both approaches and use data to make the decision, e.g. choose the model that gives the best result.

  61. aravindMay 5, 2019 at 3:54 am#

    hai jason,
    the article was very much helpful.
    can you just tell me which approach should I take if I have two columns in my dataset .
    one is time in ddmmyyyy format and the other is stock price.
    I have the data for last 12 months.
    I want to predict the stock price for 4 upcoming months.
    how can I do the same.
    one more doubt is that if the column for time is not actually having a same interval in between them, then is there anything more that I should do to or consider for predicting the 4 upcoming months stock price

  62. shivaMay 7, 2019 at 7:33 am#

    In Multiple Input Series,
    (7, 3, 2) (7,)

    [[10 15]
    [20 25]
    [30 35]] 65
    [[20 25]
    [30 35]
    [40 45]] 85
    [[30 35]
    [40 45]
    [50 55]] 105
    [[40 45]
    [50 55]
    [60 65]] 125
    [[50 55]
    [60 65]
    [70 75]] 145
    [[60 65]
    [70 75]
    [80 85]] 165
    [[70 75]
    [80 85]
    [90 95]] 185

    1. How many lstm block will be here in this example( x=7)
    if batch size = 3,is the number of lstm block equal to the number of x in the batches?
    or the number of timesteps?

    2.are timesteps, neurons and batchsize all hyperparameter? how do we optimize them

  63. shivaMay 8, 2019 at 4:56 am#

    thanks..
    Then what is the total number of LSTM blocks?
    for every epoch, are the weights reinitialized and states are reset?

    • Jason BrownleeMay 8, 2019 at 6:46 am#

      The number of LSTM units is specified in each hidden LSTM layer.

      LSTM states are reset at the end of every batch.

  64. shivaMay 8, 2019 at 8:15 am#

    sorry but i dont get this?

    In model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))
    input_shape here is equal to an input to each LSTM node right?

    and here 50 means,, h(hidden layer) is a vector of 50*1 right?

    my question is the number of individual LSTM nodes(block) equal to number of samples in the a batch?

    • Jason BrownleeMay 8, 2019 at 2:08 pm#

      Yes, the shape defines the shape of each input sample (time steps and features).

      Yes, 50 refers to units in the first hidden layer.

      The number of units and sample shape are both unrelated to the batch size. Unless you are working with a stateful LSTM, in which case the input shape must also specify the batch size.

      Does that help?

    • shivaMay 8, 2019 at 11:08 pm#

      yeah.. one followup question
      [10 15]
      [20 25]
      [30 35]] 65
      here is it like many to one ?

      this feeds as xt (single input) right?
      in this case what is the size of weight ?

      • Jason BrownleeMay 9, 2019 at 6:43 am#

        Yes, multivariate multistep input to one output.

        • shivaMay 9, 2019 at 11:05 am#

          how does this input concatenate with hidden layer … i cannot visualize this..
          i was thinking the input were a vector[n*1]

          • Jason BrownleeMay 9, 2019 at 2:05 pm#

            Each node in the hidden layer gets the complete put sequence.

          • shivaMay 12, 2019 at 9:33 am#

            Thank you so much..

            [10 15]
            [20 25]
            [30 35]] 65

            so in this case ,,, what is the size of xt and weight matrix?

          • Jason BrownleeMay 13, 2019 at 6:42 am#

            You can calculate it based on the number of nodes in your network.

          • shivaMay 15, 2019 at 10:32 pm#

            Thank you jason.. you have so kind and helpful..

  65. shivaMay 8, 2019 at 11:07 am#

    The number of cells is equal to the number of fixed time steps.
    The blogs says so. I am very confused with number of cells and what controls it.

    https://stackoverflow.com/questions/37901047/what-is-num-units-in-tensorflow-basiclstmcell#39440218

    Sorry for trouble

  66. PhilippMay 13, 2019 at 2:26 am#

    Dear Jason,

    Thank you for writing all these awesome tutorials!

    My question:
    As I understood it, a LSTM network learns the information in a time-series by backpropagation through a specific length (in time) at which the LSTM cells are unrolled during training.
    So, while training it is necessary to define the number of timesteps provided in the training data. But shouldn’t it be possible to use the (trained) network with ANY number of input timesteps to make a prediction (because of the recurrent nature in which the LSTM cells work)?
    Am I getting something wrong here from the beginning?

    Thank you for hints on this
    Philipp

  67. sumitraMay 13, 2019 at 3:41 pm#

    Dear Jason,

    I am currently working on a disease outbreak prediction model. I have 4 years of data with over 100 input variables and each year has got 365 data points. I would like to create a LSTM model that will be able to predict the future outbreak (whether thr will be an outbreak-1 or no outbreak-0) based on the given input variables. For example, given 7 days of data points, i would like to predict the occurance of outbreak (whether 0 or 1) on the 8th day.

    However, i am not sure on which LSTM model will best fit my case. Will ‘multiple input multi-step output) be the best approach? Your guidance will be much appreciated.

    Thank you

  68. NitinMay 26, 2019 at 1:54 pm#

    Hi Jason,

    Can you please provide some pointers that will help us in minimizing the step-loss during model fitting….

    Thanks

  69. ICHaLiLMay 29, 2019 at 12:26 am#

    Dear Jason,

    Thank you for your tutorials. They are really useful for us.

    I’ve one question about LSTM. I have different time series more than one (for example 100). I need to train network with 100 different time series. and test 10 different time series. Which method should I use?

    Thanks for your helps.

  70. QuantCubMay 30, 2019 at 2:00 pm#

    Hi Jason,

    Thank you for sharing. I wonder if there is a way to set timestep > 1 without doing subsequence sampling as you did in data preparation, e.g. convert a 9-by-1 time series to a 6-by-3 data set. After the conversion, the 3-feature dataset is no more time dependent. You are able to use any kind of ML models (say OLS) to predict y. So why LSTM? Should LSTM be able to select (forget) previous information without this conversion?

    • Jason BrownleeMay 30, 2019 at 2:55 pm#

      LSTM does have the benefit that it can remember across samples.

      This may or may not be useful, and is often not useful for simple autoregressions.

  71. NeelJune 11, 2019 at 9:08 pm#

    For a classification LSTM, using a Seed I get the same classification matrix each time I run it. However, when I vary the batch size in model.predict, I get the following:

    Prediction Batch Sizes:

    32 = Different Classification Matrix on each repeat

    Batch size in predictions is merely for ram managment. Correct? If yes, what do you think Dr. Jason would cause these irregularities ?

      • NeelJune 12, 2019 at 4:06 pm#

        Hi Jason,

        Sorry I didn’t explain my concern well. I was referring to the Batch Size parameter that we mention in “model.predict i.e. predicting” and not while training. I agree that batch size during training will have an impact. During prediction, the default size is 32 as defined by keras but when I change that to anything but 32 I get a different classification matrix even though I use a seed. When I leave the batch size as default, my seed is able to produce the same results.

        • Jason BrownleeJune 13, 2019 at 6:11 am#

          Recall that with the LSTM, the state is reset at the end of each batch. This explains why you are getting different results for the same model with different inference batch sizes.

  72. DiegoJune 24, 2019 at 5:23 am#

    Hi Jason,

    Thanks for the tutorial.
    I’d like to apply this example to a real case.

    I have to forecast how much money will be withdrawn every day from a group of ATMs.
    Currently I am using a time series for every ATM. (100 ATMs = 100 time series).

    Wich method do you think could be better from this tutorial ?
    I need to use historical information and external information such as holidays, day of week, etc.
    Thanks in advance.

  73. Liang ZhaoJune 25, 2019 at 5:58 am#

    Hi Jason, I want to use some kind of machine learning method to demonstrate that there is a relationship between the score gap of two basketball teams and the demand for a taxi outside the stadium.

    I have time series of pick-ups near a stadium. I have the score gap time series between two basketball teams.

    What I want to achieve is that training a machine learning model that could tell me, based on the taxi pick-ups at time t, what is the taxi pick-ups at time t+1.
    I also want to see if I also have the score gap at time t, can I improve my prediction accuracy of pick-ups at time t+1.

    Which machine learning model should I use?

    thank you so much!

  74. JamesJuly 1, 2019 at 1:11 am#

    Hi Jason,

    Thanks for the tutorial.

    Suppose I have several time series showing cumulative bookings for different trains last year. I don’t want to forecast but just classify those time series to see if some of them have similar patterns. Can I include all those series into one LSTM model? Is there any risks when doing so?

    Thanks in advance.

    • Jason BrownleeJuly 1, 2019 at 6:35 am#

      Sure, it means you are learning/modeling across books. Sounds reasonable.

      • JamesJuly 1, 2019 at 11:44 pm#

        Thanks Jason!

        So is it the same as multivariate LSTM? Sorry I’m new to modelling so still find things confusing

        • Jason BrownleeJuly 2, 2019 at 7:32 am#

          Probably not, each example is a separate sample or input-output pair for the model to learn from.

  75. IriniJuly 1, 2019 at 9:35 am#

    Hi Jason,

    thanks for the nice tutorial!

    I have a dataset with 3000 univariate timeseries (i.e. 3000 samples) and each sample has 4000 timesteps. When i use [samples, time steps, features]=[3000, 4000, 1] the code is extremely slow and with bad performance.
    On the other hand, if instead [3000, 4000, 1] i write [3000, 1, 4000] the code is very fast and with great performance.
    But is the reshape [3000, 1, 4000] correct? I mean according to the rule [samples, time steps, features] and given the fact that each of my samples have 4000 timesteps and for each time step there is one feature the correct should be [3000, 4000, 1].

    So is [3000, 1, 4000] correct? And if it is not (logically it is not) why it works much better than [3000, 4000, 1] ?

    Thanks in advance

    • Jason BrownleeJuly 1, 2019 at 11:35 am#

      I would recommend not using more than 200 to 400 time steps per sample. Perhaps you can truncate your data?

      • IriniJuly 1, 2019 at 8:15 pm#

        I did also an experiment and i truncated my data and used as input [samples, time steps, features]=[3000, 400, 1]. It was quicker but i got a mean accuracy 42% (in 10 random splits).
        As i told you in my previous post when i exchange timesteps with features namely when i use [3000, 1, 4000] i get an accuracy 90%.
        But giving 1 timestep means that i don’t exploit the memory, whis is the characteristic of lstm?

        I am confused as to whether i should use [3000, 1, 4000], which is very quick and gives very good results but maybe it is not very correct? Or it is correct as if i used [3000, 400, 1](if i truncated my data to 400)

        • Jason BrownleeJuly 2, 2019 at 7:30 am#

          The state of the LSTM is reset at the end of each batch by default, so you can get some across-sample memory.

          I recommend testing a suite of different configurations to see what works well or best for your specific dataset. I cannot know what will work well, you must discover the answer.

  76. ManishJuly 2, 2019 at 2:39 am#

    Hello Jason,

    I am quite new to ML and LSTMs. I have a scenario where I intend to train a model using my hourly sensor values. For eg

    12-1-2019 12:00:00 12
    12-1-2019 13:00:00 16

    12-5-2019 12:00:00 14

    Once I am done with my training I intend to predict values every hour and compare the values with live sensor values….I am planning to use LSTM and which approach do you recommend me ?

  77. HarishJuly 2, 2019 at 7:18 pm#

    Jason, this is very useful. Im try to to do some prediction around IT incidents. based on historic data i want to predict what type incident i can expect next month/week/day. do you have anything similar done if so request to share pls

  78. LopaJuly 3, 2019 at 12:35 am#

    Hi Jason,

    Thanks for answering my question in your other tutorials. I have a minor doubt suppose my data has a continuous time series(non stationary) & other categorical variables (which are already encoded). Under that circumstance what is the best way to difference the data ? Because categorical data are not differenced but they have to be used while training the model.

    The function written above differences all the variables irrespective of whether they are continuous or categorical. It would be great if you can help.

    • Jason BrownleeJuly 3, 2019 at 8:36 am#

      Difference the real-values only, and only if they are non-stationary.

  79. myroJuly 3, 2019 at 5:16 pm#

    Hi Jason,
    I copied and pasted your first example from Multi-Step LSTM Models, the one with the vector output of two values and the input being one.

    You report as an output the values:

    input [[70 80 90]]
    output [[100.98096 113.28924]]

    but with those parameters I cannot get any closer than

    input [[70 80 90]]
    output [[122.678955 139.9465 ]]

    This you use the parameters you report? Is this so dependant on architecture?

    • Jason BrownleeJuly 4, 2019 at 7:40 am#

      Results are dependent upon the model, the model configuration and the data, the performance is also stochastic, subject to random variance.

      • myroJuly 19, 2019 at 7:49 pm#

        Hi, thanks for your reply.
        I understand that, that’s why I am asking,
        I have same model, same model config. same data, and the stochasticity should be symmetrically distributed (?). Then I assume that the results you report are not from the parameters you have in the code examples.

  80. LopaJuly 3, 2019 at 7:10 pm#

    My data is non stationary & there are seasonality every 7 days ( as evident from the ADF tests & ETS plots) & a first order differencing makes it stationary.

    I totally get that I have to difference only the real values & that is what I have been aiming to do . But the reason I asked this question because the moment I difference the real values its get shifted by one place so if the original data has 100 observations the differenced data will have 99 observations (with a first order differencing). But the categorical data which cannot be differenced remains to be the same 100. How do I deal with this ?

    • Jason BrownleeJuly 4, 2019 at 7:44 am#

      You discard the first observation and the difference value corresponds to the categorical value at the same time step.

  81. LopaJuly 3, 2019 at 8:06 pm#

    I think I have been able to solve the issue thanks Jason for addressing my query

  82. LeonJuly 5, 2019 at 5:58 am#

    in the Vector Output Model section,
    I copied your code and tried, the actual answer is not correct as of the expected [100, 110], they are actually [110, 120].

    • Jason BrownleeJuly 5, 2019 at 8:11 am#

      Perhaps try running the example a few times? It can very given the stochastic nature of the learning algorithm.

      • LeonJuly 11, 2019 at 1:30 am#

        never get any chance to around [100, 110]. I ran many times, the output is always around [110, 120] with some variations.

        no kidding 🙂 you can try that part of codes. The output looks ridiculous.

  83. MatthewJuly 5, 2019 at 5:56 pm#

    Hi Jason, I am doing an electrical demand forecast and am trying to build a model which predicts the demand for the following 24 hours given the last 90 hours. I have implemented two types: a 24 step prediction and a recursively defined prediction, which predicts the next hour and then uses the previous 89 true values and the new predicted value to predict the next value, and so on. I am wondering which method you believe to be the best(if either) and any tips for improving my model as depending on the time of year the forecast can vary massively with accuracy. I currently have an LSTM(50) connected to a Dense(20) connected to an output Dense(1) for both cases.
    Any help would be greatly appreciated. Thank you. Matthew

    • Jason BrownleeJuly 6, 2019 at 8:29 am#

      Well done, very cool!

      I recommend testing each method and use the one with the lowest error.

      Also, get creative and test a suite of other configurations. Ensure your test harness is robust and reliable so that you can trust the decisions you make.

  84. skyrim4everJuly 8, 2019 at 8:13 pm#

    Hello, this example was nice to follow and seemed little more simpler than other LSTM examples because of no pre-processinhg transformations (normalization, standardization, making data into stationary, etc.). However, should I perform these pre-processing transformations in general for time series prediction? Should I do such thing for this kind of examples too even though the dataset is simple?

    • Jason BrownleeJuly 9, 2019 at 8:09 am#

      Yes, test to see if the data preparation improves model performance.

      I keep it out of examples for brevity.

  85. NutakkiJuly 11, 2019 at 8:59 pm#

    #In Multiple Parallel Series
    I have defined the input like this
    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    x_input = array([[70,75,1,4], [80,85,165,5], [90,95,185,6]])
    n_steps=4
    n_features=X.shape[2]

    how the input is looping to obtain output as follows: [[ 72.74373 106.51455 251.78499]]?
    Can you give a clear idea what does n_steps=4, n_features=X.shape[2] really means and how does it function?

  86. AmelieJuly 12, 2019 at 2:10 am#

    Please, is there a method to find the correct parameter of an ANN model: LSTM, MLP (hidden layer number, activation function, loss function ..)

    what does it mean when my train and validation loss curves are parallel while the Train Score and Test Score are small?

    Is there a method to optimize all these results?

  87. SamilJuly 17, 2019 at 8:16 am#

    Thanks for the tutorial. I have applied the multistep, multivariate logic to my own dataset. Namely, I have 12 look-back, 12 look-ahead and 41 features (all having exact look-back as the main variable of interest). Trying the TimeDistributed code snippet gave me progressively increasing RMSE. Is this due to the nature of my time series or is it a sign of mistake done during construction of the model? It is hard to tell for you but maybe you can share your take on this issue. Thanks

    • Jason BrownleeJuly 17, 2019 at 8:33 am#

      It could be either.

      Perhaps try fewer features and evaluate impact?
      Perhaps try different models and evaluate impact?

      • SamilJuly 19, 2019 at 9:42 am#

        Thanks II tried encoder-decoder and stack LSTM. Both gives me increasing RMSE for further look-aheads.It is understandable for encoder-decoder as it uses the output as an input (so associated error also comes with the prediction and builds up over time) but not sure why I see the same thing with the stack lstm. Anyways, thanks again for the response and the post!

        • SamilJuly 19, 2019 at 9:48 am#

          Also, one quick related question. You use “-1” in multi step future multivariate split_sequence models (such as n_steps_out-1 etc.). This reduces the number of resulting features by one when compared to other split_sequence snippets. I tested it with the other multistep split_sequence code you shared above. Not sure but are’nt we supposed to have the same number of features? Thanks

        • Jason BrownleeJuly 19, 2019 at 2:20 pm#

          Well done on the improvement!

  88. Aziz AhmadJuly 22, 2019 at 4:56 am#

    Sir Plz! Suggest me good learning sources about my project ( carbon emission forcasting using LSTM).

  89. Mans OshanovJuly 22, 2019 at 6:20 pm#

    Thank you for the great tutorial. Is it possible to get the probability of prediction(in percentage) or second best prediction out of these models? Thank you)

  90. KennardJuly 26, 2019 at 11:57 am#

    Hi, Jason

    Your tutorial helps me a lot, thank you very much!

    And I have a question that how to adjust the learning rate of the LSTM network in the CNN-LSTM code you’ve mentioned above.

    I’m looking forward to your reply, thank you!

    (The reply I left inhttps://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/?unapproved=494293&moderation-hash=2b6d045a4e1ff047d0720753b2b1e418#comment-494293 is in wrong place, sorry about that)

  91. LuisJuly 27, 2019 at 3:10 am#

    This is amazing. I love the blog

  92. Armande KertanioJuly 28, 2019 at 8:31 am#

    Thank for this nice explanation.

    I have a problem when reshaping the data for multiple output architecture.

    the architecture is:

    outputs=[]

    main_input = Input(shape= (seq_length,feature_cnt), name=’main_input’)
    lstm = LSTM(32,return_sequences=True)(main_input)
    for _ in range((5)):
    prediction = LSTM(8,return_sequences=False)(lstm)
    out = Dense(1)(prediction)
    outputs.append(out)

    model = Model(inputs=main_input, outputs=outputs)
    model.compile(optimizer=’rmsprop’,loss=’mse’)

    and when reshaping the y using:

    y=y.reshape((len(y),5,1))

    I got a reshaping error:

    ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 5 array(s), but instead got the following list of 1 arrays: [array([[0.35128802, 0.01439778, 0.60109704, 0.52722118, 0.25493708],

    would you please help?

    • Jason BrownleeJuly 29, 2019 at 5:58 am#

      Perhaps define what you want the output shape to be, e.g. n samples with m time steps, then confirm your data has that shape, or if not set that shape?

  93. FlorianJuly 29, 2019 at 2:00 am#

    You use “model.add(TimeDistributed(MaxPooling1D(pool_size=2)))” and write “max pooling layer that distills the filter maps down to 1/4 of their size”. A typo or is there a different reason explaining the use of 2 vs. 4 here?

    • Jason BrownleeJuly 29, 2019 at 6:16 am#

      Sorry for the confusion.

      If the map is 8×8 and we apply a 2×2 pooling layer, then we get a 4×4 out, e.g. 1/4 the area (64 down to 16).

      For time series, if we have 1×8 and apply a 1×2 pooling, we get 1×4, you’re right. 1/2 the size, not 1/4 as in image data.

      Fixed. Thnaks!

  94. NicJuly 29, 2019 at 5:53 pm#

    Hi Jason,

    first of all, thanks for that awesome introduction into LSTM-Models.
    There is just one thing i don’t get.

    In the section “Multiple Input Series” you used the following example:
    [[ 10 15 25]
    [ 20 25 45]
    [ 30 35 65]
    [ 40 45 85]
    [ 50 55 105]
    [ 60 65 125]
    [ 70 75 145]
    [ 80 85 165]
    [ 90 95 185]]

    As you mentioned the first two entries in the arrays refer to the two time series and the last one to the corresponding target variable. To train the LSTM you split the data into input and output samples like:
    [[10 15]
    [20 25]
    [30 35]] 65

    Why do I drop the first two target entries (25 and 45). Isn’t that information my network loses for training? Why don’t we use each (single) sample like x = [10 15] y[25] to train the time series. Isn’t it easier to lern the series if i have the target for each step?

    • Jason BrownleeJuly 30, 2019 at 6:04 am#

      Good question.

      We must create samples of inputs and outputs.

      Some of the input at the beginning of the dataset don’t have enough prior data to recreate an input, therefore must be removed.

  95. JoelAugust 1, 2019 at 12:04 am#

    Good work, However, you should provide the library imports, to make it easier for beginners.

    • Jason BrownleeAugust 1, 2019 at 6:53 am#

      All library inputs are provided in the “complete example” listed in the post.

      Sorry for the confusion.

      • GODFREY JOSEPH SAQWAREOctober 8, 2021 at 4:45 pm#

        Hello Sir, I am so happy with your illustration, I have a problem with how to do forecasting based on your demonstration. I will be happy to get your email

  96. willAugust 11, 2019 at 12:27 am#

    Hi, Jason,I need to predict a hundred thousand sequences like this[10, 20, 30, 40, 50, 60, 70, 80, 90], how do I do it, do I do it in cycles, one by one, I do it in cycles, it feels like it’s going to take longer

    • Jason BrownleeAugust 11, 2019 at 5:59 am#

      If the model is read only and you are not dependent upon state across samples, you can run the model in parallel on different machines and prepare batches of samples for each model to make predictions.

  97. PRADEEP CHAKRAVARTHI NUTAKKIAugust 11, 2019 at 3:42 am#

    Hi, I am very happy to have this LSTM example to have a practice.

    I have a problem as follows:

    I have 300 excel workbooks of which each excel sheet has 3 values…..

    the 3 values will be in this format [1.02,2.20,1.0]; [2.9,3.5,3.3];…….like this 300 sets.

    Now i want to train and test my model with the data from 300 excel workbooks as input and the model has to predict the 301th set for example: [5,3.3,2.4] depending on the sequence of previous values.
    Note: the output shouldn’t be the probability set from the 300 sets, the output should be a new set.

    Can you suggest me any solution to this problem?

    • Jason BrownleeAugust 11, 2019 at 6:04 am#

      Perhaps you can use some custom code to extract all of the data from the excel files into a csv file ready for modeling?

  98. y jingAugust 12, 2019 at 4:45 pm#

    How to construct parallel three lstms, and then add a DNN in series.

  99. DoronAugust 13, 2019 at 4:42 pm#

    Hi Jason,

    Thanks for this wonderful post. I have been trying to digest LSTM’s (metaphorically) and one particular aspect was not clear to me. I know the general structure of LSTM’s but I’m having hard time to understand:

    model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))

    When ReLU is set as an activation function, but not in the output layer, what exactly happens behind the scenes? To make myself clear, I am aware of the gates and their respective activation functions: sigmoid and tanh. But if we set ReLU like above, does that mean that each unit/LSTM cell outputs a hidden state –> pass it to a ReLu –> pass it to the next unit/LSTM cell?

    Thanks!

    • Jason BrownleeAugust 14, 2019 at 6:33 am#

      Yes, that is correct. It controls the output gate, not the internal gates which are governed by a sigmoid.

  100. DawjiddaAugust 17, 2019 at 1:46 am#

    hello Mr Jason Brownlee please my dataset is in matrics form, i want convert it to fit into GRU or LSTM sequential model,

  101. TommyAugust 18, 2019 at 9:22 pm#

    Hi Jason,

    A problem is involved in my mind, If it is possible, I want to know your opinion.

    What will happen if we use both lstm and gru layers simultaneously in the model? Does this make sense?
    For example this architecture:

    model=Sequential()
    model.add(GRU(256 , input_shape = (x.shape[1], x.shape[2]) , return_sequences=True))
    model.add(LSTM(256))
    model.add(Dense(64))
    model.add(Dense(1))

    Because I used this model and I got good results compared to using each one separately.

  102. Ali AltinAugust 21, 2019 at 6:14 pm#

    Hello Jason and community,

    I have a question. My dataset has 27 features. 26 of them I want to use as input and the last one as output (this feature is also the last column in my dataset). I use the multiple input multi-step output code from above. After using the function “def split_sequences(sequences, n_steps_in, n_steps_out)”, I split the dataset into train and test sets and choose a number of time steps for n_steps_in and n_steps out. After transforming from 2D to 3D with “split_sequences(train, n_steps_in, n_steps_out)” I printed the shape of train_X, train_y, test_X and test_y. The results are:

    (14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)

    My three questions are:

    1.) Does python count from 0 upwards, so that 0 is my first feature or does it count from 1 upwards?

    2.) Does python work from left to right, so that the left feature in the csv file is my first feature and so on?

    3.) Is the shape above (7130386, 20) equal to (7130386, 20, 1) or why is it 2D?

    I hope that I could explain my problem and the questions good enough.

    Many thanks in advance.

    Ali

    • Jason BrownleeAugust 22, 2019 at 6:23 am#

      Yes, array indexes start at 0.

      Yes, arrays run from left to right.

      Yes, you can transform (7130386, 20) to (7130386, 20, 1) directly. They are the same thing.

      • Ali AltinAugust 22, 2019 at 5:59 pm#

        Hello Jason,

        thank you so much for the answer. I have to other questions:

        I take the ‘split a multivariate sequence into samples’ code from above:

        def split_sequences(sequences, n_steps_in, n_steps_out):
        X, y = list(), list()
        for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out-1
        # check if we are beyond the dataset
        if out_end_ix > len(sequences):
        break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
        X.append(seq_x)
        y.append(seq_y)
        return array(X), array(y)

        After that I split the dataset into train and test sets:

        train_size = int(len(values) * 0.67)
        test_size = len(values) – train_size
        train, test = values[0:train_size,:], values[train_size:len(values),:]
        print(len(train), len(test))

        The result is:

        14476930 7130429

        The next step is to define the number of time steps:

        n_steps_in, n_steps_out = 25, 20

        train_X, train_y = split_sequences(train, n_steps_in, n_steps_out)
        test_X, test_y = split_sequences(test, n_steps_in, n_steps_out)
        print(train_X, train_y, test_X and test_y)

        The result is:

        (14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)

        The last point is to create and fit the LSTM network:

        n_features = 26

        model = Sequential()
        model.add(LSTM(50, input_shape=(n_steps_in, n_features)))

        A lot of code, sorry for that. Now the short questions:

        I want to predict the last column (column 27) in my csv-file. The first 26 are the input features (columns).

        1.) Where in the codes above do I explicitly define my input features and my output feature?

        2.) Do I have to explicitly use n_features in the code ‘model.add(LSTM(50, input_shape=(n_steps_in, n_features)))’. My aim is to train the model with the input features and the output feature and test it only with the test data without the output feature. The output feature shall be predicted.

        Is the code with n_features = 26 in my case wrong?

        Sorry that I bother you with this banal questions but I have not enough experience.

        Many thanks in advance.
        Ali

  103. JoeAugust 21, 2019 at 6:42 pm#

    Hi Jason,

    Thanks for your post.

    I would like to use a network architecture like:

    cnn = Sequential([
    Conv1D(filters=16, kernel_size=4, strides=2, activation=’relu’, input_shape=(n_steps, n_features)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2)
    ])

    model = Sequential()
    model.add(cnn)
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))

    The reason is that when the true model is path dependency, longer look back period should be used, but it is not very efficient for LSTM dealing with large time step, so I use CNN to reduce the length of time step and encode some predictive information.

    Is this make sense to you?
    Do you think pre-train would make some contribution in stacked network structure?

    Joe

    • Jason BrownleeAugust 22, 2019 at 6:25 am#

      Don’t put stock into my speculations, perhaps try it and see?

  104. AhmadAugust 23, 2019 at 7:28 pm#

    Dear Jason,

    Thank you for your great tutorial. I just have a question:

    As I understood from your explanations, for bidirectional neural networks we need both past and future input data to predict the current time step. So, in case of univariate LSTM, when we are going to predict the energy use of current time as example, we need to know the energy use of future? This is abit confusing to me. Would you please explain about it.

    Thank you

    • Jason BrownleeAugust 24, 2019 at 7:47 am#

      No, the future is predicted from the past.

      Or you can frame your prediction problem any way you wish.

      • AhmadAugust 24, 2019 at 7:04 pm#

        Thank you for your answer. Can you explain a bit more to make it clear? Because as I just checked the mathematical formulation of Bidirectional RNNs, I see that there is a hidden state of the next time step as the input: ( x(t), h(t-1) and h(t+1) are used to calculate y(t) ).

        So, when there is a hidden state from the next time step as the input, how is t possible to just use the past data in univariate bidirectional RNN?

        Thank you in advance for your guidance

  105. AmirrezaAugust 24, 2019 at 10:23 pm#

    Actually I applied the bidirectional layer but I got much higher error than typical LSTM network. Is it possible or I am doing wrong?

    When I write 50 neurons it means that each single layer of bidirectional has 50 neurons or it would be the summation of two layers?

    • Jason BrownleeAugust 25, 2019 at 6:37 am#

      Bidirectional may require more training.

      Each direction has 50.

  106. helloworldAugust 27, 2019 at 4:31 pm#

    Hi, I have question regarding data normalization (scaling values between specific number such as [0,1]). Should I perform it before making the dataset supervised form as in this example? Or after the supervised form?

    I noticed that if I do after, the columns looks little different from each other because the scaling are done via columns only. Here is example output if done after:

    t-1 t t+1
    -1.000000 -1.000000 -0.870529
    -1.000000 -0.869976 -0.895359
    -0.869976 -0.894799 -0.897133
    -0.894799 -0.896572 -0.901271

    Is this problematic to forecast via LSTM?

  107. SamitAugust 28, 2019 at 7:26 pm#

    Hi Jason.

    Great Tutorial. I have electronic health record data which has multivariate time series inputs. Is it better to use normal LSTM or bidirectional LSTM for prediction?

    Thanks

  108. Radhouane BabaAugust 28, 2019 at 8:20 pm#

    Hi Jason,

    i am trying to train my model to forecast a 144 data points (1 day) (10 minutes for each values (load forecast for a home)) based on 5 days (=144*5 values) (i have more data but till now i didnt find a good result so i training my model by less amount of data.. it takes so long)
    there a seasonality each day.. so i chose the n_input to be 144.
    i am varying the batch size from 1 to 6… and the epochs from 25 to 150,
    but my problem is: each time i get a result, i have one of these problems:
    1- values converge to a constant (i thought maybe it is underfitting)… so i try to reduce batch size and increase epochs
    2- when i do so.. i always get a loss value of n.a.n and then i get no predictions from the model….

    can you please recommend something?

    thank you so much!!!!
    i appreciate it!

  109. Radhouane BabaAugust 28, 2019 at 8:28 pm#

    Hello Jason,

    i still have another question:

    is it better to forecast 144 values through the dense(144) at once?
    or
    like what i am doing.. i am forecasting only 1 value and then append my history with it:

    history.append(yhat_sequence)

    add.dense(1)

    Thank you so Much!!!

    • Jason BrownleeAugust 29, 2019 at 6:05 am#

      Perhaps compare a few approaches for your dataset and discover what works best.

  110. KenAugust 30, 2019 at 7:15 pm#

    Hello Jason,
    Thanks for this great tutorial and dive into LSTMs.
    For Multiple Parallel Input and Multi-Step Output you also mention that it is possible to use the vector version of LSTMs. I cannot get my head around it how that model should look like.

    model = Sequential()
    model.add(LSTM(100, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(100, activation=’relu’, return_sequences=True))
    # What is needed here??? dim to n_steps_out
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    The above architecture doesn’t what I intend. In the end there should be an output of dim (batch_size, n_steps_out, n_features) but what I achieve is (batch_size, 100, n_feautres) or an error. So how to make the above architecture work without the encoder-decoder version of your snippets?

    Thanks a lot for all of your hard work

    • Jason BrownleeAugust 31, 2019 at 6:03 am#

      Perhaps you can use the example in the post as a starting point?

  111. Vishal SainiAugust 30, 2019 at 8:37 pm#

    Hi Jason,

    Really interesting article!!
    Actually I have a doubt, I am currently trying to forecast sales of business based on the discounting burn. So, The future dependent variable values are usually fixed, is there any code which deals with such a problem.

    Thanks and Regards

  112. Sai Krishna NatesanSeptember 2, 2019 at 6:23 am#

    Dear Jason,

    I have a question about Multivariate LSTM Models.

    In the Multiple Input Series, your input is

    80, 85
    90, 95
    100, 105

    And you’re trying to predict the output of 205.

    In the Multiple Parallel Series, your input is

    70, 75, 145
    80, 85, 165
    90, 95, 185

    And you are trying to predict the output of
    [100, 105, 205]

    My question is, in the first model, you know more information about the output in the past that you are not passing on to the model.

    So, the actual input should be
    80, 85, 165
    90, 95, 185
    100, 105, X
    Where we are trying to predict X

    Similarly, in the second model let us assume that you know the first two fields 100 and 105 and you only want to predict the 205.
    70, 75, 145
    80, 85, 165
    90, 95, 185
    100, 105, X
    Again we are unnecessarily trying to predict some known values.

    Is there a model where I can use all available information from the previous time series and try to predict X?

    I learnt a lot from this post and the above question is something I am trying to answer. Thanks a lot for sharing your knowledge. It is helping us a lot.

    • Jason BrownleeSeptember 2, 2019 at 1:48 pm#

      Yes, you can frame the problem anyway you wish.

      In your proposed framing, you could use a new token to indicate missing and then use a Masking input layer.

      Or a multiple input model with a separate input for the dependent variables and the univariate series that your predicting.

      Perhaps experiment and see what model you prefer and what works best for your specific dataset.

  113. JemSeptember 3, 2019 at 2:13 am#

    Hi Jason,

    I was implementing the cross-validation method for the LSTM Encoder-Decoder model, I wanted to ask you if it is better that at each step I recreate the class or I can use the old one calling the fit method.

    Thanks and Regards

  114. sara jSeptember 4, 2019 at 12:31 am#

    Hi Jason,

    If I have to train my model in such a manner that I have the data like :
    Input are two columns i.e temperature and pressure i.e. the first 25 perc data and output are also two column temperature and pressure i.e. the 75 perc data remaining one.
    My goal is to predict the temperature and pressure together by giving little input and receiving greater output to LSTM
    . If i train my model by giving input [x,y] can I predict [x,y] but I do not want to give time stamp. Which method should I follow?

    I have already made my data according to your blog and I am now confused hot to train the model without time steps

  115. ritaSeptember 10, 2019 at 12:20 am#

    Hi Jason,

    Why you have not used the minmax scaler over here while training the input sequence in the LSTM model?

    • Jason BrownleeSeptember 10, 2019 at 5:50 am#

      Good question, I skipped scaling to keep the example simpler – e.g. for brevity.

  116. ritaSeptember 10, 2019 at 6:08 pm#

    Thank you very much and I have one more question if I have 200000 data points and I have to make time steps for them maybe dividing the data into 5 time steps and giving 40,000 points in each of the timestep for LSTM will it be a good training? or you can suggest something for this? So, that I can prepare the data properly.

    I have a multivariate data of 2 variables and want to predict both of them. So, basically 2 inputs and 2 outputs but do I have to make them supervised first as they are temperature and viscosity and they are dependent on each other with respect to time.

    So, should I supervise them first or I can directly use multivariate time series for the prediction by dividing the data into 5 time steps and predicting 2 outputs.

    Do you provide any consultations also?

  117. HimanshiSeptember 13, 2019 at 5:28 pm#

    Hi,

    can you please tell me how to visualize the results. As, when I am reshaping the array it is not able to get reshaped into 2 dimension from 3D.

    Thank you and have a nice day!

  118. christinaSeptember 16, 2019 at 7:52 pm#

    I think I did not reframe my question properly. My question is for example: I trained my LSTM model with 300 n_step_in and 300 n_steps_out. Now, after the training, yhat has a shape (20000, 300,2) . So, when I am reshaping it to 2D so as to see the results it is giving me an error and is not able to reshape it back.

  119. sxSeptember 17, 2019 at 3:54 pm#

    Hi can i add an extra layer under this one and if yes how should i do that?
    model.add(LSTM(200, activation=’relu’, input_shape=(n_timesteps, n_features)))

    Thanks in advance.

  120. bhavnaSeptember 17, 2019 at 6:21 pm#

    Hi, can you please tell me is this type of prediction only suitable for sequential data?

  121. surajSeptember 17, 2019 at 8:09 pm#

    Hi Jason,

    If I have unsupervised data and I make it supervised for the training in the LSTM model.
    My question is that when we make the data supervised and we give input data points and we predict the output data points, but the output is just the n+1 point of input and at last we are only predicting 1 point from the whole data. Basically we are giving the model all the points in the training only. What is the model actually doing?

    • Jason BrownleeSeptember 18, 2019 at 6:05 am#

      The model learns a function that takes input points and predicts the next point.

      • SurajSeptember 18, 2019 at 4:57 pm#

        but what if I want the model to get not all data as input points and just few input points to predict the remaining data? then what strategy is used?

        • Jason BrownleeSeptember 19, 2019 at 5:52 am#

          You control what data goes in and out of the model.

          Prepare the data you want to feed in and make a prediction.

          The examples above will provide a template you can use to start with and adapt for your problem.

  122. sxSeptember 18, 2019 at 12:24 am#

    Yes i want to apply it at time series

  123. PeterSeptember 26, 2019 at 5:52 pm#

    Sorry you are talking about time series, what if there is a date with time (I didn’t see the feature of date and time in your created data)

    • Jason BrownleeSeptember 27, 2019 at 7:47 am#

      Date and time are removed from the dataset and the series of observations is worked with directly.

  124. LuisSeptember 29, 2019 at 9:32 am#

    Hi Jason,

    I have really enjoyed many of your articles over the last half year. Question on your output vector model using stacked LSTM model. Under the hood, what type of architecture is being used here for 3 input time-steps and 2 output time-steps. I’m sure it is a many-to-many problem, but can you help me with the exact visual connection? Is the first output time-step laid out directly over the second time-step of the input series?

  125. AlOctober 4, 2019 at 8:57 pm#

    Hi Jason, thanks for your great posts and prompt replies. On a Multi-Step LSTM Models when I loaded my dataset I first noticed that the number of steps should be a number divisible by the length of the dataset (i.e. if my data is 1239 rows, a step in number of 59 is suitable since 1239/59 = 21). In fact trying with non-divisible numbers assigned to n_steps_in would result in nan loss values when fitting the model. I was indeed able to run all the way 50 epochs using 59 over 1239, however something I cannot explain happened: after re-running the code without making any changes, the loss on the various epochs (after setting the verbose to 1) jumped back to nan. Running it again it would start populating some values and along the way end up in nan.. It is very erratic and unpredictable and to end up all epochs looks like a lucky test, Could you help me to understand what is wrong? Thanks!

    • Jason BrownleeOctober 6, 2019 at 8:09 am#

      Yes, it might help to scale your data prior to modeling.

      • AlOctober 7, 2019 at 10:52 pm#

        Yes, you are correct, as always. Scaling not only did not return nan but also made each epoch faster to run. Thanks Jason!

  126. AddiOctober 7, 2019 at 1:32 am#

    Thanks Jason. I apologize if this was addressed somewhere in the list of comments but in the case of predicting a continuous variable, how would you compare the performance of LSTM vs. another algorithm such as Random Forest?

    Other than comparing the actual value vs. predicted value from both models, is there a separate way to assess accuracy of both models?

  127. ABDULKARIM GIZZINIOctober 8, 2019 at 7:43 pm#

    Thanks Jason,
    all your work is clear! thank you very much. I have some questions if you please. What are the differences between all LSTM models you applied above ? is there and performance trade-off between them? because you repeat the sentence that we can use any of them for time series forecasting.
    on the other hand, im working in the domain of wireless channel prediction. its a complex number problem. So can I split it into real and Imag parts and apply your LSTM models for each part separately and then concatenate the output results?

    • Jason BrownleeOctober 9, 2019 at 8:10 am#

      Good question.

      Not so much a performance trade-off as different framings of the problem, or different problem types.

      The goal was to show you how flexible the method is and that you should adapt it to your problem, not your problem to the method.

      Not sure about imaginary numbers in neural nets or Keras, sorry.

  128. Mayank PrakashOctober 9, 2019 at 10:46 pm#

    I wanted to know how to approach this problem. Let’s say we have a time series with 2 features, ranging from 0 to n as:
    [a0, b0], [a1, b1], [a2, b2] upto [an, bn]
    The output of the series would be,
    [a0 b0], [a1, b1], [a2, b2] -> [a3]

    The issue is [b3] also play an important role in determining [a3].

    My question is how do I incorporate this so that, I am able to use a0, a1, a2, b0, b1, b2, b3 to feed into the model and predict [a3].

  129. Yawar AbbasOctober 10, 2019 at 4:30 am#

    Great tutorial.
    I have a question related to lstm model for time series forecasting problem. I have dataset with four input features like 78, 153.23, 77.25, 4.33.
    The first input ordering difference is like 78,80,87,96….so on.
    The other inputs ordering is well like 77.25,77.35,77.40….
    I have used lstm model with one previous timestamp as input to predict the next timestamp which predict well on the last three input but poor for the first one.i.e.
    Actual: 78, 153.23, 77.25, 4.33
    Predicted: 82, 153.01, 77.02, 4.12
    How i tunned this model for good result of first input?

  130. ovi95October 12, 2019 at 12:01 am#

    hi Jason,
    I want to make a model to predict the Inflow to a reservoir, with past rainfall data, temperature data, and also past inflow data.
    i want the model to be able to predict the inflow for a week ahead (7 timesteps) when given the past week’s, rainfall and temperature data.
    what model should i use for this?

  131. UliaOctober 17, 2019 at 6:34 pm#

    Hi Jason,

    Can I put time in the X axis to predict wind speed on Y axis?

    Best Regards

  132. James AOctober 21, 2019 at 6:43 am#

    Hi Jason,

    In the “Multiple Parallel Input and Multi-Step Output” example, you stated that it could be done with the vector output method, or the encoder/decoder, and proceeded to demonstrate the encoder/decoder.

    I’ve been wondering how the example would look in vector output form. Would the target, y, for each sample need to be merged into a single 1D array, or vector?

    For example,
    If y for one sample looks like:
    [a1,b1,c1],
    [a2,b2,c2],

    [an,bn,cn]

    Would we reshape it into something that looks like this?
    [a1,b1,c1,a1,b2,c2,…,an,bn,cn]

    • Jason BrownleeOctober 21, 2019 at 1:38 pm#

      Probably one long 1d vector with all time steps that you can then choose to interpret anyway you wish (e.g. by the structure of the expected/target y).

  133. KannuOctober 22, 2019 at 12:55 am#

    hey,

    How can we se the root mean square error in the training of the model here

    Best Regards,
    Kannu

  134. Vishnu SureshOctober 24, 2019 at 3:55 am#

    Hello Jason,

    I am trying to use the CNN-LSTM for forecasting

    The split sequences gives an output of

    (175196, 4, 4) (175196, 1)

    Where 175196 is the samples, 4 is number of steps and 4 is the features ( variables)

    Then i reshape the input vector as directed in the tutorial, but when i run the model

    I get this error:

    at: TypeError Traceback (most recent call last)
    in ()
    22 model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    23 model.add(TimeDistributed(Flatten()))

    —> 24 model.add(LSTM(50, activation=’relu’))

    25 model.add(Dense(1))
    26 model.compile(optimizer=’adam’, loss=’mse’)

    TypeError: while_loop() got an unexpected keyword argument ‘maximum_iterations’

    I know it is hard to debug in this manner! but any idea what could be wrong here ?

    • Jason BrownleeOctober 24, 2019 at 5:46 am#

      Do other Keras examples work for you?

      It is possible that there is a fault with your Keras/TF installation?

      • Vishnu SureshOctober 24, 2019 at 6:04 am#

        Yes Other keras examples work, CNN, Multi-Headed CNN etc.

        • Vishnu SureshOctober 24, 2019 at 8:54 am#

          You were right 🙂 I updated to higher version of tensorflow and keras and it worked! thanks!

        • Jason BrownleeOctober 24, 2019 at 1:59 pm#

          That is surprising, not sure I have good advice sorry.

          Perhaps try simplifying the example and see what the case of the fault could be on your workstation?

  135. FRIOctober 24, 2019 at 9:31 pm#

    Hi Jason,
    Thank you for this interesting article. Can I create one model for all sites with LSTM ? That means if we have for example a group of persons and every person has its time series data with different features, LSTM model can learn from all these time series for once?
    Best regards

  136. Sarveswara RaoOctober 30, 2019 at 10:22 pm#

    Hi Jason, how do we chose n_steps in the split_sequence() ? or we should consider n_steps as an hyper parameters or it can be set by an statistical test? Thank you for work jason. i am following ur site from past 2 yrs. ur content is best in the ml community.

    • Jason BrownleeOctober 31, 2019 at 5:29 am#

      A hyperparameter.

      Thanks, I deeply appreciate your support!

  137. jasperNovember 9, 2019 at 3:32 pm#

    Hi Jason,
    i have some questions in LSTM model.
    First, it is the LSTM input x definition. In the time series forecast case, we divide input data into some portion by batchsize parameters. Later, these 2D portion data were transformed into 3D tensor data and feed to model for training. After all portions feed to the model and complete forward/backward propagation, the 1 epoch routine is completed. My question is : in the x[t] input time, the LSTM model input x refers to only first portion of x data or the all portions data ?
    Second, what is the LSTM_unit parameter definition ? My understanding is the number of the LSTM input x vector’s element. For example, if have 10 input, the LSTM_unit should be 10 to capture all the input vector. But, it is not always requiring the higher numbers such as 20, so on.
    Third, is there any “feature importance” example in the LSTM now? I am looking forward and quite frustration this moment. Could LSTM and XGBoost have sample feature importance result ?

    many thank

  138. mingkaiNovember 11, 2019 at 7:31 pm#

    Hi Jason, I run the first example, but it was failed. It shows: TypeError: Input ‘b’ of ‘MatMul’ Op has type float32 that does not match type int32 of argument ‘a’. Do you know what the problem is?

      • mingkaiNovember 14, 2019 at 2:03 pm#

        Thank you, Jason. I have solved the problem. The reason is, I installed the tensorflow 2.0 + keras 2.2.4, but these two are not matched, so I use tensorflow.keras instead of keras. I added a command “x_input = x_input.astype(‘float32’)” in the code, and it run swiftly. Another way is to install the tensorflow version 1.15.0, and no problem occurs.

        • Jason BrownleeNovember 15, 2019 at 7:41 am#

          Happy to hear that you solved the problem.

          You can use Keras 2.3. with TensorFlow 2.0, or Keras 2.2 with TensorFlow 1.15.

  139. BonnardNovember 12, 2019 at 7:43 pm#

    Hi Jason,

    I have a problem to modelize and i think lstm network are the most adapted models to do it.

    I want to predict the true trajectory of an airplane before it takes off. I have to set of data, the first is the trajectories announced before departure (the fake), and the second is the trajectory announced after landing (the true).
    I want to predict the true, giving the fake.

    I have a list of array, each array represent a flight made by a plane. Each flight is represented by different variable and after an interpolation i have 50 observations points by flight.
    At each point we can observe a vector of our variables like latitude, longitude ect ..
    Let assume i have N variables like that.
    I have 2200 flights, so my input data is an array with (2200,50,N) shape.

    I already tried a little model but oddly the model seems to follow the fake trajectory and not the true.
    Do you have an idea of what architecture i can use ?

    Thank you a lot

    • Jason BrownleeNovember 13, 2019 at 5:40 am#

      Perhaps test a suite of different approaches and discover what works best for your specific dataset?

      • BonnardNovember 13, 2019 at 7:45 pm#

        Yeah this what i am doing, but maybe you can help with the last layer, i think the error comes from there.
        As i said I have a vector (50,N) shape wich represent a flight with 50 points and N features, and i want to predict a (50,2) vector wich is 50 points with (latitude longitude).

        I cannot use dense layer at the end of the model because it does not return the right shape.

        • Jason BrownleeNovember 14, 2019 at 8:01 am#

          Encoder-decoder with 2 nodes in the output layer and 50 in the repeat vector layer – this would achieve the desired output.

  140. MarlonNovember 13, 2019 at 12:51 am#

    Hello,

    Thanks for your tutorials; they are amazing! I’m having the following pitfall by implementing your ideas: I use your “split_sequences” in order to prepare the network input and, accordingly, I train my network and save the model. When I use the same input in the trained model and plot it, I get a very weirdo plot, like the many times over ploted lines. Do you mind what is my problem?

  141. MichaelaNovember 13, 2019 at 6:27 am#

    Hi Jason,

    I’m building a Multiple Parallel Input and Multi-Step Output model, and I’m curious why you repeat the same LSTM output inmodel.add(RepeatVector(n_steps_out))? The alternative that I was thinking is using the keras functional API, training n_steps_out LSTMs from the input, concatenating the output of these LSTMs, and feeding it into the next LSTM. so it would look something like this

    input = Input(shape=(n_steps_in,n_features))
    concat_layers = []
    for i in range n_steps_out:
    concat_layers.concat(LSTM(200,activation=’relu’))(input)
    x = tf.keras.layer.Concatenate(concat_layers)
    x = LSTM(200,activation=’relu’,return_sequences=True)(x)
    x = TimeDistributed(Dense(n_features)))(x)
    model=Model(input,x)
    model.compile(optimizer=’adam’, loss=’mse’)

    The biggest drawback that I can see is there will be a ton more parameters, but are there other issues that I’m missing? For instance, does this get rid of some relationship between the different timesteps that the previous model maintains better?

    Thanks!

    • Jason BrownleeNovember 13, 2019 at 1:41 pm#

      The reason is because it is an encoder-decoder model where the same encoding of the input is used in the generation of each output time step.

      Perhaps try it and see? It’s could to test a suite of different models in order to discover what works best for your specific dataset.

  142. fanNovember 19, 2019 at 9:01 pm#

    Dear Jason,

    thanks for the tutorial, that is very helpful! However, i am having a hard time to understand the input shape given in the CNN LSTM example below:

    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_features = 1
    n_seq = 2
    n_steps = 2
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))

    Here, X is first reshaped into 4 dimensions, however, the input_shape defined and used in the model Conv1D layer is 3 dimensions. Is the None used in “input_shape=(None, n_steps, n_features)” referring to the “n_seq” dimension of X or the number of samples of X…?
    And then, the data used to fit and predict are again 4 dimensions…
    could you please kindly explain a bit? I am really confused …

    thanks a lot!

    • Jason BrownleeNovember 20, 2019 at 6:13 am#

      Yes, the CNN must process sub sequences and then groups of processed subsequences are passed to the LSTM.

    • Fang HeDecember 11, 2023 at 8:51 pm#

      Accturally, each piece of X is 3 dimentions(n_seq, n_steps, n_features) and every time the model accepts one piece of X in this CNN-LSTM case.

      I think the None refers to the n_seq but the n_seq is expressed through using TimeDistributed(), so there is a None to stand the place of the first dimentions.

  143. ArjunNovember 21, 2019 at 8:34 pm#

    Hi jason,
    What if we had a dataset of every day of a years sales data and we wanted to predict say for example 10 days sales based on the sales data of previous 30 days? What should be the form of output that we get? and also the code for getting the predicted value? Is it model.predict(X_test)?

  144. ArneNovember 22, 2019 at 12:58 am#

    Hey Jason, I am halfway through and reading this stuff is pure joy! Thank you for your tremendous efforts and making this available! I’ve become an instand fan of your site.

  145. jasperNovember 25, 2019 at 1:15 am#

    Hi Jason,

    one practical question in LSTM. If the input data sets have the various range, how to deal with the LSTM forecast model ? For example, if input vector one spans 0~100, vector two spans 0~0.5, could we still put these two input vectors together to compile the model? I use SHAP package to analyze the weight. In this case, vector one is always very strong rather than vector two. In mathematical view, this result is correct. how do you think in this case?

    jasper

  146. SaeedNovember 27, 2019 at 1:44 pm#

    Hi Jason,

    Thank you for such a detailed explanation. I am having an issue with scaling data for a multistep multivariate lstm problem. I am taking data of last 14/21 days to predict for the next 7 days. Can you please give any idea what is the proper way of scaling data using MinMax for these type of problems, as I am lost in the shapes of matrices.

      • SaeedNovember 27, 2019 at 3:38 pm#

        Thank you. I know how scaling works and I have implemented it in single step forecasting. However, when it comes to multistep, we actually split the data and it becomes 3 dim after using the split_sequency function. Which means we have 3 dim matrices for X and Y.
        Scaler doesn’t work on 3 dim matrices.

        If I do scaling before splitting, I will end up with a matrix dimension that I can’t retrieve after prediction and thus will be stuck without doing the inverse_transform for scaling. I will appreciate your help in this matter

        • Jason BrownleeNovember 28, 2019 at 6:31 am#

          Yes, it is sticky. You may have to write some custom code as the libraries don’t accomodate it.

          Perhaps try using relu and no scaling, at least as a starting point.

  147. juntaoDecember 19, 2019 at 2:25 pm#

    Hi Jason,
    I want to introduce the attention mechanism to the Encoder-Decoder model
    for regression problem (with Multiple Input). Is there any other article that can help me solve this problem?

  148. husfeDecember 25, 2019 at 8:27 pm#

    Hi Jason.
    Is there some simple method to add attention to the Encoder_Decoder Model in this article?
    I’ve trid to use AttentionWrapper class to achieve it, but I’m failed, because It’s hard for me to do it during a short time. So can you give me some guide?
    Thank you!

  149. SanDecember 26, 2019 at 10:33 pm#

    Hi Jason,

    Thank you so much for this valuable tutorial. Really appreciate it.

    Jason, I’m bit new to DL with RNN. I have two small doubts to get cleared. In my question I want to predict how many steps (i.e:- step counts) a participant walk tomorrow depending on the previous step counts. For this we have collected step counts of large number of participants for n number of days.

    Is this a univariate problem where each participant step count is taken as a univariate sequence and train the model? AND do you think RNN is a good move to this problem?

    Do I have to scale the sequences of each and everyone’s step counts (by taking the each participants current mean and sd) Or can’t I use the raw count?

    Thank you so much in advanced again. All the best for your future work too!!!!
    San

  150. Abhishek SinghalJanuary 5, 2020 at 4:06 am#

    Thanks a lot Jason for sharing such a knowledgeable article,

    I have a doubt in my case,
    for the last- Multiple Parallel Input and Multi-Step Output

    I am trying to predict next 6 or 12 hours data, as of now trying to predict next 6 hours data training with n_steps_in- 72 and expecting n_steps_out- 6 with 6 features
    but I am getting output as nan

    Please see if I am doing something wrong..

    def split_sequences(sequences, n_steps_in, n_steps_out):
    X,y = list(), list()
    pt = progress_timer(description= ‘Split Sequences’, n_iter=len(sequences))
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    pt.update()
    pt.finish()
    return array(X), array(y)

    dataset = df_104902.values
    # choose a number of time steps
    n_steps_in, n_steps_out = 72, 6
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]
    # define model
    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=30, verbose=0)
    # demonstrate prediction
    x_input = array(df_104902[-72:])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Output is coming-

    [[[nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]
    [nan nan nan nan nan nan]]]

    • Jason BrownleeJanuary 5, 2020 at 7:08 am#

      nan output is not good.

      Perhaps check the scale of your input data and normalize or standardize prior to fitting the model?

  151. Abhishek SinghalJanuary 5, 2020 at 4:14 am#

    also my X and y shape is – (20875, 72, 6) and (20875, 6, 6) respectively.

    and x_input is (1, 72, 6)

  152. Willian AlcoserJanuary 11, 2020 at 1:57 am#

    Saludos Jason, una consulta, como puedo validar el método de predicción. He visto en otros ejemplos que la serie lo dividen en dos partes: en entrenamiento y prueba, y en este caso no lo hace, a que se debe eso ?

  153. MehdiJanuary 17, 2020 at 12:06 pm#

    Dear Jason,
    First of all, thank you so much for your time and great contents.
    Second, I studied your website for long time. I have a question: I have developed a model which predict the price of shares, my model can predict X_test data as well, now how can I forecast sequences(future times) does not happened?

    • Jason BrownleeJanuary 17, 2020 at 1:50 pm#

      You’re welcome.

      Call model.predict(newData) to make predictions on new data.

      • MehdiJanuary 19, 2020 at 4:18 am#

        newData are not available, i.e. the future days does not happened and not available, how do I prepare them for the model?

        • Jason BrownleeJanuary 19, 2020 at 7:20 am#

          You must design and train your model based on the data you will have available at the time a prediction is required.

          For example, if you have 7 days prior data at the time of prediction when predicting the next week, then design your model around that and train it on that type of data.

          Then when you start using your model on new data, you will have the data available.

  154. MehdiJanuary 19, 2020 at 5:28 pm#

    Dear Jason,
    Thank you so much for your time and attention. I will try your approach.

  155. Pietro FUSCOJanuary 21, 2020 at 3:07 am#

    Dear Jason,
    Thank you so much for your time and attention
    I was wondering if I can use time as univariate sequence.

    Regards

  156. Adonis El HajjJanuary 23, 2020 at 1:49 am#

    Hello Jason,

    my model will learn from the past Forcast data and past actual AC Power data.
    my Input is the future 7 days Forecast as csv file.
    my goal is to predict the AC Power data based on the input.
    I dont know how to apply what I want to you model here.
    can you please help me?

  157. Anshu ShahJanuary 25, 2020 at 6:30 pm#

    Thank you so much. I was struggling to understand LSTM.
    Your work helped me a lot.

    • Jason BrownleeJanuary 26, 2020 at 5:15 am#

      You’re welcome, I’m happy to hear that.

      • PatrickFebruary 7, 2020 at 11:13 pm#

        Dear Jason,

        Thank you for your contributions. You have helped me a lot in the start of deep learning.
        I have a question. I am working on a model and surprisingly the predicted output shape is different from the target shape of training data
        Traning: X (12000, 12, 8), Y (12000,)
        Test: X (3000, 12, 8); Y (3000,)
        pred = model.predict (X (3000, 12, 8))
        and pred shape is (3000, 12, 1) but I was expecting (3000,)
        what am i doing wrong?
        Please help me

        • Jason BrownleeFebruary 8, 2020 at 7:13 am#

          Perhaps double check the structure of your model, e.g. the output layer/model.

  158. wangFebruary 3, 2020 at 5:49 pm#

    Dear Jason,

    thanks for the tutorial, that is very helpful! However, I use data normalization method for input data(10,20,30…) carry out your Multi-Step LSTM Models, it happens error. I dont konw how to resolve it. Pls see the belowing program. Thanks!

    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    import collections

    # split a univariate sequence into samples
    def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the sequence
    if out_end_ix > len(sequence):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence

    training_set = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])

    training_set = training_set.reshape(-1,1)
    from sklearn.preprocessing import MinMaxScaler
    sc = MinMaxScaler(feature_range = (0, 1))
    raw_seq = sc.fit_transform(training_set)

    print(raw_seq)

    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # split into samples
    X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features = 1
    X = X.reshape((X.shape[0], X.shape[1], n_features))

    # define model
    model = Sequential()
    model.add(LSTM(40, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(40, activation=’relu’))
    model.add(Dense(n_steps_out))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    print(‘X: \n’,X)
    print(‘y: \n’,y)
    model.fit(X, y, epochs=60, verbose=0)

    # demonstrate prediction
    #x_input = array([70, 80, 90])
    x_input = np.array([70, 80, 90])
    x_input= x_input.reshape(-1,1)
    x_input = sc.transform(x_input)
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    yhat = sc.inverse_transform(yhat)
    print(100,110)
    print(yhat)

  159. wangFebruary 4, 2020 at 3:09 pm#

    Thank you so much.
    I have resolved the problem.
    Thank for your tutorial.

  160. WandyFebruary 10, 2020 at 7:36 pm#

    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_features = 1
    n_seq = 2
    n_steps = 2
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))

    The code in your post, use CNN+LSTM for Univariate Models above.
    I am confused in the numbers of n_seq, why is 2. And Can I consider the n_seq as the times_step of LSTM?

  161. FrancisFebruary 12, 2020 at 7:25 pm#

    Thank you for your great tutorial!
    BTW I found a more pythonic way to write the split_sequence() function.
    Regards,

    1
    2
    3
    defsplit_sequence(sequence,n_steps):
        splitted_seq=np.array([sequence[i:i+n_steps+1]foriinrange(len(sequence)-n_steps)])
        returnsplitted_seq[:,:n_steps],splitted_seq[n_steps]

  162. AmelieFebruary 12, 2020 at 11:58 pm#

    Hello Mr. Jason,

    Please, I have a technical question about the LSTM model.
    The LSTM is defined with default activation functions such as:
    3 sigmoid for the input gate, the foget gate and the output gate.
    and 2 tanh for updating the internal states of the recurrent layer.

    In your code:
    In your code:
    #########################
    # define model
    model = Sequential()
    model.add(LSTM(50, activation=’relu’, return_sequences=True, input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    #########################

    Have you changed the sigmoid with a relu or a tanh?

  163. bunty sahooFebruary 13, 2020 at 4:28 pm#

    Thanks for the wonderful explanation. I have query regarding which category my dataset and requirement falls into.

    i want to forecast number of defects for each of the 3 parts.
    i have dataset like : Part (a,b,c are components of that Tool)

    date Part Tools shipped num of defects(of parts)
    2019-01-01 part a 2 0
    2019-01-01 part b 1 2
    2019-01-01 part c 2 2
    2019-01-08 part a 2 0
    2019-01-08 part b 1 1
    2019-01-08 part c 2 1
    2019-01-15 part a 2 0
    2019-01-15 part b 1 1
    2019-01-15 part c 2 3

    i want to forecast what will be the number of defects of all parts in next 2 weeks for example.
    Tools shipped column has relationship with number of defects.I have future data for Tools shipped too. so output desired :

    2019-01-22 part a 2 ??
    2019-01-22 part b 2 ??
    2019-01-22 part c 2 ??

    tools shipped for a particular week is constant

  164. AyushFebruary 13, 2020 at 8:04 pm#

    Hi Jason,
    Thank you for such an informative tutorial. I am planning on implementing LSTM for a multivariate time series data. The input dimension is (1000*7*24) and the output is (1000*30). I wanted to understand how can I decide how many layers and units to use. Similarly the batch size which would be appropriate in this case. It would be great you could comment on some of standard heuristics or point to some reliable resource for the same.

  165. AndreiFebruary 19, 2020 at 9:07 pm#

    Hi Jason,
    Great tutorial, it really helped my get on my feet and started.

    I have a question on Multiple Parallel series. Does parallel mean the input features and the output are treated as independent across columns?

    To be more specific, using a 3 feature vector and 4 steps as input:
    [ [F1_t1, F2_t1, F3_t1],
    [F1_t2, F2_t2, F3_t2],
    [F1_t3, F2_t3, F3_t3],
    [F1_t4, F2_t4, F3_t4] ]

    to predict:
    [F1_t5, F2_t5, F3_t5]

    does F1(t1 tot t4) have no effect on prediction F2_t5 or F3_t5 ?

    Also, how would you go about combining Multiple input and Multiple parallel series in a case where the the input is a N-feature vector and using 3 timesteps, predict M-features (many-to-one), where M < N (and M-features included in the N-features) .

    And on a separate note, any literature suggestions for using this with categorical data? I tried encoding to numerical, but they are not treated as categories

  166. zhouhuaFebruary 20, 2020 at 3:14 pm#

    Hi Jason,

    I am new for the LSTM, can you put a related picture of topology for each type’s visualization?

  167. RobertoFebruary 22, 2020 at 5:07 am#

    Hi Jason,

    Thank you very much your effort and for offering us your great tutorials. I enjoy a lot!
    I do not have much experience with LSTM so I get already problems with definitions which are problably clear for most of the readers. For Vanilla LSTM you say you use 50 LSTM units. Does it mean you have 1 LSTM whose Input is 3 dimensional and the output 50 dimensional or you actually have 50 LSTM accepting 3 dimensional vectors and 1 dimensional outputs?

    • Jason BrownleeFebruary 22, 2020 at 6:34 am#

      Yes, 50 units, each of which takes the full input and produce an output.

  168. BenFebruary 27, 2020 at 3:15 am#

    Hi Jason, where you stateVanilla LSTM for univariate time series forecasting and make a single prediction. is it possible to predict more than a single variable? How would I modify to make 5 value predictions?

  169. Alvaro Fierro ClaveroMarch 4, 2020 at 12:59 am#

    Brilliant post. Very enlightening.

  170. manjeet kumar yadavMarch 6, 2020 at 6:18 pm#

    Hi Jason
    I need help i am working on project for HAR for video dataset, could you help me making model
    which use cnn-lstm .

  171. DavidMarch 7, 2020 at 6:07 am#

    Hi, Jason.
    Great job, I have build a model, that performed well, but when I close the program, open and run again doesn’t perform equal, but when I restart the PC does work properly, I am running in CPU, what could be causing this problem? , how do avoid this from happening?, your answer will be most appreciated

    • Jason BrownleeMarch 7, 2020 at 7:21 am#

      I have not heard of this kind of problem before, sorry.

      Perhaps try posting your experience on stackoverflow?

  172. Peter YocoteMarch 19, 2020 at 7:59 am#

    Hello Jason

    I was wondering how could we know the accuracy and have some sort of validation_data (the parameter used in model.fit).

    This to obtain the loss and accuracy curves for training and validation

    Could you please give me some guide on this
    Thanks a lot

  173. MikeMarch 31, 2020 at 2:22 am#

    Hello Jason,

    Thanks for the valuable efforts

    Do you think that TS Deep Learning has proved itself successful when applied to stock market forecasting?

  174. MarlonApril 1, 2020 at 7:41 am#

    When I train my model it has a two-dimension output – it is (none, 1) – corresponding to the time series I’m trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output – (none, 40, 1) – corresponding to the reshaping of the network input training dataset. What is wrong?

    Here is the code:

    df = np.load(‘Principal.npy’)

    # Conv1D
    #model = load_model(‘ModeloConv1D.h5’)
    model = autoencoder_conv1D((2, 20, 17), n_passos=40)

    model.load_weights(‘weights_35067.hdf5’)

    # summarize model.
    model.summary()

    # load dataset
    df = df

    # split into input (X) and output (Y) variables
    X = f.separar_interface(df, n_steps=40)
    # THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##

    # conv1D input format
    X = X.reshape(X.shape[0], 2, 20, X.shape[2])

    # Make predictions

    test_predictions = model.predict(X)
    ## test_predictions.shape = (59891, 40, 1)

    test_predictions = model.predict(X).flatten()
    ##test_predictions.shape = (2395640, 1)

    plt.figure(3)
    plt.plot(test_predictions)
    plt.legend(‘Prediction’)
    plt.show()

      • MarlonApril 1, 2020 at 10:34 pm#

        Hello,

        Thank you very much for your reply. Anyway, it didn’t help. I’ve changed the input size of my Conv1D from (2, 20, 17) to (40, 1, 17), but it didn’t accept – it tells me that it has negative dimension. I don’t understand why it doesn’t happen when training the network but does when I use the saved model to predict.

        • MarlonApril 1, 2020 at 10:36 pm#

          Layer (type) Output Shape Param #
          =================================================================
          time_distributed_14 (TimeDis (None, 4, 1, 24) 4104
          _________________________________________________________________
          time_distributed_15 (TimeDis (None, 4, 1, 24) 0
          _________________________________________________________________
          time_distributed_16 (TimeDis (None, 4, 1, 48) 9264
          _________________________________________________________________
          time_distributed_17 (TimeDis (None, 4, 1, 48) 0
          _________________________________________________________________
          time_distributed_18 (TimeDis (None, 4, 1, 64) 12352
          _________________________________________________________________
          time_distributed_19 (TimeDis (None, 4, 1, 64) 0
          _________________________________________________________________
          time_distributed_20 (TimeDis (None, 4, 64) 0
          _________________________________________________________________
          lstm_3 (LSTM) (None, 100) 66000
          _________________________________________________________________
          repeat_vector_2 (RepeatVecto (None, 40, 100) 0
          _________________________________________________________________
          lstm_4 (LSTM) (None, 40, 100) 80400
          _________________________________________________________________
          time_distributed_21 (TimeDis (None, 40, 1024) 103424
          _________________________________________________________________
          dropout_2 (Dropout) (None, 40, 1024) 0
          _________________________________________________________________
          dense_4 (Dense) (None, 40, 1) 1025
          =================================================================

        • Jason BrownleeApril 2, 2020 at 5:54 am#

          Perhaps there is a bug in your code.

          I am happy to make some suggestions:

          – Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
          – Consider cutting the problem back to just one or a few simple examples.
          – Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
          – Consider posting your question and code to StackOverflow.

  175. MarlonApril 2, 2020 at 2:33 am#

    Let me tell how I’ve solved, provisorily, the problem:

    I’ve used your split_sequences() for multivariate and 40 steps. Therefore, for dataset was taking the ith+40 steps and later ith+1+40 steps and so on. It always has the last item of each subsequence as a new one, all the rest equals the past subsequence.

    The output layer, for some reason that I still couldn’t figure out, is making a prediction of every subsequence. Then I design a function that takes the first item of each subsequence.

    def separador_output(sequence):
    X = list()
    for i in range(len(sequence)):
    x = sequence[i][30]
    X.append(x)
    return np.array(X)

    As a result, I’ve got the 1-Dimension time-series I was trying to reproduce.

    I sharing that because I still believe that there should be a manner of doing this without introduce such function as above.

    Best regards!

  176. JimApril 4, 2020 at 11:26 am#

    Thank you for the excellent article!

    I am trying to perform an LSTM model of time series data following the strategy you outline in tis article.

    I have one input (feature) at multiple timepoints in the past, and I use your code “split_sequence()” to split the univariate sequence into multiple samples, each with a specified number of time steps and a single output.

    I have to standardize my “train” dataset for which I had planned on using StandardScaler (per your other excellent articles including:https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/). I am performing the standardization prior to performing the SPLIT into multiple samples for the LSTM. This seems straightforward (although please comment if you think this plan is inappropriate.)

    The complication is that at any given timepoint, my single input feature actually has multiple values, each derived from any one of many “related but independent” sources. While I can perform the LSTM on each source separately, I would like to try maximizing my sample size by performing the LSTM on the aggregate of all of the sources (since the sources seem to follow similar behavior to each other, but not necessarily within the same time window). Or at least I would like to see what the results of that aggregated model looks like. My only question is: does it make more sense to perform the data input standardization separately for each source (so each source is standardized to mean of zero and SD of 1, and has equal weighting in the model), versus standardizing once across all sources in the aggregated data.

    (I am relatively new to machine learning, so I apologize if my question is a bit naive.)
    Thank you for your thoughts.

    Jim

  177. JamApril 5, 2020 at 7:58 pm#

    Hey, Jason Thanks for your helpful blog. could you please help me on a case?
    my data includes a fixed size of input as (1, 16, 2) . but output is different in number of timesteps. i mean that one may be like (1,2,2) or other may be (1, 20, 2). i thought to use Encoder-Decoder format. but the problem is determining dimension of “repeatVector()”. how should i do that?
    is it possible to adjust its size for each input?

    • Jason BrownleeApril 6, 2020 at 6:04 am#

      Perhaps try padding all output sequences to the same length and use an encoder-decoder model to that length.

  178. Abhishek NeemaApril 7, 2020 at 7:41 am#

    Sir please can you explain
    Why in multiple input series the input shape is (3,2) while in multiple parallel series it is (3,3)?

  179. Nuwan MadhusankaApril 7, 2020 at 12:56 pm#

    what about EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau functions with lstm. And also i want to update my model with receiving data. i mean i want to train my model after every new data. how can i do it.

  180. 郑锋淇April 8, 2020 at 5:08 pm#

    Don’t you need to test whether the data fits?

  181. DanApril 9, 2020 at 12:35 am#

    Hey Jason, very well written!

    I have a question on your 1DConv LSTM network below:

    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))

    I’m wondering what the intuition behind applying a convolution with a 1D kernel on a sequence of data is? What does this involve – is this equivalent to taking a single value as a feature, to represent the input sequence?

    Thanks for this resource!

    • Jason BrownleeApril 9, 2020 at 8:05 am#

      Thanks.

      It attempts to extract patterns from the sequence.

      • danielApril 9, 2020 at 6:57 pm#

        Will it not just be applying the same constant filter across the whole sequence equally, transforming it by some constant?

        • Jason BrownleeApril 10, 2020 at 8:26 am#

          Each filter will extract different patterns from the sequence – in an analogy to filters extracting patterns from an image.

  182. RicardoApril 12, 2020 at 10:17 am#

    Hi Jason,

    What are the limits of LSTM models on multistep prediction length?, like if we have N samples and M features and we are predicting K future samples of a 1-D variable. Is there a way to relate N, M and K? or to have a quick rule of thumb on how large K can go before it doesn’t make sense anymore?

    Thanks!

    • Jason BrownleeApril 12, 2020 at 1:15 pm#

      The further you predict into the future, the more errors will compound.

      Harder problems are more challenging to forecast.

      That is about as general as we can go – you will need test specific models on specific datasets to learn more.

  183. younesApril 16, 2020 at 8:42 pm#

    Hi Jason,
    is there a method to choose the best n-steps-in? knowing that I need to make a prediction of 3 days, and I have a data of 1 year (8760 observations).

    • Jason BrownleeApril 17, 2020 at 6:19 am#

      Test different values for your dataset and use the configuration that results in the best average performance.

  184. DkbApril 16, 2020 at 10:24 pm#

    Sir can u write the same code for functional api

  185. Abdül MeralApril 17, 2020 at 4:40 am#

    thank you Dr.Jason,
    it is very helpful as always.

    i applied for my dataset
    https://www.kaggle.com/abdulmeral/rnn-4-models-for-lstm

  186. esyraq ekramApril 18, 2020 at 1:18 am#

    omg omg omg omg omg, I just graduate last year and i did my internship. there i learn the great wonders of machine learning. for 1 year i have look at so many tutorial saying a this and that blah blah blah then it takes me a couple of hours to try to run that nonsense. Then i saw this, omg I cant stop saying that. this is what i want. this is it, i just want a simple code that i can run my self, i don’t need the million lines of explanation. i just want to know what works. you sir are my hero. Thank you so much from the bottom of my heart, i really feel that i don’t deserve this kindness of free usable knowledge. Thank you Dr Jason Brownlee my hero.

  187. Jung Hwan KimApril 29, 2020 at 4:59 pm#

    Professor Brownlee , What makes you put relu activation function on LSTM? When I tested my own project, the loss value increased astronomically (e.g. Loss: 2382585115.4067 Acc: 0.23)
    When I removed relu function on LSTM. It run as charm. Could you explain it more about this topic?

    • Jason BrownleeApril 30, 2020 at 6:36 am#

      I find relu works well in lstm when we don’t scale inputs.

      If it doesn’t work well for your data, don’t use it. Find what works best on your project.

  188. H. JonerMay 4, 2020 at 3:37 am#

    Hi professor Brownlee,

    Thank you for this excelent work.

    I’m thinking why when i use “Multiple Input Multi-Step Output” and select just 1 n_steps_out i don’t get the same result has i just make a simple Multiple Input Series predicting the next output?
    Shouldn’t get the same result?

    Thank you,

    • Jason BrownleeMay 4, 2020 at 6:26 am#

      The models and data are very small in these cases, they are to show you how to use them, not actually solve the tiny prediction tasks.

  189. Ehsan AmeriMay 9, 2020 at 8:30 pm#

    Hello everyone.
    Thanks to Mr.Jason Brownlee

    I reviewed some examples of the site (airplane passengers and shampoo) and I am totally confused the concept of timesteps with features.

    here we assume that data is like:
    X,y
    10, 20, 3040
    20, 30, 4050
    30, 40, 5060

    and then it is concluded that the timesteps is 3 and features is 1.
    but in shampoo sales prediction we always change the data shape into
    X = X.reshape(X.shape[0], 1, X.shape[1])
    no matter how many lags we took in the model. it means we assume the timesteps to be 1 and features to be equal the number of lags.

    I ll appreciated if anyone can help me understand those concepts.

  190. Juan PabloMay 15, 2020 at 8:49 am#

    Hi Jason, thanks for so nice article.
    I have a large set of number sequences, labeled as “good” or “bad”. I need to build a model, so given a new sequence, it can classify as “good” or “bad” based on training. I’m not sure what model to use, because I need to classify, not to predict next value.
    It’s like classifying dogs and cats from pictures, but instead of pictures I have sequence of numbers where the order matters.
    Thank you!

  191. Hrishikesh BawaneMay 16, 2020 at 6:29 am#

    Thank you so much for this. Preparing data is always a great task. I have a chat dataset which I want to use to create a chatbot. How should I prepare data for the encoder-decoder model?

  192. Suraj BhatiaMay 20, 2020 at 6:46 am#

    Hi Jason, thank you so much for this article. I really liked this and learned many things.
    I have a time series data, I have 60 input data points and I have to predict 1 output at the last layer of LSTM, so basically I want my lstm to be

    first_day_data–>lstm_unit1–>second_day_data–>lstm_unit2–>….60th_day_data–>lstm_unit60–>denseLayer–>output.

    is something like this,
    data is – [1,2,3,4….60] and output only single value for ex. 5.6. How to construct this model using keras ?

  193. Makarand DatarMay 20, 2020 at 1:01 pm#

    Hi Jason,
    If I have a time series with say 600000 time steps as output and 3 time serieses for 3 features all of them also of the same length. Then I form sequences (say 60 consecutive features at a time, first sequence will be 1-60, second will be 2-61 third will be 3-62 and so on) from my data and now I want to split it into training and testing sets.
    If I shuffle the sequences (within a sequence, all the points will still be chronologically sequential) and then split the data into 80 – 20 train test split, is that ok or would it lead to data leakage into testing?

  194. Makarand DatarMay 20, 2020 at 1:02 pm#

    Also, thank you for this and all the other articles!

  195. MatheusMay 23, 2020 at 4:17 am#

    Is there some tutorial for LSTM + Time series in R?

    With best regards

  196. tnuichMay 31, 2020 at 5:16 am#

    Hi! Great job with this website! It is very useful.

    I have a question: Does the order of input train samples matter?
    Example:
    product_id | day_1 | day_2 | day_3 | day_4 | day_5 | day_6
    1 | 10 | 3 | 2 | 5 | 9 | 10
    2 | 11 | 5 | 2 | 4 | 3 | 2
    3 | 14 | 8 | 5 | 0 | 2 | 14
    4 | 10 | 0 | 1 | 5 | 1 | 1
    train dataset:
    [10,3,2,5] -> [9,10] #item 1
    [11,5,2,4] -> [3,2] #item 2
    [14,8,5,0] -> [2,14] #item 3
    [10,0,1,5] -> [1,1] #item4

    (to be more precise I have x products with sales for z days and I want to make each of the products a train sample, but by doing this the days will be repeated for each product is it correct? or should I build a train sample to contain all items?) * I mention that I’ve implemented the sliding window approach to build the dataset for each item

    • Jason BrownleeMay 31, 2020 at 6:32 am#

      Thanks!

      Yes, the order of samples probably matters both in splitting data for train/eval and within the training and test set themselves.

      • tnuichJune 10, 2020 at 4:48 pm#

        At the model.fit the samples are automatically shuffle, so in this case the order of samples(items)[in train] still matter?

  197. tutejaJune 3, 2020 at 5:45 am#

    hi Jason,
    Could you provide your opinion on this usecase – I am working on a multivariate, multi-step time series problem to forecast sales for each of the cities. I understand from your tutorials how to use LSTM with vector output on such a problem but how do I handle the forecasting by cities? one way I read is to build separate models for each of the cities and then model concatenate at the end. what are your thoughts? do you have a post on it that I can refer to?

    Your blog has been a “go to” solution for all my problems. Thanks for sharing knowledge and keeping it simple!

  198. Gopikrishna K SJune 8, 2020 at 3:21 am#

    Thanks a lot for the article, can you please explain briefly how to do the same in Java using Deeplearning4j library

  199. AdidevaJune 10, 2020 at 4:04 am#

    Hey Jason

    Thanks for the wonderful article. Can you help me with the data reshaping for the Multiple Parallel Series for a CNN LSTM model? It would be great if you could provide a python function fro the same. As a beginner, it is a bit tricky to understand the data shapes needed for the different models. Thanks

  200. OnurJune 14, 2020 at 8:43 pm#

    Hi Jason ,

    What should we do when using string data as input ?

    I get the error that string data could not be converted to float type.

    how can i solve this problem?

  201. Higo Felipe PiresJune 16, 2020 at 4:31 pm#

    Hi, Jason. Thank you for your helpful blog and post.

    I’m doing a project to predict COVID-19 growth in countries/regions. My plan is to use data of a handful of chosen countries in training and do the prediction with only one country (dataset:https://github.com/datasets/covid-19/blob/master/data/time-series-19-covid-combined.csv). Is this possible with the knowledge exposed in the post? If yes, which type of time series I’ll have to apply? Univariate? Multivariate? Multi-step?

    Best regards,

    Higo

    • Higo Felipe Silva PiresJune 17, 2020 at 5:45 am#

      To be a little more specific:

      I wanna use the “Confirmed”, “Recovered” and “Deaths” to predict “Cases” (and eventually “Deaths”).

    • Jason BrownleeJune 17, 2020 at 6:18 am#

      The growth rate can be modelled directly with an exponential function, use the GROWTH() function in excel.

      • Higo Felipe PiresJune 17, 2020 at 7:03 am#

        Jason, thanks for the reply, but I don’t think I expressed myself in the best way.

        What I intend to do on my project is to train an LSTM with data from confirmed cases, recovered patients and deaths from a certain set of countries and try to predict the number of cases in another country. The dataset is that on my first comment.

        For example: training the LSTM with data from Australia, Costa Rica, Greece, Hungary and Israel (from 2020-01-22 to 2020-06-15) and trying to predict the number of cases in Brazil (here i would like to try two approaches: a validation with predictions in the same range 2020-01-22 to 2020-06-15, and another aimed at predicting future cases, beyond the date 2020-06-15).

        Which of the approaches exposed in the article should I use? It is not yet clear to me which would be the best.

        Thanks in advance.

  202. Syed Nazir HussainJune 20, 2020 at 1:05 am#

    Good day sir,

    I would like to know, how can I get the next week’s forecasting results in the vanilla LSTM model. In this site example, we only get single forecast value.
    Can you help me in this senario.?

  203. KyuJune 22, 2020 at 8:26 am#

    Hi Jason,

    Thank you for the valuable post. I have a question regarding multi-step LSTM model. I was trying to apply CNN-LSTM for the multi-step model, but I am a bit confused on reshaping [sample, timesteps] into [sample, subsequences, time steps, features].

    The example code for the stacked LSTM is
    X = X.reshape((X.shape[0], X.shape[1], n_features))

    but in case of CNN-LSTM, we need the number of subsequence for the CNN model. But whenever I input n_seq=2 and run the code
    X = X.reshape((X.shape[0], n_seq, X.shape[1], n_features))

    , the error occurred: ValueError: cannot reshape array of size 15 into shape (5,2,3,3)

    Would you please help me resolve the problem?

    Thank you in advance.

    • Jason BrownleeJune 22, 2020 at 1:27 pm#

      You’re welcome.

      You may need to experiment with different input shapes that are divisible by the number of timesteps in each sample.

  204. mike mirzaJune 24, 2020 at 4:58 am#

    Hi and thank you for great explanation
    I have another situation, lets say I have
    20 33
    30 43
    40 53
    50 63
    60 ?

    so I need to predict a time series but with help of another that I already have, whats the best approach?

  205. busssardJune 25, 2020 at 7:43 am#

    Hi Jason, Thank you so much fr all your work!
    It is a blessing to have such a talented educator as you to teach the practical side of ML.

    I used this tutorial to create a timeforecast for COVID 19.
    I was wondering, can i use different data generators (in my specific practice case: coutries) to learn the behavior?
    In your example of the shampoo sales: Can i use different companies sales numbers to predict?
    Or do i have to fit one net per data generator?

  206. AmelieJuly 6, 2020 at 8:50 am#

    Hello,

    I am testing some forecasting algorithms including the LSTM model. From this, I wanted to seek its complexity in terms of memory and computing time.
    So if you allow me, what complexities for the example of the univriate time series forecasting presented in the example above.

    Thank you so much

    • Jason BrownleeJuly 6, 2020 at 2:05 pm#

      Not sure off hand, sorry. You might have to check the literature if anyone has estimated the big-O for the method.

  207. JulianJuly 10, 2020 at 5:07 pm#

    Hi Jason,

    I have a litlle complictaed, but I think not so rare forecast Problem I’d like to solve.

    Example description:
    Lets say we do klimate-measurements at ground level but also at 15km hight. The last 2 years we started weatherbaloons every day to measure i.e. the pressure at 15 km hight. weather balloons are expensive and not really enviromental friendly, so we like to reduce the amount of weather balloons we need.
    The idea:
    from now on, we could start a weather balloon only every Sunday. The folowing 6 days we would predict the pressure at 15 km hight based on the current measurements of each day at ground level and on the Sundays we could ‘refocus’ our model using the real world measurement.
    This sounds feasable to me, but I do not know where to start.

    my first idea:
    Not really what i want but possible:
    put all the input data for last week together ((Sunday+)? Monday-Sunday) in one feature set and build a standard RNN to predict the pressure in 15km hight for Monday-Saturday. I think this would work, but then i would only get the values for last week. If I would like to have a estimation for today I to not see a way.

    I think there are many processes you could optimise this way. Also in Industry where products of one batch often have rather equal properties. We could drastically reduce the prodcution time if we predict kalibration Measurements which take a long time to perform based on rather simple Measurements.

    Do you have a idea how I could start building such a model? Do you know a good book with a similar example?

    Chears,
    Julian

    • Jason BrownleeJuly 11, 2020 at 6:05 am#

      That sounds like a fun project.

      Generally, I would encourage you to prototype and evaluate each approach you can think of, rather than guess a priori what might be best – use results to guide you.

  208. JoeJuly 13, 2020 at 11:56 pm#

    Thank you for your insightful work.

    Why does the input shape contain the number of steps :
    model.add(LSTM(50, activation=’relu’, input_shape=(n_steps, n_features)))
    model.add(Dense(1))

    It seems that the actual shape of the input can do the job:
    model.add(LSTM(50, activation=’relu’, input_shape=(X.shape[1], n_features)))

    Thanks again.

    • Jason BrownleeJuly 14, 2020 at 6:27 am#

      You can use either, as long as the model matches the data.

  209. EmmanuelJuly 16, 2020 at 5:27 pm#

    Great tutorial, your work has always been of help to me. I am trying to develop a predictive model for a belt drive. In this case, my time series data is not necessarily for forecasting but the trained model predicts the status of the belt drive based on new time series data. Is LSTM nevertheless optimal or do you have any two to three neural network you can recommend in this case?

    • Jason BrownleeJuly 17, 2020 at 6:02 am#

      You’re welcome.

      Good question. I recommend testing a suite of algorithms and algorithm configurations in order to discover what works best for your specific dataset.

  210. MelissaJuly 22, 2020 at 2:58 am#

    Hey Jason, I have very much enjoyed your tutorial! In your opinion, is there a ‘right’ amount to data points (e.g., rows) to feed into an LSTM model? I was thinking to use around 500000 – 1M data points and I was wondering if they are too much and what wold be the limitations between using a small dataset vs a very large one?

    Thanks, love your website!

  211. EnesAugust 3, 2020 at 7:19 am#

    Hello Jason,

    Thanks for your great tutorials, they have been always very helpful.

    I’m interested in the calculation process behind LSTM. I’m familiar with all formulas which are used in LSTM but I’m not sure what is the input at each calculation step in Vanilla LSTM example.

    For example, let suppose that the input time series is [30, 40, 50]

    So, at the first step, using C_{0} (cell memory), H_{0} (cell output) and number 30 (from time series above), we calculate C_{1} and H_{1}

    Next, using C_{1}, H_{1} and 40 are calculated C_{2} and H_{2} and so on. Right?

    I’m a little confused because in sentence time series, each word can be represented as a one-hot vector and in that example, the sentence would be time series of one-hot vectors and at each calculation step, the input in the formula would be one one-hot vector.

    Regards, Enes

  212. JinhuiAugust 3, 2020 at 8:05 pm#

    Hi, Jason, thank you so much for your great tutorials.

    I am using multiple-variables multiple-steps encoder-decoder LSTM. In my case, the input steps, output steps, and n_features are 150, 15, and 11, respectively. But I have a really large number of timesteps (~100,000).

    So the input [100000, 150, 11] and output [100000,15, 11] are used to train. I set the epochs to 50 and got the model after 4h’s training. But I find that all the prediction result of this model keeps constant, i.e. [0, 15, 11], [1, 15, 11], [2, 15, 11], … are the same.

    I will be grateful if you could give me some possible reasons that I should check.

    Thank you!

  213. GabAugust 7, 2020 at 4:42 am#

    hi jason, great article!!!

    I have a dataset with 3 years of historical precipitation and radiation data.

    Which of the above models would be more logical to use so that I could predict both variables at the same time?

    Is this enough data for a forecast?

    How would I predict the next 30 days of the month from the last dataset date?

    Sorry for so many questions!

  214. Sumedha Sandip BordeAugust 8, 2020 at 6:31 pm#

    Hello
    I have an EEG(brain signal) dataset which i want to use for classification. .64 electrodes are attached to every subject(patient) and 5012 samples are recorded for every electrode. this way every subject has 64 series of 5012 samples and one class label for each subject. likewise there are 108 such subjects.
    Can you suggest the right deep learning method that can be used for classification?

  215. MikeAugust 17, 2020 at 6:42 pm#

    I have a question about builind a test harness for testing LSTMs vs different other models.

    My data is structured as follows:

    Input: Information on weather, construction works, accidents in a road network
    Output: cars passing a counter

    Accidents that happened in the morning would affect traffic in the afternoon and traffic patterns that developed in the morning due to these accidents will as well. Hence I thought an LSTM could help. But I want to test against simpler models.

    I would imagine model performance varies over the course of the day so my performance measure would be a graph showing the errors of the model over the course of the day as a distribution as the test set would include multiple days.

    Where I am stuck is the training part: I selected a few characteristic days over the past years that I want to pass to the model. I assume that no effects spill over from one day to the next as there’s almost no traffic at night. So in effect my train data set consists of a number of days that shall be taken individually. That way I don’t have to pass years of data but can select typical days and only train on these. How do I pass these to the model and avoid at the same time that the “memory” takes info from previous days into consideration?

    Should I just use one model.fit(X, y) where I add a dummy variable to the X representing the day? that doesn’t seem like good practice to me. If I do not point out the day specifically the model may think that the state of the neural network from the day before would affect the following day.

    Or fit the model multiple times, e. g.

    for day in sample_days:
    model.test(X_day, y_day)

    • MikeAugust 17, 2020 at 6:43 pm#

      Sorry, mistake in the last code snippet. That would have to be:

      for day in sample_days:
      model.fit(X_day, y_day)

    • Jason BrownleeAugust 18, 2020 at 6:00 am#

      Perhaps you can use all prior data up to the day you want to test as training, then test on the hold out day. Repeat for each day you want to evaluate.

  216. emmanuelAugust 19, 2020 at 10:58 pm#

    Hello Jason, thank you for this tutorial which is very useful. I’m working on panel data right now, i.e. I observe certain variables on several individuals at different times. I have a dataset of 719 individuals and 11 variables observed daily over 10 years (2010 to 2019).

    Can we apply an LSTM model on these data?
    If yes, how to prepare the data (reshape).

    Thank you.

    • Jason BrownleeAugust 20, 2020 at 6:42 am#

      LSTM might be appropriate if each subject is a time series and you want to learn across subjects.

      This will help you prepare the data:
      https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-and-features-for-lstm-input

      • GuantaJanuary 19, 2023 at 5:28 am#

        Thank you Jason, Emmanuel. I read the link on data preperation – very useful. I have a question of clarification:

        I have panel data on 200 different companies, each company belongs to a different sector of which there are 12 of these different sectors labelled numerically as 1-12.

        For each company there are 8 different pieces of price information such as price, market capitalisation, volume, and so forth.

        I then have a column of of future company stock price which is 10 days ahead. My aim is to predict this column.

        The date range is from 2010 – 2012. Weekly, 104 dates for all 200 companies.

        My understanding is that this means 200 samples, 104 timesteps, 9 features including the 10 day ahead stock price.

        Would this mean I need to train 200 different models? How would you go about this problem if you were given this dataset?

        Sorry if this is a daft question. I am new to ML.

  217. Malik ElamAugust 23, 2020 at 8:53 pm#

    Hi Jason,

    Your answers were always inspiring for me. I am always thankful for that.

    I have a question please.

    I’m working on a stock price forecasting problem.

    Assuming (t-current time, t+1 -future time), I prepared my data set as follows: Xt -> Yt+1, so that data links current feature inputs with future change in the price.

    As I understand, RNNs try to map Yt -> Yt+1. In my case I cannot include (Yt+1, Yt+2,…) in the training set as upon using the trained model for forecasting; these will be unknown future values that cannot be fed into the model.

    On the other hand, using a data set of Xt -> Yt does not hold the core information: the future price change of the stock, so as to be able to forecast it.

    What would be your advice?

    How can I make use of say: Xt-n, … ,Xt, Yt-n,…,Yt to forecast Yt+1 ?

  218. AsishAugust 25, 2020 at 4:29 am#

    Hi Jason, I have time series eye data like diameter, number of blinks, duration of fixation and each features has different threshold like diameter is more than 3.5 means high cognitive load for eyes. Which LSTM can I use for this dataset to measure cognitive load? Or any other ML will fit for this problem?

    • Jason BrownleeAugust 25, 2020 at 6:44 am#

      I recommend testing a suite of different models and model configurations, not list lstms, in order to discover what works best for your specific dataset.

      • AsishAugust 28, 2020 at 6:27 am#

        Hi Jason,
        Thanks for your suggestions.
        I don’t have ground truth data. I’m recording data using device and I’m thinking to ask user to label data for the last 2/3 min recording data. But it has downside to label many rows with same label. Is there any way to generate ground truth data?

        Thanks

        • Jason BrownleeAugust 28, 2020 at 6:58 am#

          You can take each candidate answer as a separate row, or try consolidating each row using the mode or mean estimate.

  219. Simon PERROTTAugust 25, 2020 at 6:52 pm#

    Thank you Jason,

    I love your articles so much I’ve bought several of your books which I find excellent.

    I have a data prep question….

    I’m training an LTSM multi-classification model;
    I find that the classes in my training set (training data is chronologically before the val & test data) are very unbalanced.
    I’m particularly interested in the minority classes (their accurate prediction is more important to me).
    Given the dependent nature of timeseries observations and how I’m training in batches with each batch maintaining state (even in the stateless LSTM)…
    Am I correct in saying that I cannot upsample or downsample the training data to balance the classes in the training dataset? (because either omitting or adding any data points, in this case there’s a datapoint for every day, would mess up the timeseries in a batch).

    Do you have any advice for how I can balance out my training dataset?

    Appreciate your insight,
    Thanks,
    Simon

    • Jason BrownleeAugust 26, 2020 at 6:48 am#

      Thank you deeply Simon!

      Great question.

      First, select an appropriate metric, not accuracy.

      Second, try a cost-sensitive LSTM (and other neural nets). Try weights that balance the classes first, later try more agressive over-corrective weights and see if you can do better.

      Finally, try simple duplication of input patterns for the minority class and add gaussian noise to the observations – e.g. a primitive form of random oversampling.

      Let me know how you go.

      • Simon PerrottAugust 26, 2020 at 8:51 pm#

        Brilliant suggestions Jason, thank you!

        I’ll try those out, I’m learning a lot from you and appreciate your explanations

  220. Khin Thida SanAugust 27, 2020 at 2:16 pm#

    Thank so much for your articles, I have been learning deep NN and LSTM, this helps me a lot to understand deep down and to build my own model for time series analysis.

  221. Shinichiro ImotoAugust 31, 2020 at 12:09 pm#

    Hi Jason!
    I always appreciate your blobs. They help me understand the deep idea of DNN with precious sample codes.
    Now, I’m little struggling with CNN + LSTM model for Multivariate – Multistep time series forecasting problem.
    I experimentally added CNN before LSTM layer and your blob made me notice that I needed TimeDistributed wrapper to layers before LSTM layers. To do so, I reshaped input as follows, as well as x validation set.

    [Before adding CNN]
    InputLayer(input_shape=(x_train.shape[1],x_train.shape[2]), batch_size=BATCH_SIZE))
    x_train.shape[1]: time steps (e.g. 600)
    x_train.shape[2]; # of features (e.g. 4, since it’s Multivariate)
    batch_size: I specifed it as 128 or 256 since stateful=True in LSTM arg.

    [Now]
    InputLayer(input_shape=(x_train_multi.shape[1],x_train_multi.shape[2],x_train_multi.shape[3]), batch_size=BATCH_SIZE)
    x_train.shape[1]: subsequences (e.g. 600)
    x_train.shape[2]: time steps (e.g. 1)
    x_train.shape[3]: # of features, No change.,
    batch_size: No change.
    I adjusted the ratio of [1]:[2], then found 600:1 is the best.

    After all, the following is my current model snippet.

    model.add(InputLayer( “AS [Now] ABOVE” ))
    model.add(TimeDistributed(Conv1D(filters=200, kernel_size=3, strides=1, padding=”causal”, activation=”relu”)))
    ## model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(150, stateful=True, return_sequences=True))
    model.add(LSTM(150, stateful=True, return_sequences=False))
    model.add(Dense(150, activation=’relu’))
    model.add(Dense(8))# forcast 8 time points

    “fit()” works normally and the accuracy is almost the same as before I added CNN.
    However, when I enable MaxPooling1D layer after Conv1D, the layer throws ValueError with regards to input shape. When I delete “padding=”causal”” from Conv1D arg, Conv1D also throws the same ValueError too.

    I’m sorry for this long question, but if you see any wrong part especially about the input shape, please give me your comment.
    Thank you.

    • Jason BrownleeAugust 31, 2020 at 1:24 pm#

      Well done!

      Sorry, I don’t have a good off the cuff answer for you, you will need to tune the model for your problem including ensuring the architecture is a good match for the shape of the data flowing through the model. I cannot debug the model for you.

      • Shinichiro ImotoSeptember 1, 2020 at 12:39 pm#

        Thank you for your reply, Jason!
        Your comment cheers me up since I’m the only one who is doing ML in my office.

        I found the cause of this ValueError. It is because the size of Maxpooling1D has to be more than “timesteps”. As I posted, I reshaped the original time step 600 into 600 x 1 ( subsequences x “timesteps” in [samples, subsequences, “timesteps”, features]).
        It has to be 300 x *2*(or more) since the pooling size is *2*.

        But no errors do not mean that it is correct. I hope this would work to fit.

  222. SuweiSeptember 9, 2020 at 10:40 pm#

    Hi, Jason, thank you for your tutorial. I have a question, I want to predict the flood, and my data is not continuous, like for the year, 2019, I have the data of part weeks of 5, 8 month, and for the year 2020, I have data of 3, 6 month. how should I do to make the prediction?

  223. VictorSeptember 16, 2020 at 6:52 pm#

    Sorry Jason, I have read many times and alongside with some questions people asked above. I still don’t understand what’s the difference of using a RepeatVector comparing to LSTM with return_sequence = True? Is there any easy way to understand the major difference? Would like to understand when each method would be ideal to use.

    Much appreciated!

    • Jason BrownleeSeptember 17, 2020 at 6:43 am#

      Repeat vector uses the same single output vector from the encoder in the creation of each output step by the decoder.

      Return sequences is the output of each input time step from the encoder.

  224. RudigerSeptember 18, 2020 at 6:28 am#

    Hi Jason,
    Thank you for this wonderful post!
    I have tried out multistep your example “Vector Output Model” with exactly the same numbers, same code. Some of the important data:

    raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
    n_steps_in, n_steps_out = 3, 2
    x_input = array([70, 80, 90])

    print(yhat)
    [[124.500435 137.70433 ]]

    Normally yhat should be close to 100 and 110. Do you have an explanation what is happening or possibly going wrong?

  225. RudigerSeptember 22, 2020 at 6:48 am#

    I am going back to your multi-step LSTM example.
    You have the following parameters in the example:
    raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
    n_steps_in, n_steps_out = 3, 2
    I am wondering, how many multi-steps could be predicted maximally?
    Imagine that I have a serie of 100 timepoints. What would a the max. reliable multi-step forecast and the most optimal split of X and y?

  226. ArminOctober 6, 2020 at 6:42 am#

    Hello Jason,
    thank you very much for your great work. Please let me ask you two questions:

    a) can I train (fit) a LSTM with a series with e.g. timestep =5, but predict the network with data with timestep = 1?

    b) is there a relationship between quantity of timestep and quantity of hidden layers or neurons per layer?

    Thanks
    BR
    Armin

    • Jason BrownleeOctober 6, 2020 at 7:02 am#

      I don’t see why not. You might have to re-define the model input layer after it is fit.

      Yes, it varies for dataset and models. Run a sensitivity analysis to see how performance varies with model capacity in your case.

  227. ArminOctober 6, 2020 at 7:39 am#

    Thank you very much and greetings from Bavaria.

  228. jeanOctober 8, 2020 at 6:30 am#

    Thanks for the algorithms. I have a question about how optmize the LTSM hyperparameters. Is there some algorithm that do this?

    • Jason BrownleeOctober 8, 2020 at 8:36 am#

      Yes, a grid search or a random search are a good start.

  229. ManojOctober 20, 2020 at 9:28 pm#

    I’ve difficulty in understanding LSTM input shape. For example. I’ve 50 videos out of these 25 are categorized as Awake (0) and 25 as Drowsy (1). I preprocessed them to extract Eye Aspect Ratio and Mouth Aspect Ratio as features every second.

    Now my data has ( VideoFileName, Time Series, EAR, MAR, Label )

    Video1 1 0.30 0.25 0
    Video1 2 0.31 0.27 0
    Video1 3 0.35 0.25 0
    Video2 1 0.30 0.25 1
    Video2 2 0.27 0.28 1
    Video2 3 0.31 0.29 1
    Video2 4 0.33 0.30 1

    I extracted above data from first 3 and 4 seconds of two videos respectively as the length of videos may be different.

    I’ve a very basic question here. How should I feed this data to LSTM? Any code example would be fine. I know input shape should be [Batch Size, Time Step, Features] but I’m confused how to feed this to LSTM should I feed each video’s data in a loop.

    Please help me to clear my doubt.

  230. AdrienOctober 23, 2020 at 1:21 am#

    Hello Jason,

    Great article!

    just a quick question about the split sequence method for Multiple Input Multi-Step Output.

    On this line, you select only the first two features in X and the last feature in Y.

    seq_x, seq_y = sequences [i: end_ix,: -1], sequences [end_ix-1: out_end_ix, -1]

    Why not include the 3 features in X?

    That is to say, use the 3 features to predict only the 3rd.

    Would that be a problem?

    Thank you

  231. Xu JiOctober 29, 2020 at 1:13 pm#

    Thanks you very much for this. I learned a lot from you different post, especially LSTM. Just wondering if you have recommendation using LSTM for anomaly detection? Thank you!

  232. yjkNovember 3, 2020 at 12:33 am#

    Thanks for sharing this! I learned really well about LSTM models, and I am wondering why you used a Vanilla LSTM on ‘Multiple Input Series’ part, and why I cant use other models such as Stacked LSTM, Bidirectional LSTM, or ConvLSTM. Is it because of the dimensional of input?

    • Jason BrownleeNovember 3, 2020 at 6:55 am#

      In some cases yes, on other cases because one model performs better than the others for a given dataset.

  233. EugeneNovember 15, 2020 at 1:25 pm#

    Thanks for sharing, how would you model a regression problem to predict at a various arbitrary time steps? For example: predicting a inflection point where we are interested in where inflection occur and when is the time step it will happen. For example: The next predicted inflection point at 12345 occur at t+136.

    Will it be the same multi time step LSTM model above, or is it a completely different problem, and how can we approach to this?

    • Jason BrownleeNovember 16, 2020 at 6:24 am#

      There are many ways to approach the problem, perhaps prototype a few and discover what works well/best for your dataset.

      e.g. time series classification – is an event expected to occur in the next interval.
      or multi-step forecast and use an if-statement to post-process the predictions.
      etc.

  234. ArslanNovember 15, 2020 at 11:22 pm#

    First, thanks for this great article, I just found it on Linkedin.

    Currently I am working on a project where I want to predict how many pieces of a material should be ordered for the next three month.
    I have purchasing data of 20,000 materials (different time series) on monthly base which correlate to eachother ín case of seasonality but have very short time series (50-80 data points).
    For example:
    date | mat | amount | workload |
    2020-08 | A | 20.0 | 0.8

    Does it make sense to build a LSTM model for this kind of problem?

    As a regressor I could implement the months (for seasonality) and also the workload for this month.
    I could train the model with all time series and 80-90% of data points. The other 10-20% for test set)

    Maybe another model is better? (S)ARIMA is only a univariate approach, so I can’t implement the workload.

    Thank you!

    • Jason BrownleeNovember 16, 2020 at 6:26 am#

      Good question, I recommend evaluating a suite of different algorithms/configs and discover what works well or best for your dataset.

      This framework may help:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/

      • ArslanDecember 3, 2020 at 3:46 am#

        Thank you for your response!
        Do you have any approach how to handle time series data which is effected by covid-19.

        For example: I have timeseries from 2017 to 2020 on monthly base.
        For two months (April and March, 2020) the production went down, so that I have small values for this two months and also some high values in the following months, because production went up again (in total there are outliers over 4-6 months).

        I have tried several approaches, but these outliers of covid-19 makes it hard to get good results in case of forecasting (training/testing). Also, like I mentioned before there are thousands of timeseries that are effected (differently but of course with some correlation)

        Do you have any advices?

  235. KonstantinosNovember 25, 2020 at 7:42 am#

    Is the model trained only with training data or for every prediction the actual data of the prediction is added to trainig data and the model retrained?

    • Jason BrownleeNovember 25, 2020 at 7:53 am#

      You can re-train the model as new observations become available if you like – both in walk forward validation and when the model is deployed.

  236. TiagoNovember 29, 2020 at 10:19 pm#

    Hi Jason, amazing article covering many of the shapes of the LSTM!

    I have one question:

    I am using PyTorch instead of Keras and would like to reproduce your vanilla LSTM. Could you please explain more about what is the ‘input’ parameter of the LSTM?

    Thanks!

  237. sarahDecember 7, 2020 at 6:39 am#

    Hi jason,

    I am trying to build and LSTM for a time series data, unfortunately i am unable to reshape a 4D input data into 3D input data to fit my LSTM model. do you know how is this possible?

  238. Ítalo RomaniDecember 8, 2020 at 11:34 am#

    Hi Jason, I am trying to develop a custom loss function for my LSTM mode which was based on yours, like:

    model = Sequential()
    model.add(LSTM(neurons, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(neurons, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(1)))
    model.compile(optimizer=’adam’, loss=my_loss,run_eagerly=True)

    The custom loss my_loss function receives from Keras the parameters (y_true,y_pred), in my understanding they should have shape like input_shape. However, regardless of the input shape I use, y_true comes with shape (32,1,1), even if I remove all layers and leave a bare Sequential model.

    I am trying to understand the logic of this, googled around but so far nothing helped me to explain this.

  239. Ítalo RomaniDecember 8, 2020 at 12:16 pm#

    Actually I made some confusion in this question. Trying to explain again: imagine that I am trying to predict a series with a single (1-dimensional) value each time step, so y_true, y_pred should have shape (total_time_steps,1). However I always get shape (32,1,1), with values that have no remembrance to the actual values.

    • Jason BrownleeDecember 8, 2020 at 1:32 pm#

      Sorry, I don’t understand what you’re asking exactly. Perhaps you can rephrase.

      • Italo RomaniDecember 8, 2020 at 9:14 pm#

        Hi Jason, thanks so much for the reply. I did a further search and found the answers to my problem. First, y_true, y_pred come with sizes defined by batch_size; second, by default, their values come shuffled, so I have to use shuffle=False. Third, and most important, I don’t know if what I am trying to do is even possible with Keras because all the operations in the loss function have to use tensor operations, otherwise the loss function cannot provide gradients to the optimiser. My intended loss function goes sequentially over each element of y_true and y_pread, compares each pair and updates an accumulation function not definable by custom algebraic/symbolic functions. It’s a bit large and too specific to share here, but if you are interested I can share the details of what I am trying to do.

        • Italo RomaniDecember 8, 2020 at 9:23 pm#

          Perhaps there’s an optimiser in Keras that does not require gradients, but not that I know of

        • Jason BrownleeDecember 9, 2020 at 6:18 am#

          Well done!

  240. JamDecember 9, 2020 at 11:48 am#

    Hi Jason, I learn a lot from your articles. Could you please help on a network. I have an input of presumably (4, 10, 2). [(10,2) are time steps and features, respectively.] There are a lot of data in such a shape and for each one I propose to train a lstm and then make a Convolution layer among them. So by an Conv1D(1), I expect the output (3, 10, 2).
    please correct me if I am wrong. I reshaped data into (1, 4,10,2). Then I used TimeDistributed wrapper for prediction. but then I am not able to make a convolution on shape[0] (I mean 4). what is get is convolution on the shape[2] (I mean 2). can you help me how to arrange data for the network or whether my network is true or not?

    • Jason BrownleeDecember 9, 2020 at 1:26 pm#

      Typically you would use a CNN than an LSTM, not the other way around.

      I have not tried LSTM-CNN, but I expect it would be challenging and you may need to debug the model yourself.

  241. John WhiteDecember 25, 2020 at 2:42 pm#

    Hello Jason!

    First off, thanks for being here for my machine learning journey! So I have a base scenario to check for understanding:

    Context: I have supervised binary classification dataset on weather temperatures with 4 features. Target variable at time t is 0 or 1. 0 is if temperature at t+30 is down, 1 if up.

    Framing the Problem for LSTM: Say timesteps is 60. So we take the previous 60 timesteps of data to predict 0 or 1 for t+1. In doing so, we can predict if the weather temperature is up or down in 30 days. Input shape would be (60, 4). I would have to chunk the training dataset and reshape it to be compatible with the (60, 4) input shape.

    Is my understanding correct? Thank you!

  242. MaryDecember 27, 2020 at 5:24 am#

    Dear Jason,
    The tutorial was really useful as ever is.
    But I have not seen in your tutorials that you applied any **Bilstm** network for regression to predict ** Multivariate and Multi-step ahead** data.

    I have created a Bilstm to forecast 9 features in terms of 3-time steps ahead.

    model = Sequential()
    model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(Dropout(0.5))
    model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
    model.add(Dense(3))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    I am really eager to know whether the given model is correct or not.
    Moreover, the output prediction is not well, so I would like to know the answer to some questions.

    1- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??

    2- what is the best model for regression in the case of ** Multivariate and Multi-step ahead**??

    3- is the given model created correctly or not?

    I am really sorry for writing too much, but I am really looking forward to get anwer.

    Best
    Mary

    • Jason BrownleeDecember 27, 2020 at 6:14 am#

      No, typically bidirectional LSTMs are not used in the encoder-decoder architecture, but I don’t see any reason why they couldn’t be used.

      We cannot know the best model for a given dataset, the job of a machine learning practitioner is to use careful experiments and discvoer what works well or best.

  243. MaryDecember 27, 2020 at 7:16 am#

    Dear Jason,
    I really appreciate your quick reply.

    But I did not get the answer o this question:

    1- Is the below architecture correct logically?

    ( I am a beginner in using Bilstm in regression, so I am not sure whether I made the layers correctly or not)?

    model = Sequential()
    model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(Dropout(0.5))
    model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
    model.add(Dense(3))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

    I have created a Bilstm to forecast 9 features in terms of 3-time steps ahead.

    2- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??

    I am really looking forward to see your clear answer, as I did not get the mean of your previous answer.

    Best

    Mary

    • Jason BrownleeDecember 27, 2020 at 9:25 am#

      I don’t have the capacity to review and comment on your model architecture:
      https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

      LSTMs are used for sequence prediction, not regression. In numeric sequence prediction, bidirectional are rarely used – but if it gives the best results for your dataset, then use it.

      • MaryDecember 28, 2020 at 4:48 am#

        Dear Jason,
        I am grateful for your reply.
        As you mentioned “LSTMs are used for sequence prediction, not regression”,

        so do you have a tutorial post to introduce the best techniques for numeric sequence regression?

        As I cannot differentiate between sequence prediction and sequence regression.

        I want to predict 9 features in terms of 3-time steps ahead.

        would you please introduce me to some useful methods?

        Best
        Mary

        • Jason BrownleeDecember 28, 2020 at 6:04 am#

          “regression” is a row of data without sequence.

          “sequence prediction” or “sequence regression” are the same kind of thing. The above examples fall into this.

          You cannot accurately refer to “sequence prediction” or “sequence regression” as “regression” as an LSTM cannot be used for the latter, but can be used for the former.

          I hope that is clearer.

          • MaryDecember 29, 2020 at 10:37 pm#

            Dear Jason,
            Thank you a lot for the time you spent answering that much clearly.

            Best

            Mary

          • Jason BrownleeDecember 30, 2020 at 6:37 am#

            You’re welcome.

  244. DevendraJanuary 4, 2021 at 1:30 am#

    i want to make an earthquake prediction using rnn (LSTM).I am getting difficulty to code . can you please help me

    • Jason BrownleeJanuary 4, 2021 at 6:09 am#

      Is there a specific problem you are having that I can perhaps address?

      • DevendraJanuary 4, 2021 at 3:25 pm#

        can i get it from those code you have provided here? If yes then which LSTM should i follow?

        • Jason BrownleeJanuary 5, 2021 at 6:16 am#

          I have no examples of “earthquake prediction”.

          Perhaps you can start with a model listed above and adapt it for your specific dataset.

  245. Bensayah AbdallahJanuary 8, 2021 at 5:25 am#

    Many thanks

    This is my situation: I have several companies. According to 20 measurable features varying from year to year. For ten years, we have a binary classification Fail/Succes.
    My question is what model adequate for this problem to train the machine to predict a probable success or failure of a given company with its successive given features?

    Many thanks

  246. HninJanuary 9, 2021 at 8:10 pm#

    Thanks Jason for the insights. I have one question regarding Convo_LSTM. It can extract the spatio-temporal features. How can we input the spatio data? Do we have the examples code of it?

    • Jason BrownleeJanuary 10, 2021 at 5:39 am#

      Yes, convlstm is designed for patio-temporal data.

      It takes a sequence of images as input.

      • HninJanuary 13, 2021 at 6:29 pm#

        Thanks a lot for your kind reply.

        Do you mean to extract spatio temporal feature , we need to input a sequence of images as input rather than a sequence of values?

  247. HyundongJanuary 11, 2021 at 7:44 pm#

    Thank you for your great tutorial. I learned through this article but I have a question about the number of samples.
    If [10, 20, 30, 40, 50, 60, 70, 80, 90] is one sample, I have about 10,000 samples that each sample is independent and has the same characteristics.
    For instance, [10.01, 20, 30.035, 40.102, 50.1, 60, 70.364, 80.112, 90.623], [10.541, 20.983, 30.097, 40.152, 50.2, 60.942, 70.73, 80, 90.53], [10.543, 20.486, 30.897, 40, 50.766, 60.519, 70.132, 80.11, 90.445], …
    In this case, I am wondering if there is a way to apply all 10,000 samples to training the model.

    Thank you.

  248. YilmaJanuary 12, 2021 at 12:50 am#

    Dear Jason,
    I am not familiar with python. Do you have this tutorial in R?
    Best
    Yilma

  249. AlexJanuary 28, 2021 at 2:06 am#

    Hi Jason

    regarding the case ‘Multiple Parallel Series’… my problem has 150k time series, and for each I need to predict the future value.
    I guess this means that I will have 150k features.
    So the input array for my LSTM NN will have dimensions [n_samples, n_steps, 150k].
    The size of the array is too large! I get the error:
    ‘Unable to allocate 606. MiB for an array with shape (365, 3, 150000) and data type float32’.

    What should I do? is this the right way to approach the problem?

    Many thanks!

  250. AlexJanuary 29, 2021 at 2:21 am#

    Hi Jason

    I want to train my LSTM NN with random samples taken from a timeseries.
    Should I normalize the whole series or each sample individually?

    Thanks

  251. Raheel AnjumFebruary 8, 2021 at 1:51 pm#

    Hi Jason,
    I am intending to do my research work in electricity prices forecasting. I have electricity price data of 6 years from 2012 to 2017. And I need to forecast the value for 2017 using NEURAL NETWORK AUTO REGRESSIVE in R.The data set ranges from January 1st 2012 to December 31th 2017 (52608 observations, covering 2192 days). Each day of the data set comprises 24 observations, where each observation corresponds to a load period. For modeling and forecasting purposes,the data set is further divided into two sets:January 1st 2012 to December 31th 2016 (43848 observations, covering 1827 days) for identification and estimation of the models, and January 1st 2017 to December 31th 2017 (8760 observations, covering 365 days) for evaluating one-day-ahead out-of-sample forecasting accuracy of the models. I need your help. I’ve tried searching but couldn’t find a specific code of one day ahead forecasting with NN-AR .Can you kindly send me the code of neural network autoregression to make forecasts for one-day-ahead out-of-sample forecasting for the complete year 2017. I will be highly obliged for this favor. Thank you and have a nice day.

  252. NigelFebruary 14, 2021 at 12:32 pm#

    Hi Jason,

    Great book! Wish I understood more, but I’m on my way.

    About the tutorial. You’ve stacked the output sequence with the input sequence, and I’m trying to understand how it differentiates x from y.

    Let’s say, I have 10 input_seq and 1 out_seq how would you approach this?

    I tried it myself with some random numbers, but the code predicts all values along the x-axis, which takes forever with LSTM. Should I stack the output)seq at the end of the input_seq’s.

    Thanks in advance!

    • Jason BrownleeFebruary 14, 2021 at 2:17 pm#

      Thanks.

      They are past observations of the target that we believe will help to predict future values of the target.

  253. RobetFebruary 16, 2021 at 8:08 am#

    Hello I am new to machine learning and trying to wrap my head around some of the examples to find the best use cases for each.

    In the section on ‘Multiple Parallel Series’ is this procesessed as multiple paralel univariable predictions or multiple multivariable predictions?

    I am looking for a solution where it is the later. I was considering creating seperate multivariable models for each output but wondering if the parallel series might be the better way to go.

    • Jason BrownleeFebruary 16, 2021 at 10:04 am#

      Multiple parallel univariate time series, which is a multivariate input time series.

      Perhaps experiment with a few of the approaches and see what is a good fit for your data.

  254. Gerard ChurchFebruary 16, 2021 at 9:54 am#

    Hi Jason,

    Really great and informative article. My first time working with LSTMs but the input format is really clear and has been easy to understand.

    I am trying to adapt this to an a problem I am trying to solve. I am trying to predict net income from a financial income statement from 31 balance sheet and income statement items. I am using 3 years of quarterly data to predict this, thus a time step of 12. For each yhat, my x_train contains 12 lists for each quarter that contains the 31 independent balance sheet/ income statement variables being used to try and predict my yhat.

    Thus due to the fact my y_train has a length of 63, my input data is 63 x 12 x 31. This is stored at a list of arrays, each with 12 lists containing the 31 variables values for each quarter. The LSTM model really doesn’t like this format and gives the error:

    ValueError: Failed to find data adapter that can handle input: ( containing values of types {“”}), ( containing values of types {“”})

    Do you have any advice as to how to format this input into my LSTM? Hope the question is clear and thanks for the help!

  255. Anshuka AnshukaFebruary 24, 2021 at 3:05 pm#

    Hi Jason,

    I have a question regarding Multivariate predictions.

    Say for example I have two sets of multivariate datasets with parallel input series in both.

    How can we use dataset (X) which is multivariate and has parallel input time series , to predict dataset (Y), which again is a multivariate dataset with parallel input series.

    Looking forward to your response.

    • Jason BrownleeFebruary 25, 2021 at 5:23 am#

      The above examples under “Multivariate LSTM Models” can be used as a starting point and adapted directly.

  256. AlirezaFebruary 26, 2021 at 3:01 am#

    Hi Jason,

    Do you have any example for univariate multi-step time series?

    Thanks

    • Jason BrownleeFebruary 26, 2021 at 5:04 am#

      Yes many, you can use the search box at the top of the page.

  257. H.March 2, 2021 at 6:57 am#

    There is an issue with the line


    model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))

    from section ‘Vector Output Model and 'Encoder-Decoder Model'

    since the following exception is thrown


    NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. This error may indicate that you’re trying to pass a Tensor to a NumPy call, which is not supported
    `

    How can this be resolved

  258. MartinMarch 2, 2021 at 5:22 pm#

    Hello Jason,

    Thanks for this brilliant blog post. it has really been helpful to me.

    However, I have got a real-world Spatio-temporal traffic dataset and I reckon that the procedure to model it as a supervised learning problem would be quite different from multivariate time series (as the order of the spatial variables matter).

    As an example: take the Spatio-temporal matrix

    T1 T2 T3 T4 T5 T6
    S1 | 67 | 34 | 24 | 54 | 49 | 67 |
    S2 | 61 | 55 | 23 | 42 | 53 | 78 |
    S3 | 74 | 83 | 55 | 50 | 62 | 68 |
    S4 | 48 | 73 | 78 | 56 | 61 | 78 |
    S5 | 80 | 58 | 67 | 54 | 51 | 89 |

    where the rows represent the spatial identity (the position of the detectors) and the columns represent the time interval for collection of the data.

    In formulating this as a supervised learning problem with 5 time-step per sample and 1 step prediction made at S3, would this be a logical formulation?

    Input:
    T1 T2 T3 T4 T5
    S1 | 67 | 34 | 24 | 54 | 49 |
    S2 | 61 | 55 | 23 | 42 | 53 |
    S3 | 74 | 83 | 55 | 50 | 62 |
    S4 | 48 | 73 | 78 | 56 | 61 |
    S5 | 80 | 58 | 67 | 54 | 51 |

    Output:
    68

    Also, Since I am working with a real Spatio-temporal dataset, do I need to split the samples into subsequences when using the ConvLSTM module?

    If No, for the example above, would this input to the ConvLSTM be correct:
    [no of samples, time-step=5, rows=spatial, columns=temporal, features=1]

    • Jason BrownleeMarch 3, 2021 at 5:26 am#

      It’s hard to be prescriptive, perhaps experiment and see what works/makes sense for your dataset.

  259. Rigveda SenguptaMarch 3, 2021 at 11:25 pm#

    Hi, just a quick question I am working with a multiple multivariate timeseries. Will the structure remain the same as the Multiple Input Series model discussed above?

  260. ZhouMarch 5, 2021 at 11:45 am#

    Hi Jason,
    Thank you for such an informative tutorial.
    But I’m having problems using the convLSTM module for multivariate time series prediction. I hope you can answer this for me, it is really important for me and I would appreciate it.

    My topic is to learn the train dataset to perform outlier detection on the test dataset. If the test set has no outliers, the convLSTM module can predict well. However, when I add outliers, the predictions change and I can’t do outlier detection.I can’t explain it very well.

    Only a simple example can be given.
    Suppose a feature in my training set is [1, 2, 3, 4, 5, 6, 7]
    And the corresponding test set is [8, 9, 10, 11, 11, 11, 14]

    Ideally, the prediction generated by learning the train set would be [8, 9, 10, 11, 12, 13, 14], which is used to prove that there are 2 outliers in my test set.But the real situation is that I get predictions similar to[8, 9, 10, 11, 11.1241, 11.3661, 14].

    Questions:
    1). The data in the prediction set and the test set are too close to each other, so I can’t do outlier detection.
    2). How to use the convLSTM module to perform multi-step prediction for multivariate sequences? Because I guess the reason for the first problem is that I am using the convLSTM module for single-step time series prediction.

    • Jason BrownleeMarch 5, 2021 at 1:38 pm#

      You’re welcome.

      Sorry, I don’t understand your first question sorry. Outlier detection would probably occur prior to modeling as a data prep step.

      You can perform multi-step prediction a few ways – all described above, e.g. vector out for an encoder-decoder model each time step.

  261. ZHAO, WENYUMarch 8, 2021 at 8:47 pm#

    Hi Jason!

    I would like to ask another question. After training a mulitivate lstm model, how do we know if the model is good or not?

  262. AdityaMarch 12, 2021 at 6:35 am#

    Hi Jason,

    I’m currently working on stock price prediction. As of now, I’ve used historical data of the end-of-day ‘Closing’ prices ONLY as univariate sequences. My aim is further improve the model by giving it more than just old ‘closing’ prices. I want to give it open, high and low too. From your article, I could understand that I can achieve this using Multivariate sequences. I have gained so much knowledge from this and I can make my project even better.

    Thanks a lot! I would be really happy if you can give me some tips!

  263. HoomanMarch 19, 2021 at 3:32 am#

    Hello Jason,

    Now I know how to develop a Multivariate Multistep forecasting model for the hourly weather forecasting task.

    But in case we are also given a day ahead “weather guess” dataset, how can I use these guessed values in a model? do you know any tutorial or blog post?

    In fact, we have a history of guesses and a history of actual values.
    Then a day ahead guess is passed to us, and we should make an accurate prediction using the history of this guess entity and the actual values

  264. MIcheleApril 1, 2021 at 1:28 am#

    Hi,

    thank you for this nice tutorial.
    I would like to know how to modify the multivariate multi-step forecasting in order to use keras’ SimpleRNN instead of LSTM.
    In particular, I would like to use Elman RNN. I have read that it can be implemented by connecting one SimpleRNN layer with a TimeDistributed(Dense) layer, but it is not clear to me how to do

    I have tried the following code:

    model = Sequential()
    model.add(SimpleRNN(100, return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(TimeDistributed(Dense(n_steps_out, activation=’tanh’)))
    model.compile(optimizer=’rmsprop’, loss=’mse’)

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    but fit() fails raising the error:

    tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6,3,2] vs. [6,2]

    Thank you in advance

    • Jason BrownleeApril 1, 2021 at 8:19 am#

      You’re welcome.

      Sorry, I don’t have an example, perhaps use a little trial and error and discover how to make the required changes.

  265. GreenApril 1, 2021 at 5:17 am#

    Hello Jason,

    Is it possible to use LSTM without time. Just for coordinates. Input = coordinates, output = value (like temperature). For extrapolation task or interpolation.

  266. MicheleApril 3, 2021 at 12:20 am#

    Actually, I am using your multivariate multi-step example (version with one LSTM layer), just replacing LSTM with SimpleRNN and Dense with RimeDistributed(Dense).

    Apparently, the problem is the shape of the y data structure. I made the following change:

    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    y = y.reshape(y.shape[0],1,y.shape[1]) # <– added this one
    print(X.shape, y.shape)

    Now, the model design, train and test is:

    # define model (Elman RNN)
    model = Sequential()
    model.add(SimpleRNN(100, activation="sigmoid", return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(TimeDistributed(Dense(n_steps_out, activation='tanh')))
    model.compile(optimizer='rmsprop', loss='mse')
    # fit model
    model.fit(X, y, epochs=200, verbose=0)
    # demonstrate prediction
    x_input = array([[70, 75], [80, 85], [90, 95]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    The resulting yhat is:

    [[[1. 1.]
    [1. 1.]
    [1. 1.]]]

    which is not good in shape and values. What am I still missing?

    • Jason BrownleeApril 3, 2021 at 5:34 am#

      Sorry, I have not used “SimpleRNN” and “RimeDistributed”. I don’t know the cause of your problem.

      Perhaps these tips will help:
      https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

      • MicheleApril 3, 2021 at 8:14 am#

        RimeDistribuited was a typo, I actually meant: TimeDistributed

        I was not asking to debug my code, of course.

        Recently, I purchased a couple of your books, which unfortunately do not help me in solving theproblem. I thought you were at least able to provide useful hints – not just a link to the FAQ.

        Nevermind, I will find the solution and publish it for free. 😀

  267. LuApril 7, 2021 at 12:41 am#

    Hi Jason,

    Thanks for the post. It is very helpful.

    I created a LSTM model:

    model = Sequential()
    model.add(LSTM(20, activation=’relu’, return_sequences=True, input_shape=(5,12)))
    model.add(Dense(20, activation=’relu’))
    model.add(Dense(1, activation=’sigmoid’))
    model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

    As you can see the dimention of model output is (1,)

    However after training when I run model prediction, I got multiple output:

    predictions = model.predict_classes(X_test[0].reshape((1, 5, 12)))
    predictions.shape, predictions

    Output:

    ((1, 5, 1),
    array([[[0],
    [0],
    [0],
    [0],
    [0]]]))

  268. SURBHI SINGHApril 13, 2021 at 4:54 am#

    I have a question, if i want to get back the test data used in the model in its original form , so as to plot it against the predicted values with the dates on the x-axis, is there a way to do it ?

  269. TrishalaApril 14, 2021 at 3:42 pm#

    Hello Jason,

    I have a question regarding creating samples. I want to create samples for the Closing price for 60 days window but give labels to them. Using this code

    from numpy import array

    # split a univariate sequence into samples
    def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the sequence
    if out_end_ix > len(sequence):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = df[‘Close’]
    # choose a number of time steps
    n_steps_in, n_steps_out = 60, 60
    # split into samples
    X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
    # summarize the data
    for i in range(len(X)):
    print(X[i], y[i])

    I am able to create X samples but for Y samples I want to give them labels 0 and 1. On the condition

    X_T: T+1, T+2, …, T+60

    Y_T: ==1, if the price increases by 6% before going down 3% within 3 trading days; ==0, otherwise.

    How should I do this ?

  270. Marimuthu SApril 14, 2021 at 5:34 pm#

    Hello Jason, Greetings.

    Hope you are doing well. Thanks for the post. This is very useful. However, I have a doubt.

    How do I choose the optimal time step for my data. or Shall I use ACF or PACF plot to choose the optimal time step? please advise. Thanks in advance

    • Jason BrownleeApril 15, 2021 at 5:24 am#

      Perhaps ACF/PACF plots will help, perhaps grid search, perhaps trial and error.

  271. KerryApril 16, 2021 at 7:08 am#

    Dear Jason,
    I appreciate your instructive blog. I lean a lot from you.
    I am trying to teach supervised to several LSTMs and then make a Max pooling between their hidden states. Can you help me whether there is such an ability in LSTMs embedded or I need to make it available by myself?

    • Jason BrownleeApril 17, 2021 at 6:01 am#

      You may need to write some custom code or a custom layer.

  272. Marimuthu SApril 19, 2021 at 1:27 pm#

    Thanks for the reply, Jason.

  273. BahadirApril 19, 2021 at 11:22 pm#

    Hello Jason, thanks for this great tutorial!
    I have a question and I would be glad if you share your idea.

    I have a dataset of frames obtained from a gameplay video and each frame (row) in the dataset has the following columns (in a simplified manner): time, bitrate_kbps, game stage (0: Exploration, 1: Combat)
    As an example of random 6 adjacent frames:
    2.2, 208, 1
    2.3, 211, 1
    2.5, 215, 1
    2.6, 219, 0
    2.7, 222, 0
    2.9, 221, 1

    My goal is to train a model (e.g. with LSTM) with this time-series data to be able to classify game stages according to the bitrate data. The model should be able to assign the correct game stage labels to the unlabeled time series of frames such: time, bitrate_kbps.
    What kind of approach would be a good way to train such a model? Thanks!

  274. George GApril 23, 2021 at 12:09 am#

    Hi Jason and thanks for your posts!

    In your multivariate multi-step stacked lstm example, If I had:

    n_steps_in, n_steps_out = 3, 2

    and for x_input another one line, so:

    x_input = array([[[70, 75], [80, 85], [90, 95]],
    [[100, 105], [110, 115], [120, 125]]])

    then the output would be:

    yhat = array([[182.84283, 212.43597],
    [247.65134, 288.84436]], dtype=float32)

    Now, let’s say that I have the dates information also
    (so all this refers to data in certain dates by every day step).

    So for the first data which is on 1/4/21

    [[[ 70, 75],
    [ 80, 85],
    [ 90, 95]] the +1 day value is 182.84283 (2/4/21) and the +2 days is 212.43597 (3/4/21) ?

    And for the next set of input which is on 2/4/21

    [[100, 105],
    [110, 115],
    [120, 125]] the +1 day value is 247.65134 (3/4/21) and the +2 days is 288.84436 (4/4/21) ?

    But on 3/4/21 I have two values now!

    Please, if you want to clarify because I am confused!

    Thank you!

      • George GApril 23, 2021 at 5:08 pm#

        Hi Jason,

        So, since I have 2 samples and 3 timesteps:

        1st sample
        ———–
        [[[ 70, 75] -> 1/4/21
        [ 80, 85] -> 2/4/21
        [ 90, 95]] -> 3/4/21

        the output is:
        182.84283 is on 4/4/21 and 212.43597 on 5/4/21 , right?

        2nd sample
        ———-
        [[100, 105] -> 4/4/21
        [110, 115] -> 5/4/21
        [120, 125]] -> 6/4/21

        the output is:
        247.65134 is on 7/4/21 and 288.84436 on 8/4/21, right?

        So,I am predicting for 4,5,7,8 of April?
        Where is the prediction for 6/4 ?

        • Jason BrownleeApril 24, 2021 at 5:17 am#

          You can frame the data any way you want.

          I think it would be better to shift each sample down by one time step, instead of 3, but you can do whatever you think is best for your dataset and model. If you’re not sure, perhaps try a few different approaches and compare results.

          • George GApril 24, 2021 at 5:55 am#

            Ok, but what if I have this frame as above?
            3 steps in and 2 steps out. How to deal with the dates, that’s my problem.

          • Jason BrownleeApril 25, 2021 at 5:12 am#

            My point is you can prepare your data so you have [3,4,5]->[6,7] if you want.

  275. Abraham RodarteApril 26, 2021 at 7:03 pm#

    Do you have any work about Multiple Parallel Input, Multi-Step Output and Multiple Output for Time Series Forecasting?
    The problem I have is that I have 6 features and I want to predict 3 with their respective test and training like the air pollution blog.

  276. MingkaiApril 29, 2021 at 10:29 am#

    Hi Jason,

    Thanks for your post, it was very helpful for me to start LSTM.
    My problem is to predict a time series, say prices over time, and apart from the historic real prices, I also have some forecasted prices from another source, for the next n time intervals, and I want to use them as additional features.

    To test the accuracy of the model, I substitute the forecasted prices with real price. Say I want to predict a price that follows pi: [3 1 4 1 5 9 2 6 7 ..], I use a data input structure look like this:

    X[0,:,:] =
    [[ 3 1 4]
    [ 1 4 1]
    [ 4 1 5]
    [ 1 5 9]]

    Y[0,:] = [5 9]

    X[1,:,:] =

    [[ 1 4 1]
    [ 4 1 5]
    [ 1 5 9]
    [ 5 9 2]]

    Y[1,:] = [9 2]

    and so on,

    As a test, I used a simple single layer LSTM + a dense layer as output.

    model.add(LSTM(10, activation=’relu’, return_sequences=False, input_shape=(4, 3))
    model.add(Dropout(0.1))
    model.add(Dense(2))

    But it seems the current configuration can not figure out there is a relationship between the diagonal element in the input, even the inputs already have the answer. The error is quite large.

    Is there any LSTM or other model structure you see will be helpful?

    Thank you very much!

    MK

  277. VishnuMay 1, 2021 at 6:39 pm#

    Hello,

    I want to know the significance of the number of steps we use.

    In these examples the number of steps used are 3 ? does this mean that every time the LSTM is trained it looks only at the last 3 time steps ?

    Does this mean that if we want the LSTM to look over temporal dependencies over a longer time period we need to increase the number of steps accordingly ?

    I don’t understand this part.

    • Jason BrownleeMay 2, 2021 at 5:29 am#

      The configuration was arbitary. I recommend tuning the problem representation and model for your specific dataset.

  278. emmaMay 2, 2021 at 2:31 am#

    hello jason
    please can you explain the function split_sequence i can’t understand how the function work …
    please # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] those too lines specifically

  279. Atul UpadhyayMay 7, 2021 at 5:04 am#

    Hey Jason, Needed some help with my project.
    I am working on a project to predict future demands.
    Its as univariate forecasting. (only two columns i.e. Date and Demand)

    I have trained my model for the year 2015-2016 (having the data only of both these year), and want to predict for the year 2017 (the next 365 days).

    How can I do this

  280. EvaMay 7, 2021 at 11:31 pm#

    Thanks for this great tutorial, Dr. Jason.

    In the univariate LSTM model that uses CNN as feature , you use a kernel of size 1.

    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’),
    input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))

    The configuration may have been chosen arbitrarily, but the model performed better with kernel size 1. What is the intuition behind this size?

    Thanks

    • Jason BrownleeMay 8, 2021 at 6:37 am#

      It may suggest the CNN is not adding any value to the model.

  281. NimaMay 9, 2021 at 10:33 am#

    Hi Jason. I wanna use the last model ” Multiple Parallel Input and Multi-Step Output” for stock prediction, but I face this error: “AttributeError: module ‘tensorflow.python.framework.ops’ has no attribute ‘_TensorLike'”

    The code that I have been using is as follows. I exactly copied the code and transformed my data to fit the model but I faced an error.

    Thanks

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    import yfinance as yf
    from datetime import date
    from dateutil.relativedelta import *
    from copy import deepcopy
    import pickle

    import warnings
    warnings.filterwarnings(“ignore”)

    from numpy import array
    from numpy import hstack
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.layers import RepeatVector
    from keras.layers import TimeDistributed

    stocks = [‘AAPL’,’TSLA’,’UPS’, ‘FDX’, ‘FB’]
    today = date.today()
    Initial_period = today + relativedelta(months=-24)

    data = pd.DataFrame(columns=stocks)

    for s in stocks:
    dt = yf.download(s,Initial_period, today)
    data[s]= dt.reset_index()[‘Close’].values

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # choose a number of time steps
    n_steps_in, n_steps_out = 50, 7
    # covert into input/output
    X, y = split_sequences(data.values, n_steps_in, n_steps_out)
    print(X.shape, y.shape)

    model = Sequential()
    model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features)))
    model.add(RepeatVector(n_steps_out))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)

  282. George GMay 10, 2021 at 7:48 pm#

    Hi Jason, I have one question . Can you please check this question?

    https://stackoverflow.com/questions/67467590/lstm-timesteps-and-features-selection

    Thanks!

      • George GMay 11, 2021 at 4:02 pm#

        Ok Jason, so

        I am using 6 features and each feature has 7 timesteps, so I have:

        feature1(t-7) feature2(t-7) feature3(t-7) … feature6(t-7)… feature5(t) feature6(t) .. feature5(t+1) feature6(t+1)

        I am predicting the t and t+1 timesteps.

        So, my input data is [?, 7, 42] (6 features * 7 timesteps).

        Now, at first I was doing:

        X_train = X_train.reshape((X_train.shape[0] , 1 , X_train.shape[1]))

        and

        nb_timesteps, nb_features = 7, X_train.shape[2]

        I want to use 7 timesteps, but as you can see the input data has shape

        [?, 1, 42] and not [?, 7, 42]

        so, I show a warning about that.

        How can I overcome this, if I want to use 7 timesteps?

        My solution is to reshape data (after confirming that my length of data is a multiple of 7)

        X_train = X_train.reshape((X_train.shape[0] , 7 , X_train.shape[1] // 7))

        but now I am using 7 timesteps (ok I want that) and 6 features instead of 42.

        I want to ask if this is ok. I mean, with this setup I am using the 6 features for only for the (t-7) step and at the same time I am using 7 timesteps.

          • George GMay 12, 2021 at 4:31 pm#

            I was just saying that if I do reshape, the data is mixed up.
            Then , what features should I place in the last dimension? [samples, timesteps, features].

            Should I have all 42 features? (t-7),(t-6)…(t-1) ?

            Or should I have 6 features ? And at what time reference? (t-7) , (t-6) .. (t-1)?

          • Jason BrownleeMay 13, 2021 at 6:00 am#

            I try to avoid being descriptive as I never have all of the details of a reader’s dataset.

            I guess it is a design decision, likely based on the native structure of the data you are working with.

            The link I provided should help you think it through, otherwise prototype some approaches with pen and paper of some vanilla python and print() the results to see what makes sense.

          • Sam BJune 19, 2022 at 5:23 pm#

            Hi, I’ve got a silly question but I see variables named like nb_timesteps, nb_features. What does nb actually mean? Thanks!

          • James CarmichaelJune 20, 2022 at 11:39 am#

            Hi Sam…I do not see what you are referencing, however there would be significance to it as it is just part of variable name. In other words, you could also just call them…”nx_timesteps”, “ab_features” and the like.

  283. JoseMay 11, 2021 at 1:59 pm#

    Good evening, thanks for all the material you have published, as a newbie they have been a great help to me. In my case I am working on a time series problem, which consists of the disintegration of residential electrical energy. My problem can be summarized as follows: I have two time series as input, which can be interpreted in a certain way as the sum of the output series. I have the two input data series and 22 output time series. The objective is that once the model receives the two input series, it can reconstruct the 22 series that compose it. Please can you give me a guide between your tutorials and books which may be the most appropriate for my case. Can I reference the book that I purchase? Thank you.

  284. MingMay 13, 2021 at 11:59 am#

    Hi Jason,why do i use Encoder-Decoder Model for muti-step forecast(24steps) had bad result? it only can predict the trend for me ,can you help me? thank you very much

    • Jason BrownleeMay 14, 2021 at 6:19 am#

      It may or may not give a good result for a given dataset. We cannot know beforehand.

  285. SeanMay 13, 2021 at 7:12 pm#

    Why do you define the input_shape as shape of 2D? What is the difference between input_shape and batch_input_shape?

  286. David EspinosaMay 21, 2021 at 10:33 am#

    Good day Jason, first, thanks for the awesome tutorial.

    Second, I have two doubts regarding RNN in general.

    1) I have read in some forums that “each sample ‘should’ be of an integer type”, and in others they say that “RNN can deal with series of numbers, no matter the type”. Plus, the examples used in some of your other tutorials (https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/), show some float numbers used inside each sample. Which type is “preferred” for working with RNN?
    2) Related with the previous question, I am exploring the behaviour of some DNN architectures for binary classification. I have a mixed-type dataset (with both integer and float number), but I don’t know if I could use it “as is”, or turn them into some specific format (all integer, all float, if they are categories OHE, standardize / normalize)…

    I think both question pretty much redundate with each other, but anyways, I want to make sure I am well understood.

    Thanks beforehand for your thoughs about my query, and stay safe.

    • David EspinosaMay 21, 2021 at 10:35 am#

      Hello Jason,

      I just wanted to clarify of my ‘doubt # 2’, that I am focusing specifically to LSTM-RNN.

      Thank you again.

    • Jason BrownleeMay 22, 2021 at 5:30 am#

      Yes, generally RNNs should take small floats as input.

      Try your model on the raw data and compare to scaled data and use whatever works best for you.

  287. RasoulMay 25, 2021 at 12:15 am#

    The code worked for me with the followin changes,

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import LSTM
    from tensorflow.keras.layers import Dense

    I had to install tensorflow==1.12.0 and keras==2.2.4 (my python version is 3.6.8, and I am on Windows 10, no anaconda!)

    Hope this is helpful for other people facing problems regarding package incompatibility.

  288. Krisna GItaJune 7, 2021 at 10:59 am#

    Hi Jason,
    I tried your CNN-LSTM model and I got an error message. The error message was “ValueError: Please initializeTimeDistributed layer with atf.keras.layers.Layer instance. You passed: ”

    Would you like to help me solve this error? Thank you

    • Jason BrownleeJune 8, 2021 at 7:10 am#

      I recommend using the Keras API directly instead of tf.keras.

  289. mhrJune 7, 2021 at 2:00 pm#

    I was thinking about making a model for multiple separate ( sale forecast of a shop for different product) using a single model. I have found different ways but they are not concrete .I have studied ESRNN lib from github but it seems my data magnitude is too low like :
    product_id,date,count
    1101,1-5-2020,1
    1101,2-5-2020,4
    1101,3-5-2020,0
    1101,4-5-2020,0
    1101,5-5-2020,4 ….
    Is it possible to add embedded layer to parse the id then using the split_sequences method of yours to train a model that works for all product .

  290. PeterJune 7, 2021 at 8:33 pm#

    Thanks Jason for the tutorial. I have a question regarding the Multiple Input Multi-Step Output. You use the last 3 timesteps of the 2 time series [(10, 15); (20, 25); (30, 35)] to predict the next 2 timesteps [65,85]. Basically the 65 is from the same timestep as the (30,35). So why would you want to predict a value from a timeslot that you have already observerd (otherwise you would not have the input (30,35))? Would it not make more sense to predict the next 2 timeslot after the time slot with the (30,35) which led to 65? So basically you should predict [85, 105] when having [(10, 15); (20, 25); (30, 35)] as input.

    I’d appreciate every comment and would be quite thankful for your help.

    • Jason BrownleeJune 8, 2021 at 7:15 am#

      We are evaluating the model using walk-forward validation.

      Once you choose a model and config, you fit the model on all data and start making predictions on new data.

      Perhaps this will help:
      https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

      • PeterJune 8, 2021 at 7:37 pm#

        Thanks Jason for your answer, I know that you are using a walk-forward validation and I know how this works. This was not the point of my question. I am wondering why you forecast the values of the same timeslot for which you have the inputs? Normally you should forcast the values of the NEXT timeslot because this is – by definition – what a forecast is supposed to do.

        • PeterJune 8, 2021 at 7:39 pm#

          You are forecasting the output (Timeseries_3) of Timeslot_3 which is 65 using – amongst others – the inputs of Timeslot_3 (Timeseries_1:30 and Timeseries_2: 35). For me this does not make sense. Surely it makes sense to forecast Timeseries_3 of Timeslot_4 because this is a future value while Timeseries_3 for Timeslot_3 is not a future value when you are in Timeslot_3.

          So why do you not use Timeslot_1, Timeslot_2 and Timeslot_3 to forecast Timeslot_4 and Timeslot_5? You are using Timeslot_1, Timeslot_2 and Timeslot_3 to forecast Timeslot_3 and Timeslot_4

        • PeterJune 8, 2021 at 7:41 pm#

          Timeslot_1: Timeseries_1: 10, Timeseries_2: 15, Timeseries_3: 25

          Timeslot_2: Timeseries_1: 20, Timeseries_2: 25, Timeseries_3: 45

          Timeslot_3: Timeseries_1: 30, Timeseries_2: 35, Timeseries_3: 65

        • Jason BrownleeJune 9, 2021 at 5:43 am#

          That was the framing of the problem I was solving. You can frame the prediction problem anyway you like.

          • PeterJune 9, 2021 at 5:14 pm#

            Ah okay. Thanks a lot for your tremendous help. I really appreciate it.

  291. AmelieJune 15, 2021 at 12:20 am#

    Hello,

    The LSTM is well modeled my time series with acceptable errors.

    However, the forecasting value (after the test set of my real time series) are very far from what is called normal data.

    Is it normal?
    can you tell me more.

    • Jason BrownleeJune 15, 2021 at 6:07 am#

      Perhaps you need to prepare the data prior to modeling?
      Perhaps you need to tune the model?
      Perhaps the model is not appropriate for your dataset?

  292. Ibrahim AdigunJune 21, 2021 at 12:28 am#

    Hello,

    I have about to use LSTM for a price prediction case, but i gave addition data like, Age, Region, Town, payment method, different date (First and last payment) and so on.

    I want to know, if i will be able to use those those for LSTM model, This is my first project on NN.

    Thank you

  293. Anu AJune 21, 2021 at 6:02 pm#

    Hi Jason, thank you for the informative and detailed tutorials! I noted that you use the ‘relu’ activation function for the LSTM layers instead of the default ‘tanh’ activation. May I ask why? Thank you!

  294. Anu AJune 22, 2021 at 1:25 pm#

    Thank you very much for your reply! Sorry, but could you please clarify in what way it is more effective, and in what cases it might be preferred? Thank you!

    • Jason BrownleeJune 23, 2021 at 5:32 am#

      I noticed empirically on some problems that using RELU for some simple univariate time series was more effective.

      I recommend that you test a suite of model configurations and discover what works best for your specific dataset and model.

      • Anu AJune 23, 2021 at 11:40 am#

        Thank you for your clarification!

  295. IreneJune 22, 2021 at 5:20 pm#

    Thank you for your informative post!
    I have a question for ‘Multiple Input Multi-Step Output’ process.

    when I trained, I’d like to add validation set.
    is it a good way to add validation set?
    and if it is, how can I set?

    is it right to split train/validation/set disjointly??

    Thank you in advance!

    • Jason BrownleeJune 23, 2021 at 5:35 am#

      I don’t think using a validation set with an LSTM model is appropriate.

  296. IreneJune 23, 2021 at 9:47 am#

    Can I ask why?.. I’m lack of information about CNN or LSTM yet…

    • Jason BrownleeJune 24, 2021 at 5:57 am#

      Because we cannot perform walk-forward validation on future time steps and use the same time steps (or different future time steps) for validation.

  297. GeorgeJune 24, 2021 at 6:44 pm#

    Hi Jason,

    In your example Multivariate Multi-Step LSTM Models->Multiple Input Multi-Step Output,

    where you use n_steps_in, n_steps_out = 3, 2 , if we use for example sigmoid for the last layer and binary crossentropy loss:


    n_steps_in, n_steps_out = 3, 3

    X, y = split_sequences(dataset, n_steps_in, n_steps_out)

    n_features = X.shape[2]

    model = Sequential()
    model.add((LSTM(5, activation=’relu’, return_sequences=True, input_shape=(n_steps_in, n_features))))

    model.add(Dense(1, activation=’sigmoid’))
    model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

    model.fit(X, y, epochs=20, verbose=0, batch_size=1)

    it runs ok.

    BUT, if we use n_steps_in, n_steps_out = 3, 2, it gives:


    ValueError: Dimensions must be equal, but are 2 and 3 for ‘{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)’ with input shapes: [1,2], [1,3].

    Any ideas what is that and how to deal with it?

    Thank you!

    • Jason BrownleeJune 25, 2021 at 6:12 am#

      Sorry, it’s not clear what the issue may be. You may need to use a little trial and error in adapting the model for your specific use case.

  298. fan zhangJune 25, 2021 at 11:13 pm#

    hi Jason, thanks for the tutorial, that’s very helpful, I found that by changing the batch_size in the predict() method, the prediction values change (I used your # univariate stacked lstm example and just changed the batch_size in the predict() method below)….
    yhat values are almost the same as yhat1 (because the default batch size 32 is similar to 41), but yhat2 values differ a lot from yhat1 and yhat…..since it is a stateless lstm, how come changing the batch size in predict method change the prediction values?

    i really appreciate your time and help in advance 🙂

    # univariate stacked lstm example
    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.utils import plot_model

    # split a univariate sequence
    def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence)-1:
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = list(range(1,65))
    # choose a number of time steps
    n_steps = 2

    # split into samples
    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features = 1
    X = X.reshape((X.shape[0], X.shape[1], n_features))
    # define model
    model = Sequential()
    model.add(LSTM(50, activation=’relu’, return_sequences=True, input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=200, verbose=0)
    plot_model(model)

    # demonstrate prediction
    x_input = array(list(range(2,166)))
    x_input = x_input.reshape((-1, n_steps, n_features))

    yhat = model.predict(x_input, verbose=0, batch_size=41)

    yhat1 = model.predict(x_input, verbose=0)

    yhat2 = model.predict(x_input, verbose=0, batch_size=2)

    and yhat != yhat2 != yhat1

  299. fan zhangJune 25, 2021 at 11:37 pm#

    em..just a follow up commet, the difference are quite minor (probably can be ignored):

    yhat2[-1]
    Out[3]: array([169.57353], dtype=float32)

    yhat1[-1]
    Out[4]: array([169.57355], dtype=float32)

    yhat1[-2]
    Out[5]: array([167.4769], dtype=float32)

    yhat2[-2]
    Out[6]: array([167.47688], dtype=float32)

    yhat2[-4]
    Out[7]: array([163.28676], dtype=float32)

    yhat1[-4]
    Out[8]: array([163.28674], dtype=float32)

  300. sinferJuly 8, 2021 at 3:31 am#

    HI Jason,

    Can you give me an idea on how to choose the time steps for lstm model used for fault detection and diagnosis of time series data with 7 faults and normal condition data labeled within data set. I have decided to go with 8 time steps since there are 8 types of conditions(7 faults and normal). 8 time series .

    Finally i want to send last 10 data points to the predict function and return the condition( fault type or normal). Multiple data points as input, predicts the class label based on the input data points. A multi-class classification problem.

    Thanks

    • Jason BrownleeJuly 8, 2021 at 6:09 am#

      Perhaps you can test a suite of configurations and discover what works best for your specific dataset.

  301. LakminiJuly 9, 2021 at 3:05 pm#

    Hi Jason,

    This is a great article. Can we use LSTM to impute missing data in time series?

    • Jason BrownleeJuly 10, 2021 at 6:05 am#

      Yes, perhaps try it and compare results to other methods.

  302. Robin BartmannJuly 9, 2021 at 11:12 pm#

    Hey Jason,

    Thanks for these fantastic blogposts!
    I used a lot of your inputs to develop the code for my thesis – Forecasting carbon market prices with Bayesian and Machine Learning methods. I performed 1step and 4step ahead forecasts with a multivariate (6 covariates), direct rolling window forecast with 3 models to compare:
    1) normal linear regression
    2) a shrinkage time varying parameter model (shrinkTVP in R)
    3) LSTM model (from your blogposts)

    I am still finalizing the results and will post them here to compare the performance between these models over time. I use weekly data from 2013-2020. Let me know if you are interested in something particular / if there is something that would help this community most.

    Really big thank for the great resources – I am an economist and will continue to use all the resources here to advance econometric methods!

    • Jason BrownleeJuly 10, 2021 at 6:11 am#

      Well done!

      Sharing may help other people using the same methods or working on the same problem.

  303. LioJuly 12, 2021 at 11:53 pm#

    Hi Jason,
    Thank you for providing such a good article for us!
    In the process of learning LSTM,I encountered some doubts.I hope to get your advice.
    I find that the predicted value lags behind the actual value.It’s like the curve of the actual value make parallel movement to the curve of the predicted value.What is the cause of this phenomenon? Is there any solution?
    I hope to hear from you soon.

  304. LioJuly 15, 2021 at 6:39 pm#

    OK, thank you for your reply. I hope I can learn more from your article.
    If you can learn more about lag, I hope you can tell me. I will be indebted forever.

    • Jason BrownleeJuly 16, 2021 at 5:22 am#

      You can vary the amount of lag used as input in order to discover what works well or best for your specific dataset and model.

      • LioJuly 19, 2021 at 12:52 pm#

        Thank you very much. I will try as you say.

  305. AnshJuly 19, 2021 at 5:40 pm#

    Hi Jason,

    I have a question regarding the splitting of data for multivariate analysis.

    According to the book Deep Learning for Time Series Forecasting Predict the Future with MLPs, CNNs and LSTMs in Python for the following example:

    time, measure1, measure2
    1, 0.2, 88
    2, 0.5 89
    3, 0.7 87

    The data can be converted into supervised series as follows:
    time, measure1, measure2
    1, ?, 88
    2, 0.2, 89
    3, 0.5 87
    4, 0.7, ?

    Which means the first and last rows fall off.

    However, in your multivariate example for this dataset and window = 3
    [[10, 15, 25]
    [ 20, 25, 45]
    [ 30, 35, 65]
    [40, 45, 85]
    [50, 55, 105]
    [60, 65, 125]
    ………………..]]

    When given an input of:
    10, 15
    20, 25
    30, 35

    The output is :
    65
    85

    Shouldn’t the output be [85, 105], assuming the first set data value [65] falls off as the case in the first example.

    I also reran the same example with window size 1, and for the first row of data [10,15] the output was 25, but should it be 45 instead, given that there is no previous data to predict 25 and the first row should fall off ?

    This is the split function I am using :
    def msplit_sequence(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out-1
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern

    seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    Looking forward to your response.

    • Jason BrownleeJuly 20, 2021 at 5:34 am#

      Each problem has different requirements and expectations. You can define the input and output of your problem any way you like.

      • DionAugust 16, 2022 at 6:19 am#

        Good day
        Searching Google came across your blog is very interesting, I’m a beginner just starting to learn prediction….

        Needed to know , can this be done:

        Race : 1
        5 runners, 400m track race

        1. At 100m , 10.5s / 200m, 19.8s/300m,30.1s/400m, 43.5s
        Then runner 2,
        Then runner 3,
        Then runner 4
        Then runner 5.
        All with times at 100m, 200m,300m,400m individually performed

        Can I predict who’ll be 1st/2nd/3rd/4th with new predicted times for each runner at 100m/200m/300m/400m intervals.

        Race :2
        Another scenario i have 400m race, only 300m sectional 30.1s then for each runner with their individual times achieved and final time 43.5 then for each runner have their 400m times achieved , can i still predict predicted intervals of each runner and each runners position at 100/200/300/400m is this possible ?

        Can result be in this format ?

        Example
        100m – 5, 10.5s / 3, 10.3s /1, 09.9s/4, 10.2s / 2, 11.3
        200m- same like 100m calculations
        300m- same like 100m calculations
        400m – same like 100m calculations

        Appreciate your assistance….

        Await your response

        Thanks

        • James CarmichaelAugust 16, 2022 at 9:39 am#

          Hi Dion…Please narrow your query to single question so that we may better assist you.

  306. OmarJuly 23, 2021 at 9:10 pm#

    Hi Jason,
    I’m new to Time Series Forecasting. I would appreciate your help. I am currently trying to predict how much a person drinks each day. I have timestamps every 30 minutes and a corresponding value that represents the drunk amount within those 30 minutes. You can already imagine I have a lot of 0 values in the middle. Moreover, a person only drinks from 8AM until 8PM but the data nevertheless spans the whole day (So always 0s from 8PM until 8AM the next day and 1 day is 48 entries). I have also another version of the dataset where the data spans only 8AM till 8PM (1 day is 24 entries).
    I already tried Croston’s Method but I am trying to have a dynamic solution, I am trying to implement a Neural Network for this. Would you point me to the right direction? Will LSTMs for example work for intermittent data? Which version of the data would make the model less complicated?
    Ps: Your blog is extremely helpful, thanks a lot.
    Best,
    Omar

    • Jason BrownleeJuly 24, 2021 at 5:14 am#

      I would recommend testing a suite of different framing of the problem, different models, different configurations until you find a technique that works well for your dataset.

  307. Mohamed Elhaj AbdouAugust 4, 2021 at 2:23 pm#

    I have a dataset timeseries forecasting that includes the categorical columns and numeric as well.

    here is a sample of it

    Date | categorical _fature_1 |categorical _fature_2| Feature_1_numeric | feature_2_numeric | price

    1-1-2020 | USA | A | 5.5 | 7.6 | 100

    1-1-2020 | USA | B | 8.3 | 1.7| 20

    1-1-2020 | USA | C | 3.6 | 2.1 | 17

    1-2-2020 | USA | D | 5.5 | 7.6 | 40

    1-2-2020 | USA | E | 77.5 | 35 | 22

    1-2-2020 | USA | F | 69.5 | 2 | 22

    as you can see in the sample in the date lets pick up the **1-1-2020** we have multiple observations at the same date .

    i want to predict the **Price** column as a **Y_label** and taking the **categorical _fature_1**, **categorical _fature_2**, **Feature_1_numeric**, and **Feature_2_numeric** as the **X_features**

    so from my understanding as im using **multiple features** for time series Forecasting predicting the **Price** column this is called **Multivariate Time-Series Forecasting**

    My Question is

    1-how can i manage the multiple observations at the same time from the different features as we saw for example in **1-1-2020** we have **three** different observations

    2-i believe if we have multiple observations at the same time/date then we have a new kind of Time-series forecasting what is it Multi-timestep Multivariate Time-Series Forecasting or what ???

    thanks

    • Jason BrownleeAugust 5, 2021 at 5:15 am#

      Perhaps you can test different framings of the problem and discover what works well or best for you, e.g. multiple-input model vs treating the observations as separate time steps.

  308. okasAugust 17, 2021 at 9:19 am#

    Hi Jason , thank you for your amazing tutorial. I have a dataset that contains test results and multiple features for multiple users. for example

    date | user_Id | feature _1| feature _2| test_output
    1-Jan-2020 | A | 5.5 | 7.6 | 100

    2-Jan-2020 | A | 8.3 | 1.7 | 20

    3-Jan-2020 | A | 3.6 | 2.1 | 17

    1-Jan-2020 | B | 5.5 | 7.6 | 40

    2-Jan-2020 | B | 77.5 | 35 | 22

    3-Jan-2020 | B | 69.5 | 2 | 22

    I want to predict the output for the next day, and I want to achieve it using LSTMs if possible and all suggestions are welcome.
    I want to train my model with multiple users so that it can predict the output for any given user(unseen user) in the next day and i could not find a way to create/reshape my data before feeding it into LSTM

    • Adrian Tam
      Adrian TamAugust 17, 2021 at 11:45 am#

      A quick way is to use groupby() in dataframe to create a subset on each user, then set target to be dataframe[“target”]=dataframe[“feature”].shift(-1) so you can see the next-period data as a column. Is that what you mean by reshape?

  309. okasAugust 17, 2021 at 8:45 pm#

    thank you for your reply

    1- i want to understand and visualize the data preparation process (as in the examples above) before feeding it into the lstm model and how can i deal with such data as i mentioned it is related to multiple users.

    2- shouldnt i add the output in the “next-period” column instead of the features ?
    dataframe[“target”]=dataframe[“output”].shift(-1) ?
    3- if i want to generally prepare my code to deal with multistep forecasting, what changes should i modify in any of the above illustrated examples

    • Adrian Tam
      Adrian TamAugust 18, 2021 at 3:17 am#

      You’re correct for (2). For (1), I don’t see any issue with multiple users here. You still train the model the same way as long as you do not mix the data from different time series. For (3), that depends on your design. One way is to feed the LSTM output back into the input so we can predict for one more step, then repeat for yet one more step, etc.

      • okasAugust 18, 2021 at 5:00 am#

        regarding point 1 , can you explain what do you mean by(as long as you do not mix the data from different time series) and how can i make sure that i am not mixing the data during the training phase. in other words how can i make sure my model understands that there are multiple users that shares the same time series

  310. LilianaAugust 24, 2021 at 6:55 am#

    Hi Jason:

    I have a concern, in the case of using an LSTM for the forecast of time series of the Multiple Parallel Input and Multi-step Output type, Vector Output and Encoder-Decoder LSTM can be used, but, in both cases can also be used Vanilla LSTM, Stacked LSTM, Bidirectional LSTM, CNN-LSTM and ConvLSTM?.

    Thanks for your attention.

    • Adrian Tam
      Adrian TamAugust 24, 2021 at 11:52 am#

      Yes, there are different variations of LSTM. All have the feature that they can learn and remember the state, but each variant will have some subtle differences.

      • LilianaDecember 19, 2021 at 9:54 am#

        Hello Jason:

        I would like to know, if I want to make the forecast for a time series of Multiple Parallel Input and Multi-step Output type, using an LSTM Encoder-Decoder, to obtain multivector output. Could I do the following?:

        Configure the Encoder in any of the following ways:

        Vanilla LSTM
        Stacked LSTM
        Bidirectional LSTM
        CNN-LSTM
        ConvLSTM

        And, configure the Decoder in any of the following ways:

        Vanilla LSTM
        Stacked LSTM
        Bidirectional LSTM
        CNN-LSTM
        ConvLSTM

        And do any combination of LSTM Encoder-Decoder settings to get my multi-step, multi-vector forecast?

        Or are there any of these combinations that I cannot do for an LSTM Encoder-Decoder?

        Thanks for your attention.

        • Adrian Tam
          Adrian TamDecember 19, 2021 at 2:18 pm#

          All seems possible. Did you tried anything?

          • LilianaDecember 21, 2021 at 9:06 am#

            Hi Adrian, yes, now that you mention it, I’m testing each of these combinations.

            Thank you so much.

          • LilianaJanuary 19, 2022 at 9:15 am#

            Hello Adrian

            Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

            I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

            Thanks for your attention.

  311. davidgSeptember 5, 2021 at 5:17 am#

    Hi Jason,
    I’m trying to learn how LSTMs actually work under the hood (as opposed to how to use them). One very confusing point is this: What exactly is an LSTM unit? There seems to be contradictory definitions in the literature. In particular, referring to your very first example in which you separate a 10-long integer sequence into six sets of three consecutive terms with the next term as the desired output, the best interpretation I have come up so far is that by a “unit” you mean a set of six LSTM cells wired in series, where each cell takes a 3-dimensional vector as input and outputs a scalar. Here a “cell” is the usual collection of 4 (or 3 depending again on murky definitions) gates. So there wold be a total of 6×50 = 300 cells all wired up in series, and all having the same set of affine parameters (weights and biases). Another unanswered question then is: what is the dimension of the state vector?

    It would be great if you could notify my email when you respond, or better yet, copy your response to my email.

    Thanks so much for any help!

    • Jason BrownleeSeptember 6, 2021 at 5:16 am#

      In Keras, there are no cells, just units/nodes. Or a cell is a unit is a node.

    • LilianaJanuary 19, 2022 at 9:14 am#

      Hello Adrian

      Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

      I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

      Thanks for your attention.

      • LilianaJanuary 19, 2022 at 9:20 am#

        Sorry I was in the wrong place to ask this question. I appreciate it being deleted from this place, because I already asked it in the correct question.

        • James CarmichaelJanuary 20, 2022 at 7:56 am#

          No worries Liliana!

  312. Jacques MusondaSeptember 8, 2021 at 1:51 am#

    Thank you for this clear and helpful tutorial.

  313. PreetSeptember 22, 2021 at 5:11 pm#

    Thanks a lot! Amazing tutorial.

    • Adrian Tam
      Adrian TamSeptember 23, 2021 at 3:37 am#

      Glad you like it!

  314. ilovepythonOctober 5, 2021 at 10:51 pm#

    def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence)-1:
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    raw_seq = [2456, 1829, 2141, 1362, 1634, 1241, 1617, 1434, 2279, 1131,
    1192, 1065, 725, 997, 1161, 2033, 1815, 1123, 1136, 929, 1340,
    1476, 1962, 2199, 1276, 1351, 1201, 1078, 1397, 2181, 2042, 1117,
    1284, 1114, 1416, 1163, 1931, 1753, 1073, 1168, 1022, 1251, 3167,
    3958, 4002, 2033, 1362, 1099, 1506, 1614, 2838, 2569, 1708, 1536,
    1443, 1734, 1970, 2755, 3101, 1790, 1223, 1369, 1651, 2101, 3255,
    2559, 1711, 1738, 1612, 1878, 2064, 3504, 3855, 3425, 2829, 2846,
    4503, 4300, 4099, 3829, 1694, 1633, 1579, 2404, 2520, 4544, 4435,
    2227, 2173, 1690]

    # choose a number of time steps
    n_steps = 7
    # split into samples
    X, y = split_sequence(raw_seq, n_steps)
    # reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
    n_seq = 1
    n_steps = 2
    n_features = 1
    X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation=’relu’))
    model.add(Dense(1))
    model.compile(optimizer=’adam’, loss=’mse’)
    # fit model
    model.fit(X, y, epochs=500, verbose=0)
    # demonstrate prediction
    x_input = array([4300, 4099, 3829, 1694, 1633, 1579, 2404])
    x_input = x_input.reshape((1, n_seq, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Sir, i tried replicating your code and change the n_steps to 7 but it gave me this valueerror ValueError: cannot reshape array of size 581 into shape (83,2,2,1). what should i do? sorry i am very new. thank you. 🙁

    • Adrian Tam
      Adrian TamOctober 6, 2021 at 10:37 am#

      you redefined n_steps to 2 later on.

  315. ZHuangOctober 11, 2021 at 12:19 am#

    Jason, thank you for your great post.
    I am just wondering whether this one can be used to predict non-parallel series problem
    for example:
    out_seq = array([in_seq1[i-10]+in_seq2[i-5] for i in range(len(in_seq1))])
    I tried in_seq1, seq2 as random noise to pred out_seq. The whole purpose is to let the network to learn the hidden mapping btwn different lagging seq1/seq2. Result is not good. Any idea on how to tackle this kind of problem, or did I miss sth.

    • Adrian Tam
      Adrian TamOctober 13, 2021 at 7:09 am#

      Garbage in garbage out. If your input is random noise, usually the result would not make sense.

  316. AbbasOctober 20, 2021 at 9:13 am#

    Jason, thank you for your helpful post.
    I am a phd student . I used Bidirectional LSTM with CNN to forecasting solar Energy . I got good accuracy when compared my result with another model with same dataset, but I need some advice to make contributions on model.

  317. huang huiNovember 20, 2021 at 11:52 am#

    Hi Jason,

    I focus on your website from 2018. Your website has benefited me a lot .Thank you very much for sharing these tutorials and code publicly.

    I used convlstm for spatial -temporal forecast , I my dataset is [2880, 6], 6 is spatial dot, 2880 is time series.

    n_features = 6
    n_seq = 6
    n_steps = 2
    model.add(ConvLSTM2D(filters=6, kernel_size=(6,2), activation=’relu’, input_shape=(n_seq, 6, n_steps, n_featurs)))

    But meet the error:

    ValueError:
    Input 0 of layer sequential is incompatible with the layer: expected ndim=5, found ndim=3. Full shape received: [None, 5, 6]

    I can not find the a solution,would you like to give me any advice? Thanks !

    • Adrian Tam
      Adrian TamNovember 20, 2021 at 1:44 pm#

      ndim=5 because you set “input_shape=(n_seq, 6, n_steps, n_featurs)” and ndim=3 refers to you input dataset. I think you need to check how you shape your input and passed int the network.

  318. Alper OzelDecember 3, 2021 at 6:01 am#

    This was a great tutorial, the most comprehensive one out there. Thank you for your work. I have one question, do you have a comparison between the time series prediction NN algorithims, is there any better than LSTM?

    • Adrian Tam
      Adrian TamDecember 8, 2021 at 6:39 am#

      I don’t think any comparison would be absolutely fair, but more on which problem fits which model. For the question on LSTM, people have seen GRU as a faster alternative but not always better.

  319. RitiDecember 5, 2021 at 6:42 pm#

    Hi Jason,

    Thanks a lot for this wonderful tutorial. Extremely helpful for me!
    I have a query regarding the input shape to LSTM model. I would like to provide 8 dimensional time series (i.e. 8 features) where each time sample has a label (or output) associated with it. So, I want the network to learn the mapping from the time series to label series (where time series features also have temporal dependencies). For example- let’s say I have 10000 x 8 length of input series, and 10000 x 1 is the corresponding output size. Now if I set time_steps=10, and feat_size=8, I will have (1000, 10,8) as size of input and (1000,10) as size of output. How can I train LSTM for this ? Should I set return_seq as True and it will take care of learning map from feat to corresponding label ? I am not sure if I am correct here and would like to know if this approach is fine. Thanks again!

    • Adrian Tam
      Adrian TamDecember 8, 2021 at 7:36 am#

      If you set return_seq as True, your output is (1000,10) but if it is false, you still have (1000,1). The sequence length in LSTM just means for this many step you will reset the memory.

  320. daliaDecember 5, 2021 at 7:55 pm#

    Thank you for this clear and helpful tutorial,

    what if i need to work on csv data as input instead of sample data as above ?

  321. IWDecember 6, 2021 at 4:48 pm#

    Hi

    This blog is super helpful, thank you!

    I am really stuck on this matter and maybe you could help me?

    I have 500 number of different observations in the shape of (100,2). (100 data points, 2 features)
    I am reshaping my data to predict 5 time steps ahead based on past 3 time steps. so, after reshaping my data I have
    input_shape = (94,3,2)
    output_shape=(94,5,2)

    but because I have 500 different observations I essentially have the data in the shape of,
    input_shape = (500,94,3,2)
    output_shape=(500,94,5,2)

    the only way I could train my model is by using a for loop to feed each of the 500 observations.

    is there a better way to do this?

    • Adrian Tam
      Adrian TamDecember 8, 2021 at 7:41 am#

      You’re wrong on the shape here. Your LSTM is predicting with 3 steps and 2 features, then your input is (N,3,2). You should combine the 500 observations together.

  322. BharathiDecember 15, 2021 at 7:25 pm#

    Can you please tell me how did you consider the below values:
    I understood it for 3 timesteps for input and 1 for output but not the below one’s.

    n_steps_in, n_steps_out = 3, 2
    n_features = X.shape[2]

    • Adrian Tam
      Adrian TamDecember 17, 2021 at 6:57 am#

      For example you have data [10, 20, 30, 40, 50, …] it means you use [10, 20, 30] to predict [40, 50], hence you use 3 steps in input and 2 steps in output. In this case, each time step is a single number, hence the n_features is 1.

    • James CarmichaelDecember 21, 2021 at 11:23 pm#

      Hi Bharathi…Could you please post the exact code block you have questions about?

      -Regards,

  323. AlexDecember 18, 2021 at 4:49 pm#

    Hi Jason

    Alex is my name :I’m looking for an algorithm such as Multi-Modal Deep Prediction Model using LSTM

    • Adrian Tam
      Adrian TamDecember 19, 2021 at 1:49 pm#

      Can you explain what do you mean by the multi-model prediction?

    • James CarmichaelDecember 21, 2021 at 11:30 am#

      Hi Alex…Please explain more about what you are specifically trying to accomplish.

  324. LuigiJanuary 5, 2022 at 3:18 am#

    Hi Jason,
    amazing post thanks a lot for it! super, super!

    I would have a question if you do not mind.
    I have a dataset of 100 financial indices.

    I want to make prediction of 1 or more samples ahead (doesnt matter).
    However, since my variables share some information (common variance) there is some redundancy therefore I would like to compress my dataset same as a PCA or a factor analysis does, but I want to use the LSTM Autoencoder (or how you call it here Encoder-Decoder Model).

    The point is that I want to run the autoencoder as you coded here, however what I would keep at the end are the compressed variables at the bottleneck of the autoencoder (end of the encoder), so remove the decoder, and make a prediction only on those compressed set..
    because i believe those compressed variables can represent better my dataset (removing redundancy)

    This would be also useful for denoising (I would let the hyperparameter tuning to choose the dimension of the bottleneck).

    Do you have a reference for coding this?
    Or can you briefly indicate me please how to modify your Encoder-Decoder Model?

    My idea is that the code you show here during the training will be the same but there must be a modification to add such as the number of dimention of the bottleneck (which I cannot see in your code), and the predict() which has to be run using the model without the decoder

    Many thanks in advance
    Luigi

    • James CarmichaelJanuary 7, 2022 at 6:33 am#

      Hi Luigi…I appreciate the kind words! I would be able to help you better if you could direct any questions to specific code listings and examples provided machinelearningmastery.com.

      Regards,

  325. LuigiJanuary 7, 2022 at 7:55 pm#

    Hi James,
    thanks for willing to help me.

    I found your posthttps://machinelearningmastery.com/lstm-autoencoders/
    more relevant to my case so I will open/continue the discussion in there if you don’t mind

    Thanks again for your offer to help, very kind
    Luigi

    • James CarmichaelJanuary 8, 2022 at 11:05 am#

      Hi Luigi…You are very welcome! Yes, please feel free to continue the discussion in indicated post.

      Regards,

  326. MochaJanuary 11, 2022 at 6:35 pm#

    Hello, Sir! Thanks for your explanation. I want to ask about ConvLSTM. Can I use it for weather data that have spatial and temporal features that have extention grib2 or nc? We can get spatial features from the longitude and latitude and temporal from the time. I want use it that data for predict the rain. And also, Can I use ConvLSTM for predict the probabilistic?

    I hope you’ll answer my question, thank you Sir.

  327. MochaJanuary 12, 2022 at 12:10 pm#

    Thanks for your answer, Sir!

    But, can I still use Conv-LSTM or just LSTM? Because my data aren’t image, Sir.

  328. jesuJanuary 20, 2022 at 4:40 am#

    How can I understand the way to build the model?
    I mean, how many LSTM for example? How many dense layers? dropout?

    I have a multivariate time series with 5 features

    • James CarmichaelJanuary 20, 2022 at 7:47 am#

      Hello Jesu…More nodes and layers means more capacity for the network to learn, but results in a model that is more challenging and slower to train.

      You must find the right balance of network capacity and trainability for your specific problem.

      There is no reliable analytical way to calculate the number of nodes or the number of layers required in a neural network for a specific predictive modeling problem.

      My general suggestion is to use experimentation to discover what configuration works best for your problem.

      This post has advice on systematically evaluating neural network models:

      How to Evaluate the Skill of Deep Learning Models
      Some further ideas include:

      Use intuition about the domain or about how to configure neural networks.
      Use deep networks, as empirically, deeper networks have been shown to perform better on hard problems.
      Use ideas from the literature, such as papers published on predictive problems similar to your problem.
      Use a search across network configurations, such as a random search, grid search, heuristic search, or exhaustive search.
      Use heuristic methods to configure the network, there are hundreds of published methods, none appear reliable to me.
      More information here:

      How to Configure the Number of Layers and Nodes in a Neural Network
      Regardless of the configuration you choose, you must carefully and systematically evaluate the configuration of the model on your dataset and compare it to a baseline method in order to demonstrate skill.

  329. LilianaJanuary 21, 2022 at 6:35 am#

    Hello Jason

    Doing these tests, I would like to ask you… really in an LSTM Encoder-Decoder model could I really use a CNN-LTSM model or a ConvLSTM model with the Decoder?

    I ask this because these two models use an input with specific characteristics and in the case of being used as Decoders, the input comes with a RepeatVector layer that does not correspond to the input form for a CNN-LSTM model or a ConvLSTM model.

    Thanks for your attention.

    • James CarmichaelJanuary 21, 2022 at 9:32 am#

      Hi Liliana…You should try both and compare the results in my opinion. Also, it would be a good idea to try SARIMA. Sometimes it even outperforms newer deep learning methods!

      https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/

      • LilianaJanuary 21, 2022 at 10:30 am#

        Yes, I have already tried it and I have the problem that I describe, that is to say that I cannot make the CNN-LSTM and the ConvLSTM serve as a Decoder due to the form of input they require, which is not like the one provided by the previous layer of the model which is a Repeat Vector layer, hence my question, actually, can I use these models as a Decoder?

        Thanks for the advice, already use a VAR model.

        I am attentive, thank you.

  330. Dwiki SetiawanJanuary 27, 2022 at 4:05 pm#

    how about this,
    i have a time series data (2 years) with one variable (amount per day). And i want to predict based on that data. How to do that?

    *i’m 100% newbie

  331. Ugur KahveciJanuary 27, 2022 at 11:48 pm#

    Hello Jason, great tutorial as always!

    I am having trouble finding any sensible result in my LSTM algorithm. I am trying to use Early Stopping and Model Checkpoint together but when I try to monitor validation accuracy for model checkpoint, validation accuracy becomes zero and does not improve over epochs. I changed the monitor parameter to validation loss and now validation loss seems to be very high. After model completes training, the results are zero for both train and test accuracies.

    I am thinking if I made a mistake seperating the dataset into train and test datasets because in your article you mention that datasets should be in a certain format to use LSTM.

  332. PanizJanuary 28, 2022 at 5:13 am#

    Hi,
    Thank you so much for the thorough tutorial.

    As for the out_seq, I see in almost all examples that is a summation of the input_seqs. I understand these are examples. But what if you know there is a dependency between the in and out seqs but u do NOT know what it is exactly. Then how do you set his up? Any tips? thanks

  333. KostasFebruary 22, 2022 at 2:38 am#

    Hello, thanks for tutorial.

    I tried to use a Vector Output to model your last example (Multiple Parallel Input and Multi-Step Output) instead of an encoder-decoder model, but I keep getting an error.

    Here’s the code.

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence
    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
    # convert to [rows, columns] structure
    in_seq1 = in_seq1.reshape((len(in_seq1), 1))
    in_seq2 = in_seq2.reshape((len(in_seq2), 1))
    out_seq = out_seq.reshape((len(out_seq), 1))
    # horizontally stack columns
    dataset = hstack((in_seq1, in_seq2, out_seq))
    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]
    model = Sequential()
    model.add(LSTM(200, activation=’relu’,return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(200, activation=’relu’, return_sequences=True))
    model.add(TimeDistributed(Dense(n_features)))
    model.compile(optimizer=’adam’, loss=’mse’)
    model.fit(X, y, epochs=300, verbose=0)
    # demonstrate prediction
    x_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    Plz help !
    Thanx in advance

    • James CarmichaelFebruary 26, 2022 at 12:45 pm#

      Hi Kostas…Please clarify your question so that we may better assist you.

  334. kostasFebruary 22, 2022 at 8:26 pm#

    Thank you for the tutorial, but I have a question.

    I tried implementing a Multiple Parallel Input and Multi-Step Output model by using a vector output model instead of a encoder-decoder (as you did at the end of your tutorial) but I keep getting some errors.

    The code is presented below. Could you please help me out ?

    Thanks in advance!

    from numpy import array
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from keras.layers import Bidirectional
    from keras.layers import Flatten
    from keras.layers import TimeDistributed
    from keras.layers.convolutional import Conv1D
    from keras.layers.convolutional import MaxPooling1D
    from keras.layers import ConvLSTM2D
    from numpy import hstack
    from keras.layers import RepeatVector

    # split a multivariate sequence into samples
    def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps_in
    out_end_ix = end_ix + n_steps_out
    # check if we are beyond the dataset
    if out_end_ix > len(sequences):
    break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]
    X.append(seq_x)
    y.append(seq_y)
    return array(X), array(y)

    # define input sequence

    in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
    in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
    out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
    # convert to [rows, columns] structure
    in_seq1 = in_seq1.reshape((len(in_seq1), 1))
    in_seq2 = in_seq2.reshape((len(in_seq2), 1))
    out_seq = out_seq.reshape((len(out_seq), 1))
    # horizontally stack columns
    dataset = hstack((in_seq1, in_seq2, out_seq))
    # choose a number of time steps
    n_steps_in, n_steps_out = 3, 2
    # covert into input/output
    X, y = split_sequences(dataset, n_steps_in, n_steps_out)
    # the dataset knows the number of features, e.g. 2
    n_features = X.shape[2]

    model = Sequential()
    model.add(LSTM(200, activation=’relu’,return_sequences=True, input_shape=(n_steps_in, n_features)))
    model.add(LSTM(200, activation=’relu’,return_sequences=True ))
    model.add(TimeDistributed(Dense(2)))
    model.compile(optimizer=’adam’, loss=’mse’)

    model.summary()

    # fit model
    model.fit(X, y, epochs=300, verbose=0)
    # demonstrate prediction
    x_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])
    x_input = x_input.reshape((1, n_steps_in, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(yhat)

    • James CarmichaelFebruary 23, 2022 at 12:24 pm#

      Hi Kostas…Thanks for asking.

      I’m eager to help, but I just don’t have the capacity to debug code for you.

      I am happy to make some suggestions:

      Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
      Consider cutting the problem back to just one or a few simple examples.
      Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
      Consider posting your question and code to StackOverflow.

      • kostasFebruary 23, 2022 at 9:36 pm#

        Thanks for the reply, but the code I posted is actually a copy of your last implementation, which is “Multiple Parallel Input and Multi-Step Output” implementation.

        In your article, I quote :
        “A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.”

        I tried using a Stacked LSTM instead of an encoder-decoder model, but I did not work, because I’m using three timesteps for training and I’m trying do predict a 2 timesteps series.

  335. kostasFebruary 24, 2022 at 1:55 am#

    #Correction

    Thanks for the reply, but the code I posted is actually a copy of your last implementation, which is “Multiple Parallel Input and Multi-Step Output” implementation.

    In your article, I quote :
    “We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model”

    I tried using a Stacked LSTM instead of an encoder-decoder model, but it did not work, because I’m using three timesteps for training and I’m trying do predict a 2 timesteps series.

    How can I solve the issue plz ?

    • James CarmichaelFebruary 24, 2022 at 2:42 pm#

      Hi Kostas…What error(s) are you encountering?

  336. EmmyFebruary 26, 2022 at 7:14 pm#

    Dear Sir,

    Thank you so much for this great tutorial.

    I am working on a project that requires me to feed real-time IoT data (with four variables) to the vanilla LTSM model to enable me to predict an outcome.

    Kindly provide me with a guide on this.

    Thank you

    • James CarmichaelFebruary 27, 2022 at 12:24 pm#

      Hi Emmy…Thanks for asking.

      Sorry, I cannot help you with your project.

      I’m eager to help, but I don’t have the capacity to get involved in your project at the level you need or at a level to do a good job.

      I’m sure you can understand my position, as I get many of requests to help with projects each day.

      Nevertheless, I am happy to answer any specific questions you have about machine learning.

  337. Marimuthu SMarch 2, 2022 at 5:56 pm#

    Hello Jason,

    Greetings.

    Does RNN use one-hot encoding in each time step for time series data forecasting?

    for instance, input=[10,20, 30]

    In 1st time step input is [10, 0, 0],

    In 2nd time step input is [0, 20, 0], and

    In 3rd time step input is [0, 0, 30]

    Isn’t it?

    Thanks in advance.

  338. McanPMarch 7, 2022 at 5:34 pm#

    Hello. First, thank you for your support to developers.

    I am having a lot of trouble while I’m trying to estimate if my univariate data is forecastable or not.
    What am I doing?:
    1- Using StandartScale to scale my data
    2- Using the “difference” method to make my data stationary.
    3- Testing my data’s stationarity with null hypothesis.

    My questions:
    -When i use MinMax scaler my prediction being absoulute flat (tried relu,sigmoid,even None) Why do you think?.
    -My validation loss increasing, then stabilizing.. why?

    I can publish my code if you want,
    Thanks in advance!

  339. Lochan LucaMarch 18, 2022 at 1:47 pm#

    Firstly, thanks for this blog. I am developing LSTM forecasting model for stock price. For company X LSTM model with 2 layers, epoch 5, batch size 1 works well with 10 future steps (Recursive Multi-step Forecast). I get RMSE between predicted and actual values less than 5. But the same model with company Y with same rows of data does not work well. RMSE is larger than 20. I am not able to figure out why this happens.
    Apart from RMSE can you suggest method to check how accurate predictions are done by the model.

    • James CarmichaelMarch 20, 2022 at 7:25 am#

      Hi Lochan…Machine learning model performance is relative, not absolute.

      Start by evaluating a baseline method, for example:

      Classification: Predict the most common class value.
      Regression: Predict the average output value.
      Time Series: Predict the previous time step as the current time step.
      Evaluate the performance of the baseline method.

      A model has skill if the performance is better than the performance of the baseline model. This is what we mean when we talk about model skill being relative, not absolute, it is relative to the skill of the baseline method.

      Additionally, model skill is best interpreted by experts in the problem domain.

      For more on this topic, see the post:

      How To Know if Your Machine Learning Model Has Good Performance

  340. Lochan LucaMarch 18, 2022 at 1:58 pm#

    When I feed the test dataset to the model for predictions, the model predicts with almost 0 variation from test data for the first 70% of test data. I am predicting only a single outcome and for the next outcome I am using the original test value, not my predicted value. Still, for the last 30% of data, the variation (or deviation) between test data and predicted data starts increasing. Plotting it, I found that for the last 30% of test dataset, the deviation between expected and predicted data is even bigger than 25 digits. No matter how big or small dataset I am using, results are always bad for last 30% predictions. What should I do to get more accurate predictions.

  341. IleniaApril 12, 2022 at 7:25 pm#

    Hi!
    Thank you very much for this useful tutorial.
    I have a question on the first example (Vanilla LSTM). You showed how to make one prediction, but how can I proceed in making more?
    I mean, should I use the same model and then just pass as input the two last trained values plus the first prediction (if n_steps = 3, for instance)? Or should I retrain the model using the first prediction value as part of the new training set and go on like that?

    Thanks for the help!
    Ilenia

    • James CarmichaelApril 14, 2022 at 2:41 am#

      Hi Ilenia…Are you wanting to extend the forecast time period?

      • IleniaApril 14, 2022 at 9:58 pm#

        Hi James!
        Yes, basically, that’s what I would like to do. Let’s say I want to forecast up to 3 future values, instead of just one, what should I do?

        Thanks!

  342. javvvApril 20, 2022 at 7:07 pm#

    Hey , I’m new to LSTM. I have to start learning this for my fyp where I have to train model to predict future sensor values. Can you guide me how to start?, what are the pre-requisites and how I can do better? What language tool, software to use. I’m familiar with python and practicing on VS Code but not sure where to run all this?

  343. YeMay 9, 2022 at 11:19 am#

    Hi Jason,

    Thank you for the tutorials. They are very helpful.

    If I have multivariate time series, dependent time series, however, instead of predicting time series, I would like to get the target output from multiple input variables in the same time stamp,

    For example, the first column is input variable 1, the 2nd column is the input variable 2, and the 3rd column is the target variable.
    [[ 10 15 25]
    [ 20 25 45]
    [ 30 35 65]
    [ 40 45 85]
    [ 50 55 105]]

    I would like to have the input of 10, 15 to output 25, 20, 25 to 45, 30, 35 to 65 etc.

    Can I simply follow the examples you’d discussed in the “Multivariate LSTM Models” section, but set n_steps=1? Or there are other methods to deal with such situation?

    Thank you

  344. matMay 10, 2022 at 6:57 am#

    Hi James,
    Awesome tutorial.
    if I want to train the same model on several sequences, how would you do this ?
    Thanks in advance for the answer.

  345. matMay 10, 2022 at 6:57 pm#

    Thanks James for the link. I implemented the model and iterated it on several sequences.
    LSTM is clearly very heavy (very long to iterate 100 epoch on only 1 sequence).
    I have to find an other solution. But thanks for the support, I realy appreciated it and your blog is a huge source of information. Thanks for the work and the knwoledge you share, and congratulations.

  346. Brijesh SoniJune 1, 2022 at 12:56 am#

    Hi Jason, thanks for your tutorials.

    Is it possible to train LSTM for different lookback values in different epochs/iterations? Kindly suggest your views

  347. Brijesh SoniJune 3, 2022 at 9:12 am#

    Thanks James! I mean to say: Instead of fixed lookback, is it possible that lstm-network learns the lookback value on its own?

  348. skrJune 26, 2022 at 8:19 am#

    Hi Jason
    I am using LSTM for sequence to sequence modelling in computer networking scenario. I am considering multiple parallel series and multi-step forecasting. However, in my scenarios the number of input parallel series is not fixed. How can i handle this scenario? Kindly i need your guidance.
    Regards

  349. BudhaJuly 7, 2022 at 9:37 pm#

    Hi Jason,

    Thank you for this. I am new to LSTM, so this really helped me. I would like to ask a question. I have a small data of 24 time points with a clear trend of increase over time. Is it fine to use LSTM or should I go with classical time series methods such as ARIMA?

    Thanks once again,

    • James CarmichaelJuly 8, 2022 at 5:59 am#

      Hi Budha…My recommendation would be to apply ARIMA and an LSTM model and compare results. One is not necessarily the best option in all cases.

      • BudhaJuly 8, 2022 at 12:40 pm#

        Thank you so much for the reply. I will definitely try both models. Love reading your tutorials.

  350. ewindJuly 18, 2022 at 12:31 pm#

    In section, “Multiple Input Series”, very strange to see the result is not 100% precise? Because it should be very easy for the network to learn add operation? (The output is just the sum of current time step’s inputs)

  351. Hilton FernandesJuly 25, 2022 at 9:15 pm#

    Interestingly, I could only replicate your results with Multi-Step LSTM Models when I increased the number of iterations, the length of the sampling time series and the size of the input data. Was that because I haven’t any GPU hardware, that TensorFlow would use ? BTW, in my current setup, TensorFlow is complaining about how Keras uses it.

  352. Olaitan FolashadeAugust 2, 2022 at 9:31 pm#

    Hi Jason, thank you for the tutorial. I have a question about the Multiple Parallel Input and Multi-Step Output.

    The number of features is specified in the Dense output layer for MultiVariate-MultiStep-MultiParallel forecast, as in the last example above where the number of features in the input and output sequences are the same.

    How is this done when the number of features for the input and output are not the same? Foremaple, i am using 15 input variables and only want to forecast 4 in a multistep forecast.

    I will appreciate your response. Thank you

  353. AndreAugust 27, 2022 at 4:15 am#

    Hi Jason,

    Thanks for your tutorial. It is very useful for me.
    I have one question what if I have multiple series with different dimensions?

    Thanks for your answer

    • James CarmichaelAugust 27, 2022 at 6:11 am#

      Hi Andre…You are very welcome! With limited knowledge of your application, you may want to investigate ensemble learning:

      https://machinelearningmastery.com/ensemble-machine-learning-with-python-7-day-mini-course/

      • AndreAugust 27, 2022 at 1:08 pm#

        Hi James,

        Sorry I did’t explain it well. My doubt is about “Multiple Input Series”. I have data from multiple sites and I want to forecast the precipitation area of each site. These sites has same features but different time steps. I understood that LSTM can learn parallel input series. Can I apply it in this case too? If yes, how would you recommend I start?

        Thank you

  354. PranavAugust 30, 2022 at 9:34 pm#

    Hi Jason thank you so much for the tutorial I had one doubt

    I have 2 series of x,y coordinates

    s1 = [[x1,y1],[x2,y2],[x3,y3],[x4,y4],[x5,y5],[x6,y6],[x7,y7],[x8,y8]]

    s2 = [[a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5],[a6,b6],[a7,b7],[a8,b8]]

    I need to send both of them as inputs to lstm what would you suggest I should do? multiple input seies with more than one value in each instance..

  355. InamSeptember 4, 2022 at 11:14 pm#

    import numpy as np
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    X.shape

    # Creating a window of 10
    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    #converting list back into arrays
    X=np.array(X)
    y=np.array(y)

    #Splitting data into train, test and validation
    X_train, y_train = X[:25000], y[:25000]
    X_val, y_val = X[25000:27200], y[25000:27200]
    X_test, y_test = X[27200:], y[27200:]

    n_steps=10
    n_features=1

    # define model
    model = Sequential()
    model.add(LSTM(128, return_sequences= True ,activation=’linear’, input_shape=(n_steps, n_features)))
    model.add(LSTM(64 ,activation=’linear’))
    model.add(Dense(32, ‘linear’))
    model.add(Dense(16, ‘linear’))
    model.add(Dense(1))

    #Compiling the model
    #model.compile(loss=MeanAbsoluteError(), optimizer=’Adam’,metrics=[RootMeanSquaredError()])
    model.summary()

    So above is my input data and my LSTM model. Now I am confused about how to generate the new data? what I mean when I create the new vector q_cqi again like this

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    when I create the new vector q_cqi again like this, what would be the next step? how can i reshape it? do i need the target value y in this new data? how I can chose a data suppose from this input vector of length 35000 if I want to do predction on the last 1500 or first 1000 how could i do this?
    what I mean when I create the new vector q_cqi again like this

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    what would be the next step? how can I change the following section i.e. creating the window etc.?

    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    Do I need the target value y? how I can chose the new input? Could you please answer how I can generate the new data and how to implement my trained model on the new data?

  356. InamSeptember 5, 2022 at 8:12 am#

    Hi James! Thank you for your great posts. I am working on a project. It is a regression problem and I am using LSTM model to predict the next value. I trained my LSTM model and test and validate it on the same data. Now I want to generate new data as the previous one but I am confused about this new data whether I will have the target value in this new data or not? also how can I reshape it to used it for my trained LSTM model. the following are my LSTM model and input data. my input vector is around 35000.

    import numpy as np
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM

    # Our Input data X
    X = q_cqi
    X = X.reshape(1, -1)[0]
    X.shape

    # Creating a window of 10
    window_size = 10
    X_train = []
    y = []
    inc = 0
    for i in range(len(X) – window_size):
    if inc + window_size + 2 > len(X):
    break
    row = [[a] for a in X[inc:inc + window_size]]
    X_train.append(row)
    idx = inc + window_size + 1
    y.append(X[idx])
    inc += 1
    X = X_train

    #converting list back into arrays
    X=np.array(X)
    y=np.array(y)

    #Splitting data into train, test and validation
    X_train, y_train = X[:25000], y[:25000]
    X_val, y_val = X[25000:27200], y[25000:27200]
    X_test, y_test = X[27200:], y[27200:]

    n_steps=10
    n_features=1

    # define model
    model = Sequential()
    model.add(LSTM(128, return_sequences= True ,activation=’linear’, input_shape=(n_steps, n_features)))
    model.add(LSTM(64 ,activation=’linear’))
    model.add(Dense(32, ‘linear’))
    model.add(Dense(16, ‘linear’))
    model.add(Dense(1))

    #Compiling the model
    #model.compile(loss=MeanAbsoluteError(), optimizer=’Adam’,metrics=[RootMeanSquaredError()])
    model.summary()

    Thanks in advance

  357. InamSeptember 5, 2022 at 11:26 pm#

    Thank you James!

  358. Francis TucketOctober 5, 2022 at 2:23 am#

    Hello I’ve been able to create an LSTM model for my fourth year project which is about forex price movement forecasting but the problem comes to when I want to try and implement it in real time. I trained the model on 30 minute data so the Idea was to make the model into an API with like 10-20 closing prices of a particular forex pair eg GBP/USD and the have the model predict at least 2 hours into the future i.e. 4 30 minute periods and then the API would return that. Thankyou in advance for your help.

    • James CarmichaelOctober 5, 2022 at 7:28 am#

      Hi Francis…While we cannot recommend any particular model for your project, it would be helpful if you could elaborate on a specific question regarding our content so that we may better assist you.

  359. ArunOctober 13, 2022 at 10:05 pm#

    Which is best for time series prediction like stock price prediction?

  360. InamOctober 14, 2022 at 9:41 pm#

    Hello James! I hope you will be. thanks for your great posts.
    I am trying to plot perfromance evaluation of 2 methods (the LSTM and the Ideal)
    I want to compare these two. Also I want to make a plot between the [e_DRNN1,thr_DRNN1]
    bit-error-rate and achieved throughput. How could I do this? the following are my code with
    the respected output for each method.

    #Method LSTM
    [e_DRNN1,thr_DRNN1]=e_short_pkts(p.L_pkt,gamma_real,gamma_DRNN1,p)
    e_DRNN1,thr_DRNN1
    (array([[0.00000000e+00, 9.83990470e-01, 4.78419178e-07, …,
    0.00000000e+00, 1.62437153e-03, 4.77800111e-02]]),
    array([[2.20861316, 0.05908644, 3.07646398, …, 3.5582583 , 4.1410422 ,
    4.0731042 ]]))

    #Method Ideal
    [e_ideal,thr_ideal]=e_short_pkts(p.L_pkt,gamma_real,gamma_ideal,p)
    e_ideal,thr_ideal

    (array([[0. , 0.98399047, 0.97368655, …, 0.08990624, 0.15850721,
    0.12215858]]),
    array([[2.20861316, 0.05908644, 0.09711517, …, 3.89260928, 3.63755763,
    3.79468346]]))

    Thank you

  361. Anwar AliNovember 18, 2022 at 1:46 am#

    awsm tutorial

    • James CarmichaelNovember 18, 2022 at 6:02 am#

      Thank you Anwar for your feedback! We appreciate it!

  362. Avi OfekNovember 21, 2022 at 10:02 pm#

    Thank you very much for making it easy to understand James.
    As a beginner I tried to get one output from 5 random sets of numbers , letting the model learn by itself.
    How can I get single output from the 5 sets of input please?
    Thank you very much anyway
    Avi Ofek

  363. Sagar PadhiyarDecember 28, 2022 at 4:35 am#

    Hello Jason,

    Thank you for this blog. It is helpful as always.

    I have one doubt. How to prepare data for future prediction? let’s say I want to forecast energy consumption for the next 3 years in an hourly manner. For training data, we have a date and energy consumption hour wise. How do I prepare testing data where I only have a date?

    Thank you

  364. mayanJanuary 16, 2023 at 7:27 pm#

    Hi
    Thanks for the tutorial. For the univariate series, is there a reason to use ConvLSTM2D and not ConvLSTM1D ?

  365. mayanJanuary 16, 2023 at 11:39 pm#

    Hi,
    I did not really understand why it was necessary to use subsequences instead of the sequences in the CNN-LSTM model. Could you please detail that ?
    Thanks

  366. mayanJanuary 16, 2023 at 11:41 pm#

    Hi again

    In the ConvLSTM could we have used ConvLSTM1D instead of ConvLSTM2D ?

  367. GuantanJanuary 19, 2023 at 3:12 am#

    Hi all, I am trying to find the solution to a simillar problem and I wonder if you can help.

    I have panel data on 200 different stocks, each stock belongs to a different sector of which there are 12 different sectors hot encoded 1-12. For each stock there 8 different pieces of price information such as price, market capitalisation, volume, and so forth. I then have a a column of of future stock prices on which to train the mdoel.

    Would this mean I need to train 200 different models? How would you go about this problem if you were given this dataset?

    Sorry if this is a daft question. I am new to ML.

  368. ArnoldJanuary 21, 2023 at 1:21 pm#

    Hi Jason, massive fan of your work throughout the years.
    Keeping it short as I assume you have hundreds of messages a day!

    If one has a dataset on 400 patients’ health through time.
    X variables are: Patient ID, Age Group (Binary i.e OLD 1 and Young 2), Distance walked during the day, Amount of calories eaten that day.
    Y variable to be predicted is: Amount of non-fatal heart attacks.

    My idea was that one could run 400 different LSTM time series models on each individual to predict the amount of non-fatal heart attacks.

    My question is! These results would gain no information from the other predictions, is there a way you know of linking this information?

    For example, if one was to train a model on an OLD patient, is there any way that the model can learn that OLD patients have tended to have more non-fatal heart attacks in the other regressions so the model incorporates more non-fatal heart attacks to this old patients predictions?

    Maybe I am thinking about it wrong, please help!

  369. frrJune 11, 2023 at 3:04 pm#

    Hi, is there a “multi parallel & multi inputs(features)” LSTM model? Thanks!

  370. ImanJune 29, 2023 at 6:25 am#

    Hi, I searched so much and even used chatGPT … but I’m so confused. I have data set of company and I should find a model for customer churn using LSTM. I have customer (showing by IDs) behavior of these customer in 12 months , I mean I know the churn label for ID : 1445 in first month , second month and so on. This data set has features like monthly_visit or age of customers or the sim_type or contract_ type and so on. How can I define the LSTM input and output. I like to say that I want to predict the churn for customer 1445 for month 12 based on month 11, 10,9 and 8 and then for the customer 1445 I want to predict month 11 based on 10,9,8 and 7 and so on and then jump into the next customer and do the same for him. How can I use LSTM for this problem? sorry for long explanation.

    • James CarmichaelJune 29, 2023 at 8:50 am#

      Hi Iman…Please narrow your query to a single question so that we may better assist you.

  371. ImanJune 29, 2023 at 7:10 pm#

    Sorry … Is it possible to predict customer churn using LSTM when you have monthly behavior of customers? I mean what’s the X(input) and y(output) for LSTM ?

  372. ImanJune 29, 2023 at 7:14 pm#

    Is it possible to use LSTM for customer churn prediction when you have monthly behavior of customer and the churn label of each month ? I mean what should be the X(input) and y (output) for LSTM ?

  373. ManiJune 29, 2023 at 7:23 pm#

    Is it possible to use LSTM for customer churn prediction ? what’s the X and y for LSTM model. note that I have the monthly behavior of each customers in 12 months.

  374. DavidJune 30, 2023 at 8:20 pm#

    Hi ,I have a dataset that represent the monthly behavior of customers with 1million rows and 8 columns , I mean every 12 rows of dataset are for one customer and I want to predict churn model for these customers using LSTM. how should I make input and output for my LSTM model when I have dataset of monthly behavior of customers?

  375. Justin GohOctober 8, 2023 at 12:32 pm#

    Hi Jason,

    Appreciate your guide for LSTM time series model. It is really helpful.
    I have followed your step to make my own time series LSTM model but encountered a question.

    At stage 1, I had multivariate single step forecasting.(simple LSTM model with 3 dense layers)
    At stage 2, I converted it to multivariate multi-step forecasting by using Encoder-Decoder model.
    But in doing so, my dense layer complexity dropped which I didn’t wanted.
    Can you give any suggestion how to maintain complexity of dense layer while using Encoder-Decoder model?

    Please see below in code and model summary

    At stage 1(Simple LSTM model)

    model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(window_size, n_character)),
    tf.keras.layers.LSTM(100, return_sequences=True),
    tf.keras.layers.LSTM(100),
    tf.keras.layers.Dense(100, activation=”relu”),
    tf.keras.layers.Dense(100, activation=”relu”),
    tf.keras.layers.Dense(n_outPut_charactor)
    ])

    model.summary()
    Model: “sequential”
    ________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    lstm (LSTM) (None, 20, 100) 59600

    lstm_1 (LSTM) (None, 100) 80400

    dense (Dense) (None, 100) 10100

    dense_1 (Dense) (None, 100) 10100

    dense_2 (Dense) (None, 44) 4444

    =================================================================
    Total params: 164644 (643.14 KB)
    Trainable params: 164644 (643.14 KB)

    At stage 2 (Encoder- Decoder model)

    model = tf.keras.models.Sequential([

    tf.keras.layers.Input(shape=(window_size, n_character)),

    tf.keras.layers.LSTM(100),
    tf.keras.layers.RepeatVector(n_step_out),
    tf.keras.layers.LSTM(100,return_sequences=True),
    tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_outPut_charactor,activation=’relu’)),
    ])
    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #
    =================================================================
    lstm (LSTM) (None, 100) 59600

    repeat_vector (RepeatVecto (None, 3, 100) 0
    r)

    lstm_1 (LSTM) (None, 3, 100) 80400

    time_distributed (TimeDist (None, 3, 44) 4444
    ributed)

    =================================================================
    Total params: 144444 (564.23 KB)
    Trainable params: 144444 (564.23 KB)

  376. AhmadDecember 26, 2023 at 4:11 pm#

    Hi James,

    I aim to develop an ML predictive model (forecasting) to predict the next failure time

    I have the following data type:

    -Failure date (dd/mm/yy)
    -Failure time (11:00 am)
    -Recovery data (dd/mm/yy)
    -Recovery time (11:30 am)
    -Operational delay (30 min)
    -Age of equipment
    -Number of Failures last time

    Q:

    1-Can you suggest models to be used for prediction
    2- Is there an example of this type of prediction
    3- How to per-processing the (date & time) Data

    Regards,

  377. JuliaDecember 29, 2023 at 11:36 pm#

    Thank you, it seems that you explained the LSTM model implementation quite well but I cannot run your code. Why there is no intended block in the for loops and if loops?

    • James CarmichaelDecember 30, 2023 at 9:30 am#

      Hi Julia…did you type the code or copy and paste it? There could be formatting issues resulting from the way in which the code was entered into your Python environment.

      • JuliaDecember 31, 2023 at 7:04 am#

        Hi James, thank you for your answer. I have found the way to properly copy the code by click toggle plain code.

  378. BasselJanuary 16, 2024 at 1:00 pm#

    Thank you Jason for the great resource. I have a question : I am trying to train an LSTM autoencoder model on a multivariate time series to detect anomalies using reconstruction error. I want to train the model on normal operating mode, and i have 2 years time of data. A fault occurs 4 months into the timeseries, so i have normal operating mode data before the fault and another normal data after the fault. How can use those two sub time series before and after the fault to train the model ? As far as i know, the timeseries should have a consistent time interval and without cuts in time. What do you suggest ? I was considering adding time features to the existing features fed to the model, explicitly feeding the Model with time information, other option would be maybe to update the model after training it on the first subseries before the fault and then updating it with the second time series after the fault, am not sure this is possible.
    Thank you for your time again.

  379. martinFebruary 1, 2024 at 3:18 pm#

    Hello Dr. Brownlee

    thank you for putting this together! I really helped me understand the operations behind LSTM.

    i have couple questions if you can
    1. in vanilla/stackedetc LSTM you use “model.add(LSTM(50,” .. why 50? the keras LSTM doc specifies this field as “units: Positive integer, dimensionality of the output space.”, which makes me think we should use n_steps or n_features, but as i tried to run it with either of those two options the result was absolutely nowhere near what it should be
    2. in Multiple Input Series > Multiple Input Series shouldnt the “Output” be 85 and not 65 since 85 is the output at the next timestep in the dataseries? similarly as 10,20,30 and output was 40?

    • James CarmichaelFebruary 2, 2024 at 10:36 am#

      Hi Martin…

      Determining the input and output parameters of Long Short-Term Memory (LSTM) models is crucial for designing neural networks that can effectively process sequence data (e.g., time series, natural language text). LSTM models are a type of recurrent neural network (RNN) capable of learning long-term dependencies in data, making them suitable for tasks like language modeling, time series forecasting, and more.

      ### Input Parameters

      1. **Input Shape:**
      – The input shape to an LSTM layer is typically(batch_size, time_steps, features):
      – **batch_size**: How many sequences you’re passing through the network at once. It can be left unspecified (None) during model definition for flexibility.
      – **time_steps**: The length of the sequence, i.e., how many time steps or elements are in each sequence.
      – **features**: The number of features in each time step. For instance, in text processing, it could be the size of the word embedding vector; in time series, the number of variables at each time step.

      2. **Timesteps and Feature Selection:**
      – Based on the problem, decide how many past observations (time steps) your model should consider for predicting the future value or next sequence element. This will define your window size or the sequence length.
      – The features depend on the data available and the nature of the problem. For instance, in a stock price forecasting problem, features could include past prices, volume, and other technical indicators.

      ### Output Parameters

      1. **Output Shape:**
      – The output of an LSTM can be tailored based on the task:
      – **Many-to-One**: For tasks like sentiment analysis, where the entire sequence maps to a single label. The output shape would be(batch_size, units), where units refer to the number of LSTM units (neurons).
      – **Many-to-Many**: For tasks like machine translation or sequence generation, where each input time step corresponds to an output time step. This can be achieved by settingreturn_sequences=True in LSTM layers, resulting in an output shape of(batch_size, time_steps, units).
      – **Custom**: Using techniques like sequence-to-sequence models, where an encoder LSTM’s output is used as an input to a decoder LSTM, allowing for flexible input-output configurations.

      2. **Number of Units:**
      – This parameter defines the dimensionality of the output space of the LSTM layer, i.e., how many hidden states (neurons) each unit/time step should have. It is a crucial parameter to tune based on the complexity of the task and the amount of data available.

      ### Design Considerations

      – **Sequence Padding:** If your input sequences have variable lengths, you’ll need to pad them to ensure they have the same length for batch processing.
      – **Batch Size:** The choice of batch size can affect training dynamics and performance. Smaller batches might lead to faster convergence but can be noisier. Larger batches provide more stable but potentially slower convergence.
      – **Statefulness:** Decide whether your LSTM model should remember its state (hidden states) across batches. Stateful LSTMs can be beneficial for time series data where the sequence continuity across batches is important.

      ### Practical Steps

      1. **Preprocessing**:
      – Normalize/standardize your input data.
      – Convert text data into numerical form (e.g., embeddings for NLP tasks).
      – Ensure sequences have a fixed length (padding/truncating where necessary).

      2. **Model Definition**:
      – Choose the appropriate architecture (e.g., stacked LSTMs, bidirectional LSTMs) based on your problem.
      – Experiment with different numbers of units, batch sizes, and sequence lengths.

      3. **Training**:
      – Use a validation set to monitor performance and avoid overfitting.
      – Adjust learning rate, optimization algorithm, and other hyperparameters as needed.

      Determining the optimal input and output parameters for LSTM models often requires experimentation and is guided by the specific requirements and constraints of your application.

  380. Mesabo MesmanFebruary 2, 2024 at 5:02 pm#

    With your tutorials, It took me only a week to complete LSTM necessary knowledge for working on a real-world problem. Thank you si much!

    • James CarmichaelFebruary 3, 2024 at 9:45 am#

      Hi Mesabo…You are very welcome! Thank you for sharing your success!

  381. ArsalanFebruary 20, 2024 at 1:09 am#

    Hello
    I have 21 images(tiff file) that each of them has 60 bands. and each of them is for one year(2000-2020). one of this bands is land cover of pixel.
    I want forecast land cover change for next year of data
    which model do you suggest? ConvLSTM?

    • James CarmichaelFebruary 20, 2024 at 7:02 am#

      Hi Arsalan…That would be a great model type to start with! Let us know how it goes!

  382. CharitiniFebruary 25, 2024 at 11:12 pm#

    Hello,

    First, I would like to say, that this is an amazing tutorial!

    My question is, at the Multiple Parallel Series example where we have three input series and three output series (3 features) in a single LSTM net, how is the loss computed? Is it the average of the losses in each of the three parallel series?

    Best!!

  383. KaiApril 18, 2024 at 1:52 am#

    ModuleNotFoundError Traceback (most recent call last)
    Cell In[32], line 1
    —-> 1 from keras.layers.convolutional import Conv1D
    2 from keras.layers.convolutional import MaxPooling1D

    ModuleNotFoundError: No module named ‘keras.layers.convolutional’

    What version is the keras on the above??

  384. KaiApril 18, 2024 at 2:31 am#

    from keras.layers import Conv1D
    from keras.layers import MaxPooling1D

    this seems to resolve the above

    • James CarmichaelApril 18, 2024 at 8:47 am#

      Thank you for your feedback Kai!

  385. Saurabh DMay 27, 2024 at 2:55 pm#

    I have total 1500 datapoints which is nearly 25minutes data (per second data), out of which I want to use 850-900 datapoints for training, and rest data points I want to forecast, ideally use 15min data to forecast next 10 minutes, I want to try both univariate and multivariate time series forecasting, first off all I wanted to know which method will work better as I have less data?

    • James CarmichaelMay 28, 2024 at 1:39 am#

      Hi Saurabh…When dealing with time series forecasting, especially with a limited amount of data, the choice between univariate and multivariate methods and the specific approach you take can significantly impact your results. Here are some considerations and recommendations for your scenario:

      ### Univariate vs. Multivariate Forecasting

      **Univariate Forecasting:**
      – **Definition**: Forecasting using only the historical values of the single variable you want to predict.
      – **Advantages**: Simpler models, less data required, easier to interpret.
      – **Disadvantages**: May miss important information from other related variables.

      **Multivariate Forecasting:**
      – **Definition**: Forecasting using multiple variables that may have predictive power for the target variable.
      – **Advantages**: Can capture relationships between variables, potentially leading to more accurate forecasts.
      – **Disadvantages**: More complex models, requires more data to capture these relationships effectively.

      ### Recommendations for Limited Data

      Given that you have only 1500 data points (25 minutes of data at one-second intervals), you are dealing with a relatively small dataset. Here are some recommendations:

      1. **Start with Univariate Models:**
      – Begin with univariate models to establish a baseline performance. Use simple models like ARIMA or even basic LSTM.
      – Advantages: You can quickly see how well the historical data alone predicts future values.

      2. **Experiment with Multivariate Models:**
      – If you have additional variables that are likely to influence the target variable, incorporate them into multivariate models.
      – Use techniques to avoid overfitting, such as regularization and dropout layers in neural networks.

      ### Implementation Steps

      **1. Data Preparation:**
      – Split your data into training (first 850-900 points) and testing (remaining points).
      – Scale your data using MinMaxScaler or StandardScaler.

      **2. Model Building:**

      **Univariate LSTM Example:**

      python
      import numpy as np
      import pandas as pd
      from sklearn.preprocessing import MinMaxScaler
      from keras.models import Sequential
      from keras.layers import LSTM, Dense

      # Assuming 'data' is your time series data as a numpy array
      scaler = MinMaxScaler()
      data = scaler.fit_transform(data.reshape(-1, 1))

      def create_sequences(data, time_steps=1):
      X, y = [], []
      for i in range(len(data) - time_steps):
      X.append(data[i:(i + time_steps), 0])
      y.append(data[i + time_steps, 0])
      return np.array(X), np.array(y)

      time_steps = 60 # Example: using past 60 seconds to predict the next value
      X, y = create_sequences(data, time_steps)
      X_train, y_train = X[:850], y[:850]
      X_test, y_test = X[850:], y[850:]

      X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
      X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

      model = Sequential()
      model.add(LSTM(50, return_sequences=True, input_shape=(time_steps, 1)))
      model.add(LSTM(50))
      model.add(Dense(1))
      model.compile(optimizer='adam', loss='mean_squared_error')

      model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

      **Multivariate LSTM Example:**

      python
      import numpy as np
      import pandas as pd
      from sklearn.preprocessing import MinMaxScaler
      from keras.models import Sequential
      from keras.layers import LSTM, Dense

      # Assuming 'data' is a DataFrame with multiple columns for multivariate time series
      scaler = MinMaxScaler()
      data = scaler.fit_transform(data)

      def create_sequences(data, time_steps=1):
      X, y = [], []
      for i in range(len(data) - time_steps):
      X.append(data[i:(i + time_steps), :-1])
      y.append(data[i + time_steps, -1])
      return np.array(X), np.array(y)

      time_steps = 60 # Example: using past 60 seconds to predict the next value
      X, y = create_sequences(data, time_steps)
      X_train, y_train = X[:850], y[:850]
      X_test, y_test = X[850:], y[850:]

      model = Sequential()
      model.add(LSTM(50, return_sequences=True, input_shape=(time_steps, X_train.shape[2])))
      model.add(LSTM(50))
      model.add(Dense(1))
      model.compile(optimizer='adam', loss='mean_squared_error')

      model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

      ### Evaluation

      – **Cross-Validation**: Use cross-validation to ensure your model generalizes well.
      – **Error Metrics**: Evaluate using metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

      ### Conclusion

      – **Start with simpler univariate models to establish a baseline.**
      – **Experiment with multivariate models if you have additional relevant variables.**
      – **Use appropriate data scaling and model evaluation techniques to ensure your model’s reliability.**

      By following these steps and recommendations, you should be able to determine which approach works better for your specific time series forecasting problem with limited data.

  386. VirginiaJune 1, 2024 at 12:54 am#

    Hello, I have a question

    I have my own data, and I have been trying to implement the multi-step lstm model, the vector output version. I have been training with a series of numbers which is real life data, and I use n_steps_in = 28 and n_steps_out = 4 and a Vanilla with 100 cells. I obtain the model, and then I predict the same signal used for training (just for comparison purposes).

    My problem is that when I do this, the output vector I get is a almost a copy of the data I entered as input but shifted.

    I have realized of this since I inputted to the predict a vector of length 28, get and output of length 4, then I shift to the right by 1 and I input 28 numbers again and get 4 more values and so on –> I plotted the original signal from the python position (n_steps_in+n_steps_out-1) against the 4th value from each output vector… and the predicted signal is almost the same as the original signal but shifted by 4 samples! if I do the same with n_steps_out=7, the signal is shifted by 7!!.

    Do you have any idea? I haven’t been able of figure out what to do. Thank you in advance.

    • James CarmichaelJune 1, 2024 at 7:04 am#

      Hi Virginia…It sounds like your LSTM model is learning to replicate the input sequence rather than learning the underlying patterns necessary for multi-step prediction. This can happen for several reasons, including insufficient model complexity, inadequate training data, or inappropriate loss function and evaluation metric for your task.

      Here are some potential steps to diagnose and fix this issue:

      ### 1. **Check Your Data Preparation:**

      Ensure that your data preparation for training and prediction is correct. Specifically, verify that your input sequences and output sequences are correctly aligned and that your model is trained on diverse enough data to learn the underlying patterns.

      ### 2. **Review Your Model Architecture:**

      You mentioned using a Vanilla LSTM with 100 cells. Consider experimenting with more complex architectures, such as:

      – Adding more layers.
      – Using a higher number of LSTM units.
      – Implementing a sequence-to-sequence (seq2seq) model architecture.

      ### 3. **Tune Hyperparameters:**

      Hyperparameter tuning can significantly affect model performance. Try different values for:

      – Learning rate
      – Number of epochs
      – Batch size
      – Dropout rates

      ### 4. **Change the Loss Function:**

      Ensure that the loss function used is appropriate for your task. For time series forecasting, mean squared error (MSE) or mean absolute error (MAE) are commonly used.

      ### 5. **Increase Training Data:**

      LSTM models typically require a significant amount of data to capture temporal dependencies. Ensure that you have enough training data and that it’s properly normalized.

      ### 6. **Evaluate with Different Metrics:**

      While you’re evaluating by plotting, also consider using metrics like RMSE, MAE, or R^2 to quantitatively assess the prediction performance.

      ### Example Code for Multi-Step LSTM:

      Here’s an example of a multi-step LSTM model. Ensure that your data preparation and model training are correctly set up.

      python
      import numpy as np
      import pandas as pd
      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import LSTM, Dense
      from sklearn.preprocessing import MinMaxScaler

      # Function to create sequences
      def create_sequences(data, n_steps_in, n_steps_out):
      X, y = [], []
      for i in range(len(data)):
      end_ix = i + n_steps_in
      out_end_ix = end_ix + n_steps_out
      if out_end_ix > len(data):
      break
      seq_x, seq_y = data[i:end_ix], data[end_ix:out_end_ix]
      X.append(seq_x)
      y.append(seq_y)
      return np.array(X), np.array(y)

      # Example dataset
      data = np.sin(np.linspace(0, 100, 1000)) # Replace with your dataset

      # Scale the data
      scaler = MinMaxScaler()
      data = scaler.fit_transform(data.reshape(-1, 1)).reshape(-1)

      # Parameters
      n_steps_in, n_steps_out = 28, 4

      # Create sequences
      X, y = create_sequences(data, n_steps_in, n_steps_out)

      # Reshape for LSTM [samples, timesteps, features]
      X = X.reshape((X.shape[0], X.shape[1], 1))

      # Model definition
      model = Sequential()
      model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, 1)))
      model.add(Dense(n_steps_out))
      model.compile(optimizer='adam', loss='mse')

      # Train the model
      model.fit(X, y, epochs=200, verbose=0)

      # Make predictions
      predictions = model.predict(X)

      # Plot original and predicted signals for comparison
      import matplotlib.pyplot as plt

      plt.plot(data[n_steps_in+n_steps_out-1:], label='Original Signal')
      pred_signal = []
      for i in range(len(predictions)):
      pred_signal.extend(predictions[i])
      plt.plot(pred_signal, label='Predicted Signal')
      plt.legend()
      plt.show()

      ### Tips:

      1. **Data Preparation:**
      – Make sure your sequences are created correctly.
      – Verify the shape and content of your input (X) and output (y) data.

      2. **Model Training:**
      – Train the model for enough epochs but avoid overfitting.
      – Monitor the loss during training and adjust hyperparameters if necessary.

      3. **Prediction and Evaluation:**
      – Ensure the way you predict new data aligns with the way you trained the model.
      – Compare the predicted signal with the original signal correctly, accounting for any shifts.

      By following these steps and tips, you should be able to address the issue of the shifted predictions and improve the performance of your multi-step LSTM model.

  387. Nastaran HAugust 22, 2024 at 5:43 pm#

    hello
    To predict a phenomenon, I considered 7 variables as input and downloaded the ERA5 data for 41 years in NC format.
    I want to use this data as input to LSTM model.
    I need to convert this data into an Excel file.
    With Python, I converted the ERA5 data into an Excel file, but the amount of data is too large, and I don’t know which data to consider to prepare the Excel file. Please advise. thanks.

    • James CarmichaelAugust 23, 2024 at 8:11 am#

      Hi Nastaran…When dealing with large datasets like the ERA5 data over 41 years, it’s important to strategically reduce and organize the data before feeding it into an LSTM model. Here’s how you can approach this:

      ### Steps to Prepare the Data:

      1. **Identify Relevant Variables:**
      – Focus on the 7 variables that are most relevant to the phenomenon you’re trying to predict. These variables should have a strong correlation with your target variable or are theoretically important.

      2. **Spatial and Temporal Aggregation:**
      – **Spatial Aggregation:** If your data is spatially resolved (i.e., it has latitude and longitude dimensions), you might want to average the data over a region of interest or select specific grid points that are most relevant to your study.
      – **Temporal Aggregation:** Depending on the phenomenon, you may not need data at every time step. For example, you could aggregate the data to daily, weekly, or monthly means if the phenomenon occurs over a longer period.

      3. **Dimensionality Reduction:**
      – Use techniques like Principal Component Analysis (PCA) to reduce the number of features while retaining most of the variance in the data. This can help reduce the size of the dataset and focus on the most important patterns.

      4. **Sampling:**
      – If you have an overwhelming amount of data, consider sampling it. You can either use random sampling or select specific time periods that are most relevant to your analysis (e.g., specific seasons, years with notable events).

      5. **Extract Relevant Time Frames:**
      – Depending on the phenomenon, you may not need all 41 years of data. Focus on periods that are most indicative of the phenomenon or where it shows significant variations.

      6. **Reshape Data for LSTM Input:**
      – LSTM models require sequential input. Ensure your data is organized in time steps, where each row corresponds to a specific time step, and the columns represent the variables. You may need to create sequences of input data to feed into the LSTM.

      ### Example Code to Aggregate and Reduce Data:

      python
      import xarray as xr
      import pandas as pd
      from sklearn.decomposition import PCA

      # Load the ERA5 data from the NetCDF file
      nc_file = 'path_to_your_nc_file.nc'
      ds = xr.open_dataset(nc_file)

      # Select relevant variables
      variables = ['var1', 'var2', 'var3', 'var4', 'var5', 'var6', 'var7'] # replace with your variables
      data = ds[variables].to_dataframe().reset_index()

      # Temporal Aggregation (e.g., monthly)
      data_agg = data.resample('M', on='time').mean()

      # PCA for Dimensionality Reduction
      pca = PCA(n_components=7) # or another number if you want further reduction
      reduced_data = pca.fit_transform(data_agg[variables])

      # Convert to DataFrame and add time index
      reduced_df = pd.DataFrame(reduced_data, index=data_agg.index, columns=[f'PC{i+1}' for i in range(reduced_data.shape[1])])

      # Save to Excel
      reduced_df.to_excel('reduced_ERA5_data.xlsx')

      # Sample or focus on relevant periods if needed
      # sampled_df = reduced_df.loc['2000-01-01':'2010-12-31'] # Example of focusing on 10 years

      ### Key Considerations:
      – **Model Requirements:** Ensure the final Excel file matches the input format required by your LSTM model.
      – **Data Size vs. Quality:** Balancing the amount of data with its quality is crucial. Too much irrelevant data can slow down the training process and lead to overfitting.
      – **Relevance to Prediction:** Focus on data that directly contributes to predicting the phenomenon you’re studying.

      By carefully selecting, aggregating, and possibly reducing the data, you can create a manageable and relevant dataset for your LSTM model. If you need more specific advice on which data to keep or further help with this process, feel free to ask!

  388. muhammad umar belloAugust 29, 2024 at 10:06 pm#

    thanks alot for this wonderful lesson
    I need some lights and guidance ,I am working to create a model that can forecast electricity prices upto four days ahead using LSTM considering historical prices and load/demand data as the input data.
    please help me sir

    • James CarmichaelAugust 30, 2024 at 8:28 am#

      Creating an LSTM model to forecast electricity prices is a great approach, given the time series nature of the data. Here’s a step-by-step guide and some tips to help you with this project:

      ### 1. **Data Preparation**
      – **Historical Data**: Gather historical electricity price data and corresponding load/demand data. Ensure that the data is time-stamped and spans a sufficient period to capture patterns and seasonality.
      – **Feature Engineering**:
      – **Lag Features**: Create lagged versions of your target variable (electricity price) and input features (demand/load) to capture the temporal dependencies.
      – **Time Features**: Add features like hour of the day, day of the week, and month to capture any cyclical patterns in electricity prices.
      – **Normalization**: Normalize or standardize the data to ensure that all features have similar scales, which is important for LSTM models.

      ### 2. **Data Splitting**
      – **Training and Testing**: Split your data into training, validation, and testing sets. A common approach is an 80-10-10 split, with the last 10% used for testing.
      – **Sequence Generation**: Since LSTM models work with sequences, you need to convert your data into sequences. For example, if you want to predict the price for the next 4 days, use the data from the previousn time steps (e.g., 24, 48, 72 hours) as input to predict the next value.

      ### 3. **Model Building**
      – **LSTM Architecture**:
      – Start with a simple LSTM architecture with one or two LSTM layers, followed by dense layers.
      – Use areturn_sequences=True in the first LSTM layer if you stack multiple LSTM layers.
      – Use dropout layers to prevent overfitting.
      – The output layer should have a single neuron if you’re predicting one value (e.g., the price at the next time step) or multiple neurons if you’re predicting multiple future time steps.
      – **Loss Function**: For regression tasks like this, Mean Squared Error (MSE) or Mean Absolute Error (MAE) is commonly used.
      – **Optimizer**: Start withAdam optimizer, which generally performs well with LSTM networks.

      ### 4. **Training**
      – **Epochs and Batch Size**: Begin with a moderate number of epochs (e.g., 50-100) and a batch size that suits your computational resources. Monitor the training and validation loss to avoid overfitting.
      – **Early Stopping**: Implement early stopping to prevent the model from overfitting. If the validation loss doesn’t improve after a certain number of epochs, stop the training.
      – **Model Validation**: Validate the model on unseen data (your validation set) to check for generalization.

      ### 5. **Evaluation**
      – **Metrics**: Use metrics like RMSE (Root Mean Squared Error) and MAE to evaluate the model’s performance. These will give you an idea of the average prediction error.
      – **Prediction Horizon**: Evaluate the model’s performance over different forecast horizons (e.g., 1 day ahead, 2 days ahead, etc.) to ensure it performs well across the entire forecast window.

      ### 6. **Hyperparameter Tuning**
      – Experiment with different LSTM architectures, including the number of LSTM layers, neurons, learning rate, and batch size.
      – Consider using grid search or random search to find the best combination of hyperparameters.

      ### 7. **Model Deployment**
      – Once satisfied with the model’s performance, save it and prepare it for deployment.
      – Consider setting up a pipeline that continuously retrains the model as new data becomes available to keep the predictions accurate over time.

      ### 8. **Considerations for Improvement**
      – **Exogenous Variables**: Apart from historical prices and load/demand, consider including external factors like weather data, holidays, or events that might influence electricity prices.
      – **Ensemble Models**: If the LSTM model alone doesn’t perform satisfactorily, consider combining it with other models (e.g., ARIMA, Prophet) to create an ensemble model.
      – **Error Analysis**: Regularly analyze the prediction errors to understand the model’s weaknesses and refine the features or model architecture accordingly.

      ### Tools and Libraries
      – **TensorFlow/Keras**: For building and training the LSTM model.
      – **Pandas and NumPy**: For data manipulation and preprocessing.
      – **Scikit-learn**: For data splitting, scaling, and evaluating the model using different metrics.
      – **Matplotlib/Seaborn**: For visualizing the data and model predictions.

      By following this roadmap, you should be able to develop a robust LSTM model for forecasting electricity prices. If you encounter specific issues or need further clarification on any of these steps, feel free to ask!

  389. AndrewMay 14, 2025 at 1:00 pm#

    Just wanted to drop a quick note of appreciation for all the work you did here.

    • James CarmichaelMay 15, 2025 at 7:11 am#

      Thank you for your feedback and support!

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:


LinkedIn   Twitter   Facebook   Email Newsletter   RSS Feed

Loving the Tutorials?

TheDeep Learning for Time Series EBook is where you'll find theReally Good stuff.

>> See What's Inside


[8]ページ先頭

©2009-2025 Movatter.jp