Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The GitHub repository for the paper "Informer" accepted by AAAI 2021.

License

NotificationsYou must be signed in to change notification settings

zhouhaoyi/Informer2020

Repository files navigation

Python 3.6PyTorch 1.2cuDNN 7.3.1License CC BY-NC-SA

This is the origin Pytorch implementation of Informer in the following paper:Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Special thanks toJieqi Peng@cookieminions for building this repo.

🚩News(Mar 27, 2023): We will release Informer V2 soon.

🚩News(Feb 28, 2023): The Informer'sextension paper is online on AIJ.

🚩News(Mar 25, 2021): We update all experimentresults with hyperparameter settings.

🚩News(Feb 22, 2021): We provideColab Examples for friendly usage.

🚩News(Feb 8, 2021): Our Informer paper has been awardedAAAI'21 Best Paper [Official][Beihang][Rutgers]! We will continue this line of research and update on this repo. Please star this repo andcite our paper if you find our work is helpful for you.



Figure 1. The architecture of Informer.

ProbSparse Attention

The self-attention scores form a long-tail distribution, where the "active" queries lie in the "head" scores and "lazy" queries lie in the "tail" area. We designed the ProbSparse Attention to select the "active" queries rather than the "lazy" queries. The ProbSparse Attention with Top-u queries forms a sparse Transformer by the probability distribution.Why not use Top-u keys? The self-attention layer's output is the re-represent of input. It is formulated as a weighted combination of values w.r.t. the score of dot-product pairs. The top queries with full keys encourage a complete re-represent of leading components in the input, and it is equivalent to selecting the "head" scores among all the dot-product pairs. If we choose Top-u keys, the full keys just preserve the trivial sum of values within the "long tail" scores but wreck the leading components' re-represent.



Figure 2. The illustration of ProbSparse Attention.

Requirements

  • Python 3.6
  • matplotlib == 3.1.1
  • numpy == 1.19.4
  • pandas == 0.25.1
  • scikit_learn == 0.21.3
  • torch == 1.8.0

Dependencies can be installed using the following command:

pip install -r requirements.txt

Data

The ETT dataset used in the paper can be downloaded in the repoETDataset.The required data files should be put intodata/ETT/ folder. A demo slice of the ETT data is illustrated in the following figure. Note that the input of each dataset is zero-mean normalized in this implementation.



Figure 3. An example of the ETT data.

The ECL data and Weather data can be downloaded here.

Reproducibility

To easily reproduce the results you can follow the next steps:

  1. Initialize the docker image using:make init.
  2. Download the datasets using:make dataset.
  3. Run each script inscripts/ usingmake run_module module="bash ETTh1.sh" for each script.
  4. Alternatively, run all the scripts at once:
for file in `ls scripts`; do make run_module module="bash scripts/$script"; done

Usage

Colab Examples: We provide google colabs to help reproduce and customize our repo, which includesexperiments(train and test),prediction,visualization andcustom data.Open In Colab

Commands for training and testing the model withProbSparse self-attention on Dataset ETTh1, ETTh2 and ETTm1 respectively:

# ETTh1python -u main_informer.py --model informer --data ETTh1 --attn prob --freq h# ETTh2python -u main_informer.py --model informer --data ETTh2 --attn prob --freq h# ETTm1python -u main_informer.py --model informer --data ETTm1 --attn prob --freq t

More parameter information please refer tomain_informer.py.

We provide a more detailed and complete command description for training and testing the model:

python-umain_informer.py--model<model>--data<data>--root_path<root_path>--data_path<data_path>--features<features>--target<target>--freq<freq>--checkpoints<checkpoints>--seq_len<seq_len>--label_len<label_len>--pred_len<pred_len>--enc_in<enc_in>--dec_in<dec_in>--c_out<c_out>--d_model<d_model>--n_heads<n_heads>--e_layers<e_layers>--d_layers<d_layers>--s_layers<s_layers>--d_ff<d_ff>--factor<factor>--padding<padding>--distil--dropout<dropout>--attn<attn>--embed<embed>--activation<activation>--output_attention--do_predict--mix--cols<cols>--itr<itr>--num_workers<num_workers>--train_epochs<train_epochs>--batch_size<batch_size>--patience<patience>--des<des>--learning_rate<learning_rate>--loss<loss>--lradj<lradj>--use_amp--inverse--use_gpu<use_gpu>--gpu<gpu>--use_multi_gpu--devices<devices>

The detailed descriptions about the arguments are as following:

Parameter nameDescription of parameter
modelThe model of experiment. This can be set toinformer,informerstack,informerlight(TBD)
dataThe dataset name
root_pathThe root path of the data file (defaults to./data/ETT/)
data_pathThe data file name (defaults toETTh1.csv)
featuresThe forecasting task (defaults toM). This can be set toM,S,MS (M : multivariate predict multivariate, S : univariate predict univariate, MS : multivariate predict univariate)
targetTarget feature in S or MS task (defaults toOT)
freqFreq for time features encoding (defaults toh). This can be set tos,t,h,d,b,w,m (s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly).You can also use more detailed freq like 15min or 3h
checkpointsLocation of model checkpoints (defaults to./checkpoints/)
seq_lenInput sequence length of Informer encoder (defaults to 96)
label_lenStart token length of Informer decoder (defaults to 48)
pred_lenPrediction sequence length (defaults to 24)
enc_inEncoder input size (defaults to 7)
dec_inDecoder input size (defaults to 7)
c_outOutput size (defaults to 7)
d_modelDimension of model (defaults to 512)
n_headsNum of heads (defaults to 8)
e_layersNum of encoder layers (defaults to 2)
d_layersNum of decoder layers (defaults to 1)
s_layersNum of stack encoder layers (defaults to3,2,1)
d_ffDimension of fcn (defaults to 2048)
factorProbsparse attn factor (defaults to 5)
paddingPadding type(defaults to 0).
distilWhether to use distilling in encoder, using this argument means not using distilling (defaults toTrue)
dropoutThe probability of dropout (defaults to 0.05)
attnAttention used in encoder (defaults toprob). This can be set toprob (informer),full (transformer)
embedTime features encoding (defaults totimeF). This can be set totimeF,fixed,learned
activationActivation function (defaults togelu)
output_attentionWhether to output attention in encoder, using this argument means outputing attention (defaults toFalse)
do_predictWhether to predict unseen future data, using this argument means making predictions (defaults toFalse)
mixWhether to use mix attention in generative decoder, using this argument means not using mix attention (defaults toTrue)
colsCertain cols from the data files as the input features
num_workersThe num_works of Data loader (defaults to 0)
itrExperiments times (defaults to 2)
train_epochsTrain epochs (defaults to 6)
batch_sizeThe batch size of training input data (defaults to 32)
patienceEarly stopping patience (defaults to 3)
learning_rateOptimizer learning rate (defaults to 0.0001)
desExperiment description (defaults totest)
lossLoss function (defaults tomse)
lradjWays to adjust the learning rate (defaults totype1)
use_ampWhether to use automatic mixed precision training, using this argument means using amp (defaults toFalse)
inverseWhether to inverse output data, using this argument means inversing output data (defaults toFalse)
use_gpuWhether to use gpu (defaults toTrue)
gpuThe gpu no, used for training and inference (defaults to 0)
use_multi_gpuWhether to use multiple gpus, using this argument means using mulitple gpus (defaults toFalse)
devicesDevice ids of multile gpus (defaults to0,1,2,3)

Results

We have updated the experiment results of all methods due to the change in data scaling. We are lucky that Informer gets performance improvement. Thank you @lk1983823 for reminding the data scaling inissue 41.

Besides, the experiment parameters of each data set are formated in the.sh files in the directory./scripts/. You can refer to these parameters for experiments, and you can also adjust the parameters to obtain better mse and mae results or draw better prediction figures.



Figure 4. Univariate forecasting results.



Figure 5. Multivariate forecasting results.

FAQ

If you run into a problem likeRuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1, you can check torch version or modify code aboutConv1d ofTokenEmbedding inmodels/embed.py as the way of circular padding mode in Conv1d changed in different torch versions.

Citation

If you find this repository useful in your research, please consider citing the following papers:

@article{haoyietal-informerEx-2023,  author    = {Haoyi Zhou and               Jianxin Li and               Shanghang Zhang and               Shuai Zhang and               Mengyi Yan and               Hui Xiong},  title     = {Expanding the prediction capacity in long sequence time-series forecasting},  journal   = {Artificial Intelligence},  volume    = {318},  pages     = {103886},  issn      = {0004-3702},  year      = {2023},}
@inproceedings{haoyietal-informer-2021,  author    = {Haoyi Zhou and               Shanghang Zhang and               Jieqi Peng and               Shuai Zhang and               Jianxin Li and               Hui Xiong and               Wancai Zhang},  title     = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},  booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},  volume    = {35},  number    = {12},  pages     = {11106--11115},  publisher = {{AAAI} Press},  year      = {2021},}

Contact

If you have any questions, feel free to contact Haoyi Zhou through Email (zhouhaoyi1991@gmail.com) or Github issues. Pull requests are highly welcomed!

Acknowledgments

Thanks for the computing infrastructure provided by Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC).At the same time, thank you all for your attention to this work!HitsStargazers repo roster for @zhouhaoyi/Informer2020


[8]ページ先頭

©2009-2025 Movatter.jp