Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[ICML 2024] Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

License

NotificationsYou must be signed in to change notification settings

MAGICS-LAB/OutEffHop

Repository files navigation

This is the code of the paperOutEffHop. You can use this repo to reproduce the results in the paper.

Outline Efficiency of OutEffHop

Environmental Setup

You can set up the experimental environment by running the following command line:

Set locale variables and add the project root directory to your pythonpath:$export LC_ALL=C.UTF-8$export LANG=C.UTF-8$cd OutEffHop/$ pip install --upgrade --no-deps pip$export PYTHONPATH=${PYTHONPATH}:$(realpath"$PWD")

Create suitable environment for different experiment.

  1. For experiment in paper section 4.1, outlier efficiency of BERT and OPT:

    $ conda create -n outlier python==3.9# Run the pip module as a script.$ python -m pip install -r /your_path/OutEffHop/OutEffHop/requirements.txt
  2. For the experiment in paper section 4.1 about STanHop :

    $ conda create -n STHM python==3.8# Run the pip module as a script.$ python -m pip install -r /your_path/OutEffHop/STanHop_time_seeries/requirements.txt
  3. If you want run the experiment of STanHop quantize, please install below enviroment:

      $ conda create -n quantize_STHM python==3.8  $ python -m pip install -r /your_path/OutEffHop/OutEffHop/STanHop_outlier/quantize_requirements.txt

Pre-training commands

All the training scripts (batch size, etc.) are set up to fit on two single A100 80GB GPU on Slrum machine.

ModelSoftmaxScript
BERT-basevanilla, clipped softmax, gated attention, gated OutEffHop, clipped OutEffHop, OutEffHopOutEffHop_script/submit_outlier_bert.sh
OPT-125mvanilla, clipped softmax, gated attention, gated OutEffHop, clipped OutEffHop, OutEffHopOutEffHop_script/submit_outlier_opt.sh
STanHopvanilla, clipped softmax, gated attention, gated OutEffHop, clipped OutEffHop, OutEffHopOutEffHop_script/submit_STHM_outlier.sh

Validation commands

After the model is trained, you can run evaluation (both floating point, and quantized) usingthe following commands.Make sure to pass the same softmax method arguments that were used for pre-training (e.g.,--attn_softmax vanilla,--attn_softmax "clipped(-.025:1)",--attn_softmax softmax1,--attn_gate_type conditional_per_token --attn_gate_mlp,--attn_gate_type conditional_per_token --attn_gate_init 0.25 etc.)

FP16 validation for BERT models

Run command:

$ accelerate launch --config_file accelerate_configs/1gpu_no_mp.yaml validate_mlm_config.py \--seed 3000 \--dataset_setup bookcorpus_and_wiki \--preprocessing_num_workers 8 \--model_type bert \--max_seq_length 128 \--mlm_probability 0.15 \--per_device_eval_batch_size 32 \--attn_softmax"clippedsoftmax1(-.025:1)" \--data_cache_dir .hf_data \--model_cache_dir .hf_cache \--model_name_or_path  output/clipped_softmax1 \--output_dir  /output_metrics/clipped_softmax1-3000

INT8 validation for BERT models

Run command:

$ accelerate launch --config_file accelerate_configs/1gpu_no_mp.yaml validate_mlm_config.py \--quantize \--est_num_batches 16 \--seed 4000 \--dataset_setup bookcorpus_and_wiki \--preprocessing_num_workers 8 \--model_type bert \--max_seq_length 128 \--mlm_probability 0.15 \--per_device_eval_batch_size 32 \--attn_softmax"clippedsoftmax1(-.025:1)" \--data_cache_dir .hf_data \--model_cache_dir .hf_cache \--model_name_or_path  output/clipped_softmax1 \--output_dir  output_metrics/bert_quantize_clipped_softmax1-4000

FP16 validation for OPT models

Run command:

$ accelerate launch --config_file accelerate_configs/1gpu_no_mp.yaml validate_clm.py \--seed 5678 \--dataset_setup bookcorpus_and_wiki \--preprocessing_num_workers 32 \--model_type opt \--block_size 512 \--per_device_eval_batch_size 4 \--attn_gate_type conditional_per_token \--attn_gate_init 0.25 \--data_cache_dir .hf_data  \--model_cache_dir .hf_cache \--model_name_or_path output/gate_opt \--output_dir output_metrics/opt_gate_attention-5678

INT8 validation for OPT models

Run command:

$ accelerate launch --config_file accelerate_configs/1gpu_no_mp.yaml validate_clm.py \--quantize \--quant_setup fp32_head \--ranges_acts running_minmax \--qmethod_acts asymmetric_uniform \--percentile 99.999 \--est_num_batches 4 \--seed 6789 \--dataset_setup bookcorpus_and_wiki \--preprocessing_num_workers 32 \--model_type opt \--block_size 512 \--per_device_eval_batch_size 1 \--attn_gate_type conditional_per_token \--attn_gate_init 0.25 \--data_cache_dir .hf_data  \--model_cache_dir .hf_cache \--model_name_or_path output/gate_opt \--output_dir output_metrics/opt_quantize_gate_attention-6789

INT8 validation for OPT models

Run command:

$ accelerate launch --config_file accelerate_configs/1gpu_no_mp.yaml validate_clm.py \--quantize \--quant_setup fp32_head \--ranges_acts running_minmax \--qmethod_acts asymmetric_uniform \--percentile 99.999 \--est_num_batches 4 \--seed 6789 \--dataset_setup bookcorpus_and_wiki \--preprocessing_num_workers 32 \--model_type opt \--block_size 512 \--per_device_eval_batch_size 1 \--attn_gate_type conditional_per_token \--attn_gate_init 0.25 \--data_cache_dir .hf_data  \--model_cache_dir .hf_cache \--model_name_or_path output/gate_opt \--output_dir output_metrics/opt_quantize_gate_attention-6789

FP16 validation for STanHop models

Run command:

$ python main_stanhop.py  --data ETTh1 --in_len 168 --out_len 24 --seg_len 6 --learning_rate 1e-4 --itr 1 --mode softmax1 --use_gpu --gpu 0  --batch_size 128 --run_name STHM_softmax1 --e_layers 11 --save_np --with_tracking

INT8 validation for STanHop models

Run command:

$ python quantized_main_stanhop.py  \  --data ETTh1 \  --in_len 168 \  --out_len 24 \  --seg_len 6 \  --learning_rate 1e-4 \  --itr 1 \  --mode softmax \  --use_gpu \  --gpu 0  \  --batch_size 128 \  --run_name STHM_softmax \  --e_layers 11 \  --quantize \  --quantize_model_path OutEffHop/OutEffHop/STanHop_outlier/checkpoints1/stanhop_ETTh1_il168_ol24_sl6_win1_fa10_dm256_nh4_el11_itr0_softmax/checkpoint.pth \  --seed$((i*1000))> OutEffHop/OutEffHop/STanHop_outlier/results/stanhop_ETTh1_quantized/softmax_seq24_$i.txt

OutEffHop Case study

Environmental Setup

You can set up the experimental environment by running the following command line:

$cd STanHop_time_seeries$ pip3 install -r requirements.txt$export PYTHONPATH=$PYTHONPATH:$PWD

Reproducibility

  1. Put datasets to conduct experiments into folderdatasets/. We have already putETTh1 andETTm1 into it.WTH andECL can be downloaded fromhttps://github.com/zhouhaoyi/Informer2020.ILI andTraffic can be downloaded fromhttps://github.com/thuml/Autoformer. Note that theWTH we used in the paper is the one with 12 dimensions from Informer, not the one with 21 dimensions from Autoformer.

  2. To get results of Crossformer with$T=168, \tau = 48, L_{seg} = 6$ on ETTh1 dataset, run:

python main_stanhop.py  --data ETTh1 --in_len 168 --out_len 48 --seg_len 6 --learning_rate 1e-4 --itr 1 --mode softmax1 --use_gpu --gpu 0  --batch_size 128 --run_name STHM_softmax1  --e_layers 11

The model will be automatically trained and tested. The trained model will be saved in foldercheckpoints/ and evaluated metrics will be saved in folderresults/.

  1. To reproduce all results in the paper, run following scripts to get corresponding results:
batch OutEffHop_script/submit_STHM.sh

main_stanhop is the entry point of our model and there are other parameters that can be tuned. Here we describe them in detail:

Parameter nameDescription of parameter
dataThe dataset name
root_pathThe root path of the data file (defaults to./datasets/)
data_pathThe data file name (defaults toETTh1.csv)
data_splitTrain/Val/Test split, can be ratio (e.g.0.7,0.1,0.2) or number (e.g.16800,2880,2880), (defaults to0.7,0.1,0.2)
checkpointsLocation to store the trained model (defaults to./checkpoints/)
in_lenLength of input/history sequence, i.e.$T$ in the paper (defaults to 96)
out_lenLength of output/future sequence, i.e.$\tau$ in the paper (defaults to 24)
seg_lenLength of each segment in DSW embedding, i.e.$L_{seg}$ in the paper (defaults to 6)
win_sizeHow many adjacent segments to be merged into one in segment merging of HED (defaults to 2)
factorNumber of routers in Cross-Dimension Stage of TSA, i.e.$c$ in the paper (defaults to 10)
data_dimNumber of dimensions of the MTS data, i.e.$D$ in the paper (defaults to 7 for ETTh and ETTm)
d_modelDimension of hidden states, i.e.$d_{model}$ in the paper (defaults to 256)
d_ffDimension of MLP in MSA (defaults to 512)
n_headsNum of heads in MSA (defaults to 4)
e_layersNum of encoder layers, i.e.$N$ in the paper (defaults to 3)
dropoutThe probability of dropout (defaults to 0.2)
weight_decayThe weight decay
num_workersThe num_works of Data loader (defaults to 0)
batch_sizeThe batch size for training and testing (defaults to 32)
train_epochsTrain epochs (defaults to 20)
patienceEarly stopping patience (defaults to 3)
learning_rateThe initial learning rate for the optimizer (defaults to 1e-4)
lradjWays to adjust the learning rate (defaults totype1)
itrExperiments times (defaults to 1)
save_predWhether to save the predicted results. If True, the predicted results will be saved in folderresults in numpy array form. This will cost a lot time and memory for datasets with large$D$. (defaults toFalse).
use_gpuWhether to use gpu (defaults toTrue)
gpuThe gpu no, used for training and inference (defaults to 0)
use_multi_gpuWhether to use multiple gpus (defaults toFalse)
devicesDevice ids of multile gpus (defaults to0,1,2,3)
modeThe type of the Hopfield Network (Hopfield, SparseHopfield, STanHop, OutEffHop)
run_nameThe name of experiment
etaThe eta value of Entmax
gammaThe gamma value of Entmax

Experimental Validation of Theoretical Results

Environmental Setup

You can set up the experimental environment by running the following command line:

$ conda create -n theory_verify python=3.8$ conda activate theory_verify$cd theory_verification$ pip3 install -r requirements.txt

Plotting

$ python3 plotting.py

Acknowledgment

The experiments in this work benefit from the following open-source codes:

https://github.com/zhouhaoyi/Informer2020

https://github.com/thuml/Autoformer

https://github.com/MAGICS-LAB/STanHop

https://github.com/Qualcomm-AI-research/outlier-free-transformers

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{hu2024outlier,  title={Outlier-Efficient Hopfield Layers for Large Transformer-Based Models},  author={Hu, Jerry Yao-Chieh and Chang, Pei-Hsuan and Luo, Robin and Chen, Hong-Yu and Li, Weijian and Wang, Wei-Po and Liu, Han},  booktitle={Forty-first International Conference on Machine Learning (ICML)},  year={2024}}

[8]ページ先頭

©2009-2025 Movatter.jp