cliang1453/SAGEPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star30

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)

License

MIT license

30 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_utils		data_utils
docker		docker
experiments		experiments
input_examples		input_examples
int_test_data/glue		int_test_data/glue
module		module
mt_dnn		mt_dnn
sample_data		sample_data
scripts		scripts
tasks		tasks
tests		tests
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
calc_metrics.py		calc_metrics.py
download.sh		download.sh
extractor.py		extractor.py
int_test_encoder.py		int_test_encoder.py
int_test_prepro_std.py		int_test_prepro_std.py
predict.py		predict.py
prepare_distillation_data.py		prepare_distillation_data.py
prepro_std.py		prepro_std.py
pretrained_models.py		pretrained_models.py
requirements.txt		requirements.txt
run_toy.sh		run_toy.sh
setup.cfg		setup.cfg
train.py		train.py

Repository files navigation

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

This repo contains our codes for the paper"No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models" (ICLR 2022).

Getting Start

Pull and run docker
pytorch/pytorch:1.5.1-cuda10.1-cudnn7-devel
Install requirements
pip install -r requirements.txt

Data and Model

Download data and pre-trained models
./download.sh
Please refer tothis link for details on the GLUE benchmark.
Preprocess data
./experiments/glue/prepro.sh
For the most updated data processing details, please refer to themt-dnn repo.

Fine-tuning Pre-trained Models using SAGE

We provide an example script for fine-tuning a pre-trained BERT-base model on MNLI using Adamax-SAGE:

./scripts/train_mnli_usadamax.sh GPUID

A few notices:

learning_rate andbeta3 are two of the most important hyper-parameters.learning_rate that works well for Adamax/AdamW-SAGE is usually 2 to 5 times larger than that works well for Adamax/AdamW, depending on the tasks.beta3 that works well for Adamax/AdamW-SAGE is usually in the range of 0.6 and 0.9, depending on the tasks.
To use AdamW-SAGE, set argument--optim=usadamw. The current codebase only contains the implementation of Adamax-SAGE and AdamW-SAGE. Please refer tomodule/bert_optim.py for details. Please refer to our paper for integrating SAGE on other optimizers.
To fine-tune a pre-trained RoBERTa-base model, set arguments--init_checkpoint to the model path and set--encoder_type to 2. Other supported models are listed inpretrained_models.py.
To fine-tune on other tasks, set arguments--train_datasets and--test_datasets to the corresponding task names.

Citation

@inproceedings{liang2022no,title={No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models},author={Chen Liang and Haoming Jiang and Simiao Zuo and Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen and Tuo Zhao},booktitle={International Conference on Learning Representations},year={2022},url={https://openreview.net/forum?id=cuvga_CiVND}}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang (cliang73@gatech.edu).

About

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Getting Start

Data and Model

Fine-tuning Pre-trained Models using SAGE

Citation

Contact Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

cliang1453/SAGE

Folders and files

Latest commit

History

Repository files navigation

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Getting Start

Data and Model

Fine-tuning Pre-trained Models using SAGE

Citation

Contact Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages