dot-agi/DynamicViTPublic

forked fromraoyongming/DynamicViT

NotificationsYou must be signed in to change notification settings
Fork0
Star0

[NeurIPS 2021] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

dynamicvit.ivg-research.xyz/

License

MIT license

0 stars 75 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
figs		figs
imgs		imgs
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calc_flops.py		calc_flops.py
colab_demo.ipynb		colab_demo.ipynb
datasets.py		datasets.py
engine.py		engine.py
infer.py		infer.py
losses.py		losses.py
main.py		main.py
optim_factory.py		optim_factory.py
run_with_submitit.py		run_with_submitit.py
samplers.py		samplers.py
utils.py		utils.py
viz_example.ipynb		viz_example.ipynb

Repository files navigation

Efficient Vision Transformers and CNNs with Dynamic Spatial Sparsification

This repository contains PyTorch implementation for DynamicViT (NeurIPS 2021).

DynamicViT is a dynamic token sparsification framework to prune redundant tokens in vision transformers progressively and dynamically based on the input. Our method can reduces over30% FLOPs and improves the throughput by over40% while the drop of accuracy is within0.5% for various vision transformers.

[Project Page][arXiv (NeurIPS 2021)]

🔥Updates

We extend our method to morenetwork architectures (i.e., ConvNeXt and Swin Transformers) and moretasks (i.e., object detection and semantic segmentation) with an improveddynamic spatial sparsification framework. Please refer to the extended version of our paper for details. The extended version has been accepted by T-PAMI.

[arXiv (T-PAMI, Journal Version)]

Image Examples

Video Examples

Model Zoo

We provide our DynamicViT models pretrained on ImageNet:

name	model	rho	acc@1	acc@5	FLOPs	url
DynamicViT-DeiT-256/0.7	`deit-256`	0.7	76.53	93.12	1.3G	Google Drive /Tsinghua Cloud
DynamicViT-DeiT-S/0.7	`deit-s`	0.7	79.32	94.68	2.9G	Google Drive /Tsinghua Cloud
DynamicViT-DeiT-B/0.7	`deit-b`	0.7	81.43	95.46	11.4G	Google Drive /Tsinghua Cloud
DynamicViT-LVViT-S/0.5	`lvvit-s`	0.5	81.97	95.76	3.7G	Google Drive /Tsinghua Cloud
DynamicViT-LVViT-S/0.7	`lvvit-s`	0.7	83.08	96.25	4.6G	Google Drive /Tsinghua Cloud
DynamicViT-LVViT-M/0.7	`lvvit-m`	0.7	83.82	96.58	8.5G	Google Drive /Tsinghua Cloud

🔥Updates: We provide our DynamicCNN and DynamicSwin models pretrained on ImageNet:

name	model	rho	acc@1	acc@5	FLOPs	url
DynamicCNN-T/0.7	`convnext-t`	0.7	81.59	95.72	3.6G	Google Drive /Tsinghua Cloud
DynamicCNN-T/0.9	`convnext-t`	0.9	82.06	95.89	3.9G	Google Drive /Tsinghua Cloud
DynamicCNN-S/0.7	`convnext-s`	0.7	82.57	96.29	5.8G	Google Drive /Tsinghua Cloud
DynamicCNN-S/0.9	`convnext-s`	0.9	83.12	96.42	6.8G	Google Drive /Tsinghua Cloud
DynamicCNN-B/0.7	`convnext-b`	0.7	83.45	96.56	10.2G	Google Drive /Tsinghua Cloud
DynamicCNN-B/0.9	`convnext-b`	0.9	83.96	96.76	11.9G	Google Drive /Tsinghua Cloud
DynamicSwin-T/0.7	`swin-t`	0.7	80.91	95.42	4.0G	Google Drive /Tsinghua Cloud
DynamicSwin-S/0.7	`swin-s`	0.7	83.21	96.33	6.9G	Google Drive /Tsinghua Cloud
DynamicSwin-B/0.7	`swin-b`	0.7	83.43	96.45	12.1G	Google Drive /Tsinghua Cloud

Usage

Requirements

torch>=1.8.0
torchvision>=0.9.0
timm==0.3.2
tensorboardX
six
fvcore

Data preparation: download and extract ImageNet images fromhttp://image-net.org/. The directory structure should be

│ILSVRC2012/├──train/│  ├── n01440764│  │   ├── n01440764_10026.JPEG│  │   ├── n01440764_10027.JPEG│  │   ├── ......│  ├── ......├──val/│  ├── n01440764│  │   ├── ILSVRC2012_val_00000293.JPEG│  │   ├── ILSVRC2012_val_00002138.JPEG│  │   ├── ......│  ├── ......

Model preparation: download pre-trained models if necessary:

model	url	model	url
DeiT-Small	link	LVViT-S	link
DeiT-Base	link	LVViT-M	link
ConvNeXt-T	link	Swin-T	link
ConvNeXt-S	link	Swin-S	link
ConvNeXt-B	link	Swin-B	link

Demo

You can try DynamicViT on Colab. Thank@dirtycomputer for the contribution.

We also provide aJupyter notebook where you can run the visualization of DynamicViT.

To run the demo, you need to installmatplotlib.

Evaluation

To evaluate a pre-trained DynamicViT model on the ImageNet validation set with a single GPU, run:

python infer.py --data_path /path/to/ILSVRC2012/ --model model_name \--model_path /path/to/model --base_rate 0.7

Training

To train Dynamic Spatial Sparsification models on ImageNet, run:

(You can train models with different keeping ratio by adjustingbase_rate. )

DeiT-S

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamicvit_deit-s --model deit-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 30 --base_rate 0.7 --lr 1e-3 --warmup_epochs 5

DeiT-B

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamicvit_deit-b --model deit-b --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 30 --base_rate 0.7 --lr 1e-3 --warmup_epochs 5 --drop_path 0.2 --ratio_weight 5.0

LV-ViT-S

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamicvit_lvvit-s --model lvvit-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 30 --base_rate 0.7 --lr 1e-3 --warmup_epochs 5

LV-ViT-M

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamicvit_lvvit-m --model lvvit-m --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 30 --base_rate 0.7 --lr 1e-3 --warmup_epochs 5

DynamicViT can also achieve comparable performance with only 15 epochs training (around 0.1% lower accuracy compared to 30 epochs).

ConvNeXt-T

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_conv-t --model convnext-t --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_conv-t --model convnext-t --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 1 --lr_scale 0.2

ConvNeXt-S

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_conv-s --model convnext-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_conv-s --model convnext-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 1 --lr_scale 0.2

ConvNeXt-B

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_conv-b --model convnext-b --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.5 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_conv-b --model convnext-b --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.5 --update_freq 1 --lr_scale 0.2

Swin-T

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_swin-t --model swin-t --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_swin-t --model swin-t --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 1 --lr_scale 0.2

Swin-S

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_swin-s --model swin-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_swin-s --model swin-s --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.2 --update_freq 1 --lr_scale 0.2

Swin-B

Train on 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --output_dir logs/dynamic_swin-b --model swin-b --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.5 --update_freq 4 --lr_scale 0.2

Train on 4 8-GPU nodes:

python run_with_submitit.py --nodes 4 --ngpus 8 --output_dir logs/dynamic_swin-b --model swin-b --input_size 224 --batch_size 128 --data_path /path/to/ILSVRC2012/ --epochs 120 --base_rate 0.7 --lr 4e-3 --drop_path 0.5 --update_freq 1 --lr_scale 0.2

License

MIT License

Acknowledgements

Our code is based onpytorch-image-models,DeiT,LV-ViT,ConvNeXt andSwin-Transformer.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{rao2021dynamicvit,  title={DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification},  author={Rao, Yongming and Zhao, Wenliang and Liu, Benlin and Lu, Jiwen and Zhou, Jie and Hsieh, Cho-Jui},  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},  year = {2021}}

@article{rao2022dynamicvit,  title={Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks},  author={Rao, Yongming and Liu, Zuyan and Zhao, Wenliang and Zhou, Jie and Lu, Jiwen},  journal={arXiv preprint arXiv:2207.01580},  year={2022}

About

[NeurIPS 2021] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

dynamicvit.ivg-research.xyz/

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook72.7%
Python27.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Efficient Vision Transformers and CNNs with Dynamic Spatial Sparsification

🔥Updates

Image Examples

Video Examples

Model Zoo

Usage

Requirements

Demo

Evaluation

Training

License

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

dot-agi/DynamicViT

Folders and files

Latest commit

History

Repository files navigation

Efficient Vision Transformers and CNNs with Dynamic Spatial Sparsification

🔥Updates

Image Examples

Video Examples

Model Zoo

Usage

Requirements

Demo

Evaluation

Training

License

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages