sdc17/UPopPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star101

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

License

BSD-3-Clause license

101 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github/workflows		.github/workflows
clip		clip
configs		configs
data		data
deit		deit
models		models
scripts		scripts
segm		segm
transform		transform
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
UPop.png		UPop.png
compress_caption.py		compress_caption.py
compress_deit.py		compress_deit.py
compress_nlvr.py		compress_nlvr.py
compress_retrieval.py		compress_retrieval.py
compress_retrieval_clip.py		compress_retrieval_clip.py
compress_retrieval_flickr.py		compress_retrieval_flickr.py
compress_segmenter.py		compress_segmenter.py
compress_vqa.py		compress_vqa.py
environment.yml		environment.yml
requirements.txt		requirements.txt
utils.py		utils.py

Repository files navigation

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

🧐 A Quick Look

What is it: UPop is the firststructured pruning framework for vision-language Transformers. Itenables effective structured pruning on various multi-modal & uni-modal tasks (including Visual Reasoning, Image Captioning, Visual Question Answer, Image-Text Retrieval, Text-Image Retrieval, Image Classification and Image Segmentation),datasets (including NLVR2, COCO Caption, VQAv2, COCO, Flickr30K, ImageNet and ADE20K), andmodel architectures (including BLIP, CLIP, DeiT and Segmenter).
overview.mp4
What challenge does it tackle: The above video demonstrates thatUnified Search adopted by UPoprescues us from the burden of repeated experiments (e.g., doing grid search) for searching optimal compression ratios among different modalities and structures. Furthermore,Progressive Pruning adopted by UPop eliminates the weight gap between the searched model and the pruned subnet to be retrained, thereforegaining better convergence and performance, especially at high compression ratios.

How about the performance: On multimodal tasks, for example, UPop can achieve2x compression with only 1.2% and 2.0% accuracy loss on the VQAv2 dataset for Visual Question Answer and the NLVR2 dataset for Visual Reasoning, respectively. On unimodal tasks, for example, UPop can achieve1.5x and 1.2x compression without any loss of accuracy on the ImageNet dataset for Image Classification and the ADE20K dataset for Image Segmentation, respectively. Some examples ofvector-level structured granularity are as follows.

Example (Task • Dataset • Model • Metric)	Performance	Parameters (M)	FLOPs (G)
Visual Reasoning •NLVR2 •BLIP • Acc	$83.1 \rightarrow 81.1_{\color{red}\downarrow 2.0}$	$259.5 \rightarrow 150.2_{\color{ForestGreen}\downarrow 42\%}$	$132.5 \rightarrow 89.4_{\color{ForestGreen}\downarrow 33\%}$
Image Caption •Caption COCO •BLIP • SPICE	$23.8 \rightarrow 23.3_{\color{red}\downarrow 0.5}$	$224.0 \rightarrow 127.1_{\color{ForestGreen}\downarrow 43\%}$	$65.7 \rightarrow 39.8_{\color{ForestGreen}\downarrow 39\%}$
Visual Question Answer •VQAv2 •BLIP • Acc	$77.5 \rightarrow 76.3_{\color{red}\downarrow 1.2}$	$361.6 \rightarrow 211.3_{\color{ForestGreen}\downarrow 42\%}$	$186.1 \rightarrow 109.4_{\color{ForestGreen}\downarrow 41\%}$
Image-Text Retrieval •COCO •BLIP • R@1	$81.9 \rightarrow 77.4_{\color{red}\downarrow 4.5}$	$447.6 \rightarrow 248.9_{\color{ForestGreen}\downarrow 44\%}$	$153.2\rightarrow 88.3_{\color{ForestGreen}\downarrow 42\%}$
Image-Text Retrieval •COCO •CLIP • R@1	$71.5 \rightarrow 70.8_{\color{red}\downarrow 0.7}$	$856.0 \rightarrow 473.7_{\color{ForestGreen}\downarrow 45\%}$	$395.7\rightarrow 196.3_{\color{ForestGreen}\downarrow 50\%}$
Text-Image Retrieval •COCO •BLIP • R@1	$64.3\rightarrow 59.8_{\color{red}\downarrow 4.5}$	$447.6 \rightarrow 248.9_{\color{ForestGreen}\downarrow 44\%}$	$153.2\rightarrow 88.3_{\color{ForestGreen}\downarrow 42\%}$
Text-Image Retrieval •COCO •CLIP • R@1	$56.8\rightarrow 53.1_{\color{red}\downarrow 3.7}$	$856.0 \rightarrow 473.7_{\color{ForestGreen}\downarrow 45\%}$	$395.7\rightarrow 196.3_{\color{ForestGreen}\downarrow 50\%}$
Image-Text Retrieval •Flickr30K •BLIP • R@1	$96.8\rightarrow 92.2_{\color{red}\downarrow 4.4}$	$447.6\rightarrow 250.5_{\color{ForestGreen}\downarrow 44\%}$	$153.2\rightarrow 91.0_{\color{ForestGreen}\downarrow 41\%}$
Image-Text Retrieval •Flickr30K •CLIP • R@1	$96.8\rightarrow 93.2_{\color{red}\downarrow 3.6}$	$856.0\rightarrow 474.3_{\color{ForestGreen}\downarrow 45\%}$	$395.7 \rightarrow 201.1_{\color{ForestGreen}\downarrow 49\%}$
Text-Image Retrieval •Flickr30K •BLIP • R@1	$86.9 \rightarrow 82.0_{\color{red}\downarrow 4.9}$	$447.6\rightarrow 250.5_{\color{ForestGreen}\downarrow 44\%}$	$153.2\rightarrow 91.0_{\color{ForestGreen}\downarrow 41\%}$
Text-Image Retrieval •Flickr30K •CLIP • R@1	$86.6\rightarrow 80.5_{\color{red}\downarrow 6.1}$	$856.0\rightarrow 474.3_{\color{ForestGreen}\downarrow 45\%}$	$395.7 \rightarrow 201.1_{\color{ForestGreen}\downarrow 49\%}$
Classification •ImageNet •DeiT • Acc@1	$79.9\rightarrow 80.2_{\color{ForestGreen}\uparrow 0.3}$	$22.0 \rightarrow 15.7_{\color{ForestGreen}\downarrow 29\%}$	$4.6 \rightarrow 3.2_{\color{ForestGreen}\downarrow 30\%}$
Classification •ImageNet •DeiT • Acc@5	$95.0 \rightarrow 95.1_{\color{ForestGreen}\uparrow 0.1}$	$22.0 \rightarrow 15.7_{\color{ForestGreen}\downarrow 29\%}$	$4.6 \rightarrow 3.2_{\color{ForestGreen}\downarrow 30\%}$
Segmentation •ADE20K •Segmenter •$\text{mIoU}^s$	$45.3\rightarrow 45.3_{\color{ForestGreen}\uparrow 0.0}$	$26.4 \rightarrow 21.5_{\color{ForestGreen}\downarrow 19\%}$	$38.6 \rightarrow 30.4_{\color{ForestGreen}\downarrow 21\%}$
Segmentation •ADE20K •Segmenter •$\text{mIoU}^m$	$46.9 \rightarrow 47.1_{\color{ForestGreen}\uparrow 0.2}$	$26.4 \rightarrow 21.5_{\color{ForestGreen}\downarrow 19\%}$	$38.6 \rightarrow 30.4_{\color{ForestGreen}\downarrow 21\%}$

🥳 What's New

(Jun 2023), we worked on a new project CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers, which reduces computational costs effectively for accelerating.[Paper][Code]
(Apr 2023), our work UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers was accepted by ICML 2023.

🏃 Installation

The code is tested onPytorch==1.11.0,cuda==11.3.1, andpython==3.8.13. The dependencies can be installed by:

conda env create -f environment.yml

🚀 Visual Reasoning on the NLVR2 Dataset

Dataset & Annotation
Download theNLVR2 dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations (including annotations for Visual Reasoning, Image Caption, VQA, Image-Text Retrieval, and Text-Image Retrieval tasks) fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --evaluate \--pretrained output/nlvr_nlvr2_compression_2x/model_base_nlvr_nlvr2_2x_compressed.pth \--config ./configs/nlvr.yaml \--output_dir output/nlvr_nlvr2_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --p 0.5 --epoch 15 \--pretrained pretrained/model_base_nlvr.pth \--config ./configs/nlvr.yaml \--output_dir output/nlvr_nlvr2_compression_2x

Download

Reduction	Uncompressed Model	Compression Script	Training Log	Compressed Checkpoint	Evaluation Script
2x	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
3x	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
4x	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
5x	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
10x	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link

🚀 Image Caption on the COCO Caption Dataset

Dataset & Annotation
Download theCOCO Caption dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_caption.py --evaluate \--pretrained output/caption_coco_compression_2x/model_base_caption_capfilt_large_coco_2x_compressed.pth \--config ./configs/caption_coco.yaml \--output_dir output/caption_coco_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_caption.py --p 0.5 --epoch 5 \--pretrained pretrained/model_base_caption_capfilt_large.pth \--config ./configs/caption_coco.yaml \--output_dir output/caption_coco_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Visual Question Answer on the VQAv2 Dataset

Dataset & Annotation
Download theVQAv2 dataset andVisual Genome dataset, unzip them under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.
Evaluation
Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:
[!Note]Note that the scripts will generate answersvqa_result.json, which should be submitted to theofficial server to obtain evaluation results.
```
python -m torch.distributed.run --nproc_per_node=8 compress_vqa.py --evaluate \--pretrained output/vqa_vqa2_compression_2x/model_base_vqa_capfilt_large_vqa2_2x_compressed.pth \--config ./configs/vqa.yaml \--output_dir output/vqa_vqa2_compression_2x
```

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_vqa.py --p 0.5 --epoch 10 \--pretrained pretrained/model_base_vqa_capfilt_large.pth \--config ./configs/vqa.yaml \--output_dir output/vqa_vqa2_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset

Dataset & Annotation
Download theCOCO dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval.py --evaluate \--pretrained output/retrieval_coco_compression_2x/model_base_retrieval_coco_2x_compressed.pth --config ./configs/retrieval_coco.yaml \--output_dir output/retrieval_coco_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval.py --p 0.5 --epoch 6 \--pretrained pretrained/model_base_retrieval_coco.pth \--config ./configs/retrieval_coco.yaml \--output_dir output/retrieval_coco_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset

Dataset & Annotation
Download theFlickr30k dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_flickr.py --evaluate \--pretrained output/retrieval_flickr_compression_2x/model_base_retrieval_flickr_2x_compressed.pth \--config ./configs/retrieval_flickr.yaml \--output_dir output/retrieval_flickr_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_flickr.py --p 0.5 --epoch 12 \--pretrained pretrained/model_base_retrieval_flickr.pth \--config ./configs/retrieval_flickr.yaml \--output_dir output/retrieval_flickr_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset with CLIP

Dataset & Annotation
Download theCOCO dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --evaluate \--pretrained output/retrieval_coco_clip_compression_2x/clip_large_retrieval_coco_2x_compressed.pth \--config ./configs/retrieval_coco_clip.yaml \--output_dir output/retrieval_coco_clip_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --p 0.5 --epoch 6 \--pretrained pretrained/clip_large_retrieval_coco.pth \--config ./configs/retrieval_coco_clip.yaml \--output_dir output/retrieval_coco_clip_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset with CLIP

Dataset & Annotation
Download theFlickr30k dataset, unzip it under thedatasets folder, and accordingly modify theimage_root inconfig. Download all-in-one annotations fromGoogle Drive orBaidu Drive, unzip it under theannotation folder, and accordingly modify theannotation inconfig. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the--pretrained of the scripts. For example, to evaluate a 2x compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --evaluate \--pretrained output/retrieval_flickr_clip_compression_2x/clip_large_retrieval_flickr_2x_compressed.pth \--config ./configs/retrieval_flickr_clip.yaml \--output_dir output/retrieval_flickr_clip_compression_2x

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify thepretrained inconfig. For example, to conduct a 2x compression:

python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --p 0.5 --epoch 12 \--pretrained pretrained/clip_large_retrieval_flickr.pth \--config ./configs/retrieval_flickr_clip.yaml \--output_dir output/retrieval_flickr_clip_compression_2x

Download
Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
2x Google/Baidu Link Google/Baidu Google/Baidu Link
4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image Classification on the ImageNet Dataset

Dataset & Annotation
Download theImageNet dataset, unzip it under thedatasets folder, and accordingly modify the option--data-path in compression and evaluation scripts. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, and accordingly modify the option--resume of the scripts. For example, to evaluate a 50% compressed model:

python -m torch.distributed.run --nproc_per_node=8 compress_deit.py --eval --dist-eval \--data-path datasets/vision/imagenet \--model deit_small_patch16_224 \--resume output/train_deit_small_patch16_224_60s_300r_050x/deit_small_patch16_224_050x_compressed.pth

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, and accordingly modify the option--finetune of the scripts. For example, to conduct a 50% compression:

python -m torch.distributed.run --nproc_per_node=8 compress_deit.py \--data-path datasets/vision/imagenet \--finetune pretrained/deit_small_patch16_224-cd65a155.pth \--model deit_small_patch16_224 \--epochs-search 60 \--epochs 300 \--batch-size 512 \--lr-search 1e-4 \--lr 1e-4 \--warmup-epochs 0 \--p 0.5 \--interval 800 \--output_dir output/train_deit_small_patch16_224_60s_300r_050x

Download

Reduction	Uncompressed Model	Compression Script	Training Log	Compressed Checkpoint	Evaluation Script
10%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
20%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
30%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
40%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
50%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link

🚀 Image Segmentation on the Ade20k Dataset

Dataset & Annotation
Download theAde20k dataset, unzip it under thedatasets folder, and accordingly modify the option--dataset in compression and evaluation scripts. Seehere for expected folder structres.

Evaluation

Download compressed checkpoints from the table below, put them under theoutput folder, accordingly modify the path option of the scripts, and export the folder of datasets as the environment variableDATASET. For example, to evaluate a 30% compressed model:

export DATASET=datasets/vision# for single-scale testingpython -m torch.distributed.run --nproc_per_node=4 segm/eval/miou.py \output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --singlescale# for multi-scale testingpython -m torch.distributed.run --nproc_per_node=4 segm/eval/miou.py \output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --multiscale

Compression

Download the uncompressed model from the table below, put it under thepretrained folder, accordingly modify the option--pretrained of the scripts, and export the folder of datasets as the environment variableDATASET. For example, to conduct a 30% compression:

export DATASET=datasets/visionpython -m torch.distributed.run --nproc_per_node=4 segm/train.py --dataset ade20k \--backbone vit_small_patch16_384 --decoder mask_transformer --no-resume \--pretrained pretrained/seg_small_mask.pth \--epochs-search 16 \--epochs 64 \--batch-size 64 \--lr-search 4e-3 \-lr 4e-3  \--p 0.30 \--interval 200 \--log-dir output/seg_small_mask_16s_64r_030x

Download

Reduction	Uncompressed Model	Compression Script	Training Log	Compressed Checkpoint	Evaluation Script
10%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
15%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
20%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link
30%	Google/Baidu	Link	Google/Baidu	Google/Baidu	Link

📑 Other Issues

1. Evaluation with a single GPU

For BLIP and CLIP models, evaluate the 2x compressed BLIP model on the NLVR2 dataset as an example:

python compress_nlvr.py --evaluate \--pretrained output/caption_coco_compression_2x/model_base_caption_capfilt_large_coco_2x_compressed.pth \--config ./configs/caption_coco.yaml \--output_dir output/caption_coco_compression_2x

For DeiT, evaluate the 50% compressed model on the ImageNet dataset as an example:

[!Note]Note that without the option---dist-eval

python compress_deit.py --eval \--data-path datasets/vision/imagenet \--model deit_small_patch16_224 \--resume output/train_deit_small_patch16_224_60s_300r_050x/deit_small_patch16_224_050x_compressed.pth

For Segmenter, evaluate the 30% compressed model on the ADE20k dataset as an example:

export DATASET=datasets/vision# for single-scale testingpython segm/eval/miou.py \output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --singlescale# for multi-scale testingpython segm/eval/miou.py \output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --multiscale

2. Compress with a single GPU

For BLIP and CLIP models, compress the BLIP model to half on the NLVR2 dataset as an example:

python compress_nlvr.py --p 0.5 --epoch 15 \--pretrained pretrained/model_base_nlvr.pth \--config ./configs/nlvr.yaml \--output_dir output/nlvr_nlvr2_compression_2x

For DeiT, conduct a 50% compression on the ImageNet dataset as an example:

python compress_deit.py \--data-path datasets/vision/imagenet \--finetune pretrained/deit_small_patch16_224-cd65a155.pth \--model deit_small_patch16_224 \--epochs-search 60 \--epochs 300 \--batch-size 512 \--lr-search 1e-4 \--lr 1e-4 \--warmup-epochs 0 \--p 0.5 \--interval 800 \--output_dir output/train_deit_small_patch16_224_60s_300r_050x

For Segmenter, conduct a 30% compression on the Ade20k dataset as an example:

export DATASET=datasets/visionpython segm/train.py --dataset ade20k \--backbone vit_small_patch16_384 --decoder mask_transformer --no-resume \--pretrained pretrained/seg_small_mask.pth \--epochs-search 16 \--epochs 64 \--batch-size 64 \--lr-search 4e-3 \-lr 4e-3  \--p 0.30 \--interval 200 \--log-dir output/seg_small_mask_16s_64r_030x

3. Out of memory during the evaluation

For BLIP and CLIP models, change thebatch_size_test (or thebatch_size for the Image Caption task) in the corresponding config file to a smaller number.
For DeiT, modify the option--batch-size of the scripts to a smaller number.
For Segmenter, the default batch size of the evaluation is1. For the single-scale testing, the peak of used GPU memory on a single card is less than 5G, which should be able to run on most types of GPUs. For the multi-scale testing, the peak of used GPU memory on a single card is about 13G, which may require a GPU with relatively larger memory.

4. Out of memory during the compression

For BLIP and CLIP models, change thebatch_size_train andbatch_size_test (or thebatch_size for the Image Caption task) in the corresponding config file to a smaller number. Besides, the option--amp for compression scripts can be used to enable mixed precision. Compress the BLIP model to half on the NLVR2 dataset as an example:
```
python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --p 0.5 --epoch 15 --amp \--pretrained pretrained/model_base_nlvr.pth \--config ./configs/nlvr.yaml \--output_dir output/nlvr_nlvr2_compression_2x
```
[!WARNING]
Note that using mixed precision may produce nan gradients. Since UPop take gradients as metrics to determine pruned positions, nan gradients may disrupt the determination and degrade the performance.
For DeiT and Segmenter, modify the option--batch-size of the scripts to a smaller number. Mixed precision is not supported temporarily, as it frequently causes nan gradients.

🌲 Expected Structures

├── annotation│   ├── answer_list.json│   ├── coco_gt│   │   ├── coco_karpathy_test_gt.json│   │   └── coco_karpathy_val_gt.json│   ├── ...├── clip                                               ├── compress_caption.py       ├── compress_deit.py        ├── compress_nlvr.py                  ├── compress ...    ├── configs                                             ├── data                                        ├── datasets│   └── vision│       ├── coco│       ├── flickr│       ├── NLVR2     │       ├── ...                                                                              ├── deit   ├── log                                     ├── models            ├── output                                    ├── pretrained│   ├── bert-base-uncased│   ├── clip_large_retrieval_coco.pth│   ├── clip_large_retrieval_flickr.pth│   ├── ...       ├── segm                                                                                   ├── transform                                                                           └── utils.py

💬 Acknowledgments

This code is built uponBLIP,CLIP,DeiT,Segmenter, andtimm. Thanks for these awesome open-source projects!

✨ Citation

@InProceedings{pmlr-v202-shi23e,title ={{UP}op: Unified and Progressive Pruning for Compressing Vision-Language Transformers},author ={Shi, Dachuan and Tao, Chaofan and Jin, Ying and Yang, Zhendong and Yuan, Chun and Wang, Jiaqi},booktitle ={Proceedings of the 40th International Conference on Machine Learning},pages ={31292--31311},year ={2023},volume ={202},publisher ={PMLR}}

About

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

dachuanshi.com/UPop-Project/

Movatterモバイル変換

License

sdc17/UPop

Folders and files

Latest commit

History

Repository files navigation

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

🧐 A Quick Look

🥳 What's New

🏃 Installation

🚀 Visual Reasoning on the NLVR2 Dataset

🚀 Image Caption on the COCO Caption Dataset

🚀 Visual Question Answer on the VQAv2 Dataset

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset with CLIP

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset with CLIP

🚀 Image Classification on the ImageNet Dataset

🚀 Image Segmentation on the Ade20k Dataset

📑 Other Issues

1. Evaluation with a single GPU

2. Compress with a single GPU

3. Out of memory during the evaluation

4. Out of memory during the compression

🌲 Expected Structures

💬 Acknowledgments

✨ Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Languages

Packages