Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".

License

NotificationsYou must be signed in to change notification settings

Osilly/TokenExpansion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The official pytorch implementation of Token Expansion (ToE).

[arXiv (CVPR 2024)]

Update

  • 2024.4.17 : Add the example of ToE applying for YOLOS on object detection.
  • 2023.11.26 : Release code of ToE.

🔥 : We find that ToE has a significant effect for PEFT methods. For example, ToE can achieve2x training speed and30% GPU memory saving for the training ofVisual Prompt Tuning (VPT), while improving accuracy by 2%-6% on VTAB-1k benchmark. We will release the code soon. Stay tuned!

Pipeline

Pipeline of ToE.

The ''initialization-expansion-merging'' pipeline of proposed ToE. We take the 1st training stage ($\delta=1$), the kept rate$r_1=2r_0=\frac{2}{3}$, the repetition step$k=1$ as example. ToE is only added after the first Transformer block to guide the token selection and usage. During training, steps (1), (2), and (3) are performed for each iteration with the reduction of token numbers. First, seed tokens are selected for token initialization through step (1). Then, the number of tokens is expanded via step (2) for token expansion. Finally, we merge the unselected token set (blue boxes) into the selected one (red boxes) with the close feature distributions in step (3) for token merging. During testing, ToE can be safely removed to generate the same Transformer architecture as the original full-token Transformer.

Requirements

torch>=1.12.0torchvision>=0.13.0timm==0.9.2

How to apply ToE

We provide the main codetoken_select.py. It can be seamlessly integrated into the training of ViTs.

You can find the examples of applying ToE to popular ViTs (e.g., DeiT inToE/deitand LV-ViT inToE/lvvit) and existing efficient training frameworks (e.g., EfficientTrain inToE/EfficientTrain).

It is simple to change the existing codes, and the codes for the changes we make to the original model codes are wrapped in two# ---------------------#.

For example (ToE/deit/main.py):

 model = create_model(     args.model,     pretrained=False,     num_classes=args.nb_classes,     drop_rate=args.drop,     drop_path_rate=args.drop_path,     drop_block_rate=None,     img_size=args.input_size, ) # ---------------------# model.token_select = TokenSelect(     expansion_step=args.expansion_step,     keep_rate=args.keep_rate,     initialization_keep_rate=args.initialization_keep_rate,     expansion_multiple_stage=args.expansion_multiple_stage,     distance=args.distance, ) # ---------------------#

Data prepare

ImageNet-1k

Download and extract ImageNet images fromhttp://image-net.org/. The directory structure should be:

│ILSVRC2012/├──train/│  ├── n01440764│  │   ├── n01440764_10026.JPEG│  │   ├── n01440764_10027.JPEG│  │   ├── ......│  ├── ......├──val/│  ├── n01440764│  │   ├── ILSVRC2012_val_00000293.JPEG│  │   ├── ILSVRC2012_val_00002138.JPEG│  │   ├── ......│  ├── ......

COCO

Download and extract COCO 2017 train and val images with annotations fromhttp://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/  annotations/  # annotation json files  train2017/    # train images  val2017/      # val images

Label data for LV-ViT

Followingoriginal LV-ViT repo to prepare.

We provide NFNet-F6 generated dense label map inGoogle Drive andBaiDu Yun (password: y6j2). As NFNet-F6 are based on pure ImageNet data, no extra training data is involved.

Pretrained model prepare

Pretrained DeiT model for YOLOS

Followingoriginal YOLOS repo to prepare.

[FB] Pretrained model for YOLOS-T

[FB] Pretrained model for YOLOS-S

Training

The detailed training scripts are presents in the specific code paths (e.g., DeiT in inToE/deit/run/). You should prepare the environments and datasets.

We take a few simple training examples:

DeiT-small

We train the DeiT-small on four GPUs, the ImageNet-1K dataset is required.

cd ToE/deitbash run/imagenet_small_run1.sh

run/imagenet_small_run1.sh:

result_dir=[your_result_path]dataset_dir=[your_dataset_path]device=0,1,2,3master_port=6666CUDA_VISIBLE_DEVICES=$device torchrun --nproc_per_node=4 --master_port=$master_port main.py \--patch-size 16 \--model deit_small_patch16_224 \--batch-size 256 \--data-path $dataset_dir \--output_dir $result_dir \--num_workers 8 \--seed 3407 \--expansion-step 0 100 200 \--keep-rate 0.5 0.75 1.0 \--initialization-keep-rate 0.25 \--expansion-multiple-stage 2 \

LV-ViT-S

We train the LV-ViT-S on four GPUs, the ImageNet-1K dataset and the label data (seeoriginal LV-ViT repo) are required.

cd ToE/lvvitbash run/imagenet_small_run1.sh

run/imagenet_small_run1.sh:

#!/bin/bashresult_dir=[your_result_path]dataset_dir=[your_dataset_path]label_dir=[your_token_labeling_dataset_path]device=0,1,2,3master_port=6666shiftCUDA_VISIBLE_DEVICES=$device python3 -m torch.distributed.launch --nproc_per_node=4 main.py "$@" $dataset_dir \--output $result_dir \--model lvvit_s \-b 256 \--img-size 224 \--drop-path 0.1 \--token-label \--token-label-data $label_dir \--token-label-size 14 \--model-ema \--apex-amp \--expansion-step 0 100 200 \--keep-rate 0.4 0.7 1.0 \--initialization-keep-rate 0.2 \--expansion-multiple-stage 2 \

EfficientTrain (DeiT-small)

We train the EfficientTrain (DeiT-small) on eight GPUs, the ImageNet-1K dataset is required.

cd ToE/EfficientTrainbash run/imagenet_small_run1.sh

run/imagenet_small_run1.sh:

result_dir=[your_result_path]dataset_dir=[your_dataset_path]CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python ET_training.py \--data_path $dataset_dir \--output_dir $result_dir \--model deit_small_patch16_224 \--final_bs 256 --epochs 300 \--num_gpus 8 --num_workers 8 \

The codes of speedup factors of ToE are presents inToE/EfficientTrain/ET_training.py. For example:

"deit_tiny_patch16_224": " --use_amp true --clip_grad 5.0 \--expansion-step 0 100 200 --keep-rate 0.6 0.8 1.0 \--initialization-keep-rate 0.3 --expansion-multiple-stage 2 "

YOLOS-S

We train the YOLOS-S on eight GPUs, the COCO dataset and the pertained model (seeoriginal YOLOS repo) are required.

cd ToE/YOLOSbash run/coco_small_run1.sh

run/coco_small_run1.sh:

result_dir=[your_result_path]dataset_dir=[your_dataset_path]pretrain_path=[your_pretrain_model_path]device=0,1,2,3,4,5,6,7master_port=6666CUDA_VISIBLE_DEVICES=$device python -m torch.distributed.launch --nproc_per_node=8 --master_port=$master_port --use_env main.py \    --coco_path $dataset_dir \    --batch_size 1 \    --lr 2.5e-5 \    --epochs 150 \    --backbone_name small \    --pre_trained $pretrain_path \    --eval_size 800 \    --init_pe_size 512 864 \    --mid_pe_size 512 864 \    --output_dir $result_dir \    --num_workers 8 \    --expansion-step 5 50 100 \    --keep-rate 0.5 0.75 1.0 \    --initialization-keep-rate 0.25 \    --expansion-multiple-stage 2 \

Acknowledgements

This project is based onDeiT,LV-ViT,EfficientTrain,YOLOS andtimm. Thanks for their wonderful works.

Citation

If you find this work useful in your research, please consider citing:

@article{huang2024general,  title={A General and Efficient Training for Transformer via Token Expansion},  author={Huang, Wenxuan and Shen, Yunhang and Xie, Jiao and Zhang, Baochang and He, Gaoqi and Li, Ke and Sun, Xing and Lin, Shaohui},  journal={CVPR},  year={2024}}

About

[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp