szq0214/S2-BNNPublic

NotificationsYou must be signed in to change notification settings
Fork8
Star64

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Contrastive+Distillation		Contrastive+Distillation
Contrastive_only		Contrastive_only
Distillation_only		Distillation_only
imgs		imgs
logs		logs
README.md		README.md

Repository files navigation

S²-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

This is the official pytorch implementation of our paper:

"S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration" (CVPR 2021)

byZhiqiang Shen,Zechun Liu,Jie Qin,Lei Huang,Kwang-Ting Cheng andMarios Savvides.

In this paper, we introduce a simple yet effective self-supervised approach using distillation loss for learning efficient binary neural networks. Our proposed method can outperform the simple contrastive learning baseline (MoCo V2) by an absolute gain of 5.5∼15% on ImageNet.

The student models are not restricted to the binary neural networks, you can replace with any efficient/compact models.

News

[Aug. 18, 2021] Add ResNet-50 result as the student (real-valued model) withSwAV/RN50-w4 as the teacher.

Models	Pre-train epochs	batch size	linear cls. log	Top-1 (%)	Pre-train models
SimSiam	100	512	log	68.1	GitHub
S2-BNN (Distillation_only)	100	512	log	68.7	Download

Note that we use the same linear evaluation procedure asSimSiam.

Citation

If you find our code is helpful for your research, please cite:

@InProceedings{Shen_2021_CVPR,author    = {Shen, Zhiqiang and Liu, Zechun and Qin, Jie and Huang, Lei and Cheng, Kwang-Ting and Savvides, Marios},title     = {S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration},booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},year      = {2021}

}

Preparation

1. Requirements:

Python
PyTorch
Torchvision

2. Data:

Download ImageNet dataset followinghttps://github.com/pytorch/examples/tree/master/imagenet#requirements.

Training & Testing

To train a model, run the following scripts. All our models are trained with 8 GPUs.

1. Standard Two-Step Training:

Our enhanced MoCo V2:

Step 1:

cd Contrastive_only/step1python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48

Step 2:

cd Contrastive_only/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48  --model-path ../step1/checkpoint_0199.pth.tar

Our MoCo V2 + Distillation Loss:

Download real-valued teacher networkhere. We use MoCo V2 800-epoch pretrained model, while you can choose other stronger self-supervised models as the teachers.

Step 1:

cd Contrastive+Distillation/step1python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Step 2:

cd Contrastive+Distillation/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

Our Distillation Loss Only:

Step 1:

cd Distillation_only/step1python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Step 2:

cd Distillation_only/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

2. Simple One-Step Training (Conventional):

Our enhanced MoCo V2:

cd Contrastive_only/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48

Our MoCo V2 + Distillation Loss:

cd Contrastive+Distillation/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Our Distillation Loss Only:

cd Distillation_only/step2python main_moco.py --lr 0.0003 --batch-size 256 --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

You can replace binary neural networks with any kinds of efficient/compact models on one-step training.

3. Testing:

To linearly evaluate a model, run the following script:

python main_lincls.py  --lr 0.1  -j 24  --batch-size 256  --pretrained  /home/szq/projects/s2bnn/checkpoint_0199.pth.tar --dist-url'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]

Results & Models

We provide pre-trained models with different training strategies, we report in the table #Epochs, FLOPs, OPs, Top-1 accuracy on ImageNet validation set:

Models	#Epoch	FLOPs (x10⁸)	OPs (x10⁸)	Top-1 (%)	Trained models
MoCo V2 baseline	200	0.12	0.87	46.9	Download
Our enhanced MoCo V2	200	0.12	0.87	52.5	Download
Our MoCo V2 + Distillation Loss	200	0.12	0.87	56.0	Download
Our Distillation Loss Only	200	0.12	0.87	61.5	Download

Training Logs

Our linear evaluation logs are availabe athere.

Acknowledgement

MoCo V2 (Improved Baselines with Momentum Contrastive Learning)

ReActNet (ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions)

MEAL V2 (MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks)

Contact

Zhiqiang Shen, CMU (zhiqiangshen0214 at gmail.com)

About

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

szq0214/S2-BNN

Folders and files

Latest commit

History

Repository files navigation

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

News

Citation

Preparation

1. Requirements:

2. Data:

Training & Testing

1. Standard Two-Step Training:

Our enhanced MoCo V2:

Step 1:

Step 2:

Our MoCo V2 + Distillation Loss:

Step 1:

Step 2:

Our Distillation Loss Only:

Step 1:

Step 2:

2. Simple One-Step Training (Conventional):

Our enhanced MoCo V2:

Our MoCo V2 + Distillation Loss:

Our Distillation Loss Only:

3. Testing:

Results & Models

Training Logs

Acknowledgement

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages0

Languages

S²-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

Packages