samsgood0310/carrier-of-tricks-for-classification-pytorchPublic

forked fromhoya012/carrier-of-tricks-for-classification-pytorch

NotificationsYou must be signed in to change notification settings
Fork0
Star0

carrier of tricks for image classification tutorials using pytorch.

License

MIT license

0 stars 21 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
learning		learning
network		network
LICENSE		LICENSE
README.md		README.md
gpu_history.sh		gpu_history.sh
main.py		main.py
option.py		option.py
requirements.txt		requirements.txt
utils.py		utils.py

Repository files navigation

carrier-of-tricks-for-classification-pytorch

carrier of tricks for image classification tutorials using pytorch. Based on"Bag of Tricks for Image Classification with Convolutional Neural Networks", 2019 CVPR Paper, implement classification codebase using custom dataset.

author: hoya012
last update: 2020.07.16
supplementary materials (blog post written in Korean)

0. Experimental Setup (I used 1 GTX 1080 Ti GPU!)

0-1. Prepare Library

pipinstall-rrequirements.txt

0-2. Download dataset (Kaggle Intel Image Classification)

Intel Image Classification

This Data contains around 25k images of size 150x150 distributed under 6 categories.{'buildings' -> 0,'forest' -> 1,'glacier' -> 2,'mountain' -> 3,'sea' -> 4,'street' -> 5 }

0-3. Download ImageNet-Pretrained Weights (EfficientNet, RegNet)

facebook_research_pycls
downloadRegNetY-1.6GF andEfficientNet-B2 weights

1. Baseline Training Setting

ImageNet Pretrained ResNet-50 from torchvision.models
1080 Ti 1 GPU / Batch Size 64 / Epochs 120 / Initial Learning Rate 0.1
Training Augmentation: Resize((256, 256)), RandomHorizontalFlip()
SGD + Momentum(0.9) + learning rate step decay (x0.1 at 30, 60, 90 epoch)

pythonmain.py--checkpoint_namebaseline;

1-1. Simple Trials

Random Initialized ResNet-50 (from scratch)

pythonmain.py--checkpoint_namebaseline_scratch--pretrained0;

Adam Optimizer with small learning rate(1e-4 is best!)

pythonmain.py--checkpoint_namebaseline_Adam--optimizerADAM--learning_rate0.0001

2. Bag of Tricks from Original Papers

Before start, i didn't tryNo bias decay,Low-precision Training,ResNet Model Tweaks,Knowledge Distillation.

2-1. Learning Rate Warmup

first 5 epochs to warmup

pythonmain.py--checkpoint_namebaseline_warmup--decay_typestep_warmup;pythonmain.py--checkpoint_namebaseline_Adam_warmup--optimizerADAM--learning_rate0.0001--decay_typestep_warmup;

2-2. Zero gamma in Batch Normalization

zero-initialize the last BN in each residual branch

pythonmain.py--checkpoint_namebaseline_zerogamma--zero_gamma ;pythonmain.py--checkpoint_namebaseline_warmup_zerogamma--decay_typestep_warmup--zero_gamma;

2-3. Cosine Learning Rate Annealing

pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup;

2-4. Label Smoothing

In paper, use smoothing coefficient as 0.1. I will use same value.
The number of classes in imagenet (1000) is different from the number of classes in our dataset (6), but i didn't tune them.

pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_labelsmooth--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_namebaseline_Adam_warmup_labelsmooth--optimizerADAM--learning_rate0.0001--decay_typestep_warmup--label_smooth0.1;

2-5. MixUp Augmentation

MixUp paper link
lambda is a random number drawn from Beta(alpha, alpha) distribution.
I will use alpha=0.2 like paper.

pythonmain.py--checkpoint_namebaseline_Adam_warmup_mixup--optimizerADAM--learning_rate0.0001--decay_typestep_warmup--mixup0.2;pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_mixup--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--mixup0.2;pythonmain.py--checkpoint_namebaseline_Adam_warmup_labelsmooth_mixup--optimizerADAM--learning_rate0.0001--decay_typestep_warmup--label_smooth0.1--mixup0.2;pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_labelsmooth_mixup--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1--mixup0.2;

3. Additional Tricks from hoya012's survey note

3-1. CutMix Augmentation

CutMix paper link
I will use same hyper-parameter (cutmix alpha=1.0, cutmix prob=1.0) with ImageNet-Experimental Setting

pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_cutmix--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;

3-2. RAdam Optimizer

RAdam Optimizer paper link

pythonmain.py--checkpoint_namebaseline_RAdam_warmup_cosine_labelsmooth--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_namebaseline_RAdam_warmup_cosine_cutmix--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;

3-3. RandAugment

RandAugment paper link
I will use N=3, M=15.

pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_labelsmooth_randaug--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1--randaugment;

3-4. EvoNorm

pythonmain.py--checkpoint_namebaseline_Adam_warmup_cosine_labelsmmoth_evonorm--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1--normevonorm;

3-5. Other Architecture (EfficientNet, RegNet)

I will use EfficientNet-B2 which has similar acts with ResNet-50
- But, because of GPU Memory, i will use small batch size (48)...
I will use RegNetY-1.6GF which has similar FLOPS and acts with ResNet-50

pythonmain.py--checkpoint_nameefficientnet_Adam_warmup_cosine_labelsmooth--modelEfficientNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_nameefficientnet_Adam_warmup_cosine_labelsmooth_mixup--modelEfficientNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1--mixup0.2;pythonmain.py--checkpoint_nameefficientnet_Adam_warmup_cosine_cutmix--modelEfficientNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;pythonmain.py--checkpoint_nameefficientnet_RAdam_warmup_cosine_labelsmooth--modelEfficientNet--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_nameefficientnet_RAdam_warmup_cosine_cutmix--modelEfficientNet--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;

pythonmain.py--checkpoint_nameregnet_Adam_warmup_cosine_labelsmooth--modelRegNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_nameregnet_Adam_warmup_cosine_labelsmooth_mixup--modelRegNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1--mixup0.2;pythonmain.py--checkpoint_nameregnet_Adam_warmup_cosine_cutmix--modelRegNet--optimizerADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;pythonmain.py--checkpoint_nameregnet_RAdam_warmup_cosine_labelsmooth--modelRegNet--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--label_smooth0.1;pythonmain.py--checkpoint_nameregnet_RAdam_warmup_cosine_cutmix--modelRegNet--optimizerRADAM--learning_rate0.0001--decay_typecosine_warmup--cutmix_alpha1.0--cutmix_prob1.0;

4. Performance Table

B : Baseline
A : Adam Optimizer
W : Warm up
C : Cosine Annealing
S : Label Smoothing
M : MixUp Augmentation
CM: CutMix Augmentation
R : RAdam Optimizer
RA : RandAugment
E : EvoNorm
EN : EfficientNet
RN : RegNet

Algorithm	Validation Accuracy	Test Accuracy
B from scratch	86.68	86.10
B	86.14	87.93
B + A	93.34	93.90
B + A + W	93.77	94.17
B + A + W + C	93.66	93.67
B + A + W + S	93.94	93.77
B + A + W + C + S	93.80	93.63
B + A + W + M	94.09	94.20
B + A + W + S + M	93.69	94.40
B + A + W + C + S + M	93.77	93.77
:------------:	:-------------------:	:-------------:
BAWC + CM	94.44	93.97
BWCS + R	93.27	93.73
BAWCS + RA	93.94	93.80
BAWCS + E	93.55	93.70
BWC + CM + R	94.23	93.90
:------------:	:-------------------:	:-------------:
EN + AWCSM	93.48	93.50
EN + AWC + CM	94.19	94.03
EN + WCS + R	93.91	94.03
EN + WC + CM + R	93.98	94.27
:------------:	:-------------------:	:-------------:
RN + AWCSM	94.30	94.30
RN + AWC + CM	93.91	94.97
RN + WCS + R	93.91	94.10
RN + WC + CM + R	94.48	94.37

Tip: I recommend long training epoch if you use many regularization techniques (Label Smoothing, MixUp, RandAugment, CutMix, etc). Remember that i use just120 epoch.
- reference:"Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network", 2020 arXiv

5. How to run all of experiments?

seegpu_history.sh

6. Code Reference

Gradual Warmup Scheduler:https://github.com/ildoonet/pytorch-gradual-warmup-lr
Label Smoothing:https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/smoothing.py
MixUp Augmentation:https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/mixup.py
CutMix Augmentation:https://github.com/clovaai/CutMix-PyTorch
RAdam Optimizer:https://github.com/LiyuanLucasLiu/RAdam
RandAugment:https://github.com/ildoonet/pytorch-randaugment
EvoNorm:https://github.com/digantamisra98/EvoNorm
ImageNet-Pretrained EfficientNet, RegNet:https://github.com/facebookresearch/pycls

About

carrier of tricks for image classification tutorials using pytorch.

Releases

No releases published

Packages

No packages published

Languages

Python94.3%
Shell5.7%

Movatterモバイル変換

License

samsgood0310/carrier-of-tricks-for-classification-pytorch

Folders and files

Latest commit

History

Repository files navigation

carrier-of-tricks-for-classification-pytorch

0. Experimental Setup (I used 1 GTX 1080 Ti GPU!)

0-1. Prepare Library

0-2. Download dataset (Kaggle Intel Image Classification)

0-3. Download ImageNet-Pretrained Weights (EfficientNet, RegNet)

1. Baseline Training Setting

1-1. Simple Trials

2. Bag of Tricks from Original Papers

2-1. Learning Rate Warmup

2-2. Zero gamma in Batch Normalization

2-3. Cosine Learning Rate Annealing

2-4. Label Smoothing

2-5. MixUp Augmentation

3. Additional Tricks from hoya012's survey note

3-1. CutMix Augmentation

3-2. RAdam Optimizer

3-3. RandAugment

3-4. EvoNorm

3-5. Other Architecture (EfficientNet, RegNet)

4. Performance Table

5. How to run all of experiments?

6. Code Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages