Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jul 20, 2022. It is now read-only.

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

License

NotificationsYou must be signed in to change notification settings

hysts/pytorch_image_classification

Repository files navigation

Following papers are implemented using PyTorch.

Requirements

  • Ubuntu (It's only tested on Ubuntu, so it may not work on Windows.)
  • Python >= 3.7
  • PyTorch >= 1.4.0
  • torchvision
  • NVIDIA Apex
pip install -r requirements.txt

Usage

python train.py --config configs/cifar/resnet_preact.yaml

Results on CIFAR-10

Results using almost same settings as papers

ModelTest Error (median of 3 runs)Test Error (in paper)Training Time
VGG-like (depth 15, w/ BN, channel 64)7.29N/A1h20m
ResNet-1106.526.43 (best), 6.61 +/- 0.163h06m
ResNet-preact-1106.476.37 (median of 5 runs)3h05m
ResNet-preact-164 bottleneck5.905.46 (median of 5 runs)4h01m
ResNet-preact-1001 bottleneck4.62 (median of 5 runs), 4.69 +/- 0.20
WRN-28-104.034.00 (median of 5 runs)16h10m
WRN-28-10 w/ dropout3.89 (median of 5 runs)
DenseNet-100 (k=12)3.87 (1 run)4.10 (1 run)24h28m*
DenseNet-100 (k=24)3.74 (1 run)
DenseNet-BC-100 (k=12)4.694.51 (1 run)15h20m
DenseNet-BC-250 (k=24)3.62 (1 run)
DenseNet-BC-190 (k=40)3.46 (1 run)
PyramidNet-110 (alpha=84)4.404.26 +/- 0.2311h40m
PyramidNet-110 (alpha=270)3.92 (1 run)3.73 +/- 0.0424h12m*
PyramidNet-164 bottleneck (alpha=270)3.44 (1 run)3.48 +/- 0.2032h37m*
PyramidNet-272 bottleneck (alpha=200)3.31 +/- 0.08
ResNeXt-29 4x64d3.89~3.75 (from Figure 7)31h17m
ResNeXt-29 8x64d3.97 (1 run)3.65 (average of 10 runs)42h50m*
ResNeXt-29 16x64d3.58 (average of 10 runs)
shake-shake-26 2x32d (S-S-I)3.683.55 (average of 3 runs)33h49m
shake-shake-26 2x64d (S-S-I)2.88 (1 run)2.98 (average of 3 runs)78h48m
shake-shake-26 2x96d (S-S-I)2.90 (1 run)2.86 (average of 5 runs)101h32m*

Notes

  • Differences with papers in training settings:
    • Trained WRN-28-10 with batch size 64 (128 in paper).
    • Trained DenseNet-BC-100 (k=12) with batch size 32 and initial learning rate 0.05 (batch size 64 and initial learning rate 0.1 in paper).
    • Trained ResNeXt-29 4x64d with a single GPU, batch size 32 and initial learning rate 0.025 (8 GPUs, batch size 128 and initial learning rate 0.1 in paper).
    • Trained shake-shake models with a single GPU (2 GPUs in paper).
    • Trained shake-shake 26 2x64d (S-S-I) with batch size 64, and initial learning rate 0.1.
  • Test errors reported above are the ones at last epoch.
  • Experiments with only 1 run are done on different computer from the one used for experiments with 3 runs.
  • GeForce GTX 980 was used in these experiments.

VGG-like

python train.py --config configs/cifar/vgg.yaml

ResNet

python train.py --config configs/cifar/resnet.yaml

ResNet-preact

python train.py --config configs/cifar/resnet_preact.yaml \    train.output_dir experiments/resnet_preact_basic_110/exp00

python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 164 \    model.resnet_preact.block_type bottleneck \    train.output_dir experiments/resnet_preact_bottleneck_164/exp00

WRN

python train.py --config configs/cifar/wrn.yaml

DenseNet

python train.py --config configs/cifar/densenet.yaml

PyramidNet

python train.py --config configs/cifar/pyramidnet.yaml \    model.pyramidnet.depth 110 \    model.pyramidnet.block_type basic \    model.pyramidnet.alpha 84 \    train.output_dir experiments/pyramidnet_basic_110_84/exp00

python train.py --config configs/cifar/pyramidnet.yaml \    model.pyramidnet.depth 110 \    model.pyramidnet.block_type basic \    model.pyramidnet.alpha 270 \    train.output_dir experiments/pyramidnet_basic_110_270/exp00

ResNeXt

python train.py --config configs/cifar/resnext.yaml \    model.resnext.cardinality 4 \    train.batch_size 32 \    train.base_lr 0.025 \    train.output_dir experiments/resnext_29_4x64d/exp00

python train.py --config configs/cifar/resnext.yaml \    train.batch_size 64 \    train.base_lr 0.05 \    train.output_dir experiments/resnext_29_8x64d/exp00

shake-shake

python train.py --config configs/cifar/shake_shake.yaml \    model.shake_shake.initial_channels 32 \    train.output_dir experiments/shake_shake_26_2x32d_SSI/exp00

python train.py --config configs/cifar/shake_shake.yaml \    model.shake_shake.initial_channels 64 \    train.batch_size 64 \    train.base_lr 0.1 \    train.output_dir experiments/shake_shake_26_2x64d_SSI/exp00

python train.py --config configs/cifar/shake_shake.yaml \    model.shake_shake.initial_channels 96 \    train.batch_size 64 \    train.base_lr 0.1 \    train.output_dir experiments/shake_shake_26_2x96d_SSI/exp00

Results

ModelTest Error (1 run)# of EpochsTraining Time
ResNet-preact-20, widening factor 44.912001h26m
ResNet-preact-20, widening factor 44.014002h53m
ResNet-preact-20, widening factor 43.99180012h53m
ResNet-preact-20, widening factor 4, Cutout 163.712001h26m
ResNet-preact-20, widening factor 4, Cutout 163.464002h53m
ResNet-preact-20, widening factor 4, Cutout 163.76180012h53m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.452001h26m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.114002h53m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.15180012h53m
ModelTest Error (1 run)# of EpochsTraining Time
WRN-28-10, Cutout 163.192006h35m
WRN-28-10, mixup (alpha=1)3.322006h35m
WRN-28-10, RICAP (beta=0.3)2.832006h35m
WRN-28-10, Dual-Cutout (alpha=0.1)2.8720012h42m
WRN-28-10, Cutout 163.0740013h10m
WRN-28-10, mixup (alpha=1)3.0440013h08m
WRN-28-10, RICAP (beta=0.3)2.7140013h08m
WRN-28-10, Dual-Cutout (alpha=0.1)2.7640025h20m
shake-shake-26 2x64d, Cutout 162.64180078h55m*
shake-shake-26 2x64d, mixup (alpha=1)2.63180035h56m
shake-shake-26 2x64d, RICAP (beta=0.3)2.29180035h10m
shake-shake-26 2x64d, Dual-Cutout (alpha=0.1)2.64180068h34m
shake-shake-26 2x96d, Cutout 162.50180060h20m
shake-shake-26 2x96d, mixup (alpha=1)2.36180060h20m
shake-shake-26 2x96d, RICAP (beta=0.3)2.10180060h20m
shake-shake-26 2x96d, Dual-Cutout (alpha=0.1)2.411800113h09m
shake-shake-26 2x128d, Cutout 162.58180085h04m
shake-shake-26 2x128d, RICAP (beta=0.3)1.97180085h06m

Note

  • Results reported in the table are the test errors at last epochs.
  • All models are trained using cosine annealing with initial learning rate 0.2.
  • GeForce GTX 1080 Ti was used in these experiments, except ones with *, which are done using GeForce GTX 980.
python train.py --config configs/cifar/wrn.yaml \    train.batch_size 64 \    train.output_dir experiments/wrn_28_10_cutout16 \    scheduler.type cosine \    augmentation.use_cutout True

python train.py --config configs/cifar/shake_shake.yaml \    model.shake_shake.initial_channels 64 \    train.batch_size 64 \    train.base_lr 0.1 \    scheduler.epochs 300 \    train.output_dir experiments/shake_shake_26_2x64d_SSI_cutout16/exp00 \    augmentation.use_cutout True

Results using multi-GPU

Modelbatch size#GPUsTest Error (1 run)# of EpochsTraining Time*
WRN-28-10, RICAP (beta=0.3)51212.632003h41m
WRN-28-10, RICAP (beta=0.3)25622.712002h14m
WRN-28-10, RICAP (beta=0.3)12842.892001h01m
WRN-28-10, RICAP (beta=0.3)6482.7520034m

Note

  • Tesla V100 was used in these experiments.
Using 1 GPU
python train.py --config configs/cifar/wrn.yaml \    train.base_lr 0.2 \    train.batch_size 512 \    scheduler.epochs 200 \    scheduler.type cosine \    train.output_dir experiments/wrn_28_10_ricap_1gpu/exp00 \    augmentation.use_ricap True \    augmentation.use_random_crop False
Using 2 GPUs
python -m torch.distributed.launch --nproc_per_node 2 \    train.py --config configs/cifar/wrn.yaml \    train.distributed True \    train.base_lr 0.2 \    train.batch_size 256 \    scheduler.epochs 200 \    scheduler.type cosine \    train.output_dir experiments/wrn_28_10_ricap_2gpus/exp00 \    augmentation.use_ricap True \    augmentation.use_random_crop False
Using 4 GPUs
python -m torch.distributed.launch --nproc_per_node 4 \    train.py --config configs/cifar/wrn.yaml \    train.distributed True \    train.base_lr 0.2 \    train.batch_size 128 \    scheduler.epochs 200 \    scheduler.type cosine \    train.output_dir experiments/wrn_28_10_ricap_4gpus/exp00 \    augmentation.use_ricap True \    augmentation.use_random_crop False
Using 8 GPUs
python -m torch.distributed.launch --nproc_per_node 8 \    train.py --config configs/cifar/wrn.yaml \    train.distributed True \    train.base_lr 0.2 \    train.batch_size 64 \    scheduler.epochs 200 \    scheduler.type cosine \    train.output_dir experiments/wrn_28_10_ricap_8gpus/exp00 \    augmentation.use_ricap True \    augmentation.use_random_crop False

Results on FashionMNIST

ModelTest Error (1 run)# of EpochsTraining Time
ResNet-preact-20, widening factor 4, Cutout 124.172001h32m
ResNet-preact-20, widening factor 4, Cutout 144.112001h32m
ResNet-preact-50, Cutout 124.4520057m
ResNet-preact-50, Cutout 144.3820057m
ResNet-preact-50, widening factor 4,Cutout 124.072003h37m
ResNet-preact-50, widening factor 4,Cutout 144.132003h39m
shake-shake-26 2x32d (S-S-I), Cutout 124.084003h41m
shake-shake-26 2x32d (S-S-I), Cutout 144.054003h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.7240013h46m
shake-shake-26 2x96d (S-S-I), Cutout 143.8540013h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.6580026h42m
shake-shake-26 2x96d (S-S-I), Cutout 143.6080026h42m
ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-205.0420026m
ResNet-preact-20, Cutout 64.8420026m
ResNet-preact-20, Cutout 84.6420026m
ResNet-preact-20, Cutout 104.7420026m
ResNet-preact-20, Cutout 124.6820026m
ResNet-preact-20, Cutout 144.6420026m
ResNet-preact-20, Cutout 164.4920026m
ResNet-preact-20, RandomErasing4.6120026m
ResNet-preact-20, Mixup4.9220026m
ResNet-preact-20, Mixup4.6440052m

Note

  • Results reported in the tables are the test errors at last epochs.
  • All models are trained using cosine annealing with initial learning rate 0.2.
  • Following data augmentations are applied to the training data:
    • Images are padded with 4 pixels on each side, and 28x28 patches are randomly cropped from the padded images.
    • Images are randomly flipped horizontally.
  • GeForce GTX 1080 Ti was used in these experiments.

Results on MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-200.4010012m
ResNet-preact-20, Cutout 60.3210012m
ResNet-preact-20, Cutout 80.2510012m
ResNet-preact-20, Cutout 100.2710012m
ResNet-preact-20, Cutout 120.2610012m
ResNet-preact-20, Cutout 140.2610012m
ResNet-preact-20, Cutout 160.2510012m
ResNet-preact-20, Mixup (alpha=1)0.4010012m
ResNet-preact-20, Mixup (alpha=0.5)0.3810012m
ResNet-preact-20, widening factor 4, Cutout 140.2610045m
ResNet-preact-50, Cutout 140.2910028m
ResNet-preact-50, widening factor 4, Cutout 140.251001h50m
shake-shake-26 2x96d (S-S-I), Cutout 140.241003h22m

Note

  • Results reported in the table are the test errors at last epochs.
  • All models are trained using cosine annealing with initial learning rate 0.2.
  • GeForce GTX 1080 Ti was used in these experiments.

Results on Kuzushiji-MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-20, Cutout 140.82 (best 0.67)20024m
ResNet-preact-20, widening factor 4, Cutout 140.72 (best 0.67)2001h30m
PyramidNet-110-270, Cutout 140.72 (best 0.70)20010h05m
shake-shake-26 2x96d (S-S-I), Cutout 140.66 (best 0.63)2006h46m

Note

  • Results reported in the table are the test errors at last epochs.
  • All models are trained using cosine annealing with initial learning rate 0.2.
  • GeForce GTX 1080 Ti was used in these experiments.

Experiments

Experiment on residual units, learning rate scheduling, and data augmentation

In this experiment, the effects of the following on classification accuracy are investigated:

  • PyramidNet-like residual units
  • Cosine annealing of learning rate
  • Cutout
  • Random Erasing
  • Mixup
  • Preactivation of shortcuts after downsampling

ResNet-preact-56 is trained on CIFAR-10 with initial learning rate 0.2 in this experiment.

Note

  • PyramidNet paper (1610.02915) showed that removing first ReLU in residual units and adding BN after last convolutions in residual units both improve classification accuracy.
  • SGDR paper (1608.03983) showed cosine annealing improves classification accuracy even without restarting.

Results

  • PyramidNet-like units works.
    • It might be better not to preactivate shortcuts after downsampling when using PyramidNet-like units.
  • Cosine annealing slightly improves accuracy.
  • Cutout, RandomErasing, and Mixup all work great.
    • Mixup needs longer training.

ModelTest Error (median of 5 runs)Training Time
w/ 1st ReLU, w/o last BN, preactivate shortcut after downsampling6.4595 min
w/ 1st ReLU, w/o last BN6.4795 min
w/o 1st ReLU, w/o last BN6.1489 min
w/ 1st ReLU, w/ last BN6.43104 min
w/o 1st ReLU, w/ last BN5.8598 min
w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling6.2798 min
w/o 1st ReLU, w/ last BN, Cosine annealing5.7298 min
w/o 1st ReLU, w/ last BN, Cutout4.9698 min
w/o 1st ReLU, w/ last BN, RandomErasing5.2298 min
w/o 1st ReLU, w/ last BN, Mixup (300 epochs)5.11191 min
preactivate shortcut after downsampling
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, True, True]' \    model.resnet_preact.remove_first_relu False \    model.resnet_preact.add_last_bn False \    train.output_dir experiments/resnet_preact_after_downsampling/exp00

w/ 1st ReLU, w/o last BN
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu False \    model.resnet_preact.add_last_bn False \    train.output_dir experiments/resnet_preact_w_relu_wo_bn/exp00

w/o 1st ReLU, w/o last BN
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn False \    train.output_dir experiments/resnet_preact_wo_relu_wo_bn/exp00

w/ 1st ReLU, w/ last BN
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu False \    model.resnet_preact.add_last_bn True \    train.output_dir experiments/resnet_preact_w_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    train.output_dir experiments/resnet_preact_wo_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, True, True]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    train.output_dir experiments/resnet_preact_after_downsampling_wo_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN, cosine annealing
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_wo_relu_w_bn_cosine/exp00

w/o 1st ReLU, w/ last BN, Cutout
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    augmentation.use_cutout True \    train.output_dir experiments/resnet_preact_wo_relu_w_bn_cutout/exp00

w/o 1st ReLU, w/ last BN, RandomErasing
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    augmentation.use_random_erasing True \    train.output_dir experiments/resnet_preact_wo_relu_w_bn_random_erasing/exp00

w/o 1st ReLU, w/ last BN, Mixup
python train.py --config configs/cifar/resnet_preact.yaml \    train.base_lr 0.2 \    model.resnet_preact.depth 56 \    model.resnet_preact.preact_stage'[True, False, False]' \    model.resnet_preact.remove_first_relu True \    model.resnet_preact.add_last_bn True \    augmentation.use_mixup True \    train.output_dir experiments/resnet_preact_wo_relu_w_bn_mixup/exp00

Experiments on label smoothing, Mixup, RICAP, and Dual-Cutout

Results on CIFAR-10

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-207.6020024m
ResNet-preact-20, label smoothing (epsilon=0.001)7.5120025m
ResNet-preact-20, label smoothing (epsilon=0.01)7.2120025m
ResNet-preact-20, label smoothing (epsilon=0.1)7.5720025m
ResNet-preact-20, mixup (alpha=1)7.2420026m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.8820028m
ResNet-preact-20, RICAP (beta=0.3)6.7720028m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)6.2420045m
ResNet-preact-207.0540049m
ResNet-preact-20, label smoothing (epsilon=0.001)7.2040049m
ResNet-preact-20, label smoothing (epsilon=0.01)6.9740049m
ResNet-preact-20, label smoothing (epsilon=0.1)7.1640049m
ResNet-preact-20, mixup (alpha=1)6.6640051m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.3040056m
ResNet-preact-20, RICAP (beta=0.3)6.1940056m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)5.554001h36m

Note

  • Results reported in the table are the test errors at last epochs.
  • All models are trained using cosine annealing with initial learning rate 0.2.
  • GeForce GTX 1080 Ti was used in these experiments.

Experiments on batch size and learning rate

  • Following experiments are done on CIFAR-10 dataset using GeForce 1080 Ti.
  • Results reported in the table are the test errors at last epochs.

Linear scaling rule for learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2020481.6cosine2008.8721m
ResNet-preact-2010240.8cosine2008.4021m
ResNet-preact-205120.4cosine2008.2220m
ResNet-preact-202560.2cosine2008.6122m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-20640.05cosine2008.2228m
ResNet-preact-20320.025cosine2008.0043m
ResNet-preact-20160.0125cosine2007.751h17m
ResNet-preact-2080.006125cosine2007.702h32m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2multistep20028.9722m
ResNet-preact-2020481.6multistep2009.0721m
ResNet-preact-2010240.8multistep2008.6221m
ResNet-preact-205120.4multistep2008.2320m
ResNet-preact-202560.2multistep2008.4021m
ResNet-preact-201280.1multistep2008.2824m
ResNet-preact-20640.05multistep2008.1328m
ResNet-preact-20320.025multistep2007.5843m
ResNet-preact-20160.0125multistep2007.931h18m
ResNet-preact-2080.006125multistep2008.312h34m

Linear scaling + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine4008.9744m
ResNet-preact-2020481.6cosine4007.8543m
ResNet-preact-2010240.8cosine4007.2042m
ResNet-preact-205120.4cosine4007.8340m
ResNet-preact-202560.2cosine4007.6542m
ResNet-preact-201280.1cosine4007.0947m
ResNet-preact-20640.05cosine4007.1744m
ResNet-preact-20320.025cosine4007.242h11m
ResNet-preact-20160.0125cosine4007.264h10m
ResNet-preact-2080.006125cosine4007.027h53m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine8008.141h29m
ResNet-preact-2020481.6cosine8007.741h23m
ResNet-preact-2010240.8cosine8007.151h31m
ResNet-preact-205120.4cosine8007.271h25m
ResNet-preact-202560.2cosine8007.221h26m
ResNet-preact-201280.1cosine8006.681h35m
ResNet-preact-20640.05cosine8007.182h20m
ResNet-preact-20320.025cosine8007.034h16m
ResNet-preact-20160.0125cosine8006.788h37m
ResNet-preact-2080.006125cosine8006.8916h47m

Effect of initial learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2040961.6cosine20010.3222m
ResNet-preact-2040960.8cosine20010.7122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020483.2cosine20011.3421m
ResNet-preact-2020482.4cosine2008.6921m
ResNet-preact-2020482.0cosine2008.8121m
ResNet-preact-2020481.6cosine2008.7322m
ResNet-preact-2020480.8cosine2009.6221m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2010243.2cosine2009.1221m
ResNet-preact-2010242.4cosine2008.4222m
ResNet-preact-2010242.0cosine2008.3822m
ResNet-preact-2010241.6cosine2008.0722m
ResNet-preact-2010241.2cosine2008.2521m
ResNet-preact-2010240.8cosine2008.0822m
ResNet-preact-2010240.4cosine2008.4922m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-205123.2cosine2008.5121m
ResNet-preact-205121.6cosine2007.7320m
ResNet-preact-205120.8cosine2007.7321m
ResNet-preact-205120.4cosine2008.2220m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-202563.2cosine2009.6422m
ResNet-preact-202561.6cosine2008.3222m
ResNet-preact-202560.8cosine2007.4521m
ResNet-preact-202560.4cosine2007.6822m
ResNet-preact-202560.2cosine2008.6122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-201281.6cosine2009.0324m
ResNet-preact-201280.8cosine2007.5424m
ResNet-preact-201280.4cosine2007.2824m
ResNet-preact-201280.2cosine2007.9624m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-201280.05cosine2008.8124m
ResNet-preact-201280.025cosine20010.0724m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20640.4cosine2007.4235m
ResNet-preact-20640.2cosine2007.5236m
ResNet-preact-20640.1cosine2007.7837m
ResNet-preact-20640.05cosine2008.2228m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20320.2cosine2007.641h05m
ResNet-preact-20320.1cosine2007.251h08m
ResNet-preact-20320.05cosine2007.451h07m
ResNet-preact-20320.025cosine2008.0043m

Good learning rate + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine20010.3222m
ResNet-preact-2020481.6cosine2008.7322m
ResNet-preact-2010241.6cosine2008.0722m
ResNet-preact-2010240.8cosine2008.0822m
ResNet-preact-205121.6cosine2007.7320m
ResNet-preact-205120.8cosine2007.7321m
ResNet-preact-202560.8cosine2007.4521m
ResNet-preact-201280.4cosine2007.2824m
ResNet-preact-201280.2cosine2007.9624m
ResNet-preact-201280.1cosine2008.0924m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine8008.361h33m
ResNet-preact-2020481.6cosine8007.531h27m
ResNet-preact-2010241.6cosine8007.301h30m
ResNet-preact-2010240.8cosine8007.421h30m
ResNet-preact-205121.6cosine8006.691h26m
ResNet-preact-205120.8cosine8006.771h26m
ResNet-preact-202560.8cosine8006.841h28m
ResNet-preact-201280.4cosine8006.861h35m
ResNet-preact-201280.2cosine8007.051h38m
ResNet-preact-201280.1cosine8006.681h35m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine16008.253h10m
ResNet-preact-2020481.6cosine16007.342h50m
ResNet-preact-2010241.6cosine16006.942h52m
ResNet-preact-205121.6cosine16006.992h44m
ResNet-preact-202560.8cosine16006.952h50m
ResNet-preact-201280.4cosine16006.643h09m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine32009.526h15m
ResNet-preact-2020481.6cosine32006.925h42m
ResNet-preact-2010241.6cosine32006.965h43m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020481.6cosine64007.4511h44m

LARS

  • In the original papers (1708.03888, 1801.03137), they used polynomial decay learning rate scheduling, but cosine annealing is used in these experiments.
  • In this implementation, LARS coefficient is not used, so learning rate should be adjusted accordingly.
python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 20 \    train.optimizer lars \    train.base_lr 0.02 \    train.batch_size 4096 \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_lars/exp00

Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD40963.2cosine20010.57 (1 run)22m
ResNet-preact-20SGD40961.6cosine20010.2022m
ResNet-preact-20SGD40960.8cosine20010.71 (1 run)22m
ResNet-preact-20LARS40960.04cosine2009.5822m
ResNet-preact-20LARS40960.03cosine2008.4622m
ResNet-preact-20LARS40960.02cosine2008.2122m
ResNet-preact-20LARS40960.015cosine2008.4722m
ResNet-preact-20LARS40960.01cosine2009.3322m
ResNet-preact-20LARS40960.005cosine20014.3122m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD20483.2cosine20011.34 (1 run)21m
ResNet-preact-20SGD20482.4cosine2008.69 (1 run)21m
ResNet-preact-20SGD20482.0cosine2008.81 (1 run)21m
ResNet-preact-20SGD20481.6cosine2008.73 (1 run)22m
ResNet-preact-20SGD20480.8cosine2009.62 (1 run)21m
ResNet-preact-20LARS20480.04cosine20011.5821m
ResNet-preact-20LARS20480.02cosine2008.0522m
ResNet-preact-20LARS20480.01cosine2008.0722m
ResNet-preact-20LARS20480.005cosine2009.6522m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD10243.2cosine2009.12 (1 run)21m
ResNet-preact-20SGD10242.4cosine2008.42 (1 run)22m
ResNet-preact-20SGD10242.0cosine2008.38 (1 run)22m
ResNet-preact-20SGD10241.6cosine2008.07 (1 run)22m
ResNet-preact-20SGD10241.2cosine2008.25 (1 run)21m
ResNet-preact-20SGD10240.8cosine2008.08 (1 run)22m
ResNet-preact-20SGD10240.4cosine2008.49 (1 run)22m
ResNet-preact-20LARS10240.02cosine2009.3022m
ResNet-preact-20LARS10240.01cosine2007.6822m
ResNet-preact-20LARS10240.005cosine2008.8823m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD5123.2cosine2008.51 (1 run)21m
ResNet-preact-20SGD5121.6cosine2007.73 (1 run)20m
ResNet-preact-20SGD5120.8cosine2007.73 (1 run)21m
ResNet-preact-20SGD5120.4cosine2008.22 (1 run)20m
ResNet-preact-20LARS5120.015cosine2009.8423m
ResNet-preact-20LARS5120.01cosine2008.0523m
ResNet-preact-20LARS5120.0075cosine2007.5823m
ResNet-preact-20LARS5120.005cosine2007.9623m
ResNet-preact-20LARS5120.0025cosine2008.8323m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD2563.2cosine2009.64 (1 run)22m
ResNet-preact-20SGD2561.6cosine2008.32 (1 run)22m
ResNet-preact-20SGD2560.8cosine2007.45 (1 run)21m
ResNet-preact-20SGD2560.4cosine2007.68 (1 run)22m
ResNet-preact-20SGD2560.2cosine2008.61 (1 run)22m
ResNet-preact-20LARS2560.01cosine2008.9527m
ResNet-preact-20LARS2560.005cosine2007.7528m
ResNet-preact-20LARS2560.0025cosine2008.2128m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD1281.6cosine2009.03 (1 run)24m
ResNet-preact-20SGD1280.8cosine2007.54 (1 run)24m
ResNet-preact-20SGD1280.4cosine2007.28 (1 run)24m
ResNet-preact-20SGD1280.2cosine2007.96 (1 run)24m
ResNet-preact-20LARS1280.005cosine2007.9637m
ResNet-preact-20LARS1280.0025cosine2007.9837m
ResNet-preact-20LARS1280.00125cosine2009.2137m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD40961.6cosine20010.2022m
ResNet-preact-20SGD40961.6cosine8008.36 (1 run)1h33m
ResNet-preact-20SGD40961.6cosine16008.25 (1 run)3h10m
ResNet-preact-20LARS40960.02cosine2008.2122m
ResNet-preact-20LARS40960.02cosine4007.5344m
ResNet-preact-20LARS40960.02cosine8007.481h29m
ResNet-preact-20LARS40960.02cosine16007.37 (1 run)2h58m

Ghost BN

python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 20 \    train.base_lr 1.5 \    train.batch_size 4096 \    train.subdivision 32 \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_ghost_batch/exp00
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-208192N/A1.6cosine20012.3525m*
ResNet-preact-204096N/A1.6cosine20010.3222m
ResNet-preact-202048N/A1.6cosine2008.7322m
ResNet-preact-201024N/A1.6cosine2008.0722m
ResNet-preact-20128N/A0.4cosine2007.2824m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2081921281.6cosine20011.5127m
ResNet-preact-2040961281.6cosine2009.7325m
ResNet-preact-2020481281.6cosine2008.7724m
ResNet-preact-2010241281.6cosine2007.8222m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-208192N/A1.6cosine1600
ResNet-preact-204096N/A1.6cosine16008.253h10m
ResNet-preact-202048N/A1.6cosine16007.342h50m
ResNet-preact-201024N/A1.6cosine16006.942h52m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2081921281.6cosine160011.833h37m
ResNet-preact-2040961281.6cosine16008.953h15m
ResNet-preact-2020481281.6cosine16007.233h05m
ResNet-preact-2010241281.6cosine16007.082h59m

No weight decay on BN

python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 20 \    train.base_lr 1.6 \    train.batch_size 4096 \    train.no_weight_decay_on_bn True \    train.weight_decay 5e-4 \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_no_weight_decay_on_bn/exp00

Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-440961.6cosine20010.8122m
ResNet-preact-20yes4e-440961.6cosine20010.8822m
ResNet-preact-20yes3e-440961.6cosine20010.9622m
ResNet-preact-20yes2e-440961.6cosine2009.3022m
ResNet-preact-20yes1e-440961.6cosine20010.2022m
ResNet-preact-20no5e-440961.6cosine2008.7822m
ResNet-preact-20no4e-440961.6cosine2009.8322m
ResNet-preact-20no3e-440961.6cosine2009.9022m
ResNet-preact-20no2e-440961.6cosine2009.6422m
ResNet-preact-20no1e-440961.6cosine20010.3822m
Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-420481.6cosine2008.4620m
ResNet-preact-20yes4e-420481.6cosine2008.3520m
ResNet-preact-20yes3e-420481.6cosine2007.7620m
ResNet-preact-20yes2e-420481.6cosine2008.0920m
ResNet-preact-20yes1e-420481.6cosine2008.8320m
ResNet-preact-20no5e-420481.6cosine2008.4920m
ResNet-preact-20no4e-420481.6cosine2007.9820m
ResNet-preact-20no3e-420481.6cosine2008.2620m
ResNet-preact-20no2e-420481.6cosine2008.4720m
ResNet-preact-20no1e-420481.6cosine2009.2720m
Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-410241.6cosine2008.4521m
ResNet-preact-20yes4e-410241.6cosine2007.9121m
ResNet-preact-20yes3e-410241.6cosine2007.8121m
ResNet-preact-20yes2e-410241.6cosine2007.6921m
ResNet-preact-20yes1e-410241.6cosine2008.2621m
ResNet-preact-20no5e-410241.6cosine2008.0821m
ResNet-preact-20no4e-410241.6cosine2007.7321m
ResNet-preact-20no3e-410241.6cosine2007.9221m
ResNet-preact-20no2e-410241.6cosine2007.9321m
ResNet-preact-20no1e-410241.6cosine2008.5321m

Experiments on half-precision, and mixed-precision

  • Following experiments needNVIDIA Apex.
  • Following experiments are done on CIFAR-10 dataset using GeForce 1080 Ti, which doesn't have Tensor Cores.
  • Results reported in the table are the test errors at last epochs.

FP16 training

python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 20 \    train.base_lr 1.6 \    train.batch_size 4096 \    train.precision O3 \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_fp16/exp00

Mixed-precision training

python train.py --config configs/cifar/resnet_preact.yaml \    model.resnet_preact.depth 20 \    train.base_lr 1.6 \    train.batch_size 4096 \    train.precision O1 \    scheduler.type cosine \    train.output_dir experiments/resnet_preact_mixed_precision/exp00

Results

Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP3281921.6cosine200
ResNet-preact-20FP3240961.6cosine20010.3222m
ResNet-preact-20FP3220481.6cosine2008.7322m
ResNet-preact-20FP3210241.6cosine2008.0722m
ResNet-preact-20FP325120.8cosine2007.7321m
ResNet-preact-20FP322560.8cosine2007.4521m
ResNet-preact-20FP321280.4cosine2007.2824m
Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP1681921.6cosine20048.5233m
ResNet-preact-20FP1640961.6cosine20049.8428m
ResNet-preact-20FP1620481.6cosine20075.6327m
ResNet-preact-20FP1610241.6cosine20019.0927m
ResNet-preact-20FP165120.8cosine2007.8926m
ResNet-preact-20FP162560.8cosine2007.4028m
ResNet-preact-20FP161280.4cosine2007.5932m
Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20mixed81921.6cosine20011.7828m
ResNet-preact-20mixed40961.6cosine20010.4827m
ResNet-preact-20mixed20481.6cosine2008.9826m
ResNet-preact-20mixed10241.6cosine2008.0526m
ResNet-preact-20mixed5120.8cosine2007.8128m
ResNet-preact-20mixed2560.8cosine2007.5832m
ResNet-preact-20mixed1280.4cosine2007.3741m

Results using Tesla V100

Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP3281921.6cosine20012.3525m
ResNet-preact-20FP3240961.6cosine2009.8819m
ResNet-preact-20FP3220481.6cosine2008.8717m
ResNet-preact-20FP3210241.6cosine2008.4518m
ResNet-preact-20mixed81921.6cosine20011.9225m
ResNet-preact-20mixed40961.6cosine20010.1619m
ResNet-preact-20mixed20481.6cosine2009.1017m
ResNet-preact-20mixed10241.6cosine2007.8416m

References

Model architecture

  • He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.link,arXiv:1512.03385
  • He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks." In European Conference on Computer Vision (ECCV). 2016.arXiv:1603.05027,Torch implementation
  • Zagoruyko, Sergey, and Nikos Komodakis. "Wide Residual Networks." Proceedings of the British Machine Vision Conference (BMVC), 2016.arXiv:1605.07146,Torch implementation
  • Huang, Gao, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. "Densely Connected Convolutional Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.link,arXiv:1608.06993,Torch implementation
  • Han, Dongyoon, Jiwhan Kim, and Junmo Kim. "Deep Pyramidal Residual Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.link,arXiv:1610.02915,Torch implementation,Caffe implementation,PyTorch implementation
  • Xie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. "Aggregated Residual Transformations for Deep Neural Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.link,arXiv:1611.05431,Torch implementation
  • Gastaldi, Xavier. "Shake-Shake regularization of 3-branch residual networks." In International Conference on Learning Representations (ICLR) Workshop, 2017.link,arXiv:1705.07485,Torch implementation
  • Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-Excitation Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132-7141.link,arXiv:1709.01507,Caffe implementation
  • Huang, Gao, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten, and Kilian Q. Weinberger. "Convolutional Networks with Dense Connectivity." IEEE transactions on pattern analysis and machine intelligence (2019).arXiv:2001.02394

Regularization, data augmentation

  • Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. "Rethinking the Inception Architecture for Computer Vision." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.link,arXiv:1512.00567
  • DeVries, Terrance, and Graham W. Taylor. "Improved Regularization of Convolutional Neural Networks with Cutout." arXiv preprint arXiv:1708.04552 (2017).arXiv:1708.04552,PyTorch implementation
  • Abu-El-Haija, Sami. "Proportionate Gradient Updates with PercentDelta." arXiv preprint arXiv:1708.07227 (2017).arXiv:1708.07227
  • Zhong, Zhun, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. "Random Erasing Data Augmentation." arXiv preprint arXiv:1708.04896 (2017).arXiv:1708.04896,PyTorch implementation
  • Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. "mixup: Beyond Empirical Risk Minimization." In International Conference on Learning Representations (ICLR), 2017.link,arXiv:1710.09412
  • Kawaguchi, Kenji, Yoshua Bengio, Vikas Verma, and Leslie Pack Kaelbling. "Towards Understanding Generalization via Analytical Learning Theory." arXiv preprint arXiv:1802.07426 (2018).arXiv:1802.07426,PyTorch implementation
  • Takahashi, Ryo, Takashi Matsubara, and Kuniaki Uehara. "Data Augmentation using Random Image Cropping and Patching for Deep CNNs." Proceedings of The 10th Asian Conference on Machine Learning (ACML), 2018.link,arXiv:1811.09030
  • Yun, Sangdoo, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features." arXiv preprint arXiv:1905.04899 (2019).arXiv:1905.04899

Large batch

  • Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima." In International Conference on Learning Representations (ICLR), 2017.link,arXiv:1609.04836
  • Hoffer, Elad, Itay Hubara, and Daniel Soudry. "Train longer, generalize better: closing the generalization gap in large batch training of neural networks." In Advances in Neural Information Processing Systems (NIPS), 2017.link,arXiv:1705.08741,PyTorch implementation
  • Goyal, Priya, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour." arXiv preprint arXiv:1706.02677 (2017).arXiv:1706.02677
  • You, Yang, Igor Gitman, and Boris Ginsburg. "Large Batch Training of Convolutional Networks." arXiv preprint arXiv:1708.03888 (2017).arXiv:1708.03888
  • You, Yang, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. "ImageNet Training in Minutes." arXiv preprint arXiv:1709.05011 (2017).arXiv:1709.05011
  • Smith, Samuel L., Pieter-Jan Kindermans, Chris Ying, and Quoc V. Le. "Don't Decay the Learning Rate, Increase the Batch Size." In International Conference on Learning Representations (ICLR), 2018.link,arXiv:1711.00489
  • Gitman, Igor, Deepak Dilipkumar, and Ben Parr. "Convergence Analysis of Gradient Descent Algorithms with Proportional Updates." arXiv preprint arXiv:1801.03137 (2018).arXiv:1801.03137TensorFlow implementation
  • Jia, Xianyan, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, and Xiaowen Chu. "Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes." arXiv preprint arXiv:1807.11205 (2018).arXiv:1807.11205
  • Shallue, Christopher J., Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E. Dahl. "Measuring the Effects of Data Parallelism on Neural Network Training." arXiv preprint arXiv:1811.03600 (2018).arXiv:1811.03600
  • Ying, Chris, Sameer Kumar, Dehao Chen, Tao Wang, and Youlong Cheng. "Image Classification at Supercomputer Scale." In Advances in Neural Information Processing Systems (NeurIPS) Workshop, 2018.link,arXiv:1811.06992

Others

  • Loshchilov, Ilya, and Frank Hutter. "SGDR: Stochastic Gradient Descent with Warm Restarts." In International Conference on Learning Representations (ICLR), 2017.link,arXiv:1608.03983,Lasagne implementation
  • Micikevicius, Paulius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. "Mixed Precision Training." In International Conference on Learning Representations (ICLR), 2018.link,arXiv:1710.03740
  • Recht, Benjamin, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. "Do CIFAR-10 Classifiers Generalize to CIFAR-10?" arXiv preprint arXiv:1806.00451 (2018).arXiv:1806.00451
  • He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. "Bag of Tricks for Image Classification with Convolutional Neural Networks." arXiv preprint arXiv:1812.01187 (2018).arXiv:1812.01187

About

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp