Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SimpNet Paper Files (Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet)

NotificationsYou must be signed in to change notification settings

Coderx7/SimpNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 

Repository files navigation

SimpNet

This repository contains the architectures, pretrained models, logs, etc pertaining to the SimpNet Paper (Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet) :https://arxiv.org/abs/1802.06205

Abstract :

Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing power and efficiency. These inefficiencies mostly stem from following an ad-hoc designing procedure. In this work, we discuss and propose several crucial design principles for an efficient architecture design and elaborate intuitions concerning different aspects of the design procedure. Furthermore, we introduce a new layer calledSAF-pooling to improve the generalization power of the network while keeping it simple by choosing best features. Based on such principles, we propose a simple architecture calledSimpNet. We empirically show thatSimpNet provides a good trade-off between the computation/memory efficiency and the accuracy solely based on these primitive but crucial principles. SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations. We obtain state-of-the-art results (in terms of a balance between the accuracy and the number of involved parameters) on standard datasets, such as CIFAR10, CIFAR100, MNIST and SVHN.

The main contributions of this work are as follows:

  1. Introducing several crucial principles for designing deep convolutional architectures, which are backed up by extensive experiments and discussions in comparison with the literature.

  2. Based on such principles, It puts under the test the validity of some of the previously considered best practices. such as Strided Convolutions vs MaxPooling, Overlapped Pooling vs Nonoverlapped Pooling, etc. Furthermore, it tries to provide intuitive understanding of each point as to why one should be used instead of the other.

  3. A new architecture called SimpNet is proposed to verify the mentioned principles. Based on such design principles, the architecture is allowed to become superior to its predecessor (SimpleNet), while still retaining the same number of parameters and maintaining simplicity in design, while outperforming deeper and more complex architectures (from 2 to 25X), such as Wide Residual Networks, ResNet, FMax, etc., on a series of highly compatative benchmark datasets (e.g., CIFAR10/100, SVHN and MNIST).

Citation

If you find SimpleNet useful in your research, please consider citing:

@article{hasanpour2018towards,  title={Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet},  author={Hasanpour, Seyyed Hossein and Rouhani, Mohammad and Fayyaz, Mohsen and Sabokrou, Mohammad and Adeli, Ehsan},  journal={arXiv preprint arXiv:1802.06205},  year={2018}}

Contents :

1- Results Overview

2- Data-Augmentation /Preprocessing

3- Principle Experiments Overview

Results Overview :

Top CIFAR10/100 results:

Method#ParamsCIFAR10CIFAR100
VGGNet(16L) /Enhanced138m91.4 / 92.45-
ResNet-110L / 1202L *1.7/10.2m93.57 / 92.0774.84/72.18
SD-110L / 1202L1.7/10.2m94.77 / 95.0975.42 / -
WRN-(16/8)/(28/10)11/36m95.19 / 95.8377.11/79.5
DenseNet27.2m96.2680.75
Highway NetworkN/A92.4067.76
FitNet1M91.6164.96
FMP* (1 tests)12M95.5073.61
Max-out(k=2)6M90.6265.46
Network in Network1M91.1964.32
DSN1M92.0365.43
Max-out NIN-93.2571.14
LSUVN/A94.16N/A
SimpNet5.4M95.6978.16
SimpNet8.9M96.1279.53
SimpNet(†)15M96.2080.29
SimpNet(†)25M96.29N/A

(†): Unfinished tests. the results are not finalized and training continues. These models are simply tested without any hyperparameter tuning, only to show how they perform compare to the DeseNet and WRNs. As the prelimnery results show, they outperform both architectures. The full details will be provided after the tests are finished.

Top SVHN results:

MethodError rate
Network in Network2.35
Deeply Supervised Net1.92
ResNet (reported by (2016))2.01
ResNet with Stochastic Depth1.75
DenseNet1.79-1.59
Wide ResNet2.08-1.64
SimpNet1.648
  • The slim version achieves 1.95% error rate.

Top MNIST results:

MethodError rate
Batch-normalized Max-out NIN0.24%
Max-out network (k=2)0.45%
Network In Network0.45%
Deeply Supervised Network0.39%
RCNN-960.31%
SimpNet0.25%
  • The slim version achives 99.73% accuracy.

Slim Version Results on CIFAR10/100 :

ModelParamCIFAR10CIFAR100
SimpNet300K - 600K93.25 - 94.0368.47 - 71.74
Maxout6M90.6265.46
DSN1M92.0365.43
ALLCNN1.3M92.7566.29
dasNet6M90.7866.22
ResNet(Depth32, tested by us)475K93.2267.37-68.95
WRN600K93.1569.11
NIN1M91.19

Data-Augmentation and Preprocessing :

As indicated in the paper, CIFAR10/100 use zero-padding and horizontal filipping.The script used for preprocessing CIFAR10/100 can be accessed fromhere

Principle Experiments : A Quick Overview :

Here is a quick overview of the tests conducted for every principle.
For the complete dicussion and further explanations concerning these experimentsplease read the paper.

Gradual Expansion with Minimum Allocation:

Network PropertiesParametersAccuracy (%)
Arch1, 8 Layers300K90.21
Arch1, 9 Layers300K90.55
Arch1, 10 Layers300K90.61
Arch1, 13 Layers300K89.78

Demonstrating how gradually expanding the network helps obtaining better performance. Increasing the depth up to a certain point improves the accuracy (up to 10 layers) and then after that it starts to degrade, indicating PLD issue taking place.

Network PropertiesParametersAccuracy (%)
Arch1, 6 Layers1.1M92.18
Arch1, 10 Layers570K92.23

Shallow vs Deep: showing how a gradual increase can yield better performance with fewer number of parameters.

Correlation Preservation:

Network PropertiesParametersAccuracy (%)
Arch4, (3× 3)300K90.21
Arch4, (3 × 3)1.6M92.14
Arch4, (5 × 5)1.6M90.99
Arch4, (7 × 7)300K.v186.09
Arch4, (7 × 7)300K.v288.57
Arch4, (7 × 7)1.6M89.22

Accuracy for different combinations of kernel sizes and number of network parameters, which demonstrates how correlation preservation can directly affect the overall accuracy.

Network PropertiesParamsAccuracy (%)
Arch5,13 Layers, (1 × 1) (2 × 2) (early layers)128K87.71 88.50
Arch5,13 Layers, (1 × 1) (2 × 2) (middle layers)128K88.16 88.51
Arch5,13 Layers, (1 × 1) (3 × 3) (smaller bigger end-avg)128K89.45 89.60
Arch5,11 Layers, (2 × 2) (3 × 3) (bigger learned feature-maps)128K89.30 89.44

Different kernel sizes applied on different parts of a network affect the overall performance, the kernel sizes that preserve the correlation the most yield the best accuracy. Also, the correlation is more important in early layers than it is for the later ones.

SqueezeNet test on CIFAR10 vs SimpNet (slim version).

NetworkParamsAccuracy (%)
SqueezeNet1.1_default768K88.60
SqueezeNet1.1_optimized768K92.20
SimpNet_Slim300K93.25
SimpNet_Slim600K94.03

Correlation Preservation: SqueezeNet vs SimpNet on CIFAR10. Byoptimizedwe mean, we added Batch-Normalization to all layers and used the sameoptimization policy we used to train SimpNet.

Maximum Information Utilization:

Network PropertiesParametersAccuracy (%)
Arch3,L5 default53K79.09
Arch3,L3 early pooling53K77.34
Arch3,L7 delayed pooling53K79.44

The effect of using pooling at different layers. Applying pooling earlyin the network adversely affects the performance.

Network PropertiesDepthParametersAccuracy (%)
SimpNet(*)13360K69.28
SimpNet(*)15360K68.89
SimpNet(†)15360K68.10
ResNet(*)32460K93.75
ResNet(†)32460K93.46

Effect of using strided convolution ((†)) Max-pooling ((*)).Max-pooling outperforms the strided convolution regardless of specificarchitecture. First three rows are tested on CIFAR100 and two last onCIFAR10.

Maximum Performance Utilization:

Table [tab:max_perf] demonstrates the performanceand elapsed time when different kernels are used. (3 × 3) has thebest performance among theothers.

Network Properties(3 × 3)(5 × 5)(7 × 7)
Accuracy (higher is better)92.1490.9989.22
Elapsed time(min)(lower is better)41.3245.2964.52

Maximum performance utilization using Caffe, cuDNNv6, networks have 1.6Mparameters and the same depth.

Balanced Distribution Scheme:

Network PropertiesParametersAccuracy (%)
Arch2, 10 Layers (wide end)8M95.19
Arch2, 10 Layers (balanced width)8M95.51
Arch2, 13 Layers (wide end)128K87.20
Arch2, 13 Layers (balanced width)128K89.70

Balanced distribution scheme is demonstrated by using two variants ofSimpNet architecture with 10 and 13 layers, each showing how thedifference in allocation results in varying performance and ultimatelyimprovements for the one with balanced distribution ofunits.

Rapid Prototyping In Isolation:

Network PropertiesAccuracy (%)
Use of (3 × 3) filters90.21
Use of (5 × 5) instead of (3 × 3)90.99

The importance of experiment isolation using the same architecture onceusing (3 × 3) and then using (5 × 5)kernels.

Network PropertiesAccuracy (%)
Use of (5 × 5) filters at the beginning89.53
Use of (5 × 5) filters at the end90.15

Wrong interpretation of results when experiments are not compared inequal conditions (Experimentalisolation).

Simple Adaptive Feature Composition Pooling (SAFC Pooling) :

SAFC Pooling

Network PropertiesWith SAFWithout SAF
SqueezeNetv1.188.05(avg)87.74(avg)
SimpNet-Slim94.7694.68

UsingSAF-pooling operation improves architecture performance. Testsare run on CIFAR10.

Dropout Utilization:

Generalization Examples:

About

SimpNet Paper Files (Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2026 Movatter.jp