- Notifications
You must be signed in to change notification settings - Fork2
yashkant/enas-quantized-nets
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project combines the architecture search strategy (only micro) fromEfficient Neural Architecture Search with the search space ofQuantized Neural Networks.
Efficient Neural Architecture Search recently optimized a major computational bottleneck of NAS algorithms, it does so by sharing (reusing) parameters across child models and delivers strong empirical performance.
In ENAS, a controller discovers neural network architectures by searching for an optimal subgraph within a large computational graph. These child models are sub-graphs selected from a large computational graph which can be visualized as a directed acyclic graph.
The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on a validation set. Meanwhile, the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Sharing parameters among child models allows ENAS to deliver strong empirical performances,
During the forward pass, Quantized Neural Networks drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced.
In this project the child model is built using binary, ternary or quantized convolutional layers defined inmicro_child.py
.
To run this project you will have to create a new virtualenv with Python3 and install the needed dependencies there. I have written a small bash scriptsetup.sh
that you can run and it will create the virtual environment, install the needed dependencies, and would take care of replacing thebase_layer.py
file in your virtual environment's Keras installation.
chmod +x setup.sh ./setup.sh
Once you execute this you're ready to go!
The weight sharing mechanism works by intializing the weights of the DAG only once and reusing them over various iterations, the methods used for this arecreate_weight
andcreate_bias
defined incommon_ops.py
.
In theauthor's code, they add these weights to the layers usingtf.nn module
in Tensorflow which allows the user to set custom weights to a new layer.
To implement the quantization we use custom keras layers and there is no provision to set resusable weights for these. After I followed the call stack it turned out that that the needed functionality could be added by tweaking theself.add_weight
method which is defined now in./environment/lib/python3.x/site-packages/keras/engine/base_layer.py
file the keras installation folder.
I modified this method slightly so that it allows to set custom weights to the layers now. It is definitely not a good idea to do such changes this in your global installation of Keras, and I strongly suggest using a virtual environment for this.
Please read thisblog to know how the custom Keras layers are written, I have aseparate mini-project which contains code to build these quantized networks, and perhaps it will be a good idea to take a look there before reading the code in this repository.
The skeletal overview of the project is as follows:
├── binarize/│ ├── binary_layers.py# Custom binary layers are defined in Keras│ └── binary_ops.py# Binarization functions for weights and activations|├── ternarize/│ ├── ternary_layers.py# Custom ternarized layers are defined in Keras│ └── ternary_ops.py# Ternarization functions for weights and activations|├── quantize/│ ├── quantized_layers.py# Custom quantized layers are defined in Keras│ └── quantized_ops.py# Quantization functions for weights and activations|├── enas/ │ ├── data_utils.py& data_utils_cifar.py# Code to pre-process and import datasets│ ├── micro_controller.py# Builds the controller graph│ ├── common_ops.py# Contain methods needed for reusing weights│ ├── models.py& controller.py# Base classes for MicroChild and MicroController│ ├── utils.py# Methods to build training operations graph│ └── micro_child.py# Builds the graph for child model from the architecture string|├── main_controller_child_trainer.py# Defines experiment settings and runs architecture search└── main_child_trainer.py# Trains given architecture till convergence
Extract the three zip files stored indata/mnist
in the same folder for the MNIST experiment, for the cifar10 experiment read the directions in filecifar10_dataset.txt
.
To run the architecture search, you can edit the experiment configurations insearch_arc_cifar.py
andsearch_arc_mnist.py
for CIFAR10 and MNIST respectively.
Use the following command to run the experiment finally.
python search_arc_cifar.py>> cifar_search.txtpython search_arc_mnist.py>> mnist_search.txt
All the ouput will be redirected to cifar_search.txt / mnist_search.txt
file.
In the output file, after each training cycle for the controller we sample 10 architectures and output valdation accuracy of these architectures.
Image Source:Efficient Neural Architecture Search via Parameter Sharing
The output for the architectures will be logged as follows:
Epoch 181: EvalEval at 77830valid_accuracy: 0.9612Eval at 77830Test Num examples: 10000test_accuracy: 0.9622epoch = 181 ch_step = 77850 loss = 0.127491 lr = 0.0456|g| = 0.2030 tr_acc = 108/128 mins = 549.07 .. Epoch 182: Training controllerctrl_step = 5430 loss = 0.266 ent = 49.17 lr = 0.0035|g| = 0.0002 acc = 0.9688 bl = 0.97 mins = 550.96..Here are 10 architectures[0 2 1 4 1 3 0 1 1 0 1 0 1 2 0 4 0 0 0 1]# Denotes the architecture for normal cell[1 3 1 4 0 1 1 1 1 2 1 4 3 2 0 2 1 1 0 3]# Denotes the architecture for reduction cellval_acc = 0.9688---------------------------------------------------..[0 0 0 1 1 0 1 3 1 1 0 3 1 1 1 0 0 4 0 0][1 0 1 4 1 1 1 0 1 2 0 4 0 1 4 0 0 0 0 2]val_acc = 0.9531---------------------------------------------------
The architecture with highest validation accuracy needs to be trained till convergence. The two lists printed above denote the architecture of the cell.
To train an architecture till convergence pass the pass the architecture string as a parameter totrain_arc_mnist.py
ortrain_arc_cifar.py
file.
To the architecture string is just concatenation of the normal cell and reduction cell, see below:
Given Architecture: [0 2 1 4 1 3 0 1 1 0 1 0 1 2 0 4 0 0 0 1]# Denotes the architecture for normal cell[1 3 1 4 0 1 1 1 1 2 1 4 3 2 0 2 1 1 0 3]# Denotes the architecture for reduction cellThe architecture string becomes:"0 2 1 4 1 3 0 1 1 0 1 0 1 2 0 4 0 0 0 1 1 3 1 4 0 1 1 1 1 2 1 4 3 2 0 2 1 1 0 3"
Now, to train the architecture in above example you can use the commands below:
python train_arc_mnist.py -fixed_arc"0 2 1 4 1 3 0 1 1 0 1 0 1 2 0 4 0 0 0 1 1 3 1 4 0 1 1 1 1 2 1 4 3 2 0 2 1 1 0 3">> mnist_arc.txtpython train_arc_cifar.py -fixed_arc"0 2 1 4 1 3 0 1 1 0 1 0 1 2 0 4 0 0 0 1 1 3 1 4 0 1 1 1 1 2 1 4 3 2 0 2 1 1 0 3">> cifar_arc.txt
All the ouput will be redirected to mnist_arc.txt / cifar_arc.txt
file.
If you find this code useful, please consider citing the original work by the authors:
@article{pham2018efficient, title={Efficient Neural Architecture Search via Parameter Sharing}, author={Pham, Hieu and Guan, Melody Y and Zoph, Barret and Le, Quoc V and Dean, Jeff}, journal={arXiv preprint arXiv:1802.03268}, year={2018}}
@article{Hubara2017QuantizedNN, title={Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations}, author={Itay Hubara and Matthieu Courbariaux and Daniel Soudry and Ran El-Yaniv and Yoshua Bengio}, journal={Journal of Machine Learning Research}, year={2017}, volume={18}, pages={187:1-187:30}}
This work wouldn't have been possible without the help from the following repos:
About
Efficient Neural Architecture Search coupled with Quantized CNNs to search for resource efficient and accurate architectures.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.