tianyic/only_train_once_personal_footprintPublic

NotificationsYou must be signed in to change notification settings
Fork49
Star305

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

License

MIT license

305 stars 49 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
only_train_once		only_train_once
sanity_check		sanity_check
tutorials		tutorials
visual_examples		visual_examples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Only Train Once (OTO): Automatic One-Shot DNN Training And Compression Framework

Note: Repository Migration!

We greatly thank for the support and interest from our community. TheOnly-Train-Once (OTO) will be migrated and maintained under Microsoft open-source site. Please check out the new homemicrosoft/only_train_once and kindly help star ⭐, fork, or watch.

To distinguish, the current repository will be renamed asonly_train_once_personal_footprint to commemorate the past research and development efforts onto this series of works 😊.

This repository is the (deprecated) Pytorch implementation ofOnly-Train-Once (OTO). OTO is an$\color{LimeGreen}{\textbf{automatic}}$,$\color{LightCoral}{\textbf{architecture}}$$\color{LightCoral}{\textbf{agnostic}}$ DNN$\color{Orange}{\textbf{training}}$ and$\color{Violet}{\textbf{compression}}$ (via$\color{CornflowerBlue}{\textbf{structure pruning}}$ and$\color{DarkGoldenRod}{\textbf{erasing}}$ operators) framework. By OTO, users could train a general DNN either from scratch or a pretrained checkpoint to achieve both high performance and slimmer architecture simultaneously in the one-shot manner (without fine-tuning).

Publications

Please find our series of works andbibtexs for kind citations.

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning preprint.
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators preprint.
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge RecoveryHugging face #1 Paper of the day.
An Adaptive Half-Space Projection Method for Stochastic Optimization Problems with Group Sparse Regularization inTMLR 2023.
OTOv2: Automatic, Generic, User-Friendly inICLR 2023.
Only Train Once (OTO): A One-Shot Neural Network Training And Pruning Framework inNeurIPS 2021.

In addition, we recommend our following efficient ML works.

DREAM: Diffusion Rectification and Estimation-Adaptive Models, efficient diffusion training, inCVPR 2024.
DISTILLM: Towards Streamlined Distillation for Large Language Models, LLM distillation, inICML 2024.

Note, we will release the report of HESSO optimizer this June. Thanks for the interest and support from our community.

Installation

We recommend to run the framework underpytorch>=2.0. Usepip orgit clone to install.

pip install only_train_once

git clone https://github.com/tianyic/only_train_once.git

Quick Start

We provide an example of OTO framework usage. More explained details can be found intutorials.

Minimal usage example.

importtorchfromsanity_check.backendsimportdensenet121fromonly_train_onceimportOTO# Create OTO instancemodel=densenet121()dummy_input=torch.zeros(1,3,32,32)oto=OTO(model=model.cuda(),dummy_input=dummy_input.cuda())# Create HESSO optimizeroptimizer=oto.hesso(variant='sgd',lr=0.1,target_group_sparsity=0.7)# Train the DNN as normal via HESSOmodel.train()model.cuda()criterion=torch.nn.CrossEntropyLoss()forepochinrange(max_epoch):f_avg_val=0.0forX,yintrainloader:X,y=X.cuda(),y.cuda()y_pred=model.forward(X)f=criterion(y_pred,y)optimizer.zero_grad()f.backward()optimizer.step()# A compressed densenet will be generated.oto.construct_subnet(out_dir='./')

How the pruning mode in OTO works.

Pruning Zero-Invariant Group Partition. OTO at first automatically figures out the dependancy inside the target DNN to build a pruning dependency graph. Then OTO partitions DNN's trainable variables into so-called Pruning Zero-Invariant Groups (PZIGs). PZIG describes a class of pruning minimally removal structure of DNN, or can be largely interpreted as the minimal group of variables that must be pruned together.

Hybrid Structured Sparse Optimizer. A structured sparsity optimization problem is formulated. A hybrid structured sparse optimizer, including HESSO, DHSPG, LSHPG, is then employed to find out which PZIGs are redundant, and which PZIGs are important for the model prediction. The selected hybrid optimizer explores group sparsity more reliably and typically achieves higher generalization performance than other sparse optimizers.
Construct pruned model. The structures corresponding to redundant PZIGs (being zero) are removed to form the pruned model. Due to the property of PZIGs,the pruned model returns the exact same output as the full model. Therefore,no further fine-tuning is required.

Sanity Check

Thesanity check provides the tests for pruning mode in OTO onto various DNNs from CNN to LLM. The pass of sanity check indicates the compliance of OTO onto target DNN.

python sanity_check/sanity_check.py

Note that some tests require additional dependency. Comment off unnecessary tests. We highly recommend to proceed a sanity check over a new customized DNN for testing compliance.

Visualization

Thevisual_examples provides the visualization of pruning dependency graphs and erasing dependency graphs. Visualization serves as a frequently used tool for employing OTO onto new unseen DNNs if meets errors.

To do list

Add more explanations into the current repository.
Release a technical report regarding theHESSO optimizer which is not discussed yet in ourpapers.
Release refactorized DHSPG and LHSPG.
Release the full pipeline of LoRAShear (upon business administration).
Provide more tutorials to cover the experiments in the pruning mode. Main experiments in OTOv2 can be found atotov2_branch.
Release official erasing mode after the review process of OTOv3.
Provide documentations of the OTO API.

Welcome Contribution

We would greatly appreciate the contributions in any form, such as bug fixes, new features and new tutorials, from our open-source community.

We are humble to provide benefits for the AI community. We look forward to working with the community together to make DNN's training and compression to be more automatic and convinient.

Open for collabration.

We are open and happy for collabrations. Feel free to reach outtiachen@microsoft.com if have any interesting idea.

Legacy OTOv2 repository

The previous OTOv2 repo has been moved intolegacy_branch for academic replication.

Citation

If you find the repo useful, please kindly star this repository and cite our papers:

For OTOv3 preprint@article{chen2023otov3,title={OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators},author={Chen, Tianyi and Ding, Tianyu and Zhu, Zhihui and Chen, Zeyu and Wu, HsiangTao and Zharkov, Ilya and Liang, Luming},journal={arXiv preprint arXiv:2312.09411},year={2023}}For LoRAShear preprint@article{chen2023lorashear,title={LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery},author={Chen, Tianyi and Ding, Tianyu and Yadav, Badal and Zharkov, Ilya and Liang, Luming},journal={arXiv preprint arXiv:2310.18356},year={2023}}For AdaHSPG+ publication in TMLR (theoretical optimization paper)@article{dai2023adahspg,title={An adaptive half-space projection method for stochastic optimization problems with group sparse regularization},author={Dai, Yutong and Chen, Tianyi and Wang, Guanyi and Robinson, Daniel P},journal={Transactions on machine learning research},year={2023}}For OTOv2 publication in ICLR 2023@inproceedings{chen2023otov2,title={OTOv2: Automatic, Generic, User-Friendly},author={Chen, Tianyi and Liang, Luming and Tianyu, DING and Zhu, Zhihui and Zharkov, Ilya},booktitle={International Conference on Learning Representations},year={2023}}For OTOv1 publication in NeurIPS 2021@inproceedings{chen2021otov1,title={Only Train Once: A One-Shot Neural Network Training And Pruning Framework},author={Chen, Tianyi and Ji, Bo and Tianyu, DING and Fang, Biyi and Wang, Guanyi and Zhu, Zhihui and Liang, Luming and Shi, Yixin and Yi, Sheng and Tu, Xiao},booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},year={2021}}

About

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Releases2

v3.0.1-pruning Latest

Jan 20, 2024

+ 1 release

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Only Train Once (OTO): Automatic One-Shot DNN Training And Compression Framework

Note: Repository Migration!

Publications

Installation

Quick Start

Minimal usage example.

How the pruning mode in OTO works.

Sanity Check

Visualization

To do list

Welcome Contribution

Open for collabration.

Legacy OTOv2 repository

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Uh oh!

Contributors6

Uh oh!

Languages

Movatterモバイル変換

License

tianyic/only_train_once_personal_footprint

Folders and files

Latest commit

History

Repository files navigation

Only Train Once (OTO): Automatic One-Shot DNN Training And Compression Framework

Note: Repository Migration!

Publications

Installation

Quick Start

Minimal usage example.

How the pruning mode in OTO works.

Sanity Check

Visualization

To do list

Welcome Contribution

Open for collabration.

Legacy OTOv2 repository

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Uh oh!

Contributors6

Uh oh!

Languages