IST-DASLab/OBCPublic

NotificationsYou must be signed in to change notification settings
Fork16
Star121

Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

121 stars 16 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bertsquad		bertsquad
.gitignore		.gitignore
README.md		README.md
database.py		database.py
datautils.py		datautils.py
main_trueobs.py		main_trueobs.py
modelutils.py		modelutils.py
postproc.py		postproc.py
quant.py		quant.py
spdy.py		spdy.py
trueobs.py		trueobs.py

Repository files navigation

Optimal Brain Compression

This repository contains efficient implementations of ExactOBS for quantization,unstructured-, block- and N:M pruning, introduced in the NeurIPS 2022 paper"Optimal Brain Compression: A Framework for Accurate Post-Training Quantizationand Pruning".

Files

trueobs.py: efficient implementations of ExactOBS for all compression types
main_trueobs.py: code to run ExactOBS
post_proc.py: post processing operations like statistics corrections
database.py: generating databases for non-uniform compression
spdy.py: implementation of the DP algorithm for finding non-uniformcompression configurations; adapted from code provided by the authors of SPDY [9]
modelutils.py: model utilities
datautils.py: data utilities
quant.py: quantization utilities

NOTE: The code as provided here only fully supports torchvision ResNet variants(the full integration of YOLO and BERT models is omitted due to large amountsof complex dependencies).

Usage

First, make sure ImageNet is located/linked to../imagenet (alternatively,you can specifiy the--datapath argument for all commands).

Applying OBC

# Quantize weights and activationspython main_trueobs.py rn18 imagenet quant --wbits 4 --abits 4 --save rn18_4w4a.pth# Prune to the N:M patternpython main_trueobs.py rn18 imagenet nmprune --prunen 2 --prunem 4 --save rn18_24.pth# Generate an unstructured pruning databasemkdir models_unstrpython main_trueobs.py rn18 imagenet unstr --sparse-dir models_unstr# Generate a 4-block pruning databasemkdir models_4blockpython main_trueobs.py rn18 imagenet blocked --sparse-dir models_blocked# Quantize a 2:4 pruned modelpython main_trueobs.py rn18 imagenet quant --wbits 4 --abits 4 --load rn18_24.pth --save rn18_24_4w4a.pth

Statistics Corrections

# Batchnorm tuningpython postproc.py rn18 imagenet rn18_24.pth --bnt# Statistics correctionpython postproc.py rn18 imagenet rn18_24.pth --statcorr --statcorr-samples 1024

Non-Uniform Compression

mkdir scores# Unstructured pruning# Setup databasemkdir models_unstrpython main_trueobs.py rn18 imagenet unstr --sparse-dir models_unstr# Compute corresponding lossespython database.py rn18 imagenet unstr loss# Run DP algorithm to determine per-layer compression targets python spdy.py rn18 imagenet 2 unstr --dp # Stitch profile, apply batchnorm resetting and compute validation accuracy python postproc.py rn18 imagenet rn18_unstr_200x_dp.txt --database unstr --bnt# Mixed quantization + 2:4 pruningmkdir models_nmmkdir models_quantmkdir models_nm_quantpython main_trueobs.py rn18 imagenet nmprune --save models_nm/rn18_24.pthpython main_trueobs.py rn18 imagenet quant --wbits 8 --abits 8 --save models_quant/rn18_8w8a.pthpython main_trueobs.py rn18 imagenet quant --wbits 4 --abits 4 --save models_quant/rn18_4w4a.pthpython main_trueobs.py rn18 imagenet quant --wbits 8 --abits 8 --load models_nm/rn18_24.pth --save models_nm_quant/rn18_24_8w8a.pth python main_trueobs.py rn18 imagenet quant --wbits 4 --abits 4 --load models_nm/rn18_24.pth --save models_nm_quant/rn18_24_4w4a.pth python database.py rn18 imagenet mixed losspython spdy.py rn18 imagenet 8 mixed --dppython postproc.py rn18 imagenet rn18_mixed_800x_dp.txt --database mixed --bnt

BERT

Before using our BERT integration, please download ourpretrained checkpoints and move them to thebertsquad folder.Then you should be able to use most features described above by passingbertsquad (orbertsquad6 for smaller variants) as the model name andsquad as the dataset name.The code was tested withtransformers==4.21.2 anddatasets==1.17.0.

BibTex

@article{frantar2022obc,  title={{Optimal Brain Compression:} A Framework for Accurate Post-Training Quantization and Pruning},  author={Frantar, Elias and Singh, Sidak Pal and Alistarh, Dan},  journal={Advances in Neural Information Processing Systems},  volume={36},  year={2022}}

About

Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

arxiv.org/abs/2208.11580

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Optimal Brain Compression

Files

Usage

Applying OBC

Statistics Corrections

Non-Uniform Compression

BERT

BibTex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

IST-DASLab/OBC

Folders and files

Latest commit

History

Repository files navigation

Optimal Brain Compression

Files

Usage

Applying OBC

Statistics Corrections

Non-Uniform Compression

BERT

BibTex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages