Qualcomm-AI-research/FP8-quantizationPublic

NotificationsYou must be signed in to change notification settings
Fork12
Star168

License

BSD-3-Clause-Clear license

168 stars 12 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
models		models
quantization		quantization
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compute_quant_error.py		compute_quant_error.py
image_net.py		image_net.py
requirements.txt		requirements.txt

Repository files navigation

FP8 Quantization: The Power of the Exponent

This repository contains the implementation and experiments for the paper presented in

Andrey Kuzmin^*1, Mart van Baalen^*1, Yuwei Ren¹,Markus Nagel¹, Jorn Peters¹, Tijmen Blankevoort¹ "FP8 Quantization: The Power of the Exponent", NeurIPS2022.[ArXiv]

*Equal contribution¹ Qualcomm AI Research (Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.)

You can use this code to recreate the results in the paper.

Method and Results

In this repository we share the code to reproduce analytical and experimental results on performance of FP8 format with different mantissa/exponent division versus INT8. The first part of the repository allows the user to reproduceanalytical computations of SQNR for uniform, Gaussian, and Student's-t distibutions. Varying the mantissa/exponent bit-width division changes the trade-off between accurate representation of the data around mean of the distribution,and the ability to capture its tails. The more outliers are present in the data, the more exponent bits is useful to allocate for the best results. In the second part we provide the code to reproduce the post-training quantization (PTQ)results for MobileNetV2, and Resnet-18 pre-trained on ImageNet.

How to install

Make sure to have Python ≥3.8 (tested with Python 3.8.10) andensure the latest version ofpip (tested with 21.3.1):

python3 -m venv envsource env/bin/activatepip install --upgrade --no-deps pip

Next, install PyTorch 1.11.0 with the appropriate CUDA version (tested with CUDA 10.0):

pip install torch==1.11.0 torchvision==0.12.0

Finally, install the remaining dependencies using pip:

pip install -r requirements.txt

Running experiments

Analytical expected SQNR computations

The main run file to compute the expected SQNR for different distributions using different formats iscompute_quant_error.py. The script takes no input arguments and computes the SQNR for different distributions and formats:

python compute_quant_error.py

ImageNet experiments

The main run file to reproduce the ImageNet experiments isimage_net.py.It contains commands for validating models quantized with post-training quantization.You can see the full list of options for each command usingpython image_net.py [COMMAND] --help.

Usage: image_net.py [OPTIONS] COMMAND [ARGS]...Options:  --help  Show this message and exit.Commands:  validate-quantized

To reproduce the experiments run:

python image_net.py validate-quantized --images-dir</PATH/TO/IMAGENET> --architecture<ARCHITECTURE_NAME> --batch-size 64 --seed 10--model-dir</PATH/TO/PRETRAINED/MODEL># only needed for MobileNet-V2--n-bits 8  --cuda --load-type fp32 --quant-setup all --qmethod fp_quantizer --per-channel --fp8-mantissa-bits=5 --fp8-set-maxval --no-fp8-mse-include-mantissa-bits--weight-quant-method=current_minmax --act-quant-method=allminmax --num-est-batches=1

where <ARCHITECTURE_NAME> can be mobilenet_v2_quantized or resnet18_quantized.Please note that only MobileNet-V2 requires pre-trained weights that can be downloaded here (the tar file is used as it is without a need to untar):

MobileNetV2

Reference

If you find our work useful, please cite

@article{kuzmin2022fp8,  title={FP8 Quantization: The Power of the Exponent},  author={Kuzmin, Andrey and Van Baalen, Mart and Ren, Yuwei and Nagel, Markus and Peters, Jorn and Blankevoort, Tijmen},  journal={arXiv preprint arXiv:2208.09225},  year={2022}}

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FP8 Quantization: The Power of the Exponent

Method and Results

How to install

Running experiments

Analytical expected SQNR computations

ImageNet experiments

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

Qualcomm-AI-research/FP8-quantization

Folders and files

Latest commit

History

Repository files navigation

FP8 Quantization: The Power of the Exponent

Method and Results

How to install

Running experiments

Analytical expected SQNR computations

ImageNet experiments

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages