- Notifications
You must be signed in to change notification settings - Fork12
License
Qualcomm-AI-research/FP8-quantization
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains the implementation and experiments for the paper presented in
Andrey Kuzmin*1, Mart van Baalen*1, Yuwei Ren1,Markus Nagel1, Jorn Peters1, Tijmen Blankevoort1 "FP8 Quantization: The Power of the Exponent", NeurIPS2022.[ArXiv]
*Equal contribution1 Qualcomm AI Research (Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.)
You can use this code to recreate the results in the paper.
In this repository we share the code to reproduce analytical and experimental results on performance of FP8 format with different mantissa/exponent division versus INT8. The first part of the repository allows the user to reproduceanalytical computations of SQNR for uniform, Gaussian, and Student's-t distibutions. Varying the mantissa/exponent bit-width division changes the trade-off between accurate representation of the data around mean of the distribution,and the ability to capture its tails. The more outliers are present in the data, the more exponent bits is useful to allocate for the best results. In the second part we provide the code to reproduce the post-training quantization (PTQ)results for MobileNetV2, and Resnet-18 pre-trained on ImageNet.
Make sure to have Python ≥3.8 (tested with Python 3.8.10) andensure the latest version ofpip (tested with 21.3.1):
python3 -m venv envsource env/bin/activatepip install --upgrade --no-deps pipNext, install PyTorch 1.11.0 with the appropriate CUDA version (tested with CUDA 10.0):
pip install torch==1.11.0 torchvision==0.12.0
Finally, install the remaining dependencies using pip:
pip install -r requirements.txt
The main run file to compute the expected SQNR for different distributions using different formats iscompute_quant_error.py. The script takes no input arguments and computes the SQNR for different distributions and formats:
python compute_quant_error.py
The main run file to reproduce the ImageNet experiments isimage_net.py.It contains commands for validating models quantized with post-training quantization.You can see the full list of options for each command usingpython image_net.py [COMMAND] --help.
Usage: image_net.py [OPTIONS] COMMAND [ARGS]...Options: --help Show this message and exit.Commands: validate-quantized
To reproduce the experiments run:
python image_net.py validate-quantized --images-dir</PATH/TO/IMAGENET> --architecture<ARCHITECTURE_NAME> --batch-size 64 --seed 10--model-dir</PATH/TO/PRETRAINED/MODEL># only needed for MobileNet-V2--n-bits 8 --cuda --load-type fp32 --quant-setup all --qmethod fp_quantizer --per-channel --fp8-mantissa-bits=5 --fp8-set-maxval --no-fp8-mse-include-mantissa-bits--weight-quant-method=current_minmax --act-quant-method=allminmax --num-est-batches=1
where <ARCHITECTURE_NAME> can be mobilenet_v2_quantized or resnet18_quantized.Please note that only MobileNet-V2 requires pre-trained weights that can be downloaded here (the tar file is used as it is without a need to untar):
If you find our work useful, please cite
@article{kuzmin2022fp8, title={FP8 Quantization: The Power of the Exponent}, author={Kuzmin, Andrey and Van Baalen, Mart and Ren, Yuwei and Nagel, Markus and Peters, Jorn and Blankevoort, Tijmen}, journal={arXiv preprint arXiv:2208.09225}, year={2022}}About
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.