- Notifications
You must be signed in to change notification settings - Fork13
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
License
IST-DASLab/QUIK
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the code for QUIK, a method for quantizing the majority of the weights and activations to4bit post-training.
QUIK is described in the following paper:https://arxiv.org/abs/2310.09259
- cmake
- C++ compiler (GCC/clang/...)
- nvcc
git clone https://github.com/IST-DASLab/QUIK.gitcd QUIKpip install -e.# or pip install .
cd experimentspip install -r requirements.txtpython llama.py --fp_features_num 256 --model meta-llama/Llama-2-7b-hf --hf_token<your_hf_token> --dataset c4\--w_bits 4 --w_clip --a_bits 4 --save_qmodel_path save_gptq_model_path --int8_down_proj --sim_eval --benchmark
Benchmark will be run on all available GPUs.
Linear layer benchmarks can be run withpython layer_benchmark.py
. One can vary input size with command line parameters.
First, one has to quantize the model weights using GPTQ algorithm. Inllama.py
it is done withllama_sequential
function.From that we get quantized weights (that are still stored intorch.float16
).Then ones needs create QUIK Linear layers usingqlinear.MixedQLinear.from_float
that must replace original Linear layers. Seellama_replace_with_kernels
inllama.py
.Now the quantized model is ready for use.
To run the fake quantization example, checkfake_quant
directory.
The full paper is available on arxiv. The full citation is
@article{QUIK, title={QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models}, author={Ashkboos, Saleh and Markov, Ilia and Frantar, Elias and Zhong, Tingxuan and Wang, Xincheng and Ren, Jie and Hoefler, Torsten and Alistarh, Dan}, journal={arXiv preprint arXiv:2310.09259}, year={2023}}
About
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.