You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation ofhttps://arxiv.org/abs/2501.02625
HALO is a novel quantization-aware training method for fine-tuning Large Language Models (LLMs) with low-precision matrix multiplications. It integratesHadamard transformations to mitigate outliers, enabling accurateINT8 andFP6 fine-tuning while maintaining compute efficiency. HALO achieves up to1.41× speedup over full-precision fine-tuning while preserving accuracy, supporting both full and parameter-efficient fine-tuning (PEFT).
HALO is implemented with efficient CUDA kernels and integrates seamlessly withFully Sharded Data Parallel (FSDP) for low-precision communication, making it ideal for large-scale distributed training. 💡
Installation 🛠️
First, start by cloning the repository with its submodules:
or if it you have already cloned the repository, you can update the submodules with:
git submodule update --init --recursive
Create a new environment (python=3.10 is tested). Our code currently supports CUDA >=12.4, but using older CUDA versions should be possible by disabling some of the CUDA kernels.
# Create an environment, with whatever method you wantconda create -n halo python=3.10conda activate halo
Then run the following commands in order:
# Install the requirementssource install.sh
Training 👨🏫
To fine-tune a Llama-3-8B model, you can run:
cd scriptsCUDA_VISIBLE_DEVICES=0,1,2,3 bash train_halo.sh DATASET=<dataset> LR=<lr> KERNEL_TYPE=<kernel_type>
For the dataset and lr you can try the following combinations: (sql, 3e-5), (viggo, 4e-5), (gsm8k, 6e-6). Regarding the kernel type, you can choose any of the following:
base: this runs the base BF16 experiment, with HALO disabled.
halo0_fp8: runs our Halo level 0 with FP8 precision.
halo2_int8: runs our Halo level 2 with INT8 precision.
You can add_qfsdp to enable HQ-FSDP, for example:halo0_fp8_qfsdp. Other combinations of precision and HALO levels also work, e.g.,halo1_int8_qfsdp.
Benchmarks 📊
The benchmark files are located in thetests directory:
cd tests
Linear Module
You can run the single layer benchmarks using the following command:
Note thatNCCL_NTHREADS=64 is tuned for RTX 4090. For newer GPUs, you may use the default value without setting it.
Citation 📖
If you use HALO in your research, please cite our paper:
@article{halo2025,title={HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs},author={Saleh Ashkboos and Mahdi Nikdan and Soroush Tabesh and Roberto L. Castro and Torsten Hoefler and Dan Alistarh},year={2025},eprint={2501.02625},archivePrefix={arXiv},primaryClass={cs.LG},url={https://arxiv.org/abs/2501.02625}, }
About
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation ofhttps://arxiv.org/abs/2501.02625