GraphPKU/PiSSAPublic

NotificationsYou must be signed in to change notification settings
Fork19
Star386

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)

proceedings.neurips.cc/paper_files/paper/2024/file/db36f4d603cc9e3a2a5e10b93e6428f2-Paper-Conference.pdf

386 stars 19 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
assets		assets
configs		configs
loss_landscape		loss_landscape
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
pissa-sdxl.ipynb		pissa-sdxl.ipynb
requirements.txt		requirements.txt
train.py		train.py

Repository files navigation

PrincipalSingular values andSingular vectorsAdaptation

Introduction

We introduce a parameter-efficient fine-tuning (PEFT) method,PrincipalSingular values andSingular vectorsAdaptation (PiSSA), which optimizes the essential singular values and vectors while freezing the "noisy" parts. In comparison, LoRA freezes the original matrix and updates the "noise". This distinction enables PiSSA to convergence much faster than LoRA and also achieve better performance in the end. On five common benchmarks, PiSSA outperforms LoRA on all of them using exactly the same setups except for a different initialization. On GSM8K, Mistral-7B fine-tuned with PiSSA achieves an accuracy of 72.86%, outperforming LoRA's 67.7% by 5.16%.Due to the same architecture, PiSSA inherits many of LoRA's advantages, such as parameter efficiency and compatibility with quantization.Furthermore, PiSSA reduces the 4-bit quantization error in LLaMA 2-7B by 18.97%, resulting in a substantial improvement in fine-tuning performance. On the GSM8K benchmark, PiSSA achieves an accuracy of 49.13%, surpassing the performances of QLoRA at 39.8% and LoftQ at 40.71%.Leveraging a fast SVD technique, the initialization of PiSSA takes only a few seconds, inducing negligible cost of switching LoRA to PiSSA.

News

[2025.01.09] ProvideDocument and中文文档 to help you better use PiSSA for training and testing.
[2024.07.17] PiSSA now support Conv2d and Embedding,here is an example for using PiSSA on SDXL.
[2024.07.16] PiSSA now support deepspeed.
[2024.05.16] PiSSA has been merged into themain branch of peft as an optional initialization method for LoRA.

Quick Start

Install PiSSA via pip:

git clone https://github.com/GraphPKU/PiSSA.gitcd PiSSA/# export HF_ENDPOINT=https://hf-mirror.compip install -U huggingface_hubhuggingface-cli download --repo-type dataset --resume-download fxmeng/pissa-dataset --local-dir pissa-datasetconda create -n pissa python=3.10conda activate pissaconda install nvidia/label/cuda-12.1.0::cuda-toolkitconda install pytorch==2.4.0 torchvision=0.19.0 pytorch-cuda=12.1 -c pytorch -c nvidiapip install -r requirements.txtpip install flash-attn --no-build-isolation

Reproduce the Results

All the datasets we used are publicly available atDataset.

The PiSSA-initialized models are shared onModels for easy reuse. They retain the same input and output as the original models but are split into residual models and PiSSA adapters for more effective fine-tuning.

	PiSSA	QPiSSA
LLaMA-2-7B	r128	r16,32,64,128
LLaMA-3-8B	r16,32,64,128	r64,128
LLaMA-3-8B-Instruct	r16,32,64,128	--
LLaMA-3-70B	--	r64,128
LLaMA-3-70B-Instruct	--	r128
Qwen2-7B	r128	r128
Qwen2-7B-Instruct	r128	r128
Qwen2-72B	--	r64,128
Qwen2-72B-Instruct	--	r64,128

Training

Running the following script will automatically download the model, then start training:

sh scripts/*/run_full_finetune.shsh scripts/*/lora.shsh scripts/*/pissa.shsh scripts/*/loftq.shsh scripts/*/qlora.shsh scripts/*/qpissa.sh

Evaluation

To evaluate the performance of your fine-tuned model, please follow the instructions infxmeng/pissa-dataset.

Advanced Usage

We recommend downloading decomposed models directly from theHugging Face Collections instead of performing SVD every time.If the existing models do not meet your needs, apply PiSSA initialization to a pre-trained model and store the decomposed model locally:

importtorchimportosfrompeftimportLoraConfig,get_peft_modelfromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_ID="meta-llama/Llama-2-7b-hf"model=AutoModelForCausalLM.from_pretrained(MODEL_ID,torch_dtype=torch.bfloat16,device_map="auto")tokenizer=AutoTokenizer.from_pretrained(MODEL_ID)tokenizer.pad_token_id=tokenizer.eos_token_idlora_config=LoraConfig(# init_lora_weights="pissa", # Configure the initialization method to "pissa", which may take several minutes to execute SVD on the pre-trained model.init_lora_weights="pissa_niter_4",# Initialize the PiSSA with fast SVD, which completes in just a few seconds.r=128,lora_alpha=128,lora_dropout=0,# Since the component of the PiSSA adapter are the principal singular values and vectors, dropout should be set to 0 to avoid random discarding.target_modules=["q_proj","o_proj","k_proj","v_proj","gate_proj","up_proj","down_proj"],task_type="CAUSAL_LM",)peft_model=get_peft_model(model,lora_config)peft_model.print_trainable_parameters()OUTPUT_DIR="PiSSA-Llama-2-7b-hf-r128"# Save PiSSA modules:peft_model.peft_config["default"].init_lora_weights=True# Importantpeft_model.save_pretrained(os.path.join(OUTPUT_DIR,"pissa_init"))# Save residual model:peft_model=peft_model.unload()peft_model.save_pretrained(OUTPUT_DIR)# Save the tokenizer:tokenizer.save_pretrained(OUTPUT_DIR)

Load a pre-processed model and finetune it on IMDB dataset:

fromtrlimportSFTTrainerfromdatasetsimportload_datasetfromtransformersimportAutoTokenizer,AutoModelForCausalLMfrompeftimportPeftModelMODEL_ID="PiSSA-Llama-2-7b-hf-r128"residual_model=AutoModelForCausalLM.from_pretrained(MODEL_ID,device_map="auto")model=PeftModel.from_pretrained(residual_model,MODEL_ID,subfolder="pissa_init",is_trainable=True)tokenizer=AutoTokenizer.from_pretrained(MODEL_ID)dataset=load_dataset("imdb",split="train[:1%]")# Only use 1% of the datasettrainer=SFTTrainer(model=peft_model,train_dataset=dataset,dataset_text_field="text",max_seq_length=128,tokenizer=tokenizer,)trainer.train()peft_model.save_pretrained("pissa-llama-2-7b-ft")

Convert PiSSA to LoRA

When usingpeft_model.save_pretrained, ifpath_initial_model_for_weight_conversion=None, the fine-tuned matrices$A$ and$B$ are saved and should be combined with the residual model. However, when specifyingpath_initial_model_for_weight_conversion="pissa_init_dir", the saving function converts PiSSA to LoRA by$\Delta W = A B - A_0 B_0 = [A | A_0] [B | -B_0]^T=A^{'}B^{'}$. This conversion enables the loading of LoRA on top of a standard base model:

importtorchfrompeftimportPeftModelfromtransformersimportAutoModelForCausalLMmodel=AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf",torch_dtype=torch.bfloat16,device_map="auto")# No SVD is performed during this step, and the base model remains unaltered.peft_model=PeftModel.from_pretrained(model,"pissa-llama-2-7b-lora")

Utilizing the converted LoRA does not require modifying the parameters of the base model. When multiple converted LoRAs are needed simultaneously, each adapter operates independently without interference, allowing for the adapters to be freely deleted or added.

Citation

@article{meng2024pissa,  title={Pissa: Principal singular values and singular vectors adaptation of large language models},  author={Meng, Fanxu and Wang, Zhaohui and Zhang, Muhan},  journal={arXiv preprint arXiv:2404.02948},  year={2024}}

Star History

Follow-up Work

2024, May 27,LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters performs basis adaption for principal singular values and singular vectors.
2024, May 30,SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors freeze the singular vectors while fintune the singular values in a sparse manner.
2024, Jun 3,OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models, leverages orthonormal matrix initialization through QR decomposition.
2024, Jun 7,CorDA: Context-Oriented Decomposition Adaptation of Large Language Models, leverages knowledge-preserved adaptation and the instruction-previewed adaptation through Context-oriented Decomposition.
2024, Jun 7,MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning, Minor Singular Components Adaption.
2024, Jun 18,LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation performs basis adaption for principal singular values and singular vectors.
2024, Jul 6,LoRA-GA: Low-Rank Adaptation with Gradient Approximation aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step.
2024, Jul 25,LoRA-Pro: Are Low-Rank Adapters Properly Optimized? strategically adjusts the gradients of adapters, enabling the low-rank gradients to more accurately approximate the full fine-tuning gradients.
2024, Oct 9,One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation initialize adapter in a data-driven manner by computing singular value decomposition on minibatches of activation vectors.
2024, Nov 7,SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models consolidate the outliers by shifting them from activations to weights, then employ a high-precision low-rank branch to take in the weight outliers with SVD.

About

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)

proceedings.neurips.cc/paper_files/paper/2024/file/db36f4d603cc9e3a2a5e10b93e6428f2-Paper-Conference.pdf

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PrincipalSingular values andSingular vectorsAdaptation

Introduction

News

Quick Start

Reproduce the Results

Training

Evaluation

Advanced Usage

Convert PiSSA to LoRA

Citation

Star History

Follow-up Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

GraphPKU/PiSSA

Folders and files

Latest commit

History

Repository files navigation

PrincipalSingular values andSingular vectorsAdaptation

Introduction

News

Quick Start

Reproduce the Results

Training

Evaluation

Advanced Usage

Convert PiSSA to LoRA

Citation

Star History

Follow-up Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Uh oh!

Languages

Packages