- Notifications
You must be signed in to change notification settings - Fork66
An Efficient "Factory" to Build Multiple LoRA Adapters
License
TUDB-Labs/mLoRA
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
mLoRA (a.k.a Multi-LoRA Fine-Tune) is an open-source framework designed for efficient fine-tuning of multiple Large Language Models (LLMs) using LoRA and its variants. Key features of mLoRA include:
Concurrent fine-tuning of multiple LoRA adapters.
Shared base model among multiple LoRA adapters.
Efficient pipeline parallelism algorithm.
Support for multiple LoRA variant algorithms and various base models.
Support for multiple reinforcement learning preference alignment algorithms.
The end-to-end architecture of the mLoRA is shown in the figure:
- [2025/01] mLoRA has been accepted by VLDB'25
Firstly, you should clone this repository and install dependencies (or use our image):
# Clone Repositorygit clone https://github.com/TUDB-Labs/mLoRAcd mLoRA# Install requirements need the Python >= 3.12pip install.
Themlora_train.py code is a starting point for batch fine-tuning LoRA adapters.
python mlora_train.py \ --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \ --config demo/lora/lora_case_1.yaml
You can check the adapters' configuration indemo folder, there are some configuration regarding the use of different LoRA variants and reinforcement learning preference alignment algorithms.
For further detailed usage information, please use--help option:
python mlora_train.py --help
Similar to Quickstart, the command to start in a two-node environment is as follows:
NOTE1: Use environment variablesMASTER_ADDR/MASTER_PORT to set the master node.
NOTE2: Set balance, indicating the number of decoder layers allocated to each rank.
# in the first nodeexport MASTER_ADDR=master.svc.cluster.localexport MASTER_PORT=12355python mlora_pp_train.py \ --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \ --config demo/lora/lora_case_1.yaml \ --pipeline \ --device"cuda:0" \ --rank 0 \ --balance 12 13 \ --no-recompute \ --precision fp32# in the second nodeexport MASTER_ADDR=master.svc.cluster.localexport MASTER_PORT=12355python mlora_pp_train.py \ --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \ --config demo/lora/lora_case_1.yaml \ --pipeline \ --device"cuda:1" \ --rank 1 \ --balance 12 13 \ --no-recompute \ --precision fp32
mLoRA offers an official Docker image for quick start and development, The image is available on Dockerhub Packages registry.
First, you should pull the latest image (the image also use for development):
docker pull yezhengmaolove/mlora:latest
Deploy and enter a container to run mLoRA:
docker run -itd --runtime nvidia --gpus all \ -v~/your_dataset_dir:/dataset \ -v~/your_model_dir:/model \ -p<host_port>:22 \ --name mlora \ yezhengmaolove/mlora:latest# when the container started, use the ssh to login# the default password is mlora@123ssh root@localhost -p<host_port># pull the latest code and run the mloracd /mLoRAgit pullpython mlora_train.py \ --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \ --config demo/lora/lora_case_1.yaml
We can deploy mLoAR as a service to continuously receive user requests and perform fine-tuning task.
First, you should pull the latest image (use same image for deploy):
docker pull yezhengmaolove/mlora:latest
Deploy our mLoRA server:
docker run -itd --runtime nvidia --gpus all \ -v~/your_dataset_cache_dir:/cache \ -v~/your_model_dir:/model \ -p<host_port>:8000 \ --name mlora_server \ -e"BASE_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v0.4" \ -e"STORAGE_DIR=/cache" \ yezhengmaolove/mlora:latest /bin/bash /opt/deploy.sh
Once the service is deployed, install and usemlora_cli.py to interact with the server.
# install the client toolspip install mlora-cli# use the mlora cli tool to connect to mlora servermlora_cli(mLoRA)set port<host_port>(mLoRA)set host http://<host_ip># and enjoy it!!
Step-by-step
docker pull yezhengmaolove/mlora:latestpip install mlora-cli
# first, we create a cache dir in host for cache some filemkdir~/cache# second, we manually download the model weights from Hugging Face.mkdir~/model&&cd~/modelgit clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0# we map port 8000 used by the mlora server to port 1288 on the host machine.# the BASE_MODEL environment variable indicates the path of the base model used by mlora.# the STORAGE_DIR environment variable indicates the path where datasets and lora adapters are stored.# we use the script /opt/deploy.sh in container to start the server.docker run -itd --runtime nvidia --gpus all \ -v~/cache:/cache \ -v~/model:/model \ -p 1288:8000 \ --name mlora_server \ -e"BASE_MODEL=/model/TinyLlama-1.1B-Chat-v1.0" \ -e"STORAGE_DIR=/cache" \ yezhengmaolove/mlora:latest /bin/bash /opt/deploy.sh
we use mlora_cli link to the serverhttp://127.0.0.1:1288 (must use the http protocal)
(mLoRA)set port 1288(mLoRA)set host http://127.0.0.1
we use the Stanford Alpaca dataset as a demo, the data just like below:
[{"instruction":"","input":"","output": }, {...}](mLoRA) file upload? file type: train data? name: alpaca? file path: /home/yezhengmao/alpaca-lora/alpaca_data.json
the template in a yaml file, and write by templating language Jinja2, see the demo/prompt.yaml file
the data file you upload can be considered as array data, with the elements in the array being of dictionary type. we consider each element as a data point in the template.
(mLoRA) file upload? file type: prompt template? name: simple_prompt? file path: /home/yezhengmao/mLoRA/demo/prompt.yaml
we create a dataset, the dataset consists of data, a template, and the corresponding prompter.we can usedataset showcase command to check the if the prompts are generated correctly.
(mLoRA) dataset create? name: alpaca_dataset? train data file: alpaca? prompt template file: simple_prompt? prompter: instruction? data preprocessing: default(mLoRA) dataset showcase? dataset name: alpaca_dataset
now we can useadapter create command to create a adapter for train.
Finally, we can submit the task to train our adapter using the defined dataset.NOTE: you can continuously submit or terminal training tasks.use theadapter ls ortask ls to check the tasks' status
Using mLoRA can save significant computational and memory resources when training multiple adapters simultaneously.
We fine-tuned multiple LoRA adapters using four A6000 graphics cards with fp32 precision and without using checkpointing and any quantization techniques:
| Model | mLoRA (tokens/s) | PEFT-LoRA with FSDP (tokens/s) | PEFT-LoRA with TP (tokens/s) |
|---|---|---|---|
| llama-2-7b (32fp) | 2364 | 1750 | 1500 |
| llama-2-13b (32fp) | 1280 | OOM | 875 |
| Model | |
|---|---|
| ✓ | LLaMA |
| Variant | |
|---|---|
| ✓ | QLoRA,NIPS,2023 |
| ✓ | LoRA+,ICML,2024 |
| ✓ | VeRA,ICLR,2024 |
| ✓ | DoRA,ICML,2024 |
| Variant | |
|---|---|
| ✓ | DPO,NeurIPS,2024 |
| ✓ | CPO,ICML,2024 |
| ✓ | CIT,arXiv,2024 |
- Help Document[TODO]
- Design Document
- How to develop a new adapter
- How to reproduce the paper
We welcome contributions to improve this repository! Please review the contribution guidelines before submitting pull requests or issues.
Fork the repository.Create a new branch for your feature or fix.Submit a pull request with a detailed explanation of your changes.
You can use the pre-commit to check your code.
# Install requirementspip install .[ci_test]ln -s ../../.github/workflows/pre-commit .git/hooks/pre-commitOr just call the script to check your code
.github/workflows/pre-commit
Please cite the repo if you use the code in this repo.
@misc{ye2024mlorafinetuningloraadapters,title={mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs},author={Zhengmao Ye and Dengchun Li and Zetao Hu and Tingfeng Lan and Jian Sha and Sicong Zhang and Lei Duan and Jie Zuo and Hui Lu and Yuanchun Zhou and Mingjie Tang},year={2024},eprint={2312.02515},archivePrefix={arXiv},primaryClass={cs.LG},url={https://arxiv.org/abs/2312.02515}, }
Copyright © 2024 All Rights Reserved.
This project is licensed under theApache 2.0 License.
Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.About
An Efficient "Factory" to Build Multiple LoRA Adapters
Topics
Resources
License
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.

