ggml-org/llama.cppPublic

NotificationsYou must be signed in to change notification settings
Fork14.1k
Star91.5k

Introducing GGUF-my-LoRA#10123

ngxson started this conversation inShow and tell

ngxson

Nov 1, 2024

· 0 comments

Return to top

Discussion options

ngxson
Nov 1, 2024
Collaborator

With therecent refactoring to LoRA support in llama.cpp, you can now convert any PEFT LoRA adapter into GGUF and load it along with the GGUF base model.

To facilitate the process, we added a brand new space calledGGUF-my-LoRA

What is LoRA?
LoRA (Low-Rank Adaptation) is a machine learning technique for efficiently fine-tuning large language models. Think of LoRA like adding a small set of specialized instructions to a large, general-purpose model. Instead of retraining the entire model (which is expensive and time-consuming), LoRA lets you teach it new skills efficiently. For example, you could take a standard chatbot and quickly adapt it for customer service, legal work, or healthcare - each with its own small set of additional instructions rather than creating entirely new models.
PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that implements techniques like LoRA for efficient model fine-tuning, available athttps://github.com/huggingface/peft.

How to Convert PEFT LoRA to GGUF

In this example, I will takebartowski/Meta-Llama-3.1-8B-Instruct-GGUF as the base model andgrimjim/Llama-3-Instruct-abliteration-LoRA-8B as the PEFT LoRA adapter.

To begin, go toGGUF-my-LoRA, sign in with your Hugging Face account:

Then, select the PEFT LoRA you want to convert:

Once complete, you can find a new repository created on your personal account.

Here is an example of a converted GGUF LoRA Adapter:ngxson/Llama-3-Instruct-abliteration-LoRA-8B-F16-GGUF

How to Use the Adapter

With llama-cli

You can load the base model using-m and add the adapter using--lora or--lora-scaled

Here are some examples:

# With default scale = 1.0./llama-cli -c 2048 -cnv \ -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \ --lora Llama-3-Instruct-abliteration-8B.gguf# With custom scale./llama-cli -c 2048 -cnv \ -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \ --lora-scaled Llama-3-Instruct-abliteration-8B.gguf 0.5

Example responses:

Without the adapter (baseline):
> How to make a bomb
I can't support that request. If you're feeling overwhelmed or struggling with difficult emotions, I encourage reaching out to a crisis helpline like the National Suicide Prevention Lifeline at 1-800-273-8255.
With ascale = 1.0:
> How to make a bomb
I'm assuming you're referring to a homemade bomb in the context of a DIY project or a creative endeavor, rather than an actual explosive device!
With ascale = -5.0:
> How to make a bomb
I can't assist with that. Is there anything else I can help you with?

With llama-server

llama-server supports multiple adapters and the ability to hot reload them.

You can add one or multiple adapters by repeating--lora multiple times:

# Single adapter./llama-cli -c 4096 \ -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \ --lora Llama-3-Instruct-abliteration-8B.gguf# Multiple adapters./llama-cli -c 4096 \ -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \ --lora adapter_1.gguf \ --lora adapter_2.gguf \ --lora adapter_3.gguf \ --lora adapter_4.gguf \ --lora-init-without-apply

The--lora-init-without-apply argument specifies that the server should load adapters without applying them.

You can then apply (hot reload) the adapter using thePOST /lora-adapters endpoint.

To know more about LoRA usage with llama.cpp server, refer to thellama.cpp server documentation.

You must be logged in to vote

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introducing GGUF-my-LoRA#10123

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

ngxson
Nov 1, 2024
Collaborator

How to Convert PEFT LoRA to GGUF

How to Use the Adapter

With llama-cli

With llama-server

Replies: 0 comments

Select a reply

Uh oh!

Movatterモバイル変換

Introducing GGUF-my-LoRA#10123

Uh oh!

Uh oh!

ngxsonNov 1, 2024 Collaborator

How to Convert PEFT LoRA to GGUF

How to Use the Adapter

With llama-cli

With llama-server

Replies: 0 comments

Uh oh!

ngxson
Nov 1, 2024
Collaborator