Movatterモバイル変換

zkpranav/unslothPublic

forked fromunslothai/unsloth

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

unsloth.ai

License

Apache-2.0 license

0 stars 4.1k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,365 Commits
.github		.github
images		images
tests		tests
unsloth		unsloth
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
unsloth-cli.py		unsloth-cli.py

Repository files navigation

Finetune Qwen3, Llama 4, Gemma 3, Phi-4 & Mistral 2x faster with 80% less VRAM!

✨ Finetune for Free

Notebooks are beginner friendly. Read ourguide. Add your dataset, click "Run All", and export your finetuned model to GGUF, Ollama, vLLM or Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Qwen3 (14B)	▶️ Start for free	2x faster	70% less
Qwen3 (4B): GRPO	▶️ Start for free	2x faster	80% less
Gemma 3 (4B)	▶️ Start for free	1.6x faster	60% less
Llama 3.2 (3B)	▶️ Start for free	2x faster	70% less
Phi-4 (14B)	▶️ Start for free	2x faster	70% less
Llama 3.2 Vision (11B)	▶️ Start for free	2x faster	50% less
Llama 3.1 (8B)	▶️ Start for free	2x faster	70% less
Mistral v0.3 (7B)	▶️ Start for free	2.2x faster	75% less
Sesame-CSM (1B)	▶️ Start for free	1.5x faster	50% less

See all our notebooks for:Kaggle,GRPO,TTS &Vision
Seeall our models and ourSynthetic Dataset notebook in collaboration with Meta
See detailed documentation for Unslothhere

⚡ Quickstart

Install with pip (recommended) for Linux devices:

pip install unsloth

For Windows install instructions, seehere.

🦥 Unsloth.ai News

📣 NEW!Text-to-Speech (TTS) is now supported, includingsesame/csm-1b and STTopenai/whisper-large-v3.
📣 NEW!Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
📣 NEW! IntroducingDynamic 2.0 quants that set new benchmarks on 5-shot MMLU & KL Divergence.
📣Llama 4 by Meta, including Scout & Maverick are now supported.
📣EVERYTHING is now supported - all models (BERT, diffusion, Cohere, Mamba), FFT, etc. MultiGPU coming soon. Enable FFT withfull_finetuning = True, 8-bit withload_in_8bit = True.
📣Gemma 3 by Google:Read Blog. Weuploaded GGUFs, 4-bit models.
📣 Introducing Long-contextReasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
📣DeepSeek-R1 - run or fine-tune themwith our guide. All model uploads:here.

Click for more news

📣 Introducing UnslothDynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection onHugging Face here.
📣Phi-4 by Microsoft: We alsofixed bugs in Phi-4 anduploaded GGUFs, 4-bit.
📣Vision models now supported!Llama 3.2 Vision (11B),Qwen 2.5 VL (7B) andPixtral (12B) 2409
📣Llama 3.3 (70B), Meta's latest model is supported.
📣 We worked with Apple to addCut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
📣 We found and helped fix agradient accumulation bug! Please update Unsloth and transformers.
📣 We cut memory usage by afurther 30% and now support4x longer context windows!

🔗 Links and Resources

Type	Links
📚Documentation & Wiki	Read Our Docs
Twitter (aka X)	Follow us on X
💾Installation	Pip install
🔮Our Models	Unsloth Releases
✍️Blog	Read our Blogs
Reddit	Join our Reddit page

⭐ Key Features

Supportsfull-finetuning, pretraining, 4b-bit, 16-bit and8-bit training
Supportsall transformer-style models includingTTS, STT, multimodal, diffusion,BERT and more!
All kernels written inOpenAI's Triton language.Manual backprop engine.
0% loss in accuracy - no approximation methods - all exact.
No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)Check your GPU! GTX 1070, 1080 works, but is slow.
Works onLinux andWindows
If you trained a model with 🦥Unsloth, you can use this cool sticker!

💾 Install Unsloth

You can also see our documentation for more detailed installation and updating instructionshere.

Pip Installation

Install with pip (recommended) for Linux devices:

pip install unsloth

To update Unsloth:

pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

Seehere for advanced pip install instructions.

Windows Installation

Warning

Python 3.13 does not support Unsloth. Use 3.12, 3.11 or 3.10

Install NVIDIA Video Driver:You should install the latest version of your GPUs driver. Download drivers here:NVIDIA GPU Drive.
Install Visual Studio C++:You will need Visual Studio, with C++ installed. By default, C++ is not installed withVisual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, seehere.
Install CUDA Toolkit:Follow the instructions to installCUDA Toolkit.
Install PyTorch:You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.Install PyTorch.
Install Unsloth:

pipinstallunsloth

Notes

To run Unsloth directly on Windows:

Install Triton from this Windows fork and follow the instructionshere (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
In the SFTTrainer, setdataset_num_proc=1 to avoid a crashing issue:

trainer=SFTTrainer(dataset_num_proc=1,    ...)

Advanced/Troubleshooting

Foradvanced installation instructions or if you see weird errors during installations:

Installtorch andtriton. Go tohttps://pytorch.org to install it. For examplepip install torch torchvision torchaudio triton
Confirm if CUDA is installed correctly. Trynvcc. If that fails, you need to installcudatoolkit or CUDA drivers.
Installxformers manually. You can try installingvllm and seeing ifvllm succeeds. Check ifxformers succeeded withpython -m xformers.info Go tohttps://github.com/facebookresearch/xformers. Another option is to installflash-attn for Ampere GPUs.
Double check that your versions of Python, CUDA, CUDNN,torch,triton, andxformers are compatible with one another. ThePyTorch Compatibility Matrix may be useful.
Finally, installbitsandbytes and check it withpython -m bitsandbytes

Conda Installation (Optional)

⚠️Only use Conda if you have it. If not, use Pip. Select eitherpytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We supportpython=3.10,3.11,3.12.

conda create --name unsloth_env \    python=3.11 \    pytorch-cuda=12.1 \    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \    -yconda activate unsloth_envpip install unsloth

If you're looking to install Conda in a Linux environment,read here, or run the below 🔽

mkdir -p~/miniconda3wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O~/miniconda3/miniconda.shbash~/miniconda3/miniconda.sh -b -u -p~/miniconda3rm -rf~/miniconda3/miniconda.sh~/miniconda3/bin/conda init bash~/miniconda3/bin/conda init zsh

Advanced Pip Installation

⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different fortorch 2.2,2.3,2.4,2.5 and CUDA versions.

For other torch versions, we supporttorch211,torch212,torch220,torch230,torch240 and for CUDA versions, we supportcu118 andcu121 andcu124. For Ampere devices (A100, H100, RTX3090) and above, usecu118-ampere orcu121-ampere orcu124-ampere.

For example, if you havetorch 2.4 andCUDA 12.1, use:

pip install --upgrade pippip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you havetorch 2.5 andCUDA 12.4, use:

pip install --upgrade pippip install"unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"

And other examples:

pip install"unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

Or, run the below in a terminal to get theoptimal pip installation command:

wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py| python -

Or, run the below manually in a Python REPL:

try:importtorchexcept:raiseImportError('Install torch via `pip install torch`')frompackaging.versionimportVersionasVv=V(torch.__version__)cuda=str(torch.version.cuda)is_ampere=torch.cuda.get_device_capability()[0]>=8ifcuda!="12.1"andcuda!="11.8"andcuda!="12.4":raiseRuntimeError(f"CUDA ={cuda} not supported!")ifv<=V('2.1.0'):raiseRuntimeError(f"Torch ={v} too old!")elifv<=V('2.1.1'):x='cu{}{}-torch211'elifv<=V('2.1.2'):x='cu{}{}-torch212'elifv<V('2.3.0'):x='cu{}{}-torch220'elifv<V('2.4.0'):x='cu{}{}-torch230'elifv<V('2.5.0'):x='cu{}{}-torch240'elifv<V('2.6.0'):x='cu{}{}-torch250'else:raiseRuntimeError(f"Torch ={v} too new!")x=x.format(cuda.replace(".",""),"-ampere"ifis_ampereelse"")print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')

📜 Documentation

Go to our officialDocumentation for saving to GGUF, checkpointing, evaluation and more!
We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
We're in 🤗Hugging Face's official docs! Check out theSFT docs andDPO docs!
If you want to download models from the ModelScope community, please use an environment variable:UNSLOTH_USE_MODELSCOPE=1, and install the modelscope library by:pip install modelscope -U.

unsloth_cli.py also supportsUNSLOTH_USE_MODELSCOPE=1 to download models and datasets. please remember to use the model and dataset id in the ModelScope community.

fromunslothimportFastLanguageModel,FastModelimporttorchfromtrlimportSFTTrainer,SFTConfigfromdatasetsimportload_datasetmax_seq_length=2048# Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl="https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset=load_dataset("json",data_files= {"train" :url},split="train")# 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models= ["unsloth/Meta-Llama-3.1-8B-bnb-4bit",# Llama-3.1 2x faster"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit","unsloth/Meta-Llama-3.1-70B-bnb-4bit","unsloth/Meta-Llama-3.1-405B-bnb-4bit",# 4bit for 405b!"unsloth/Mistral-Small-Instruct-2409",# Mistral 22b 2x faster!"unsloth/mistral-7b-instruct-v0.3-bnb-4bit","unsloth/Phi-3.5-mini-instruct",# Phi-3.5 2x faster!"unsloth/Phi-3-medium-4k-instruct","unsloth/gemma-2-9b-bnb-4bit","unsloth/gemma-2-27b-bnb-4bit",# Gemma 2x faster!"unsloth/Llama-3.2-1B-bnb-4bit",# NEW! Llama 3.2 models"unsloth/Llama-3.2-1B-Instruct-bnb-4bit","unsloth/Llama-3.2-3B-bnb-4bit","unsloth/Llama-3.2-3B-Instruct-bnb-4bit","unsloth/Llama-3.3-70B-Instruct-bnb-4bit"# NEW! Llama 3.3 70B!]# More models at https://huggingface.co/unslothmodel,tokenizer=FastModel.from_pretrained(model_name="unsloth/gemma-3-4B-it",max_seq_length=2048,# Choose any for long context!load_in_4bit=True,# 4 bit quantization to reduce memoryload_in_8bit=False,# [NEW!] A bit more accurate, uses 2x memoryfull_finetuning=False,# [NEW!] We have full finetuning now!# token = "hf_...", # use one if using gated models)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=16,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,use_rslora=False,# We support rank stabilized LoRAloftq_config=None,# And LoftQ)trainer=SFTTrainer(model=model,train_dataset=dataset,tokenizer=tokenizer,args=SFTConfig(max_seq_length=max_seq_length,per_device_train_batch_size=2,gradient_accumulation_steps=4,warmup_steps=10,max_steps=60,logging_steps=1,output_dir="outputs",optim="adamw_8bit",seed=3407,    ),)trainer.train()# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like# (1) Saving to GGUF / merging to 16bit for vLLM# (2) Continued training from a saved LoRA adapter# (3) Adding an evaluation loop / OOMs# (4) Customized chat templates

💡 Reinforcement Learning

RL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in 🤗Hugging Face's official docs! We're on theGRPO docs and theDPO docs! List of RL notebooks:

Advanced Qwen3 GRPO notebook:Link
ORPO notebook:Link
DPO Zephyr notebook:Link
KTO notebook:Link
SimPO notebook:Link

Click for DPO code

importosos.environ["CUDA_VISIBLE_DEVICES"]="0"# Optional set GPU device IDfromunslothimportFastLanguageModelimporttorchfromtrlimportDPOTrainer,DPOConfigmax_seq_length=2048model,tokenizer=FastLanguageModel.from_pretrained(model_name="unsloth/zephyr-sft-bnb-4bit",max_seq_length=max_seq_length,load_in_4bit=True,)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=64,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=64,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,)dpo_trainer=DPOTrainer(model=model,ref_model=None,train_dataset=YOUR_DATASET_HERE,# eval_dataset = YOUR_DATASET_HERE,tokenizer=tokenizer,args=DPOConfig(per_device_train_batch_size=4,gradient_accumulation_steps=8,warmup_ratio=0.1,num_train_epochs=3,logging_steps=1,optim="adamw_8bit",seed=42,output_dir="outputs",max_length=1024,max_prompt_length=512,beta=0.1,    ),)dpo_trainer.train()

🥇 Performance Benchmarking

For our most detailed benchmarks, read ourLlama 3.3 Blog.
Benchmarking of Unsloth was also conducted by🤗Hugging Face.

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

Model	VRAM	🦥 Unsloth speed	🦥 VRAM reduction	🦥 Longer context	😊 Hugging Face + FA2
Llama 3.3 (70B)	80GB	2x	>75%	13x longer	1x
Llama 3.1 (8B)	80GB	2x	>70%	12x longer	1x

Context length benchmarks

Llama 3.1 (8B) max. context length

We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
8 GB	2,972	OOM
12 GB	21,848	932
16 GB	40,724	2,551
24 GB	78,475	5,789
40 GB	153,977	12,264
48 GB	191,728	15,502
80 GB	342,733	28,454

Llama 3.3 (70B) max. context length

We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
48 GB	12,106	OOM
80 GB	89,389	6,916

Citation

You can cite the Unsloth repo as follows:

@software{unsloth,author ={Daniel Han, Michael Han and Unsloth team},title ={Unsloth},url ={http://github.com/unslothai/unsloth},year ={2023}}

Thank You to

Thellama.cpp library that lets users save models with Unsloth
The Hugging Face team and theirTRL library
Erik for his help addingApple's ML Cross Entropy in Unsloth
Etherl for adding support forTTS, diffusion and BERT models
And of course for every single person who has contributed or has used Unsloth!

About

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

unsloth.ai

Releases

No releases published

Packages

No packages published

Languages

Python99.9%
Shell0.1%

Movatterモバイル変換

License

zkpranav/unsloth

Folders and files

Latest commit

History

Repository files navigation

Finetune Qwen3, Llama 4, Gemma 3, Phi-4 & Mistral 2x faster with 80% less VRAM!

✨ Finetune for Free

⚡ Quickstart

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

💾 Install Unsloth

Pip Installation

Windows Installation

Notes

Advanced/Troubleshooting

Conda Installation (Optional)

Advanced Pip Installation

📜 Documentation

💡 Reinforcement Learning

🥇 Performance Benchmarking

Context length benchmarks

Llama 3.1 (8B) max. context length

Llama 3.3 (70B) max. context length

Citation

Thank You to

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages