- Notifications
You must be signed in to change notification settings - Fork0
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
License
zkpranav/unsloth
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Notebooks are beginner friendly. Read ourguide. Add your dataset, click "Run All", and export your finetuned model to GGUF, Ollama, vLLM or Hugging Face.
| Unsloth supports | Free Notebooks | Performance | Memory use |
|---|---|---|---|
| Qwen3 (14B) | 2x faster | 70% less | |
| Qwen3 (4B): GRPO | 2x faster | 80% less | |
| Gemma 3 (4B) | 1.6x faster | 60% less | |
| Llama 3.2 (3B) | 2x faster | 70% less | |
| Phi-4 (14B) | 2x faster | 70% less | |
| Llama 3.2 Vision (11B) | 2x faster | 50% less | |
| Llama 3.1 (8B) | 2x faster | 70% less | |
| Mistral v0.3 (7B) | 2.2x faster | 75% less | |
| Sesame-CSM (1B) | 1.5x faster | 50% less |
- See all our notebooks for:Kaggle,GRPO,TTS &Vision
- Seeall our models and ourSynthetic Dataset notebook in collaboration with Meta
- See detailed documentation for Unslothhere
- Install with pip (recommended) for Linux devices:
pip install unslothFor Windows install instructions, seehere.
- 📣 NEW!Text-to-Speech (TTS) is now supported, including
sesame/csm-1band STTopenai/whisper-large-v3. - 📣 NEW!Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
- 📣 NEW! IntroducingDynamic 2.0 quants that set new benchmarks on 5-shot MMLU & KL Divergence.
- 📣Llama 4 by Meta, including Scout & Maverick are now supported.
- 📣EVERYTHING is now supported - all models (BERT, diffusion, Cohere, Mamba), FFT, etc. MultiGPU coming soon. Enable FFT with
full_finetuning = True, 8-bit withload_in_8bit = True. - 📣Gemma 3 by Google:Read Blog. Weuploaded GGUFs, 4-bit models.
- 📣 Introducing Long-contextReasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
- 📣DeepSeek-R1 - run or fine-tune themwith our guide. All model uploads:here.
Click for more news
- 📣 Introducing UnslothDynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection onHugging Face here.
- 📣Phi-4 by Microsoft: We alsofixed bugs in Phi-4 anduploaded GGUFs, 4-bit.
- 📣Vision models now supported!Llama 3.2 Vision (11B),Qwen 2.5 VL (7B) andPixtral (12B) 2409
- 📣Llama 3.3 (70B), Meta's latest model is supported.
- 📣 We worked with Apple to addCut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
- 📣 We found and helped fix agradient accumulation bug! Please update Unsloth and transformers.
- 📣 We cut memory usage by afurther 30% and now support4x longer context windows!
| Type | Links |
|---|---|
| 📚Documentation & Wiki | Read Our Docs |
| Follow us on X | |
| 💾Installation | Pip install |
| 🔮Our Models | Unsloth Releases |
| ✍️Blog | Read our Blogs |
Reddit | Join our Reddit page |
- Supportsfull-finetuning, pretraining, 4b-bit, 16-bit and8-bit training
- Supportsall transformer-style models includingTTS, STT, multimodal, diffusion,BERT and more!
- All kernels written inOpenAI's Triton language.Manual backprop engine.
- 0% loss in accuracy - no approximation methods - all exact.
- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)Check your GPU! GTX 1070, 1080 works, but is slow.
- Works onLinux andWindows
- If you trained a model with 🦥Unsloth, you can use this cool sticker!

You can also see our documentation for more detailed installation and updating instructionshere.
Install with pip (recommended) for Linux devices:
pip install unslothTo update Unsloth:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zooSeehere for advanced pip install instructions.
Warning
Python 3.13 does not support Unsloth. Use 3.12, 3.11 or 3.10
Install NVIDIA Video Driver:You should install the latest version of your GPUs driver. Download drivers here:NVIDIA GPU Drive.
Install Visual Studio C++:You will need Visual Studio, with C++ installed. By default, C++ is not installed withVisual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, seehere.
Install CUDA Toolkit:Follow the instructions to installCUDA Toolkit.
Install PyTorch:You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.Install PyTorch.
Install Unsloth:
pipinstallunsloth
To run Unsloth directly on Windows:
- Install Triton from this Windows fork and follow the instructionshere (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
- In the SFTTrainer, set
dataset_num_proc=1to avoid a crashing issue:
trainer=SFTTrainer(dataset_num_proc=1, ...)
Foradvanced installation instructions or if you see weird errors during installations:
- Install
torchandtriton. Go tohttps://pytorch.org to install it. For examplepip install torch torchvision torchaudio triton - Confirm if CUDA is installed correctly. Try
nvcc. If that fails, you need to installcudatoolkitor CUDA drivers. - Install
xformersmanually. You can try installingvllmand seeing ifvllmsucceeds. Check ifxformerssucceeded withpython -m xformers.infoGo tohttps://github.com/facebookresearch/xformers. Another option is to installflash-attnfor Ampere GPUs. - Double check that your versions of Python, CUDA, CUDNN,
torch,triton, andxformersare compatible with one another. ThePyTorch Compatibility Matrix may be useful. - Finally, install
bitsandbytesand check it withpython -m bitsandbytes
⚠️Only use Conda if you have it. If not, use Pip. Select eitherpytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We supportpython=3.10,3.11,3.12.
conda create --name unsloth_env \ python=3.11 \ pytorch-cuda=12.1 \ pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \ -yconda activate unsloth_envpip install unsloth
If you're looking to install Conda in a Linux environment,read here, or run the below 🔽
mkdir -p~/miniconda3wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O~/miniconda3/miniconda.shbash~/miniconda3/miniconda.sh -b -u -p~/miniconda3rm -rf~/miniconda3/miniconda.sh~/miniconda3/bin/conda init bash~/miniconda3/bin/conda init zsh
⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different fortorch 2.2,2.3,2.4,2.5 and CUDA versions.
For other torch versions, we supporttorch211,torch212,torch220,torch230,torch240 and for CUDA versions, we supportcu118 andcu121 andcu124. For Ampere devices (A100, H100, RTX3090) and above, usecu118-ampere orcu121-ampere orcu124-ampere.
For example, if you havetorch 2.4 andCUDA 12.1, use:
pip install --upgrade pippip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"Another example, if you havetorch 2.5 andCUDA 12.4, use:
pip install --upgrade pippip install"unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"And other examples:
pip install"unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"
Or, run the below in a terminal to get theoptimal pip installation command:
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py| python -Or, run the below manually in a Python REPL:
try:importtorchexcept:raiseImportError('Install torch via `pip install torch`')frompackaging.versionimportVersionasVv=V(torch.__version__)cuda=str(torch.version.cuda)is_ampere=torch.cuda.get_device_capability()[0]>=8ifcuda!="12.1"andcuda!="11.8"andcuda!="12.4":raiseRuntimeError(f"CUDA ={cuda} not supported!")ifv<=V('2.1.0'):raiseRuntimeError(f"Torch ={v} too old!")elifv<=V('2.1.1'):x='cu{}{}-torch211'elifv<=V('2.1.2'):x='cu{}{}-torch212'elifv<V('2.3.0'):x='cu{}{}-torch220'elifv<V('2.4.0'):x='cu{}{}-torch230'elifv<V('2.5.0'):x='cu{}{}-torch240'elifv<V('2.6.0'):x='cu{}{}-torch250'else:raiseRuntimeError(f"Torch ={v} too new!")x=x.format(cuda.replace(".",""),"-ampere"ifis_ampereelse"")print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')
- Go to our officialDocumentation for saving to GGUF, checkpointing, evaluation and more!
- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
- We're in 🤗Hugging Face's official docs! Check out theSFT docs andDPO docs!
- If you want to download models from the ModelScope community, please use an environment variable:
UNSLOTH_USE_MODELSCOPE=1, and install the modelscope library by:pip install modelscope -U.
unsloth_cli.py also supports
UNSLOTH_USE_MODELSCOPE=1to download models and datasets. please remember to use the model and dataset id in the ModelScope community.
fromunslothimportFastLanguageModel,FastModelimporttorchfromtrlimportSFTTrainer,SFTConfigfromdatasetsimportload_datasetmax_seq_length=2048# Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl="https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset=load_dataset("json",data_files= {"train" :url},split="train")# 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models= ["unsloth/Meta-Llama-3.1-8B-bnb-4bit",# Llama-3.1 2x faster"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit","unsloth/Meta-Llama-3.1-70B-bnb-4bit","unsloth/Meta-Llama-3.1-405B-bnb-4bit",# 4bit for 405b!"unsloth/Mistral-Small-Instruct-2409",# Mistral 22b 2x faster!"unsloth/mistral-7b-instruct-v0.3-bnb-4bit","unsloth/Phi-3.5-mini-instruct",# Phi-3.5 2x faster!"unsloth/Phi-3-medium-4k-instruct","unsloth/gemma-2-9b-bnb-4bit","unsloth/gemma-2-27b-bnb-4bit",# Gemma 2x faster!"unsloth/Llama-3.2-1B-bnb-4bit",# NEW! Llama 3.2 models"unsloth/Llama-3.2-1B-Instruct-bnb-4bit","unsloth/Llama-3.2-3B-bnb-4bit","unsloth/Llama-3.2-3B-Instruct-bnb-4bit","unsloth/Llama-3.3-70B-Instruct-bnb-4bit"# NEW! Llama 3.3 70B!]# More models at https://huggingface.co/unslothmodel,tokenizer=FastModel.from_pretrained(model_name="unsloth/gemma-3-4B-it",max_seq_length=2048,# Choose any for long context!load_in_4bit=True,# 4 bit quantization to reduce memoryload_in_8bit=False,# [NEW!] A bit more accurate, uses 2x memoryfull_finetuning=False,# [NEW!] We have full finetuning now!# token = "hf_...", # use one if using gated models)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=16,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,use_rslora=False,# We support rank stabilized LoRAloftq_config=None,# And LoftQ)trainer=SFTTrainer(model=model,train_dataset=dataset,tokenizer=tokenizer,args=SFTConfig(max_seq_length=max_seq_length,per_device_train_batch_size=2,gradient_accumulation_steps=4,warmup_steps=10,max_steps=60,logging_steps=1,output_dir="outputs",optim="adamw_8bit",seed=3407, ),)trainer.train()# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like# (1) Saving to GGUF / merging to 16bit for vLLM# (2) Continued training from a saved LoRA adapter# (3) Adding an evaluation loop / OOMs# (4) Customized chat templates
RL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in 🤗Hugging Face's official docs! We're on theGRPO docs and theDPO docs! List of RL notebooks:
- Advanced Qwen3 GRPO notebook:Link
- ORPO notebook:Link
- DPO Zephyr notebook:Link
- KTO notebook:Link
- SimPO notebook:Link
Click for DPO code
importosos.environ["CUDA_VISIBLE_DEVICES"]="0"# Optional set GPU device IDfromunslothimportFastLanguageModelimporttorchfromtrlimportDPOTrainer,DPOConfigmax_seq_length=2048model,tokenizer=FastLanguageModel.from_pretrained(model_name="unsloth/zephyr-sft-bnb-4bit",max_seq_length=max_seq_length,load_in_4bit=True,)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=64,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=64,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,)dpo_trainer=DPOTrainer(model=model,ref_model=None,train_dataset=YOUR_DATASET_HERE,# eval_dataset = YOUR_DATASET_HERE,tokenizer=tokenizer,args=DPOConfig(per_device_train_batch_size=4,gradient_accumulation_steps=8,warmup_ratio=0.1,num_train_epochs=3,logging_steps=1,optim="adamw_8bit",seed=42,output_dir="outputs",max_length=1024,max_prompt_length=512,beta=0.1, ),)dpo_trainer.train()
- For our most detailed benchmarks, read ourLlama 3.3 Blog.
- Benchmarking of Unsloth was also conducted by🤗Hugging Face.
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
| Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
|---|---|---|---|---|---|
| Llama 3.3 (70B) | 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|---|---|---|
| 8 GB | 2,972 | OOM |
| 12 GB | 21,848 | 932 |
| 16 GB | 40,724 | 2,551 |
| 24 GB | 78,475 | 5,789 |
| 40 GB | 153,977 | 12,264 |
| 48 GB | 191,728 | 15,502 |
| 80 GB | 342,733 | 28,454 |
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|---|---|---|
| 48 GB | 12,106 | OOM |
| 80 GB | 89,389 | 6,916 |
You can cite the Unsloth repo as follows:
@software{unsloth,author ={Daniel Han, Michael Han and Unsloth team},title ={Unsloth},url ={http://github.com/unslothai/unsloth},year ={2023}}
- Thellama.cpp library that lets users save models with Unsloth
- The Hugging Face team and theirTRL library
- Erik for his help addingApple's ML Cross Entropy in Unsloth
- Etherl for adding support forTTS, diffusion and BERT models
- And of course for every single person who has contributed or has used Unsloth!
About
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Languages
- Python99.9%
- Shell0.1%





