Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork4.1k
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
License
unslothai/unsloth
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Notebooks are beginner friendly. Read ourguide. Add dataset, run, then export your trained model to GGUF, llama.cpp, Ollama, vLLM, SGLang or Hugging Face.
| Model | Free Notebooks | Performance | Memory use |
|---|---|---|---|
| gpt-oss (20B) | 1.5x faster | 70% less | |
| Mistral Ministral 3 (3B) | 1.5x faster | 60% less | |
| gpt-oss (20B): GRPO | 2x faster | 80% less | |
| Qwen3: Advanced GRPO | 2x faster | 50% less | |
| Qwen3-VL (8B): GSPO | 1.5x faster | 80% less | |
| Gemma 3 (270M) | 1.7x faster | 60% less | |
| Gemma 3n (4B) | 1.5x faster | 50% less | |
| DeepSeek-OCR (3B) | 1.5x faster | 30% less | |
| Llama 3.1 (8B) Alpaca | 2x faster | 70% less | |
| Llama 3.2 Conversational | 2x faster | 70% less | |
| Orpheus-TTS (3B) | 1.5x faster | 50% less |
- See all our notebooks for:Kaggle,GRPO,TTS &Vision
- Seeall our models andall our notebooks
- See detailed documentation for Unslothhere
pip install unsloth
For Windows,pip install unsloth works only if you have Pytorch installed. Read ourWindows Guide.
Use our officialUnsloth Docker imageunsloth/unsloth container. Read ourDocker Guide.
For RTX 50x, B200, 6000 GPUs:pip install unsloth. Read ourBlackwell Guide andDGX Spark Guide for more details.
- New RoPE & MLPTriton Kernels &Padding Free + Packing: 3x faster training & 30% less VRAM.Blog
- Ministral 3 by Mistral: Run Ministral 3 or fine-tune with vision/RL sodoku notebooks.Guide •Notebooks
- 500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU.Blog
- FP8 Reinforcement Learning: You can now do FP8 GRPO on consumer GPUs.Blog •Notebook
- DeepSeek-OCR: Fine-tune to improve language understanding by 89%.Guide •Notebook
- Docker: Use Unsloth with no setup & environment issues with our new image.Guide •Docker image
- gpt-oss RL: Introducing the fastest possible inference for gpt-oss RL!Read blog
- Vision RL: You can now train VLMs with GRPO or GSPO in Unsloth!Read guide
- gpt-oss by OpenAI: Read ourUnsloth Flex Attention blog andgpt-oss Guide. 20B works on 14GB VRAM. 120B on 65GB.
Click for more news
- Quantization-Aware Training: We collabed with Pytorch, recovering ~70% accuracy.Read blog
- Memory-efficient RL: We're introducing even better RL. Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context.Read blog
- Gemma 3n by Google:Read Blog. Weuploaded GGUFs, 4-bit models.
- Text-to-Speech (TTS) is now supported, including
sesame/csm-1band STTopenai/whisper-large-v3. - Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
- IntroducingDynamic 2.0 quants that set new benchmarks on 5-shot MMLU & Aider Polyglot.
- EVERYTHING is now supported - all models (TTS, BERT, Mamba), FFT, etc.MultiGPU coming soon. Enable FFT with
full_finetuning = True, 8-bit withload_in_8bit = True. - 📣DeepSeek-R1 - run or fine-tune themwith our guide. All model uploads:here.
- 📣 Introducing Long-contextReasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
- 📣 Introducing UnslothDynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection onHugging Face here.
- 📣Llama 4 by Meta, including Scout & Maverick are now supported.
- 📣Phi-4 by Microsoft: We alsofixed bugs in Phi-4 anduploaded GGUFs, 4-bit.
- 📣Vision models now supported!Llama 3.2 Vision (11B),Qwen 2.5 VL (7B) andPixtral (12B) 2409
- 📣Llama 3.3 (70B), Meta's latest model is supported.
- 📣 We worked with Apple to addCut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
- 📣 We found and helped fix agradient accumulation bug! Please update Unsloth and transformers.
- 📣 We cut memory usage by afurther 30% and now support4x longer context windows!
| Type | Links |
|---|---|
r/unsloth Reddit | Join Reddit community |
| 📚Documentation & Wiki | Read Our Docs |
| Follow us on X | |
| 💾Installation | Pip & Docker Install |
| 🔮Our Models | Unsloth Catalog |
| ✍️Blog | Read our Blogs |
- Supportsfull-finetuning, pretraining, 4b-bit, 16-bit andFP8 training
- Supportsall models includingTTS, multimodal,BERT and more! Any model that works in transformers, works in Unsloth.
- The most efficient library forReinforcement Learning (RL), using 80% less VRAM. Supports GRPO, GSPO, DrGRPO, DAPO etc.
- 0% loss in accuracy - no approximation methods - all exact.
- Supports NVIDIA (since 2018),AMD and Intel GPUs. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
- Works onLinux, WSL andWindows
- All kernels written inOpenAI's Triton language. Manual backprop engine.
- If you trained a model with 🦥Unsloth, you can use this cool sticker!

You can also see our docs for more detailed installation and updating instructionshere.
Unsloth supports Python 3.13 or lower.
Install with pip (recommended) for Linux devices:
pip install unslothTo update Unsloth:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zooSeehere for advanced pip install instructions.
Install NVIDIA Video Driver:You should install the latest driver for your GPU. Download drivers here:NVIDIA GPU Driver.
Install Visual Studio C++:You will need Visual Studio, with C++ installed. By default, C++ is not installed withVisual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, seehere.
Install CUDA Toolkit:Follow the instructions to installCUDA Toolkit.
Install PyTorch:You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.Install PyTorch.
Install Unsloth:
pipinstallunsloth
To run Unsloth directly on Windows:
- Install Triton from this Windows fork and follow the instructionshere (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
- In the
SFTConfig, setdataset_num_proc=1to avoid a crashing issue:
SFTConfig(dataset_num_proc=1, ...)
Foradvanced installation instructions or if you see weird errors during installations:
First try using an isolated environment via thenpip install unsloth
python -m venv unslothsource unsloth/bin/activatepip install unsloth- Install
torchandtriton. Go tohttps://pytorch.org to install it. For examplepip install torch torchvision torchaudio triton - Confirm if CUDA is installed correctly. Try
nvcc. If that fails, you need to installcudatoolkitor CUDA drivers. - Install
xformersmanually via:
pip install ninjapip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`- For GRPO runs, you can try installing
vllmand seeing ifpip install vllmsucceeds. - Double check that your versions of Python, CUDA, CUDNN,
torch,triton, andxformersare compatible with one another. ThePyTorch Compatibility Matrix may be useful. - Finally, install
bitsandbytesand check it withpython -m bitsandbytes
⚠️Only use Conda if you have it. If not, use Pip. Select eitherpytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We supportpython=3.10,3.11,3.12.
conda create --name unsloth_env \ python=3.11 \ pytorch-cuda=12.1 \ pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \ -yconda activate unsloth_envpip install unsloth
If you're looking to install Conda in a Linux environment,read here, or run the below 🔽
mkdir -p~/miniconda3wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O~/miniconda3/miniconda.shbash~/miniconda3/miniconda.sh -b -u -p~/miniconda3rm -rf~/miniconda3/miniconda.sh~/miniconda3/bin/conda init bash~/miniconda3/bin/conda init zsh
⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different fortorch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9 and CUDA versions.
For other torch versions, we supporttorch211,torch212,torch220,torch230,torch240,torch250,torch260,torch270,torch280,torch290 and for CUDA versions, we supportcu118 andcu121 andcu124. For Ampere devices (A100, H100, RTX3090) and above, usecu118-ampere orcu121-ampere orcu124-ampere.
For example, if you havetorch 2.4 andCUDA 12.1, use:
pip install --upgrade pippip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"Another example, if you havetorch 2.9 andCUDA 13.0, use:
pip install --upgrade pippip install"unsloth[cu130-torch290] @ git+https://github.com/unslothai/unsloth.git"And other examples:
pip install"unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"
Or, run the below in a terminal to get theoptimal pip installation command:
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py| python -Or, run the below manually in a Python REPL:
try:importtorchexcept:raiseImportError('Install torch via `pip install torch`')frompackaging.versionimportVersionasVimportrev=V(re.match(r"[0-9\.]{3,}",torch.__version__).group(0))cuda=str(torch.version.cuda)is_ampere=torch.cuda.get_device_capability()[0]>=8USE_ABI=torch._C._GLIBCXX_USE_CXX11_ABIifcudanotin ("11.8","12.1","12.4","12.6","12.8","13.0"):raiseRuntimeError(f"CUDA ={cuda} not supported!")ifv<=V('2.1.0'):raiseRuntimeError(f"Torch ={v} too old!")elifv<=V('2.1.1'):x='cu{}{}-torch211'elifv<=V('2.1.2'):x='cu{}{}-torch212'elifv<V('2.3.0'):x='cu{}{}-torch220'elifv<V('2.4.0'):x='cu{}{}-torch230'elifv<V('2.5.0'):x='cu{}{}-torch240'elifv<V('2.5.1'):x='cu{}{}-torch250'elifv<=V('2.5.1'):x='cu{}{}-torch251'elifv<V('2.7.0'):x='cu{}{}-torch260'elifv<V('2.7.9'):x='cu{}{}-torch270'elifv<V('2.8.0'):x='cu{}{}-torch271'elifv<V('2.8.9'):x='cu{}{}-torch280'elifv<V('2.9.1'):x='cu{}{}-torch290'elifv<V('2.9.2'):x='cu{}{}-torch291'else:raiseRuntimeError(f"Torch ={v} too new!")ifv>V('2.6.9')andcudanotin ("11.8","12.6","12.8","13.0"):raiseRuntimeError(f"CUDA ={cuda} not supported!")x=x.format(cuda.replace(".",""),"-ampere"ifFalseelse"")# is_ampere is broken due to flash-attnprint(f'pip install --upgrade pip && pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git" --no-build-isolation')
You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required.Read our guide.
This container requires installingNVIDIA's Container Toolkit.
docker run -d -e JUPYTER_PASSWORD="mypassword" \ -p 8888:8888 -p 2222:22 \ -v$(pwd)/work:/workspace/work \ --gpus all \ unsloth/unsloth
Access Jupyter Lab athttp://localhost:8888 and start fine-tuning!
- Go to our officialDocumentation forrunning models,saving to GGUF,checkpointing,evaluation and more!
- Read our Guides for:Fine-tuning,Reinforcement Learning,Text-to-Speech (TTS),Vision andany model.
- We support Huggingface's transformers, TRL, Trainer, Seq2SeqTrainer and Pytorch code.
Unsloth example code to fine-tune gpt-oss-20b:
fromunslothimportFastLanguageModel,FastModelimporttorchfromtrlimportSFTTrainer,SFTConfigfromdatasetsimportload_datasetmax_seq_length=2048# Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl="https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset=load_dataset("json",data_files= {"train" :url},split="train")# 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models= ["unsloth/gpt-oss-20b-unsloth-bnb-4bit",#or choose any model]# More models at https://huggingface.co/unslothmodel,tokenizer=FastModel.from_pretrained(model_name="unsloth/gpt-oss-20b",max_seq_length=2048,# Choose any for long context!load_in_4bit=True,# 4-bit quantization. False = 16-bit LoRA.load_in_8bit=False,# 8-bit quantizationload_in_16bit=False,# [NEW!] 16-bit LoRAfull_finetuning=False,# Use for full fine-tuning.# token = "hf_...", # use one if using gated models)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=16,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,use_rslora=False,# We support rank stabilized LoRAloftq_config=None,# And LoftQ)trainer=SFTTrainer(model=model,train_dataset=dataset,tokenizer=tokenizer,args=SFTConfig(max_seq_length=max_seq_length,per_device_train_batch_size=2,gradient_accumulation_steps=4,warmup_steps=10,max_steps=60,logging_steps=1,output_dir="outputs",optim="adamw_8bit",seed=3407, ),)trainer.train()# Go to https://docs.unsloth.ai for advanced tips like# (1) Saving to GGUF / merging to 16bit for vLLM or SGLang# (2) Continued training from a saved LoRA adapter# (3) Adding an evaluation loop / OOMs# (4) Customized chat templates
RL includingGRPO,GSPO,FP8 traning, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth.Read ourReinforcement Learning Guide or ouradvanced RL docs for batching, generation & training parameters.
List of RL notebooks:
- gpt-oss GSPO notebook:Link
- Qwen2.5-VL GSPO notebook:Link
- Advanced Qwen3 GRPO notebook:Link
- FP8 Qwen3-8B GRPO notebook (L4):Link
- ORPO notebook:Link
- DPO Zephyr notebook:Link
- KTO notebook:Link
- SimPO notebook:Link
- For our most detailed benchmarks, read ourLlama 3.3 Blog.
- Benchmarking of Unsloth was also conducted by🤗Hugging Face.
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
| Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
|---|---|---|---|---|---|
| Llama 3.3 (70B) | 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|---|---|---|
| 8 GB | 2,972 | OOM |
| 12 GB | 21,848 | 932 |
| 16 GB | 40,724 | 2,551 |
| 24 GB | 78,475 | 5,789 |
| 40 GB | 153,977 | 12,264 |
| 48 GB | 191,728 | 15,502 |
| 80 GB | 342,733 | 28,454 |
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|---|---|---|
| 48 GB | 12,106 | OOM |
| 80 GB | 89,389 | 6,916 |
You can cite the Unsloth repo as follows:
@software{unsloth,author ={Daniel Han, Michael Han and Unsloth team},title ={Unsloth},url ={http://github.com/unslothai/unsloth},year ={2023}}
- Thellama.cpp library that lets users save models with Unsloth
- The Hugging Face team and their libraries:transformers andTRL
- The Pytorch andTorch AO team for their contributions
- Erik for his help addingApple's ML Cross Entropy in Unsloth
- Etherl for adding support forTTS, diffusion and BERT models
- And of course for every single person who has contributed or has used Unsloth!
About
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Topics
Resources
License
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.





