Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

License

NotificationsYou must be signed in to change notification settings

unslothai/unsloth

unsloth logo

Train gpt-oss, DeepSeek, Gemma, Qwen & Llama 2x faster with 70% less VRAM!

✨ Train for Free

Notebooks are beginner friendly. Read ourguide. Add dataset, run, then deploy your trained model.

ModelFree NotebooksPerformanceMemory use
gpt-oss (20B)▶️ Start for free1.5x faster70% less
gpt-oss (20B): GRPO▶️ Start for free2x faster80% less
Qwen3: Advanced GRPO▶️ Start for free2x faster50% less
Qwen3-VL (8B): GSPO▶️ Start for free1.5x faster80% less
Gemma 3 (4B) Vision▶️ Start for free1.7x faster60% less
Gemma 3n (e4B)▶️ Start for free1.5x faster50% less
embeddinggemma (300M)▶️ Start for free2x faster20% less
Mistral Ministral 3 (3B)▶️ Start for free1.5x faster60% less
Llama 3.1 (8B) Alpaca▶️ Start for free2x faster70% less
Llama 3.2 Conversational▶️ Start for free2x faster70% less
Orpheus-TTS (3B)▶️ Start for free1.5x faster50% less

⚡ Quickstart

Linux or WSL

pip install unsloth

Windows

For Windows,pip install unsloth works only if you have Pytorch installed. Read ourWindows Guide.

Docker

Use our officialUnsloth Docker imageunsloth/unsloth container. Read ourDocker Guide.

AMD, Intel, Blackwell & DGX Spark

For RTX 50x, B200, 6000 GPUs:pip install unsloth. Read our guides for:Blackwell andDGX Spark.
To install Unsloth onAMD andIntel GPUs, follow ourAMD Guide andIntel Guide.

🦥 Unsloth News

  • TrainMoE LLMs 12x faster with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss.Blog
  • Embedding models: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning.BlogNotebooks
  • New7x longer context RL vs. all other setups, via our new batching algorithms.Blog
  • New RoPE & MLPTriton Kernels &Padding Free + Packing: 3x faster training & 30% less VRAM.Blog
  • 500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU.Blog
  • FP8 Reinforcement Learning: You can now do FP8 GRPO on consumer GPUs.BlogNotebook
  • Docker: Use Unsloth with no setup & environment issues with our new image.GuideDocker image
  • Vision RL: You can now train VLMs with GRPO or GSPO in Unsloth!Read guide
  • gpt-oss by OpenAI: Read ourRL blog,Flex Attention blog andgpt-oss Guide. 20B works on 14GB VRAM. 120B on 65GB.
Click for more news

🔗 Links and Resources

TypeLinks
 r/unsloth RedditJoin Reddit community
📚Documentation & WikiRead Our Docs
 Twitter (aka X)Follow us on X
💾InstallationPip & Docker Install
🔮Our ModelsUnsloth Catalog
✍️BlogRead our Blogs

⭐ Key Features

  • Supportsfull-finetuning, pretraining, 4-bit, 16-bit andFP8 training
  • Supportsall models includingTTS, multimodal,embedding and more! Any model that works in transformers, works in Unsloth.
  • The most efficient library forReinforcement Learning (RL), using 80% less VRAM. Supports GRPO, GSPO, DrGRPO, DAPO etc.
  • 0% loss in accuracy - no approximation methods - all exact.
  • Export anddeploy your model toGGUF llama.cpp,vLLM,SGLang and Hugging Face.
  • Supports NVIDIA (since 2018),AMD andIntel GPUs. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
  • Works onLinux, WSL andWindows
  • All kernels written in OpenAI's Triton language. Manual backprop engine.
  • If you trained a model with 🦥Unsloth, you can use this cool sticker!  

💾 Install Unsloth

You can also see our docs for more detailed installation and updating instructionshere.

Unsloth supports Python 3.13 or lower.

Pip Installation

Install with pip (recommended) for Linux devices:

pip install unsloth

To update Unsloth:

pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

Seehere for advanced pip install instructions.

Windows Installation

  1. Install NVIDIA Video Driver:You should install the latest driver for your GPU. Download drivers here:NVIDIA GPU Driver.

  2. Install Visual Studio C++:You will need Visual Studio, with C++ installed. By default, C++ is not installed withVisual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, seehere.

  3. Install CUDA Toolkit:Follow the instructions to installCUDA Toolkit.

  4. Install PyTorch:You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.Install PyTorch.

  5. Install Unsloth:

pipinstallunsloth

Advanced/Troubleshooting

Foradvanced installation instructions or if you see weird errors during installations:

First try using an isolated environment via thenpip install unsloth

python -m venv unslothsource unsloth/bin/activatepip install unsloth
  1. Installtorch andtriton. Go tohttps://pytorch.org to install it. For examplepip install torch torchvision torchaudio triton
  2. Confirm if CUDA is installed correctly. Trynvcc. If that fails, you need to installcudatoolkit or CUDA drivers.
  3. Installxformers manually via:
pip install ninjapip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`
  1. For GRPO runs, you can try installingvllm and seeing ifpip install vllm succeeds.
  2. Double check that your versions of Python, CUDA, CUDNN,torch,triton, andxformers are compatible with one another. ThePyTorch Compatibility Matrix may be useful.
  3. Finally, installbitsandbytes and check it withpython -m bitsandbytes

Conda Installation (Optional)

⚠️Only use Conda if you have it. If not, use Pip. We supportpython=3.10,3.11,3.12,3.13.

conda create --name unsloth_env python==3.12 -yconda activate unsloth_env

Usenvidia-smi to get the correct CUDA version like 13.0 which becomescu130

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130pip3 install unsloth
If you're looking to install Conda in a Linux environment,read here, or run the below 🔽
mkdir -p~/miniconda3wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O~/miniconda3/miniconda.shbash~/miniconda3/miniconda.sh -b -u -p~/miniconda3rm -rf~/miniconda3/miniconda.sh~/miniconda3/bin/conda init bash~/miniconda3/bin/conda init zsh

Advanced Pip Installation

⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different fortorch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10 and CUDA versions.

For other torch versions, we supporttorch211,torch212,torch220,torch230,torch240,torch250,torch260,torch270,torch280,torch290,torch2100 and for CUDA versions, we supportcu118 andcu121 andcu124. For Ampere devices (A100, H100, RTX3090) and above, usecu118-ampere orcu121-ampere orcu124-ampere. Note: torch 2.10 only supports CUDA 12.6, 12.8, and 13.0.

For example, if you havetorch 2.4 andCUDA 12.1, use:

pip install --upgrade pippip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you havetorch 2.9 andCUDA 13.0, use:

pip install --upgrade pippip install"unsloth[cu130-torch290] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you havetorch 2.10 andCUDA 12.6, use:

pip install --upgrade pippip install"unsloth[cu126-torch2100] @ git+https://github.com/unslothai/unsloth.git"

And other examples:

pip install"unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"pip install"unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

Or, run the below in a terminal to get theoptimal pip installation command:

wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py| python -

Or, run the below manually in a Python REPL:

try:importtorchexcept:raiseImportError('Install torch via `pip install torch`')frompackaging.versionimportVersionasVimportrev=V(re.match(r"[0-9\.]{3,}",torch.__version__).group(0))cuda=str(torch.version.cuda)is_ampere=torch.cuda.get_device_capability()[0]>=8USE_ABI=torch._C._GLIBCXX_USE_CXX11_ABIifcudanotin ("11.8","12.1","12.4","12.6","12.8","13.0"):raiseRuntimeError(f"CUDA ={cuda} not supported!")ifv<=V('2.1.0'):raiseRuntimeError(f"Torch ={v} too old!")elifv<=V('2.1.1'):x='cu{}{}-torch211'elifv<=V('2.1.2'):x='cu{}{}-torch212'elifv<V('2.3.0'):x='cu{}{}-torch220'elifv<V('2.4.0'):x='cu{}{}-torch230'elifv<V('2.5.0'):x='cu{}{}-torch240'elifv<V('2.5.1'):x='cu{}{}-torch250'elifv<=V('2.5.1'):x='cu{}{}-torch251'elifv<V('2.7.0'):x='cu{}{}-torch260'elifv<V('2.7.9'):x='cu{}{}-torch270'elifv<V('2.8.0'):x='cu{}{}-torch271'elifv<V('2.8.9'):x='cu{}{}-torch280'elifv<V('2.9.1'):x='cu{}{}-torch290'elifv<V('2.9.2'):x='cu{}{}-torch291'elifv<V('2.10.1'):x='cu{}{}-torch2100'else:raiseRuntimeError(f"Torch ={v} too new!")ifv>V('2.6.9')andcudanotin ("11.8","12.6","12.8","13.0"):raiseRuntimeError(f"CUDA ={cuda} not supported!")ifv>=V('2.10.0')andcudanotin ("12.6","12.8","13.0"):raiseRuntimeError(f"Torch 2.10 requires CUDA 12.6, 12.8, or 13.0! Got CUDA ={cuda}")x=x.format(cuda.replace(".",""),"-ampere"ifFalseelse"")# is_ampere is broken due to flash-attnprint(f'pip install --upgrade pip && pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git" --no-build-isolation')

Docker Installation

You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required.Read our guide.

This container requires installingNVIDIA's Container Toolkit.

docker run -d -e JUPYTER_PASSWORD="mypassword" \  -p 8888:8888 -p 2222:22 \  -v$(pwd)/work:/workspace/work \  --gpus all \  unsloth/unsloth

Access Jupyter Lab athttp://localhost:8888 and start fine-tuning!

📜 Documentation

Unsloth example code to fine-tune gpt-oss-20b:

fromunslothimportFastLanguageModel,FastModel,FastVisionModelimporttorchfromtrlimportSFTTrainer,SFTConfigfromdatasetsimportload_datasetmax_seq_length=2048# Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl="https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset=load_dataset("json",data_files= {"train" :url},split="train")# 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models= ["unsloth/gpt-oss-20b-unsloth-bnb-4bit",#or choose any model]# More models at https://huggingface.co/unslothmodel,tokenizer=FastLanguageModel.from_pretrained(model_name="unsloth/gpt-oss-20b",max_seq_length=max_seq_length,# Choose any for long context!load_in_4bit=True,# 4-bit quantization. False = 16-bit LoRA.load_in_8bit=False,# 8-bit quantizationload_in_16bit=False,# 16-bit LoRAfull_finetuning=False,# Use for full fine-tuning.trust_remote_code=False,# Enable to support new models# token = "hf_...", # use one if using gated models)# Do model patching and add fast LoRA weightsmodel=FastLanguageModel.get_peft_model(model,r=16,target_modules= ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,# Supports any, but = 0 is optimizedbias="none",# Supports any, but = "none" is optimized# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,max_seq_length=max_seq_length,use_rslora=False,# We support rank stabilized LoRAloftq_config=None,# And LoftQ)trainer=SFTTrainer(model=model,train_dataset=dataset,tokenizer=tokenizer,args=SFTConfig(max_seq_length=max_seq_length,per_device_train_batch_size=2,gradient_accumulation_steps=4,warmup_steps=10,max_steps=60,logging_steps=1,output_dir="outputs",optim="adamw_8bit",seed=3407,    ),)trainer.train()# Go to https://unsloth.ai/docs for advanced tips like# (1) Saving to GGUF / merging to 16bit for vLLM or SGLang# (2) Continued training from a saved LoRA adapter# (3) Adding an evaluation loop / OOMs# (4) Customized chat templates

💡 Reinforcement Learning

RL includingGRPO,GSPO,FP8 training, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth.

Read ourReinforcement Learning Guide or ouradvanced RL docs for batching, generation & training parameters.

List of RL notebooks:

  • gpt-oss GRPO notebook:Link
  • FP8 Qwen3-8B GRPO notebook (L4):Link
  • Qwen3-VL GSPO notebook:Link
  • Advanced Qwen3 GRPO notebook:Link
  • ORPO notebook:Link
  • DPO Zephyr notebook:Link
  • KTO notebook:Link
  • SimPO notebook:Link

🥇 Performance Benchmarking

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

ModelVRAM🦥 Unsloth speed🦥 VRAM reduction🦥 Longer context😊 Hugging Face + FA2
Llama 3.3 (70B)80GB2x>75%13x longer1x
Llama 3.1 (8B)80GB2x>70%12x longer1x

Context length benchmarks

Llama 3.1 (8B) max. context length

We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM🦥Unsloth context lengthHugging Face + FA2
8 GB2,972OOM
12 GB21,848932
16 GB40,7242,551
24 GB78,4755,789
40 GB153,97712,264
48 GB191,72815,502
80 GB342,73328,454

Llama 3.3 (70B) max. context length

We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM🦥Unsloth context lengthHugging Face + FA2
48 GB12,106OOM
80 GB89,3896,916


Citation

You can cite the Unsloth repo as follows:

@software{unsloth,author ={Daniel Han, Michael Han and Unsloth team},title ={Unsloth},url ={https://github.com/unslothai/unsloth},year ={2023}}

Thank You to

  • Thellama.cpp library that lets users save models with Unsloth
  • The Hugging Face team and their libraries:transformers andTRL
  • The Pytorch andTorch AO team for their contributions
  • And of course for every single person who has contributed or has used Unsloth!

About

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

[8]ページ先頭

©2009-2026 Movatter.jp