- Notifications
You must be signed in to change notification settings - Fork78
Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
License
huggingface/optimum-neuron
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators includingAWS Trainium andAWS Inferentia.Key Features:
- 🔄Drop-in replacement for standard Transformers training and inference
- ⚡Distributed training support with minimal code changes
- 🎯Optimized models for AWS accelerators
- 📈Production-ready inference with compiled models
To install the latest release of this package:
- For AWS Trainium (trn1) or AWS inferentia2 (inf2)
pip install --upgrade-strategy eager optimum-neuron[neuronx]
- For AWS inferentia (inf1)
pip install --upgrade-strategy eager optimum-neuron[neuron]
Optimum Neuron is a fast-moving project, and you may want to install it from source:
pip install git+https://github.com/huggingface/optimum-neuron.git
Make sure that you have installed the Neuron driver and tools before installingoptimum-neuron
,more extensive guide here.
Optimum Neuron makes AWS accelerator adoption seamless for Transformers users.
Training on AWS Trainium requires minimal changes to your existing code:
importtorchimporttorch_xla.runtimeasxrfromdatasetsimportload_datasetfromtransformersimportAutoTokenizer# Optimum Neuron's drop-in replacements for standard training componentsfromoptimum.neuronimportNeuronSFTConfig,NeuronSFTTrainer,NeuronTrainingArgumentsfromoptimum.neuron.models.trainingimportNeuronModelForCausalLMdefformat_dolly_dataset(example):"""Format Dolly dataset into instruction-following format."""instruction=f"### Instruction\n{example['instruction']}"context=f"### Context\n{example['context']}"ifexample["context"]elseNoneresponse=f"### Answer\n{example['response']}"# Combine all parts with double newlinesparts= [instruction,context,response]return"\n\n".join(partforpartinpartsifpart)defmain():# 📊 Load instruction-following datasetdataset=load_dataset("databricks/databricks-dolly-15k",split="train")# 🔧 Model configurationmodel_id="Qwen/Qwen3-1.7B"output_dir="qwen3-1.7b-finetuned"# 🔤 Setup tokenizertokenizer=AutoTokenizer.from_pretrained(model_id)tokenizer.pad_token=tokenizer.eos_token# ⚙️ Configure training for Trainiumtraining_args=NeuronTrainingArguments(learning_rate=1e-4,tensor_parallel_size=8,# Split model across 8 acceleratorsper_device_train_batch_size=1,# Batch size per devicegradient_accumulation_steps=8,logging_steps=1,output_dir=output_dir, )# 🧠 Load model optimized for Trainiummodel=NeuronModelForCausalLM.from_pretrained(model_id,training_args.trn_config,torch_dtype=torch.bfloat16,use_flash_attention_2=True,# Enable fast attention )# 📝 Setup supervised fine-tuningsft_config=NeuronSFTConfig(max_seq_length=2048,packing=True,# Pack multiple samples for efficiency**training_args.to_dict(), )# 🚀 Initialize trainer and start trainingtrainer=NeuronSFTTrainer(model=model,args=sft_config,tokenizer=tokenizer,train_dataset=dataset,formatting_func=format_dolly_dataset, )trainer.train()# 🤗 Share your model with the communitytrainer.push_to_hub(commit_message="Fine-tuned on Databricks Dolly dataset",blocking=True,model_name=output_dir, )ifxr.local_ordinal()==0:print(f"✅ Training complete! Model saved to{output_dir}")if__name__=="__main__":main()
This example demonstrates supervised fine-tuning on theDatabricks Dolly dataset usingNeuronSFTTrainer
andNeuronModelForCausalLM
- the Trainium-optimized versions of standard Transformers components.
Compilation (optional for first run):
NEURON_CC_FLAGS="--model-type transformer" neuron_parallel_compile torchrun --nproc_per_node 32 sft_finetune_qwen3.py
Training:
NEURON_CC_FLAGS="--model-type transformer" torchrun --nproc_per_node 32 sft_finetune_qwen3.py
You can compile and export your 🤗 Transformers models to a serialized format before inference on Neuron devices:
optimum-cliexport neuron \ --model distilbert-base-uncased-finetuned-sst-2-english \ --batch_size 1 \ --sequence_length 32 \ --auto_cast matmul \ --auto_cast_type bf16 \ distilbert_base_uncased_finetuned_sst2_english_neuron/
The command above will exportdistilbert-base-uncased-finetuned-sst-2-english
with static shapes:batch_size=1
andsequence_length=32
, and cast allmatmul
operations from FP32 to BF16. Check out theexporter guide for more compilation options.
Then you can run the exported Neuron model on Neuron devices withNeuronModelForXXX
classes which are similar toAutoModelForXXX
classes in 🤗 Transformers:
from transformers import AutoTokenizer-from transformers import AutoModelForSequenceClassification+from optimum.neuron import NeuronModelForSequenceClassification# PyTorch checkpoint-model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")+model = NeuronModelForSequenceClassification.from_pretrained("distilbert_base_uncased_finetuned_sst2_english_neuron")tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")inputs = tokenizer("Hamilton is considered to be the best musical of past years.", return_tensors="pt")logits = model(**inputs).logitsprint(model.config.id2label[logits.argmax().item()])# 'POSITIVE'
Check outthe documentation of Optimum Neuron for more advanced usage.
If you find any issue while using those, please open an issue or a pull request.
About
Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.