Optimum

You are viewingmain version, which requiresinstallation from source. If you'd likeregular pip install, checkout the latest stable version (v1.26.1).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Optimum Inference with ONNX Runtime

Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime.Optimum can be used to load optimized models from theHugging Face Hub and create pipelinesto run accelerated inference without rewriting your APIs.

Loading

Transformers models

Once your model wasexported to the ONNX format, you can load it by replacingAutoModelForXxx with the correspondingORTModelForXxx class.

  from transformers import AutoTokenizer, pipeline- from transformers import AutoModelForCausalLM+ from optimum.onnxruntime import ORTModelForCausalLM- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)  result = pipe("He never went out without a book under his arm")

More information for all the supportedORTModelForXxx in ourdocumentation

Diffusers models

Once your model wasexported to the ONNX format, you can load it by replacingDiffusionPipeline with the correspondingORTDiffusionPipeline class.

- from diffusers import DiffusionPipeline+ from optimum.onnxruntime import ORTDiffusionPipeline  model_id = "runwayml/stable-diffusion-v1-5"- pipeline = DiffusionPipeline.from_pretrained(model_id)+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")  prompt = "sailing ship in storm by Leonardo da Vinci"  image = pipeline(prompt).images[0]

Sentence Transformers models

Once your model wasexported to the ONNX format, you can load it by replacingAutoModel with the correspondingORTModelForFeatureExtraction class.

  from transformers import AutoTokenizer- from transformers import AutoModel+ from optimum.onnxruntime import ORTModelForFeatureExtraction  tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")  inputs = tokenizer("This is an example sentence", return_tensors="pt")  outputs = model(**inputs)

You can also load your ONNX model directly using thesentence_transformers.SentenceTransformer class, just make sure to havesentence-transformers>=3.2 installed. If the model wasn’t already converted to ONNX, it will be converted automatically on-the-fly.

  from sentence_transformers import SentenceTransformer  model_id = "sentence-transformers/all-MiniLM-L6-v2"- model = SentenceTransformer(model_id)+ model = SentenceTransformer(model_id, backend="onnx")  sentences = ["This is an example sentence", "Each sentence is converted"]  embeddings = model.encode(sentences)

Timm models

Once your model wasexported to the ONNX format, you can load it by replacing thecreate_model with the correspondingORTModelForImageClassification class.

  import requests  from PIL import Image- from timm import create_model  from timm.data import resolve_data_config, create_transform+ from optimum.onnxruntime import ORTModelForImageClassification- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")  transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"  image = Image.open(requests.get(url, stream=True).raw)  inputs = transform(image).unsqueeze(0)  outputs = model(inputs)

Converting your model to ONNX on-the-fly

In case your model wasn’t alreadyconverted to ONNX,ORTModel includes a method to convert your model to ONNX on-the-fly.Simply passexport=True to thefrom_pretrained() method, and your model will be loaded and converted to ONNX on-the-fly:

>>>from optimum.onnxruntimeimport ORTModelForSequenceClassification>>># Load the model from the hub and export it to the ONNX format>>>model_id ="distilbert-base-uncased-finetuned-sst-2-english">>>model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

Pushing your model to the Hub

You can also callpush_to_hub directly on your model to upload it to theHub.

>>>from optimum.onnxruntimeimport ORTModelForSequenceClassification>>># Load the model from the hub and export it to the ONNX format>>>model_id ="distilbert-base-uncased-finetuned-sst-2-english">>>model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)>>># Save the converted model locally>>>output_dir ="a_local_path_for_convert_onnx_model">>>model.save_pretrained(output_dir)# Push the onnx model to HF Hub>>>model.push_to_hub(output_dir, repository_id="my-onnx-repo")

<>Update on GitHub

←Inference pipelines How to apply graph optimization→

Movatterモバイル変換