- Notifications
You must be signed in to change notification settings - Fork2
🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime
License
huggingface/optimum-onnx
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Before you begin, make sure you install all necessary libraries by running:
pip install"optimum-onnx[onnxruntime]"@git+https://github.com/huggingface/optimum-onnx.git
If you want to use theGPU version of ONNX Runtime, make sure the CUDA and cuDNNrequirements are satisfied, and install the additional dependencies by running :
pip install"optimum-onnx[onnxruntime-gpu]"@git+https://github.com/huggingface/optimum-onnx.git
To avoid conflicts betweenonnxruntime
andonnxruntime-gpu
, make sure the packageonnxruntime
is not installed by runningpip uninstall onnxruntime
prior to installing Optimum.
It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to theONNX format and perform graph optimization as well as quantization easily:
optimum-cliexport onnx --model meta-llama/Llama-3.2-1B onnx_llama/
The model can also be optimized and quantized withonnxruntime
.
For more information on the ONNX export, please check thedocumentation.
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner usingONNX Runtime in the backend:
from transformers import AutoTokenizer, pipeline- from transformers import AutoModelForCausalLM+ from optimum.onnxruntime import ORTModelForCausalLM- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B") pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) result = pipe("He never went out without a book under his arm")
More details on how to run ONNX models withORTModelForXXX
classeshere.
About
🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime
Resources
License
Uh oh!
There was an error while loading.Please reload this page.