Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

License

NotificationsYou must be signed in to change notification settings

huggingface/optimum-onnx

Repository files navigation

Export your Hugging Face models to ONNX

Documentation |ONNX |Hub

Installation

Before you begin, make sure you install all necessary libraries by running:

pip install"optimum-onnx[onnxruntime]"@git+https://github.com/huggingface/optimum-onnx.git

If you want to use theGPU version of ONNX Runtime, make sure the CUDA and cuDNNrequirements are satisfied, and install the additional dependencies by running :

pip install"optimum-onnx[onnxruntime-gpu]"@git+https://github.com/huggingface/optimum-onnx.git

To avoid conflicts betweenonnxruntime andonnxruntime-gpu, make sure the packageonnxruntime is not installed by runningpip uninstall onnxruntime prior to installing Optimum.

ONNX export

It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to theONNX format and perform graph optimization as well as quantization easily:

optimum-cliexport onnx --model meta-llama/Llama-3.2-1B onnx_llama/

The model can also be optimized and quantized withonnxruntime.

For more information on the ONNX export, please check thedocumentation.

Inference

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner usingONNX Runtime in the backend:

  from transformers import AutoTokenizer, pipeline- from transformers import AutoModelForCausalLM+ from optimum.onnxruntime import ORTModelForCausalLM- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)  result = pipe("He never went out without a book under his arm")

More details on how to run ONNX models withORTModelForXXX classeshere.

About

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp