huggingface/optimum-onnxPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star9

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

huggingface.co/docs/optimum-onnx/en/quickstart

License

Apache-2.0 license

9 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
docs/source		docs/source
examples/onnxruntime		examples/onnxruntime
optimum		optimum
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Repository files navigation

🤗 Optimum ONNX

Export your Hugging Face models to ONNX

Documentation |ONNX |Hub

Installation

Before you begin, make sure you install all necessary libraries by running:

pip install"optimum-onnx[onnxruntime]"@git+https://github.com/huggingface/optimum-onnx.git

If you want to use theGPU version of ONNX Runtime, make sure the CUDA and cuDNNrequirements are satisfied, and install the additional dependencies by running :

pip install"optimum-onnx[onnxruntime-gpu]"@git+https://github.com/huggingface/optimum-onnx.git

To avoid conflicts betweenonnxruntime andonnxruntime-gpu, make sure the packageonnxruntime is not installed by runningpip uninstall onnxruntime prior to installing Optimum.

ONNX export

It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to theONNX format and perform graph optimization as well as quantization easily:

optimum-cliexport onnx --model meta-llama/Llama-3.2-1B onnx_llama/

The model can also be optimized and quantized withonnxruntime.

For more information on the ONNX export, please check thedocumentation.

Inference

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner usingONNX Runtime in the backend:

  from transformers import AutoTokenizer, pipeline- from transformers import AutoModelForCausalLM+ from optimum.onnxruntime import ORTModelForCausalLM- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)  result = pipe("He never went out without a book under his arm")

More details on how to run ONNX models withORTModelForXXX classeshere.

About

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

huggingface.co/docs/optimum-onnx/en/quickstart

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🤗 Optimum ONNX

Installation

ONNX export

Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors7

Languages

Movatterモバイル変換

License

huggingface/optimum-onnx

Folders and files

Latest commit

History

Repository files navigation

🤗 Optimum ONNX

Installation

ONNX export

Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors7

Languages

Packages