Make some important quantization settings as configurable e.g. calib_size, dataset, add_position_ids input etc. Remove hard-coding of such params.
Add RTN in supported algorithms.
Make calibrartion_providers configurable (added new config field) - user can provide calibration_providers explicitly in config json. Otherwise, EPs from available EPs will be used for calibration.
With configurable position_ids inputs in calibration-data and with support for available/specified calibration_providers, NvTensorRtRtx EP should also get supported.
Rename some of the config fields like calibration_method etc.
Documentation update:

Moved "More Inference Examples" outside of TensorRT Model-Optimizer section as it was before. Looks like it might had got inadvertently clubbed under TensorRT Model-Optimizer in some earlier commit16ffab8.
Added section for using different execution-providers with the example config.

Cleanup: Removed some packages from requirements files which should get installed with nvidia-modelopt[onnx]

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by runninglintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to updateexample documentation in a follow-up PR.

(Optional) Issue link

make quantization settings configurable and add rtn support

0326994

Copy link

Author

vishalpandya1990 commentedJul 18, 2025

@microsoft-github-policy-service agree company="NVIDIA"

@vishalpandya1990 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:
(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

Copy link

Author

vishalpandya1990 commentedJul 18, 2025

CC@jambayk

jambayk reviewed

Jul 18, 2025

View reviewed changes

olive/passes/onnx/nvmo_quantization.py OutdatedShow resolvedHide resolved

jambayk reviewed

Jul 18, 2025

View reviewed changes

test/requirements-test.txt OutdatedShow resolvedHide resolved

vishalpandya1990and others added2 commits

July 18, 2025 19:07

add cpu ep fallback and keep nvidia-modelopt in test requirements

900de5f

Merge branch 'main' into nv_ep_support_and_generalize

6e59765

jambayk reviewed

Jul 18, 2025

View reviewed changes

olive/passes/onnx/nvmo_quantization.py

		logger.debug("No tokenizer directory specified. Skipping calibration input preparation.")
		logger.warning("Not providing calibration data for quantization.")

		logger.info("===== Quantization Settings =====")

Copy link

Contributor

jambaykJul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: we use info level logs for workflow level logs so debug would be preferable here

jambayk reviewed

Jul 18, 2025

View reviewed changes

examples/phi3/README.md


		- [Web chat APP with Phi-3 and ONNX Runtime Web](https://github.com/microsoft/onnxruntime-inference-examples/tree/gs/chat/js/chat)
		The example `phi3_nvmo_ptq.json` demonstrates model building and quantization with DirectML execution-provider (EP). In order to use any other EP for the passes:
		- Use corresponding onnxruntime-genai and onnxruntime packages, along with suitable setup of thier dependencies/requirements as needed. Refer documentation for [execution-providers](https://onnxruntime.ai/docs/execution-providers/).

Copy link

Contributor

jambaykJul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

spellcheck found a type here withthier

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985

Are you sure you want to change the base?

Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985

Conversation

vishalpandya1990 commentedJul 18, 2025•
edited
Loading

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

vishalpandya1990 commentedJul 18, 2025

Uh oh!

vishalpandya1990 commentedJul 18, 2025

Uh oh!

Uh oh!

Uh oh!

jambaykJul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jambaykJul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Movatterモバイル変換

Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985

Are you sure you want to change the base?

Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985

Conversation

vishalpandya1990 commentedJul 18, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

vishalpandya1990 commentedJul 18, 2025

Uh oh!

vishalpandya1990 commentedJul 18, 2025

Uh oh!

Uh oh!

Uh oh!

jambaykJul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jambaykJul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vishalpandya1990 commentedJul 18, 2025•
edited
Loading