Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
vishalpandya1990 wants to merge3 commits intomicrosoft:main
base:main
Choose a base branch
Loading
fromvishalpandya1990:nv_ep_support_and_generalize

Conversation

vishalpandya1990
Copy link

@vishalpandya1990vishalpandya1990 commentedJul 18, 2025
edited
Loading

Describe your changes

Update NVMO quantization pass for the following:

  1. Make some important quantization settings as configurable e.g. calib_size, dataset, add_position_ids input etc. Remove hard-coding of such params.
  2. Add RTN in supported algorithms.
  3. Make calibrartion_providers configurable (added new config field) - user can provide calibration_providers explicitly in config json. Otherwise, EPs from available EPs will be used for calibration.
  4. With configurable position_ids inputs in calibration-data and with support for available/specified calibration_providers, NvTensorRtRtx EP should also get supported.
  5. Rename some of the config fields like calibration_method etc.
  6. Documentation update:
  • Moved "More Inference Examples" outside of TensorRT Model-Optimizer section as it was before. Looks like it might had got inadvertently clubbed under TensorRT Model-Optimizer in some earlier commit16ffab8.
  • Added section for using different execution-providers with the example config.
  1. Cleanup: Removed some packages from requirements files which should get installed with nvidia-modelopt[onnx]

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by runninglintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
  • Is this PR including examples changes? If yes, please remember to updateexample documentation in a follow-up PR.

(Optional) Issue link

@vishalpandya1990
Copy link
Author

@microsoft-github-policy-service agree company="NVIDIA"

@vishalpandya1990 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@vishalpandya1990
Copy link
Author

CC@jambayk

logger.debug("No tokenizer directory specified. Skipping calibration input preparation.")
logger.warning("Not providing calibration data for quantization.")

logger.info("===== Quantization Settings =====")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: we use info level logs for workflow level logs so debug would be preferable here


- [Web chat APP with Phi-3 and ONNX Runtime Web](https://github.com/microsoft/onnxruntime-inference-examples/tree/gs/chat/js/chat)
The example `phi3_nvmo_ptq.json` demonstrates model building and quantization with DirectML execution-provider (EP). In order to use any other EP for the passes:
- Use corresponding onnxruntime-genai and onnxruntime packages, along with suitable setup of thier dependencies/requirements as needed. Refer documentation for [execution-providers](https://onnxruntime.ai/docs/execution-providers/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

spellcheck found a type here withthier

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jambaykjambaykjambayk left review comments

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@vishalpandya1990@jambayk

[8]ページ先頭

©2009-2025 Movatter.jp