- Notifications
You must be signed in to change notification settings - Fork227
Update NVMO quantization pass - make quantization settings configurable and add RTN support#1985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree company="NVIDIA"
|
CC@jambayk |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
logger.debug("No tokenizer directory specified. Skipping calibration input preparation.") | ||
logger.warning("Not providing calibration data for quantization.") | ||
logger.info("===== Quantization Settings =====") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: we use info level logs for workflow level logs so debug would be preferable here
- [Web chat APP with Phi-3 and ONNX Runtime Web](https://github.com/microsoft/onnxruntime-inference-examples/tree/gs/chat/js/chat) | ||
The example `phi3_nvmo_ptq.json` demonstrates model building and quantization with DirectML execution-provider (EP). In order to use any other EP for the passes: | ||
- Use corresponding onnxruntime-genai and onnxruntime packages, along with suitable setup of thier dependencies/requirements as needed. Refer documentation for [execution-providers](https://onnxruntime.ai/docs/execution-providers/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
spellcheck found a type here withthier
Uh oh!
There was an error while loading.Please reload this page.
Describe your changes
Update NVMO quantization pass for the following:
Checklist before requesting a review
lintrunner -a
(Optional) Issue link