- Notifications
You must be signed in to change notification settings - Fork202
Commit5842d73
authored
[OMNIML-2244] enable fp8 and int8 ONNX export (#594)
## What does this PR do?**Type of change:** Example update**Overview:**- Support ONNX export for fp8 and int8 precisions- Added utility functions to check for fp8 and int8 quantization (willbe used in ONNXExporter)- Fixed a bug in evaluation API for high batch sizes- Added function to replace zeros from scales to smallest positive valuein fp16## Usage<!-- You can potentially add a usage example below. -->```pythonpython torch_quant_to_onnx.py \ --quantize_mode fp8/int8 \ --onnx_save_path <onnx_path> ```## TestingValidated the accuracy and latency of int8 and fp8 models:| Metric | INT8 | FP8 ||--------|------|-----|| Top1 Accuracy | 84.584% | 85.062% || Top5 Accuracy | 97.3% | 97.534% || Inference Latency | 8.4825 ms | 8.15096 ms |## Before your PR is "*Ready for review*"<!-- If you haven't finished some of the above items you can still open`Draft` PR. -->- **Make sure you read and follow [Contributorguidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**and your commits are signed.- **Is this change backward compatible?**: Yes- **Did you write any new necessary tests?**: No- **Did you add or update any necessary documentation?**: Yes- **Did you update[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:No---------Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>1 parenta5025a2 commit5842d73
File tree
5 files changed
+53
-7
lines changed- examples/onnx_ptq
- modelopt
- onnx/quantization
- torch/_deploy/utils
5 files changed
+53
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
132 | | - | |
| 132 | + | |
133 | 133 | | |
134 | | - | |
| 134 | + | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | | - | |
| 151 | + | |
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
155 | | - | |
156 | 155 | | |
| 156 | + | |
| 157 | + | |
157 | 158 | | |
158 | 159 | | |
159 | 160 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
323 | 323 | | |
324 | 324 | | |
325 | 325 | | |
326 | | - | |
| 326 | + | |
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1037 | 1037 | | |
1038 | 1038 | | |
1039 | 1039 | | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
1040 | 1055 | | |
1041 | 1056 | | |
1042 | 1057 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
336 | 337 | | |
337 | 338 | | |
338 | 339 | | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
339 | 366 | | |
340 | 367 | | |
341 | 368 | | |
| |||
510 | 537 | | |
511 | 538 | | |
512 | 539 | | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
513 | 543 | | |
514 | 544 | | |
515 | 545 | | |
| |||
0 commit comments
Comments
(0)