- Notifications
You must be signed in to change notification settings - Fork202
Commit7a36ccc
authored
Added support to export for BF16 weight and amax for vLLM fakequant QAT (#579)
## What does this PR do?**Type of change:** New Feature**Overview:** Support for vLLM fakequantize QAT/QAD checkpoint evaluation. This MRadds function to export checkpoint as BF16 weights and amax using`export_hf_checkpoint` for HF and `export_mcore_gpt_to_hf` for MCoreusing `export_bf16_weights_amax` option. The exported weights and amaxcan be used with[vllm_serve_fakequant.py](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/096ee13ea62bbb0ce0a4e4128c439651374d6235/examples/vllm_serve/vllm_serve_fakequant.py)script to run saved checkpoint.## UsageRefer to[README.md](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/096ee13ea62bbb0ce0a4e4128c439651374d6235/examples/vllm_serve#load-qatptq-model-and-serve-in-vllm-wip)## Testing- Tested HF approach by exporting bf16 model using QAT script andrunning vllm server, verified amax values match- Tested MCore approach by quantizing and exporting bf16 model usingquantize.sh and export.sh script and running vllm server, verified amaxvalues match## Before your PR is "*Ready for review*"<!-- If you haven't finished some of the above items you can still open`Draft` PR. -->- **Make sure you read and follow [Contributorguidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**and your commits are signed.- **Is this change backward compatible?**: Yes- **Did you write any new necessary tests?**: No- **Did you add or update any necessary documentation?**: Yes- **Did you update[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:Yes## Additional InformationMCore export script doesn't have the option to export enable currently---------Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>1 parent5842d73 commit7a36ccc
File tree
9 files changed
+547
-240
lines changed- examples/vllm_serve
- modelopt/torch
- export
- plugins
- quantization/plugins
- tests/gpu/torch/export
9 files changed
+547
-240
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
61 | 66 | | |
62 | 67 | | |
63 | | - | |
| 68 | + | |
64 | 69 | | |
65 | 70 | | |
66 | | - | |
67 | | - | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| |||
85 | 88 | | |
86 | 89 | | |
87 | 90 | | |
| 91 | + | |
| 92 | + | |
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
| 20 | + | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| |||
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
33 | 128 | | |
34 | 129 | | |
35 | 130 | | |
| |||
154 | 249 | | |
155 | 250 | | |
156 | 251 | | |
157 | | - | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
158 | 261 | | |
| 262 | + | |
159 | 263 | | |
160 | 264 | | |
161 | 265 | | |
| |||
0 commit comments
Comments
(0)