NVIDIA/TensorRT-Model-OptimizerPublic

NotificationsYou must be signed in to change notification settings
Fork202
Star1.6k

Moved vllm fq export code to separate files#612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

kinjalpatel27 merged 3 commits intomainfromkinjal/move_export_code

Nov 27, 2025

Merged

Moved vllm fq export code to separate files#612

kinjalpatel27 merged 3 commits intomainfromkinjal/move_export_code

Nov 27, 2025

Conversation

Copy link

Contributor

kinjalpatel27 commentedNov 26, 2025•
edited by coderabbitaibot
Loading

What does this PR do?

Type of change: : Bug fix

Overview:
moved vLLM fakequant checkpoint export code to separate files:

for HF export -> modelopt.torch.export.plugins.vllm_fq_hf
for megatron export -> modelopt.torch.export.plugins.vllm_fq_megatron

Usage

Refer to
README.md

Testing

Tested HF approach by exporting bf16 model using QAT script and running vllm server, verified amax values match
Tested MCore approach by quantizing and exporting bf16 model using quantize.sh and export.sh script and running vllm server, verified amax values match
Tested using unit tests intests/gpu/torch/export/test_vllm_fq_hf_export.py andtests/gpu/torch/export/test_vllm_fq_megatron_export.py

Before your PR is "Ready for review"

Make sure you read and followContributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: NA
Did you add or update any necessary documentation?: Yes
Did you updateChangelog?: NA

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added dedicated export functions for vLLM fakequant checkpoint format, supporting both HuggingFace and Megatron Core models.
Refactor
- Simplified export API by removing conditional export flags for cleaner, more predictable behavior.
- Reorganized export functionality into focused plugin modules for improved maintainability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Moved vllm fq export code to separate files

f4281c1

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

kinjalpatel27 requested review froma team ascode owners

November 26, 2025 21:32

kinjalpatel27 requested a review fromsugunav14

November 26, 2025 21:32

Copy link

ContributorAuthor

kinjalpatel27 commentedNov 26, 2025

@coderabbitai review

Copy link

Contributor

coderabbitaibot commentedNov 26, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

Contributor

coderabbitaibot commentedNov 26, 2025

Walkthrough

This pull request refactors vLLM fakequant export functionality by splitting the monolithicvllm_fakequant.py module into separate HuggingFace and Megatron-specific export modules, removing the publicexport_vllm_fq_weights_qstate parameter from unified export APIs, and introducing internal quantization state retrieval methods toGPTModelExporter.

Changes

Cohort / File(s)	Summary
Documentation Updates `examples/vllm_serve/README.md`	Updated Step 1 export instructions to reference new vLLM-specific export functions:`export_hf_vllm_fq_checkpoint` and`export_mcore_gpt_to_hf_vllm_fq` instead of generic unified export paths.
Package Initialization `modelopt/torch/export/__init__.py`,`modelopt/torch/export/plugins/__init__.py`	Added imports to expose new vLLM fakequant export plugins (`vllm_fq_hf` and`vllm_fq_megatron`) at the package level through`from .plugins import *` in the export module and module imports in the plugins package.
vLLM HuggingFace Export `modelopt/torch/export/plugins/vllm_fq_hf.py`	New module introducing`export_hf_vllm_fq_checkpoint()` to export quantized HuggingFace models to vLLM fakequant checkpoint format, extracting amax values, stripping quantizer attributes, and saving to quant_amax.pth.
vLLM Megatron Export `modelopt/torch/export/plugins/vllm_fq_megatron.py`	New module introducing`export_mcore_gpt_to_hf_vllm_fq()` and`VllmFqGPTModelExporter` class for distributed Megatron-Core GPT export with per-rank amax state gathering and synchronization across ranks.
Legacy Module Removal `modelopt/torch/export/plugins/vllm_fakequant.py`	Removed entire module containing`export_hf_vllm_fq_checkpoint()`,`get_mcore_vllm_fq_quantized_state()`, and`gather_mcore_vllm_fq_quantized_state_dict()` functions, as functionality is now split into dedicated modules.
Unified HuggingFace Export `modelopt/torch/export/unified_export_hf.py`	Removed`export_vllm_fq_weights_qstate` parameter and associated conditional export path;`export_hf_checkpoint()` now follows standard HF export flow unconditionally.
Unified Megatron Export `modelopt/torch/export/unified_export_megatron.py`	Removed`export_vllm_fq_weights_qstate` parameter from`GPTModelExporter.__init__()` and`export_mcore_gpt_to_hf()`; added internal methods`_get_quantized_state()` and`_get_quantization_format()` to encapsulate quantization state retrieval; removed dependency on legacy vllm_fakequant helpers.
vLLM HuggingFace Tests `tests/gpu/torch/export/test_vllm_fq_hf_export.py`	New test file validating end-to-end export of quantized HuggingFace models via`export_hf_vllm_fq_checkpoint()`, verifying amax file creation, weight preservation, and quantization state consistency.
vLLM Megatron Tests `tests/gpu/torch/export/test_vllm_fq_megatron_export.py`	Updated test to use new`export_mcore_gpt_to_hf_vllm_fq()` function, removed HuggingFace-specific test flow and unused imports related to legacy export path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Primary concern:modelopt/torch/export/unified_export_megatron.py contains the most complex refactoring with internal method extraction, parameter removal, and control-flow changes involving quantization state retrieval across multiple helper functions. Verify that_get_quantized_state() and_get_quantization_format() correctly replace all usages of the removed legacy functions.
Secondary concern:modelopt/torch/export/plugins/vllm_fq_megatron.py introduces distributed rank-coordination logic withgather_mcore_vllm_fq_quantized_state_dict() and custom exporter class; review barrier synchronization, rank-0 aggregation, and compatibility constraints.
Note: Homogeneous parameter removals acrossunified_export_hf.py and legacy module deletion reduce review complexity despite file count.

Poem

🐰Hop hop, refactor's done!
Old fakequant melts to two—
HF and Megatron now shine,
Each export path shines bright and new. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 59.09% which is insufficient. The required threshold is 80.00%.	You can run`@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the primary refactoring: moving vLLM fakequant export code into separate dedicated modules (vllm_fq_hf and vllm_fq_megatron).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branchkinjal/move_export_code

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates thehigh-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using thehigh_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Usehigh_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:
📝 Description — Summarize the main change in 50–60 words, explaining what was done.
📓 References — List relevant issues, discussions, documentation, or related PRs.
📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |
✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment@coderabbitai help to get the list of available commands and usage tips.}

coderabbitaibot reviewed

Nov 26, 2025

View reviewed changes

Copy link

Contributor

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)

modelopt/torch/export/plugins/vllm_fq_hf.py (1)
28-43:Consider documenting the in-place model mutation.
The function modifies the model in place by removing quantizer attributes. Users may not expect this side effect. Consider adding a note in the docstring.
     """Exports the torch model weights and amax values separately.     This function:     1. Extracts amax values for calibration     2. Deletes all quantizer parameters from state dict to store only weights in original dtype     3. Saves the model weights+    Note:+        This function modifies the model in place by removing quantizer attributes.+        The model should not be used for inference after calling this function.+     Args:         model: The quantized model to export         export_dir: Directory to save the amax values     """
tests/gpu/torch/export/test_vllm_fq_hf_export.py (1)
91-93:Minor: Consider more explicit assertion message for key comparison.
Whileamax_dict.keys() == amax_state_dict.keys() works correctly, the assertion message could be more specific about what keys are present vs. expected to aid debugging.
Consider this diff for clearer failure messages:
-    assert amax_dict.keys() == amax_state_dict.keys(), (-        "amax keys mismatch between before and after export"-    )+    assert set(amax_dict.keys()) == set(amax_state_dict.keys()), (+        f"amax keys mismatch: exported has {len(amax_dict)} keys, "+        f"expected {len(amax_state_dict)} keys. "+        f"Missing: {set(amax_state_dict.keys()) - set(amax_dict.keys())}, "+        f"Extra: {set(amax_dict.keys()) - set(amax_state_dict.keys())}"+    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between0a4f0a8 andf4281c1.

📒 Files selected for processing (10)

examples/vllm_serve/README.md (1 hunks)
modelopt/torch/export/__init__.py (1 hunks)
modelopt/torch/export/plugins/__init__.py (1 hunks)
modelopt/torch/export/plugins/vllm_fakequant.py (0 hunks)
modelopt/torch/export/plugins/vllm_fq_hf.py (1 hunks)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1 hunks)
modelopt/torch/export/unified_export_hf.py (1 hunks)
modelopt/torch/export/unified_export_megatron.py (8 hunks)
tests/gpu/torch/export/test_vllm_fq_hf_export.py (1 hunks)
tests/gpu/torch/export/test_vllm_fq_megatron_export.py (2 hunks)

💤 Files with no reviewable changes (1)

modelopt/torch/export/plugins/vllm_fakequant.py

🧰 Additional context used

🧬 Code graph analysis (5)

tests/gpu/torch/export/test_vllm_fq_hf_export.py (2)

tests/_test_utils/torch/transformers_models.py (1)
create_tiny_llama_dir (121-135)
modelopt/torch/export/plugins/vllm_fq_hf.py (1)
export_hf_vllm_fq_checkpoint (28-62)

tests/gpu/torch/export/test_vllm_fq_megatron_export.py (1)

modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
export_mcore_gpt_to_hf_vllm_fq (84-112)

modelopt/torch/export/plugins/vllm_fq_hf.py (4)

modelopt/torch/export/layer_utils.py (1)
is_quantlinear (346-348)
modelopt/torch/quantization/utils.py (1)
get_quantizer_state_dict (492-502)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
save_pretrained (71-78)
modelopt/torch/export/unified_export_megatron.py (2)
save_pretrained (263-464)
state_dict (467-471)

modelopt/torch/export/unified_export_megatron.py (3)

modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
_get_quantization_format (80-81)
modelopt/torch/export/quant_utils.py (5)
get_weight_block_size (412-429)
get_weight_scaling_factor (251-293)
get_weight_scaling_factor_2 (296-320)
get_activation_scaling_factor (234-248)
get_quantization_format (432-533)
modelopt/torch/export/model_config.py (2)
weight (145-150)
bias (153-163)

modelopt/torch/export/plugins/vllm_fq_megatron.py (1)

modelopt/torch/export/unified_export_megatron.py (4)
GPTModelExporter (115-1210)
state_dict (467-471)
save_pretrained (263-464)
_get_quantization_format (563-564)

🪛 GitHub Actions: Code Quality

modelopt/torch/export/plugins/vllm_fq_megatron.py

[error] 71-71: E501 Line too long (129 > 120). Ruff: Line too long in def signature.

[error] 75-75: E501 Line too long (141 > 120). Ruff: Line too long in multi-line string or statement.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs

🔇 Additional comments (16)

modelopt/torch/export/unified_export_megatron.py (4)
507-564:Well-structured internal APIs for extension.
The new_get_quantized_state and_get_quantization_format methods provide clean extension points that allow subclasses (likeVllmFqGPTModelExporter invllm_fq_megatron.py) to override quantization behavior. The implementation correctly extracts weights, scales, and quantization format in a reusable manner.
285-286:LGTM - quantization format determination refactored correctly.
The quantization format is now obtained through the internal_get_quantization_format method, enabling subclasses to override this behavior for specialized export paths.
333-333:Condition simplified appropriately.
The condition now checksquantization is not None directly, which is cleaner than relying on a separate export flag. This aligns with the removal of theexport_vllm_fq_weights_qstate parameter.
594-594:Consistent usage of internal method.
All mapping functions now useself._get_quantized_state instead of the removed external function, maintaining consistency throughout the class.
modelopt/torch/export/__init__.py (1)
22-22:Plugin exports now accessible at the package level.
This addition exposes the new vLLM fakequant export functions (export_hf_vllm_fq_checkpoint,export_mcore_gpt_to_hf_vllm_fq) at themodelopt.torch.export namespace level, providing a convenient import path for consumers.
modelopt/torch/export/plugins/__init__.py (1)
24-27:Appropriate import guards for new plugin modules.
Thevllm_fq_hf module is imported directly since it only depends on torch, whilevllm_fq_megatron is correctly guarded withimport_plugin to handle its megatron dependencies gracefully. This follows the established pattern used formegatron_importer.
examples/vllm_serve/README.md (1)
60-63:Documentation updated to reflect new export APIs.
The README correctly directs users to the new dedicated export functions for HuggingFace and MCore models, aligning with the refactored code structure.
modelopt/torch/export/plugins/vllm_fq_hf.py (2)
47-51:Amax extraction logic is correct.
The dictionary comprehension properly extracts_amax values from the quantizer state dict, cloning and moving them to CPU for serialization.
60-62: Based on my verification, I have confirmed that thesave_modelopt_state parameter is fully supported and properly implemented:
Parameter Definition: The parameter is defined in the ModelOpt patch for HuggingFace models inmodelopt/torch/opt/plugins/huggingface.py (line 99), where it's extracted viakwargs.pop("save_modelopt_state", True).
Implementation: The patch correctly handles the parameter by:
Extracting it from kwargs with a default value ofTrue
Passing the remaining kwargs to the originalsave_pretrained method
Conditionally saving ModelOpt state based on the parameter value
Usage Pattern: The test file (test_vllm_fq_hf_export.py) confirms the expected usage:
Models are quantized viamtq.quantize() which applies the patches
Thenexport_hf_vllm_fq_checkpoint() is called on the patched model
This is consistent with other export functions in the codebase
Correctness: The use ofsave_modelopt_state=False in line 62 is appropriate because the quantizers have been removed from the model (lines 56-58), so there's no ModelOpt state to save.
The parameter is fully supported and the code is correct.
tests/gpu/torch/export/test_vllm_fq_hf_export.py (1)
26-93:LGTM! Comprehensive test coverage.
The test properly verifies:
Model quantization with FP8
Export to vLLM fakequant format
File existence checks (quant_amax.pth should exist, hf_quant_config.json should not)
Weight preservation across export/reload
Amax value extraction and storage
tests/gpu/torch/export/test_vllm_fq_megatron_export.py (2)
25-25:LGTM! Import updated correctly.
The import now uses the new dedicated vLLM fakequant export function for Megatron.
93-98:LGTM! Export call updated correctly.
The test now callsexport_mcore_gpt_to_hf_vllm_fq with the appropriate parameters, properly replacing the old conditional export path.
modelopt/torch/export/plugins/vllm_fq_megatron.py (3)
84-112:LGTM! Export function properly integrates with the exporter class.
Theexport_mcore_gpt_to_hf_vllm_fq function correctly instantiatesVllmFqGPTModelExporter and delegates to itssave_pretrained method. The parameter forwarding is complete and the docstring is clear.
80-81: <function_calls>
<invoke name="shell</invoke name="args">

#!/bin/bash
Check the gather function to confirm it handles quantization state
rg -n "def gather_mcore_vllm_fq_quantized_state_dict" -A 30 modelopt/torch/export/plugins/vllm_fq_megatron.py

</invoke_name>
</invoke_name>
</function_calls>
#!/bin/bash# Search for how amax values are handled in vllm fakequantrg"amax" modelopt/torch/export/plugins/vllm_fq_megatron.py -n -B3 -A3</parameter></invoke_name></invoke_name></function_calls>---`29-67`: I'll help you verify the distributed gather correctness concerns in this code. Let me search the codebase for similar patterns and context.<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for torch.distributed.gather_object usage patternsrg -n "gather_object" -B3 -A5</parameter></invoke></function_calls>Let me also search for amax-related patterns and distributed state dict handling:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for amax patterns and how they're handledrg -n"amax" -B3 -A3# Also search for similar state dict gathering patternsrg -n"state_dict" -C2| grep -A2 -B2"gather\|distributed\|rank"| head -100</parameter></invoke></function_calls>Let me searchfor the specific file and understand its context better:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# First, find the filefd"vllm_fq_megatron.py" -type f# Check the full file to see imports and contextif [-f"modelopt/torch/export/plugins/vllm_fq_megatron.py" ];then  wc -l"modelopt/torch/export/plugins/vllm_fq_megatron.py"fi</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Read the file to understand contextcat -n"modelopt/torch/export/plugins/vllm_fq_megatron.py"</parameter></invoke></function_calls>Nowlet me searchfor related quantization andexport code to understand the broader context:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look for related export code and state dict handlingfd -e py"export"| grep -E"(plugin|quant)"| head -20# Search for gather_mcore or similar distributed export functionsrg -n"def gather" --type py| head -20</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Check for usage of this function - who calls it?rg -n"gather_mcore_vllm_fq_quantized_state_dict" -B2 -A2# Look for torch.save patterns in the codebase to see how they typically handle distributed writesrg -n"torch.save" -B3 -A1| head -50</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for barrier usage patterns to understand synchronization strategyrg -n"distributed.barrier" -B5 -A3</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for how megatron or vllm handle distributed state dictsrg -n"megatron\|vllm" -i| grep -E"(gather|merge|state_dict)"| head -30</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at the immediate context around the function - is there caller code?fd"vllm_fq_megatron.py" -type f -exec cat -n {}\;</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for amax key naming patterns and quantization structurerg -n"_amax" -B2 -A2| head -80# Look for comments or documentation about distributed quantizationrg -n"quantized.*state\|amax.*dict" -i -B2 -A2| head -100</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for Megatron or vLLM quantization documentation/patternsfd -e py| xargs grep -l"fq\|fake.*quant" -i| head -10# Look at related export plugin filesfd"plugins" -type d -exec ls -la {}\;</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at the broader plugin directory structurels -la modelopt/torch/export/plugins/2>/dev/null|| ls -la modelopt/torch/| head -30</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Check what other plugins exist and how they handle state dictsfd -path"*export/plugins*" -type f -name"*.py"| head -20</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at similar distributed export patterns in the repositoryrg -n"gather_object" -A20| head -100# Also check for distributed.barrier() usage contextrg -n"barrier()" -B10 -A2| head -150</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at what the function returns - it filters out amax keyscat -n"modelopt/torch/export/plugins/vllm_fq_megatron.py"| tail -20# Check if there are any tests for this functionfd"test.*vllm_fq" -o -name"*test*" -path"*export*"| xargs grep -l"gather_mcore"2>/dev/null</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look for the complete function and what comes after line 67tail -n +29"modelopt/torch/export/plugins/vllm_fq_megatron.py"| head -80</parameter></invoke></function_calls>Based on my verification of the code and codebase patterns, here's my analysis:**Verify distributed gather correctness; key collision handling is unclear but likely safe in practice, barrier placement is correct.**The code correctly uses `torch.distributed.barrier()` after gather operations to ensure synchronization. However, three concerns warrant clarification:1. **Key collision handling**: The `dict.update()` silently overwrites duplicate keys. While amax keys likely include model parameter names (reducing collision risk across ranks in typical distributed setups), this could cause silent data loss if the model sharding strategy results in overlapping parameter ownership. Consider adding validation or merging logic that detects and handles collisions explicitly.2. **Barrier timing is correct**: The barrier is placed after both the gather operation and the `torch.save()` call on rank 0, ensuring non-rank-0 processes wait for the file write to complete before proceeding. This addresses the third concern adequately.3. **File write completeness**: The barrier provides the necessary synchronization for downstream operations that depend on the saved file, assuming the file system ensures durability by the barrier point.Recommend adding either: (a) an assertion or log statement to detect if merged keys differ significantly from individual rank keys, or (b) explicit handling if overlapping amax entries should be averaged/combined rather than overwritten.</blockquote></details><details><summary>modelopt/torch/export/unified_export_hf.py (1)</summary><blockquote>`582-582`: Based on my thorough verification, I can now provide the rewritten review comment.---**The claim about `export_vllm_fq_weights_qstate` parameter removal is incorrect.**The parameter `export_vllm_fq_weights_qstate` never existed in the `export_hf_checkpoint` API. The actual architecture implements vLLM functionality through a separate dedicated function `export_hf_vllm_fq_checkpoint` (in `modelopt/torch/export/plugins/vllm_fq_hf.py`), not as a parameter flag. Users requiring vLLM fakequant export call `export_hf_vllm_fq_checkpoint()` directly rather than passing a parameter to `export_hf_checkpoint()`. This is not a breaking API change.> Likely an incorrect or invalid review comment.</blockquote></details></blockquote></details></details>

modelopt/torch/export/plugins/vllm_fakequant_hf.pyShow resolvedHide resolved

modelopt/torch/export/plugins/vllm_fq_megatron.py OutdatedShow resolvedHide resolved

Copy link

codecovbot commentedNov 26, 2025•
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.64%. Comparing base (261858c) to head (947c872).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@##             main     #612      +/-   ##==========================================- Coverage   74.80%   74.64%   -0.17%==========================================  Files         183      183                Lines       18626    18542      -84     ==========================================- Hits        13933    13840      -93- Misses       4693     4702       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:

❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv reviewed

Nov 26, 2025

View reviewed changes

modelopt/torch/export/plugins/vllm_fakequant_hf.pyShow resolvedHide resolved

cjluo-nv reviewed

Nov 26, 2025

View reviewed changes

modelopt/torch/export/plugins/vllm_fakequant_megatron.pyShow resolvedHide resolved

cjluo-nv reviewed

Nov 26, 2025

View reviewed changes

tests/gpu/torch/export/test_vllm_fq_hf_export.py OutdatedShow resolvedHide resolved

modified the name to fakequant

9b85f1a

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

cjluo-nv approved these changes

Nov 26, 2025

View reviewed changes

minor

947c872

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

kinjalpatel27 merged commit263b2b7 intomain

Nov 27, 2025

27 checks passed

kinjalpatel27 deleted the kinjal/move_export_code branch

November 27, 2025 01:15

Labels

None yet

Movatterモバイル変換

Moved vllm fq export code to separate files#612

Moved vllm fq export code to separate files#612

Uh oh!

Conversation

kinjalpatel27 commentedNov 26, 2025• edited by coderabbitaibotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

kinjalpatel27 commentedNov 26, 2025

Uh oh!

coderabbitaibot commentedNov 26, 2025

Uh oh!

coderabbitaibot commentedNov 26, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitaibot left a comment

Choose a reason for hiding this comment

Check the gather function to confirm it handles quantization state

Uh oh!

Uh oh!

Uh oh!

codecovbot commentedNov 26, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kinjalpatel27 commentedNov 26, 2025•
edited by coderabbitaibot
Loading

codecovbot commentedNov 26, 2025•
edited
Loading