Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Moved vllm fq export code to separate files#612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kinjalpatel27 merged 3 commits intomainfromkinjal/move_export_code
Nov 27, 2025

Conversation

@kinjalpatel27
Copy link
Contributor

@kinjalpatel27kinjalpatel27 commentedNov 26, 2025
edited by coderabbitaibot
Loading

What does this PR do?

Type of change: : Bug fix

Overview:
moved vLLM fakequant checkpoint export code to separate files:

  1. for HF export -> modelopt.torch.export.plugins.vllm_fq_hf
  2. for megatron export -> modelopt.torch.export.plugins.vllm_fq_megatron

Usage

Refer to
README.md

Testing

  • Tested HF approach by exporting bf16 model using QAT script and running vllm server, verified amax values match
  • Tested MCore approach by quantizing and exporting bf16 model using quantize.sh and export.sh script and running vllm server, verified amax values match
  • Tested using unit tests intests/gpu/torch/export/test_vllm_fq_hf_export.py andtests/gpu/torch/export/test_vllm_fq_megatron_export.py

Before your PR is "Ready for review"

  • Make sure you read and followContributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: NA
  • Did you add or update any necessary documentation?: Yes
  • Did you updateChangelog?: NA

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features

    • Added dedicated export functions for vLLM fakequant checkpoint format, supporting both HuggingFace and Megatron Core models.
  • Refactor

    • Simplified export API by removing conditional export flags for cleaner, more predictable behavior.
    • Reorganized export functionality into focused plugin modules for improved maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
@kinjalpatel27
Copy link
ContributorAuthor

@coderabbitai review

coderabbitai[bot] reacted with eyes emoji

@coderabbitai
Copy link
Contributor

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

Walkthrough

This pull request refactors vLLM fakequant export functionality by splitting the monolithicvllm_fakequant.py module into separate HuggingFace and Megatron-specific export modules, removing the publicexport_vllm_fq_weights_qstate parameter from unified export APIs, and introducing internal quantization state retrieval methods toGPTModelExporter.

Changes

Cohort / File(s)Summary
Documentation Updates
examples/vllm_serve/README.md
Updated Step 1 export instructions to reference new vLLM-specific export functions:export_hf_vllm_fq_checkpoint andexport_mcore_gpt_to_hf_vllm_fq instead of generic unified export paths.
Package Initialization
modelopt/torch/export/__init__.py,modelopt/torch/export/plugins/__init__.py
Added imports to expose new vLLM fakequant export plugins (vllm_fq_hf andvllm_fq_megatron) at the package level throughfrom .plugins import * in the export module and module imports in the plugins package.
vLLM HuggingFace Export
modelopt/torch/export/plugins/vllm_fq_hf.py
New module introducingexport_hf_vllm_fq_checkpoint() to export quantized HuggingFace models to vLLM fakequant checkpoint format, extracting amax values, stripping quantizer attributes, and saving to quant_amax.pth.
vLLM Megatron Export
modelopt/torch/export/plugins/vllm_fq_megatron.py
New module introducingexport_mcore_gpt_to_hf_vllm_fq() andVllmFqGPTModelExporter class for distributed Megatron-Core GPT export with per-rank amax state gathering and synchronization across ranks.
Legacy Module Removal
modelopt/torch/export/plugins/vllm_fakequant.py
Removed entire module containingexport_hf_vllm_fq_checkpoint(),get_mcore_vllm_fq_quantized_state(), andgather_mcore_vllm_fq_quantized_state_dict() functions, as functionality is now split into dedicated modules.
Unified HuggingFace Export
modelopt/torch/export/unified_export_hf.py
Removedexport_vllm_fq_weights_qstate parameter and associated conditional export path;export_hf_checkpoint() now follows standard HF export flow unconditionally.
Unified Megatron Export
modelopt/torch/export/unified_export_megatron.py
Removedexport_vllm_fq_weights_qstate parameter fromGPTModelExporter.__init__() andexport_mcore_gpt_to_hf(); added internal methods_get_quantized_state() and_get_quantization_format() to encapsulate quantization state retrieval; removed dependency on legacy vllm_fakequant helpers.
vLLM HuggingFace Tests
tests/gpu/torch/export/test_vllm_fq_hf_export.py
New test file validating end-to-end export of quantized HuggingFace models viaexport_hf_vllm_fq_checkpoint(), verifying amax file creation, weight preservation, and quantization state consistency.
vLLM Megatron Tests
tests/gpu/torch/export/test_vllm_fq_megatron_export.py
Updated test to use newexport_mcore_gpt_to_hf_vllm_fq() function, removed HuggingFace-specific test flow and unused imports related to legacy export path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Primary concern:modelopt/torch/export/unified_export_megatron.py contains the most complex refactoring with internal method extraction, parameter removal, and control-flow changes involving quantization state retrieval across multiple helper functions. Verify that_get_quantized_state() and_get_quantization_format() correctly replace all usages of the removed legacy functions.
  • Secondary concern:modelopt/torch/export/plugins/vllm_fq_megatron.py introduces distributed rank-coordination logic withgather_mcore_vllm_fq_quantized_state_dict() and custom exporter class; review barrier synchronization, rank-0 aggregation, and compatibility constraints.
  • Note: Homogeneous parameter removals acrossunified_export_hf.py and legacy module deletion reduce review complexity despite file count.

Poem

🐰Hop hop, refactor's done!
Old fakequant melts to two—
HF and Megatron now shine,
Each export path shines bright and new.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check nameStatusExplanationResolution
Docstring Coverage⚠️ WarningDocstring coverage is 59.09% which is insufficient. The required threshold is 80.00%.You can run@coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check nameStatusExplanation
Description Check✅ PassedCheck skipped - CodeRabbit’s high-level summary is enabled.
Title check✅ PassedThe title accurately summarizes the primary refactoring: moving vLLM fakequant export code into separate dedicated modules (vllm_fq_hf and vllm_fq_megatron).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branchkinjal/move_export_code

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates thehigh-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using thehigh_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Usehigh_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment@coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
modelopt/torch/export/plugins/vllm_fq_hf.py (1)

28-43:Consider documenting the in-place model mutation.

The function modifies the model in place by removing quantizer attributes. Users may not expect this side effect. Consider adding a note in the docstring.

     """Exports the torch model weights and amax values separately.     This function:     1. Extracts amax values for calibration     2. Deletes all quantizer parameters from state dict to store only weights in original dtype     3. Saves the model weights+    Note:+        This function modifies the model in place by removing quantizer attributes.+        The model should not be used for inference after calling this function.+     Args:         model: The quantized model to export         export_dir: Directory to save the amax values     """
tests/gpu/torch/export/test_vllm_fq_hf_export.py (1)

91-93:Minor: Consider more explicit assertion message for key comparison.

Whileamax_dict.keys() == amax_state_dict.keys() works correctly, the assertion message could be more specific about what keys are present vs. expected to aid debugging.

Consider this diff for clearer failure messages:

-    assert amax_dict.keys() == amax_state_dict.keys(), (-        "amax keys mismatch between before and after export"-    )+    assert set(amax_dict.keys()) == set(amax_state_dict.keys()), (+        f"amax keys mismatch: exported has {len(amax_dict)} keys, "+        f"expected {len(amax_state_dict)} keys. "+        f"Missing: {set(amax_state_dict.keys()) - set(amax_dict.keys())}, "+        f"Extra: {set(amax_dict.keys()) - set(amax_state_dict.keys())}"+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between0a4f0a8 andf4281c1.

📒 Files selected for processing (10)
  • examples/vllm_serve/README.md (1 hunks)
  • modelopt/torch/export/__init__.py (1 hunks)
  • modelopt/torch/export/plugins/__init__.py (1 hunks)
  • modelopt/torch/export/plugins/vllm_fakequant.py (0 hunks)
  • modelopt/torch/export/plugins/vllm_fq_hf.py (1 hunks)
  • modelopt/torch/export/plugins/vllm_fq_megatron.py (1 hunks)
  • modelopt/torch/export/unified_export_hf.py (1 hunks)
  • modelopt/torch/export/unified_export_megatron.py (8 hunks)
  • tests/gpu/torch/export/test_vllm_fq_hf_export.py (1 hunks)
  • tests/gpu/torch/export/test_vllm_fq_megatron_export.py (2 hunks)
💤 Files with no reviewable changes (1)
  • modelopt/torch/export/plugins/vllm_fakequant.py
🧰 Additional context used
🧬 Code graph analysis (5)
tests/gpu/torch/export/test_vllm_fq_hf_export.py (2)
tests/_test_utils/torch/transformers_models.py (1)
  • create_tiny_llama_dir (121-135)
modelopt/torch/export/plugins/vllm_fq_hf.py (1)
  • export_hf_vllm_fq_checkpoint (28-62)
tests/gpu/torch/export/test_vllm_fq_megatron_export.py (1)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
  • export_mcore_gpt_to_hf_vllm_fq (84-112)
modelopt/torch/export/plugins/vllm_fq_hf.py (4)
modelopt/torch/export/layer_utils.py (1)
  • is_quantlinear (346-348)
modelopt/torch/quantization/utils.py (1)
  • get_quantizer_state_dict (492-502)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
  • save_pretrained (71-78)
modelopt/torch/export/unified_export_megatron.py (2)
  • save_pretrained (263-464)
  • state_dict (467-471)
modelopt/torch/export/unified_export_megatron.py (3)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
  • _get_quantization_format (80-81)
modelopt/torch/export/quant_utils.py (5)
  • get_weight_block_size (412-429)
  • get_weight_scaling_factor (251-293)
  • get_weight_scaling_factor_2 (296-320)
  • get_activation_scaling_factor (234-248)
  • get_quantization_format (432-533)
modelopt/torch/export/model_config.py (2)
  • weight (145-150)
  • bias (153-163)
modelopt/torch/export/plugins/vllm_fq_megatron.py (1)
modelopt/torch/export/unified_export_megatron.py (4)
  • GPTModelExporter (115-1210)
  • state_dict (467-471)
  • save_pretrained (263-464)
  • _get_quantization_format (563-564)
🪛 GitHub Actions: Code Quality
modelopt/torch/export/plugins/vllm_fq_megatron.py

[error] 71-71: E501 Line too long (129 > 120). Ruff: Line too long in def signature.


[error] 75-75: E501 Line too long (141 > 120). Ruff: Line too long in multi-line string or statement.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
🔇 Additional comments (16)
modelopt/torch/export/unified_export_megatron.py (4)

507-564:Well-structured internal APIs for extension.

The new_get_quantized_state and_get_quantization_format methods provide clean extension points that allow subclasses (likeVllmFqGPTModelExporter invllm_fq_megatron.py) to override quantization behavior. The implementation correctly extracts weights, scales, and quantization format in a reusable manner.


285-286:LGTM - quantization format determination refactored correctly.

The quantization format is now obtained through the internal_get_quantization_format method, enabling subclasses to override this behavior for specialized export paths.


333-333:Condition simplified appropriately.

The condition now checksquantization is not None directly, which is cleaner than relying on a separate export flag. This aligns with the removal of theexport_vllm_fq_weights_qstate parameter.


594-594:Consistent usage of internal method.

All mapping functions now useself._get_quantized_state instead of the removed external function, maintaining consistency throughout the class.

modelopt/torch/export/__init__.py (1)

22-22:Plugin exports now accessible at the package level.

This addition exposes the new vLLM fakequant export functions (export_hf_vllm_fq_checkpoint,export_mcore_gpt_to_hf_vllm_fq) at themodelopt.torch.export namespace level, providing a convenient import path for consumers.

modelopt/torch/export/plugins/__init__.py (1)

24-27:Appropriate import guards for new plugin modules.

Thevllm_fq_hf module is imported directly since it only depends on torch, whilevllm_fq_megatron is correctly guarded withimport_plugin to handle its megatron dependencies gracefully. This follows the established pattern used formegatron_importer.

examples/vllm_serve/README.md (1)

60-63:Documentation updated to reflect new export APIs.

The README correctly directs users to the new dedicated export functions for HuggingFace and MCore models, aligning with the refactored code structure.

modelopt/torch/export/plugins/vllm_fq_hf.py (2)

47-51:Amax extraction logic is correct.

The dictionary comprehension properly extracts_amax values from the quantizer state dict, cloning and moving them to CPU for serialization.


60-62: Based on my verification, I have confirmed that thesave_modelopt_state parameter is fully supported and properly implemented:

  1. Parameter Definition: The parameter is defined in the ModelOpt patch for HuggingFace models inmodelopt/torch/opt/plugins/huggingface.py (line 99), where it's extracted viakwargs.pop("save_modelopt_state", True).

  2. Implementation: The patch correctly handles the parameter by:

    • Extracting it from kwargs with a default value ofTrue
    • Passing the remaining kwargs to the originalsave_pretrained method
    • Conditionally saving ModelOpt state based on the parameter value
  3. Usage Pattern: The test file (test_vllm_fq_hf_export.py) confirms the expected usage:

    • Models are quantized viamtq.quantize() which applies the patches
    • Thenexport_hf_vllm_fq_checkpoint() is called on the patched model
    • This is consistent with other export functions in the codebase
  4. Correctness: The use ofsave_modelopt_state=False in line 62 is appropriate because the quantizers have been removed from the model (lines 56-58), so there's no ModelOpt state to save.

The parameter is fully supported and the code is correct.

tests/gpu/torch/export/test_vllm_fq_hf_export.py (1)

26-93:LGTM! Comprehensive test coverage.

The test properly verifies:

  • Model quantization with FP8
  • Export to vLLM fakequant format
  • File existence checks (quant_amax.pth should exist, hf_quant_config.json should not)
  • Weight preservation across export/reload
  • Amax value extraction and storage
tests/gpu/torch/export/test_vllm_fq_megatron_export.py (2)

25-25:LGTM! Import updated correctly.

The import now uses the new dedicated vLLM fakequant export function for Megatron.


93-98:LGTM! Export call updated correctly.

The test now callsexport_mcore_gpt_to_hf_vllm_fq with the appropriate parameters, properly replacing the old conditional export path.

modelopt/torch/export/plugins/vllm_fq_megatron.py (3)

84-112:LGTM! Export function properly integrates with the exporter class.

Theexport_mcore_gpt_to_hf_vllm_fq function correctly instantiatesVllmFqGPTModelExporter and delegates to itssave_pretrained method. The parameter forwarding is complete and the docstring is clear.


80-81: <function_calls>
<invoke name="shell</invoke name="args">

#!/bin/bash

Check the gather function to confirm it handles quantization state

rg -n "def gather_mcore_vllm_fq_quantized_state_dict" -A 30 modelopt/torch/export/plugins/vllm_fq_megatron.py

</invoke_name>
</invoke_name>
</function_calls>

#!/bin/bash# Search for how amax values are handled in vllm fakequantrg"amax" modelopt/torch/export/plugins/vllm_fq_megatron.py -n -B3 -A3</parameter></invoke_name></invoke_name></function_calls>---`29-67`: I'll help you verify the distributed gather correctness concerns in this code. Let me search the codebase for similar patterns and context.<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for torch.distributed.gather_object usage patternsrg -n "gather_object" -B3 -A5</parameter></invoke></function_calls>Let me also search for amax-related patterns and distributed state dict handling:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for amax patterns and how they're handledrg -n"amax" -B3 -A3# Also search for similar state dict gathering patternsrg -n"state_dict" -C2| grep -A2 -B2"gather\|distributed\|rank"| head -100</parameter></invoke></function_calls>Let me searchfor the specific file and understand its context better:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# First, find the filefd"vllm_fq_megatron.py" -type f# Check the full file to see imports and contextif [-f"modelopt/torch/export/plugins/vllm_fq_megatron.py" ];then  wc -l"modelopt/torch/export/plugins/vllm_fq_megatron.py"fi</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Read the file to understand contextcat -n"modelopt/torch/export/plugins/vllm_fq_megatron.py"</parameter></invoke></function_calls>Nowlet me searchfor related quantization andexport code to understand the broader context:<function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look for related export code and state dict handlingfd -e py"export"| grep -E"(plugin|quant)"| head -20# Search for gather_mcore or similar distributed export functionsrg -n"def gather" --type py| head -20</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Check for usage of this function - who calls it?rg -n"gather_mcore_vllm_fq_quantized_state_dict" -B2 -A2# Look for torch.save patterns in the codebase to see how they typically handle distributed writesrg -n"torch.save" -B3 -A1| head -50</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for barrier usage patterns to understand synchronization strategyrg -n"distributed.barrier" -B5 -A3</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for how megatron or vllm handle distributed state dictsrg -n"megatron\|vllm" -i| grep -E"(gather|merge|state_dict)"| head -30</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at the immediate context around the function - is there caller code?fd"vllm_fq_megatron.py" -type f -exec cat -n {}\;</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for amax key naming patterns and quantization structurerg -n"_amax" -B2 -A2| head -80# Look for comments or documentation about distributed quantizationrg -n"quantized.*state\|amax.*dict" -i -B2 -A2| head -100</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Search for Megatron or vLLM quantization documentation/patternsfd -e py| xargs grep -l"fq\|fake.*quant" -i| head -10# Look at related export plugin filesfd"plugins" -type d -exec ls -la {}\;</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at the broader plugin directory structurels -la modelopt/torch/export/plugins/2>/dev/null|| ls -la modelopt/torch/| head -30</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Check what other plugins exist and how they handle state dictsfd -path"*export/plugins*" -type f -name"*.py"| head -20</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at similar distributed export patterns in the repositoryrg -n"gather_object" -A20| head -100# Also check for distributed.barrier() usage contextrg -n"barrier()" -B10 -A2| head -150</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look at what the function returns - it filters out amax keyscat -n"modelopt/torch/export/plugins/vllm_fq_megatron.py"| tail -20# Check if there are any tests for this functionfd"test.*vllm_fq" -o -name"*test*" -path"*export*"| xargs grep -l"gather_mcore"2>/dev/null</parameter></invoke></function_calls><function_calls><invoke name="shell"><parameter name="code">#!/bin/bash# Look for the complete function and what comes after line 67tail -n +29"modelopt/torch/export/plugins/vllm_fq_megatron.py"| head -80</parameter></invoke></function_calls>Based on my verification of the code and codebase patterns, here's my analysis:**Verify distributed gather correctness; key collision handling is unclear but likely safe in practice, barrier placement is correct.**The code correctly uses `torch.distributed.barrier()` after gather operations to ensure synchronization. However, three concerns warrant clarification:1. **Key collision handling**: The `dict.update()` silently overwrites duplicate keys. While amax keys likely include model parameter names (reducing collision risk across ranks in typical distributed setups), this could cause silent data loss if the model sharding strategy results in overlapping parameter ownership. Consider adding validation or merging logic that detects and handles collisions explicitly.2. **Barrier timing is correct**: The barrier is placed after both the gather operation and the `torch.save()` call on rank 0, ensuring non-rank-0 processes wait for the file write to complete before proceeding. This addresses the third concern adequately.3. **File write completeness**: The barrier provides the necessary synchronization for downstream operations that depend on the saved file, assuming the file system ensures durability by the barrier point.Recommend adding either: (a) an assertion or log statement to detect if merged keys differ significantly from individual rank keys, or (b) explicit handling if overlapping amax entries should be averaged/combined rather than overwritten.</blockquote></details><details><summary>modelopt/torch/export/unified_export_hf.py (1)</summary><blockquote>`582-582`: Based on my thorough verification, I can now provide the rewritten review comment.---**The claim about `export_vllm_fq_weights_qstate` parameter removal is incorrect.**The parameter `export_vllm_fq_weights_qstate` never existed in the `export_hf_checkpoint` API. The actual architecture implements vLLM functionality through a separate dedicated function `export_hf_vllm_fq_checkpoint` (in `modelopt/torch/export/plugins/vllm_fq_hf.py`), not as a parameter flag. Users requiring vLLM fakequant export call `export_hf_vllm_fq_checkpoint()` directly rather than passing a parameter to `export_hf_checkpoint()`. This is not a breaking API change.> Likely an incorrect or invalid review comment.</blockquote></details></blockquote></details></details><!-- This is an auto-generated comment by CodeRabbit for review status -->

@codecov
Copy link

codecovbot commentedNov 26, 2025
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.64%. Comparing base (261858c) to head (947c872).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@##             main     #612      +/-   ##==========================================- Coverage   74.80%   74.64%   -0.17%==========================================  Files         183      183                Lines       18626    18542      -84     ==========================================- Hits        13933    13840      -93- Misses       4693     4702       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
@kinjalpatel27kinjalpatel27 merged commit263b2b7 intomainNov 27, 2025
27 checks passed
@kinjalpatel27kinjalpatel27 deleted the kinjal/move_export_code branchNovember 27, 2025 01:15
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@coderabbitaicoderabbitai[bot]coderabbitai[bot] left review comments

@cjluo-nvcjluo-nvcjluo-nv approved these changes

@sugunav14sugunav14Awaiting requested review from sugunav14sugunav14 is a code owner automatically assigned from NVIDIA/modelopt-examples-llm_ptq-codeowners

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@kinjalpatel27@cjluo-nv

[8]ページ先頭

©2009-2025 Movatter.jp