Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[TRTLLM-6975][test] Add multi-turn test cases for VLM models#6749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
LarryXFly merged 12 commits intoNVIDIA:release/1.0fromcrazydemo:add_multi-turn_test
Aug 13, 2025

Conversation

@crazydemo
Copy link
Collaborator

@crazydemocrazydemo commentedAug 8, 2025
edited by coderabbitaibot
Loading

Summary by CodeRabbit

  • New Features

    • Multi-turn multimodal conversation mode with CLI options to enable and configure turn count and media-driven turns.
  • Tests

    • Added multimodal multi-turn integration tests and test-list entries for multiple models; extended model coverage and adjusted quickstart entry point. Duplicate test definition accidentally introduced. Updated accuracy reference expectations for select models. Added resource- and skip-based gating for several tests.
  • Chores

    • Migrated CUDA-graph handling in tests to a config-based approach.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run/bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: DoesNOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: DoesNOT update GitHub check status.

--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: DoesNOT update GitHub check status.

--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: DoesNOT update GitHub pipeline status.

--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: DoesNOT update GitHub check status.

--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: DoesNOT update GitHub check status.

--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug(OPTIONAL) :Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list parameter to access the appropriate container environment. Note: DoesNOT update GitHub check status.

For guidance on mapping tests to stage names, seedocs/source/reference/ci-overview.md
and thescripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request.--comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitaibot commentedAug 8, 2025
edited
Loading

📝 Walkthrough

Walkthrough

Adds multiturn multimodal conversation handling to examples/llm-api/quickstart_multimodal.py, introduces multimodal multiturn integration tests and test-list entries, migrates CUDA-graph booleans to a cuda_graph_config object across accuracy tests, tightens resource gating in disaggregated serving tests, and adds FP8 accuracy variants to reference YAMLs. (≤50 words)

Changes

Cohort / File(s)Change summary
Quickstart multimodal example
examples/llm-api/quickstart_multimodal.py
Added--multiturn and--conversation_turns CLI flags and implemented a per-turn multimodal loop that loads inputs viadefault_multimodal_input_loader (num_frames=8, image_data_format="pt", device="cpu"), optionally builds a LoRA request, callsllm.generate, captures per-turn metadata, and maintains runningconversation_history. Single-turn path unchanged when multiturn disabled.
End-to-end integration tests
tests/integration/defs/test_e2e.py
Switched someexample_root values toexamples/llm-api; addedtest_ptp_quickstart_multimodal_multiturn exercising multimodal multiturn flows with per-model overrides and image inputs (models: gemma-3-27b-it, mistral-small-3.1-24b-instruct, Phi-4-multimodal-instruct). Note: the new test function was duplicated in the file.
Integration test lists
tests/integration/test_lists/qa/llm_function_full.txt
Added three new multiturn multimodal test entries referencingtest_ptp_quickstart_multimodal_multiturn for gemma, mistral, and Phi-4 multimodal models.
Accuracy tests — CUDA-graph migration
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Replaced booleanuse_cuda_graph withcuda_graph_config (eitherCudaGraphConfig() orNone) across tests and LLM initializations; added@pytest.mark.skip_less_device_memory(80000) to one test.
Disaggregated serving tests
tests/integration/defs/accuracy/test_disaggregated_serving.py
Replaced failing resource-check (pytest.fail) withpytest.skip; added@pytest.mark.skip_less_device(4) and@skip_pre_hopper gating on several classes/tests to gate runs by device/architecture.
Accuracy references
tests/integration/defs/accuracy/references/gsm8k.yaml,.../mmlu.yaml
Added FP8 / kv_cache FP8 entries (and an FP8 + spec_dec_algo: Eagle variant) under meta-llama/Llama-4-Maverick-17B-128E-Instruct with specified accuracy values.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User
participant Quickstart as quickstart_multimodal.py
participant Loader as default_multimodal_input_loader
participant LLM as LLM
participant LoRA as LoRA (optional)

User->>Quickstart: run with --multiturn + prompts + medialoop for each turn  Quickstart->>Loader: load inputs (conversation_history, media)  alt LoRA enabled    Quickstart->>LoRA: build LoRA request  end  Quickstart->>LLM: llm.generate(inputs, sampling_params, lora_request)  LLM-->>Quickstart: assistant response  Quickstart->>Quickstart: append response (and next user prompt if present) to conversation_historyendQuickstart-->>User: print per-turn results and exit

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

Speculative Decoding,Disaggregated Serving,CI

Suggested reviewers

  • brb-nv
  • Shixiaowei02
  • litaotju
  • QiJune
  • amukkara
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat withCodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag@coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag@coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on oursupport page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type@coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add@coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add@coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add@coderabbitai or@coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit ourStatus Page to check the current availability of CodeRabbit.
  • Visit ourDocumentation for detailed information on how to use CodeRabbit.
  • Join ourDiscord Community to get help, request features, and share feedback.
  • Follow us onX/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tests/integration/test_lists/qa/llm_function_full.txt (1)

603-605:Optional: expand coverage and watch runtime

  • Consider adding one multiturn case for a Qwen2-VL or NVILA model later for broader backend coverage, if runtime budget allows.
  • If these prove slow/flake in CI, annotate with TIMEOUT like other heavy tests in this file or place them in a nightly-focused list.

Happy to draft the additional test list entries or help profile runtime to set TIMEOUTs.

tests/integration/defs/test_e2e.py (1)

2600-2601:Avoid dumping full generated output to CI logs

Printing the entire model output can flood CI logs and isn’t needed given the detailed assertion. Remove or gate behind a debug flag.

-    print("output:", output)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between09038be andb2c460e.

📒 Files selected for processing (3)
  • examples/llm-api/quickstart_multimodal.py (3 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/test_e2e.py
  • examples/llm-api/quickstart_multimodal.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/test_e2e.py
  • examples/llm-api/quickstart_multimodal.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/test_e2e.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/test_e2e.py
🪛 Ruff (0.12.2)
tests/integration/defs/test_e2e.py

2608-2608: Line too long (240 > 120)

(E501)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


156-156: Do not use bareexcept

(E722)

🔇 Additional comments (5)
tests/integration/test_lists/qa/llm_function_full.txt (2)

603-605:LGTM: Multiturn VLM cases added in the right section with consistent IDs

Names and placement align with existing quickstart multimodal entries (e.g., Lines 600-602). Model IDs/paths look consistent with prior single-turn/2GPU variants.


603-605:Multiturn tests and CLI logic verified
I’ve confirmed that:

  • test_ptp_quickstart_multimodal_multiturn is defined intests/integration/defs/test_e2e.py and parametrizes exactly the three new models (gemma-3-27b-it,mistral-small-3.1-24b-instruct,Phi-4-multimodal-instruct).
  • The CLI inexamples/llm-api/quickstart_multimodal.py supports--multiturn and--conversation_turns and includes atorch.cuda.empty_cache() step.

No changes needed.

tests/integration/defs/test_e2e.py (2)

2571-2582:Consider explicitly passing conversation_turns to match the number of prompts

AI summary notes quickstart_multimodal.py adds a --conversation_turns option. If the script doesn’t infer turns from provided prompts reliably, explicitly set it to len(prompts) to avoid ambiguity.

Would the CLI require or benefit from:

  • f"--conversation_turns={len(accuracy_inputs['image']['prompt'])}"

2608-2610:Fix line length violation flagged by Ruff (E501)

Static analysis reports a too-long line here. The refactor above splits the zip args onto separate lines, resolving E501.

Likely an incorrect or invalid review comment.

examples/llm-api/quickstart_multimodal.py (1)

111-120:CLI flag looks good – consider clarifying the help text.

--conversation_turns controlsmaximum turns (it is clipped by prompt length).
Maybe say “Maximum number of …” to avoid confusion, but otherwise OK.

@crazydemocrazydemo marked this pull request as ready for reviewAugust 8, 2025 09:46
@crazydemocrazydemo requested a review froma team as acode ownerAugust 8, 2025 09:46
@crazydemo
Copy link
CollaboratorAuthor

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14600 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14600 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #30 completed with status: 'FAILURE'

@crazydemocrazydemoforce-pushed theadd_multi-turn_test branch 3 times, most recently from7cc9490 to302df88CompareAugust 8, 2025 15:22
@crazydemo
Copy link
CollaboratorAuthor

/bot run

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
examples/llm-api/quickstart_multimodal.py (3)

150-158:Narrowexcept and punctuate the docstring (Ruff D415, E722).

Apply this diff:

-def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""-    try:-        import torch-        if torch.cuda.is_available():-            torch.cuda.empty_cache()-    except:-        pass+def clear_cuda_cache():+    """Clear CUDA cache to prevent memory issues."""+    try:+        import torch+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    try:+        if torch.cuda.is_available():+            torch.cuda.empty_cache()+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect CLI parameters instead of hard-coded loader defaults.

Apply this diff:

-                inputs = default_multimodal_input_loader(+                inputs = default_multimodal_input_loader(                     tokenizer=llm.tokenizer,                     model_dir=llm._hf_model_dir,                     model_type=model_type,                     modality=args.modality,                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=args.image_format,+                    num_frames=args.num_frames,+                    device=args.device)

193-239:Conversation history is not persisted between turns – context never grows.

Each iteration uses a freshargs.prompt[i]; the assistant reply appended tocur_prompt is discarded. Persist a running history so the model receives prior turns.

Apply this diff:

-        generated_outputs = []  # Store generated outputs for return+        generated_outputs = []  # Store generated outputs for return+        conversation_history = ""@@-                # Use multimodal input loader to process input with conversation context-                cur_prompt = args.prompt[i]+                # Use multimodal input loader to process input with conversation context+                cur_user = args.prompt[i]+                cur_prompt = (f"{conversation_history} user: {cur_user}"+                              if conversation_history else cur_user)@@-                generated_outputs.append({+                generated_outputs.append({                     "turn": i + 1,-                    "user_input": cur_prompt,+                    "user_input": cur_user,                     "assistant_response": response,                     "media": args.media                 })@@-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                conversation_history = f"{cur_prompt} assistant: {response}"
tests/integration/defs/test_e2e.py (1)

2601-2620:Harden matching, fix Ruff E501, and remove noisy prints in multi-turn check

  • Add an assertion that parsed outputs count equals number of prompts to catch parser/format drift early.
  • Make keyword matching robust and case-insensitive on both sides.
  • Remove debug prints that spam CI logs.
  • Split long lines (Line 2610, assert message) to satisfy E501.
-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)-    # Set match ratio based on model-    match_ratio = 4.0 / 5-    if model_name == "Phi-4-multimodal-instruct":-        match_ratio = 0.6--    # Check output accuracy-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )+    # Set match ratio based on model+    match_ratio = 4.0 / 5+    if model_name == "Phi-4-multimodal-instruct":+        match_ratio = 0.6++    # Check output accuracy+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )
🧹 Nitpick comments (2)
examples/llm-api/quickstart_multimodal.py (2)

111-119:Validate--conversation_turns to be >= 1.

Prevent silent no-op when negative by enforcing a positive integer at parse time.

Apply this diff:

-    parser.add_argument(-        "--conversation_turns",-        type=int,-        default=2,-        help="Number of conversation turns for automated testing.")+    parser.add_argument(+        "--conversation_turns",+        type=positive_int,+        default=2,+        help="Number of conversation turns for automated testing (>= 1).")

Add this helper (outside the shown range, e.g., above add_multimodal_args):

defpositive_int(value:str)->int:iv=int(value)ifiv<1:raiseargparse.ArgumentTypeError("conversation_turns must be >= 1")returniv

212-217:Redundantmodel_class is None check insideif args.load_lora.

model_class is guaranteed to be set in the earlierif args.load_lora: block; this branch cannot seemodel_class as None.

Apply this diff:

-                if args.load_lora:-                    if model_class is None:-                        raise ValueError(-                            "model_class must be provided when load_lora is True"-                        )-                    lora_request = model_class.lora_request(+                if args.load_lora:+                    lora_request = model_class.lora_request(                         len(inputs), args.modality, llm._hf_model_dir)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between7cc9490 and302df88.

📒 Files selected for processing (3)
  • examples/llm-api/quickstart_multimodal.py (3 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/test_lists/qa/llm_function_full.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/test_e2e.py
  • examples/llm-api/quickstart_multimodal.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/test_e2e.py
  • examples/llm-api/quickstart_multimodal.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/test_e2e.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/test_e2e.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidiaPR: NVIDIA/TensorRT-LLM#6506File: examples/models/core/mixtral/requirements.txt:3-3Timestamp: 2025-08-01T15:14:45.673ZLearning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/test_e2e.py
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CRPR: NVIDIA/TensorRT-LLM#0File: CODING_GUIDELINES.md:0-0Timestamp: 2025-08-06T21:22:55.018ZLearning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/llm-api/quickstart_multimodal.py
🪛 Ruff (0.12.2)
tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


156-156: Do not use bareexcept

(E722)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
examples/llm-api/quickstart_multimodal.py (2)

253-253:No action needed.

Comment-only change for section separation.


199-206:default_multimodal_input_loader already handles 1-prompt -> N-media broadcasting.
Theif len(media) > len(prompts) and len(prompts) == 1 branch wraps your list of media into a single-element media list to match the single prompt, and the subsequentassert len(media_or_embeddings) == len(prompts) enforces correct pairing. No changes needed in the example for prompt/media length alignment.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14620 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14620 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #33 completed with status: 'FAILURE'

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/llm-api/quickstart_multimodal.py (4)

150-158:Fix docstring punctuation and avoid bare except in clear_cuda_cache()

Align with Ruff D415/E722; narrow exceptions and log unexpected errors.

Apply:

 def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""+    """Clear CUDA cache to prevent memory issues."""     try:         import torch         if torch.cuda.is_available():             torch.cuda.empty_cache()-    except:-        pass+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect CLI args for image format, frames, and device

Hard-coded values ignore user input; pass through args to the loader.

Apply:

                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=args.image_format,+                    num_frames=args.num_frames,+                    device=args.device)

239-246:Always clear CUDA cache via finally

Errors skip clear_cuda_cache(); use finally to guarantee cleanup.

Apply:

-            except Exception as e:-                print(f"Error in turn {i+1}: {e}")-                import traceback-                traceback.print_exc()-                continue--            clear_cuda_cache()+            except Exception as e:+                print(f"Error in turn {i+1}: {e}")+                import traceback+                traceback.print_exc()+            finally:+                clear_cuda_cache()

236-238:Multi-turn: conversation context is not persisted to next turn

The updated prompt isn’t saved; next iteration reuses the original prompt. Persist it for true multi-turn behavior.

Apply:

-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                args.prompt[i] = f"{cur_prompt} assistant: {response}"
tests/integration/defs/test_e2e.py (2)

2573-2584:Ensure the test exercises all prompts by adding --conversation_turns

Avoid coupling to script defaults; make intent explicit.

Apply:

         "--modality",         "image",         "--multiturn",+        "--conversation_turns",+        str(len(accuracy_inputs["image"]["prompt"])),         "--prompt",

2601-2620:Harden checks, reduce log noise, and fix E501 with structured assertions

  • Parse once and assert counts.
  • Case-insensitive matching.
  • Remove verbose prints.
  • Break long assert to satisfy E501.

Apply:

-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )@@-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )
🧹 Nitpick comments (2)
examples/llm-api/quickstart_multimodal.py (2)

111-120:Make conversation_turns optional; default to number of provided prompts

Defaulting to 2 is surprising. Let users omit it and derive from len(prompts).

Apply:

-    parser.add_argument(-        "--conversation_turns",-        type=int,-        default=2,-        help="Number of conversation turns for automated testing.")+    parser.add_argument(+        "--conversation_turns",+        type=int,+        default=None,+        help="Number of conversation turns (default: use number of provided prompts).")

190-191:Derive max_turns when conversation_turns is omitted

Guard for None to run all provided prompts by default.

Apply:

-        max_turns = min(args.conversation_turns, len(args.prompt))+        max_turns = (len(args.prompt) if args.conversation_turns is None+                     else min(args.conversation_turns, len(args.prompt)))
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between302df88 and22670bf.

📒 Files selected for processing (4)
  • examples/llm-api/quickstart_multimodal.py (3 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (6 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/test_lists/qa/llm_function_full.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • examples/llm-api/quickstart_multimodal.py
  • tests/integration/defs/test_e2e.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • examples/llm-api/quickstart_multimodal.py
  • tests/integration/defs/test_e2e.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/defs/test_e2e.py
🪛 Ruff (0.12.2)
examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


156-156: Do not use bareexcept

(E722)

tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (6)

506-507:Migration to CudaGraphConfig looks correct

Using cuda_graph_config=CudaGraphConfig() if cuda_graph else None is the right replacement for the deprecated boolean flag. No issues spotted.


530-531:LGTM: consistent CudaGraphConfig usage in chunked prefill test

Same pattern applied consistently. Good.


651-652:LGTM: FP8 chunked prefill with CudaGraphConfig

Change aligns with new API; parameters remain unchanged.


675-676:LGTM: FP4 chunked prefill with CudaGraphConfig

Correct and consistent with other tests.


1036-1037:LGTM: Conditional CUDA graph config in CUTEDSL test

Conditional construction via CudaGraphConfig() is correct.


1194-1195:LGTM: Conditional CUDA graph config in 4-GPU CUTEDSL test

Matches the new API across the suite.

@crazydemo
Copy link
CollaboratorAuthor

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14714 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14714 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #48 completed with status: 'SUCCESS'

@crazydemo
Copy link
CollaboratorAuthor

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14748 [ reuse-pipeline ] triggered by Bot

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/llm-api/quickstart_multimodal.py (4)

150-158:Fix bare except and finish the docstring sentence.

Ruff flags D415 and E722. Limit exceptions and add a period.

-def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""-    try:-        import torch-        if torch.cuda.is_available():-            torch.cuda.empty_cache()-    except:-        pass+def clear_cuda_cache():+    """Clear CUDA cache to prevent memory issues."""+    try:+        import torch+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    try:+        if torch.cuda.is_available():+            torch.cuda.empty_cache()+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect user CLI args instead of hard-coded defaults.

Useargs.image_format,args.num_frames, andargs.device to match single-turn behavior and user expectations.

-                inputs = default_multimodal_input_loader(+                inputs = default_multimodal_input_loader(                     tokenizer=llm.tokenizer,                     model_dir=llm._hf_model_dir,                     model_type=model_type,                     modality=args.modality,                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=image_format,+                    num_frames=args.num_frames,+                    device=args.device)

239-246:Always clear CUDA cache using finally.

On exceptions, thecontinue skipsclear_cuda_cache(). Use afinally block.

-            except Exception as e:-                print(f"Error in turn {i+1}: {e}")-                import traceback-                traceback.print_exc()-                continue--            clear_cuda_cache()+            except Exception as e:+                print(f"Error in turn {i+1}: {e}")+                import traceback+                traceback.print_exc()+            finally:+                clear_cuda_cache()

236-238:Multi-turn context is not persisted — conversation state never grows.

You buildcur_prompt but don’t save it, so the next turn reloads the original prompt. Persist it.

-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                args.prompt[i] = f"{cur_prompt} assistant: {response}"

Alternative: maintain ahistory list per sample and concatenate before calling the loader for each turn.

tests/integration/defs/test_e2e.py (2)

2573-2584:Make the number of turns explicit to avoid silent truncation.

--conversation_turns defaults to 2; if more prompts are passed later, only two turns would run. Set it to the number of provided prompts for clarity and robustness.

         "--multiturn",+        "--conversation_turns",+        str(len(accuracy_inputs["image"]["prompt"])),         "--prompt",

2609-2620:Harden matching (case-insensitive), remove noisy prints, and fix E501 long line.

This reduces flakiness, cleans logs, and complies with line length rules.

-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )
🧹 Nitpick comments (1)
tests/integration/defs/test_e2e.py (1)

2602-2602:Reduce CI log noise — drop raw output print and pre-parse instead.

Printing the entire raw output is noisy. Parse once and assert; include truncated raw text only in failure messages if needed.

-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between22670bf andeaf5a2d.

📒 Files selected for processing (5)
  • examples/llm-api/quickstart_multimodal.py (3 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (6 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • examples/llm-api/quickstart_multimodal.py
  • tests/integration/defs/test_e2e.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • examples/llm-api/quickstart_multimodal.py
  • tests/integration/defs/test_e2e.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/test_e2e.py
🧬 Code Graph Analysis (2)
examples/llm-api/quickstart_multimodal.py (3)
tensorrt_llm/llmapi/llm.py (4)
  • prompt (79-80)
  • tokenizer (691-695)
  • tokenizer (698-699)
  • generate (235-313)
tensorrt_llm/inputs/utils.py (1)
  • default_multimodal_input_loader (475-635)
tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
  • lora_request (267-288)
tests/integration/defs/test_e2e.py (4)
tests/integration/defs/conftest.py (2)
  • llm_models_root (77-83)
  • llm_root (180-181)
tests/integration/defs/triton_server/conftest.py (1)
  • llm_models_root (16-25)
tests/integration/defs/common.py (1)
  • parse_output (665-681)
tests/integration/defs/examples/test_llama.py (1)
  • parse_output (129-139)
🪛 Ruff (0.12.2)
examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


156-156: Do not use bareexcept

(E722)

tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

624-624:Appropriate Hopper-gating for FP8 model — add an explicit reason if supported.

Adding@skip_pre_hopper totest_auto_dtype for Qwen3-8B-FP8 is correct (FP8 requires Hopper+). Ifskip_pre_hopper supports a reason message, consider adding one for clarity in CI logs.

examples/llm-api/quickstart_multimodal.py (1)

111-119:New CLI flags look good.

--multiturn and--conversation_turns are sensible and scoped. No issues.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14748 [ reuse-pipeline ] completed with stateSUCCESS
ReusingPR_Github #14714 for commiteaf5a2d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15056 [ reuse-pipeline ] completed with stateSUCCESS
ReusingPR_Github #14714 for commit9a298af

@StanleySun639StanleySun639enabled auto-merge (squash)August 13, 2025 04:40
@LarryXFlyLarryXFly merged commit15bcf80 intoNVIDIA:release/1.0Aug 13, 2025
5 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 22, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 22, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 22, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 23, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 24, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 25, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 26, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 27, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 27, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 27, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 27, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 28, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 29, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 29, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 29, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull requestAug 30, 2025
…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
joyang-nv pushed a commit that referenced this pull requestSep 1, 2025
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@coderabbitaicoderabbitai[bot]coderabbitai[bot] left review comments

@tijyojwadtijyojwadtijyojwad approved these changes

@StanleySun639StanleySun639StanleySun639 approved these changes

@LarryXFlyLarryXFlyLarryXFly approved these changes

@chzblychchzblychAwaiting requested review from chzblychchzblych was automatically assigned from NVIDIA/trt-llm-doc-owners

@QiJuneQiJuneAwaiting requested review from QiJuneQiJune was automatically assigned from NVIDIA/trt-llm-doc-owners

@Wanli-JiangWanli-JiangAwaiting requested review from Wanli-Jiang

@xinhe-nvxinhe-nvAwaiting requested review from xinhe-nv

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@crazydemo@tensorrt-cicd@tijyojwad@StanleySun639@LarryXFly

[8]ページ先頭

©2009-2025 Movatter.jp