Movatterモバイル変換

Copy link

Contributor

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)

tests/integration/test_lists/qa/llm_function_full.txt (1)
603-605:Optional: expand coverage and watch runtime
Consider adding one multiturn case for a Qwen2-VL or NVILA model later for broader backend coverage, if runtime budget allows.
If these prove slow/flake in CI, annotate with TIMEOUT like other heavy tests in this file or place them in a nightly-focused list.
Happy to draft the additional test list entries or help profile runtime to set TIMEOUTs.
tests/integration/defs/test_e2e.py (1)
2600-2601:Avoid dumping full generated output to CI logs
Printing the entire model output can flood CI logs and isn’t needed given the detailed assertion. Remove or gate behind a debug flag.
-    print("output:", output)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between09038be andb2c460e.

📒 Files selected for processing (3)

examples/llm-api/quickstart_multimodal.py (3 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/test_e2e.py
examples/llm-api/quickstart_multimodal.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/test_e2e.py
examples/llm-api/quickstart_multimodal.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/defs/test_e2e.py

📚 Learning: 2025-08-06T13:58:07.506Z

Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/defs/test_e2e.py

🪛 Ruff (0.12.2)

tests/integration/defs/test_e2e.py

2608-2608: Line too long (240 > 120)

(E501)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

156-156: Do not use bareexcept

(E722)

🔇 Additional comments (5)

tests/integration/test_lists/qa/llm_function_full.txt (2)
603-605:LGTM: Multiturn VLM cases added in the right section with consistent IDs
Names and placement align with existing quickstart multimodal entries (e.g., Lines 600-602). Model IDs/paths look consistent with prior single-turn/2GPU variants.
603-605:Multiturn tests and CLI logic verified
I’ve confirmed that:
test_ptp_quickstart_multimodal_multiturn is defined intests/integration/defs/test_e2e.py and parametrizes exactly the three new models (gemma-3-27b-it,mistral-small-3.1-24b-instruct,Phi-4-multimodal-instruct).
The CLI inexamples/llm-api/quickstart_multimodal.py supports--multiturn and--conversation_turns and includes atorch.cuda.empty_cache() step.
No changes needed.
tests/integration/defs/test_e2e.py (2)
2571-2582:Consider explicitly passing conversation_turns to match the number of prompts
AI summary notes quickstart_multimodal.py adds a --conversation_turns option. If the script doesn’t infer turns from provided prompts reliably, explicitly set it to len(prompts) to avoid ambiguity.
Would the CLI require or benefit from:
f"--conversation_turns={len(accuracy_inputs['image']['prompt'])}"
2608-2610:Fix line length violation flagged by Ruff (E501)
Static analysis reports a too-long line here. The refactor above splits the zip args onto separate lines, resolving E501.
Likely an incorrect or invalid review comment.
examples/llm-api/quickstart_multimodal.py (1)
111-120:CLI flag looks good – consider clarifying the help text.
--conversation_turns controlsmaximum turns (it is clipped by prompt length).
Maybe say “Maximum number of …” to avoid confusion, but otherwise OK.

examples/llm-api/quickstart_multimodal.py OutdatedShow resolvedHide resolved

examples/llm-api/quickstart_multimodal.pyShow resolvedHide resolved

tests/integration/defs/test_e2e.pyShow resolvedHide resolved

crazydemo force-pushed theadd_multi-turn_test branch from6618032 to0e86043Compare

crazydemo marked this pull request as ready for review

crazydemo requested a review froma team as acode owner

crazydemo requested review fromQiJune andchzblych

crazydemo force-pushed theadd_multi-turn_test branch from0e86043 to42cee24Compare

August 8, 2025 09:47

Copy link

CollaboratorAuthor

crazydemo commentedAug 8, 2025

/bot run

crazydemo requested a review fromWanli-Jiang

August 8, 2025 09:47

Copy link

Collaborator

tensorrt-cicd commentedAug 8, 2025

PR_Github #14600 [ run ] triggered by Bot

Copy link

Collaborator

tensorrt-cicd commentedAug 8, 2025

PR_Github #14600 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #30 completed with status: 'FAILURE'

crazydemo force-pushed theadd_multi-turn_test branch 3 times, most recently from7cc9490 to302df88Compare

August 8, 2025 15:22

Copy link

CollaboratorAuthor

crazydemo commentedAug 8, 2025

/bot run

coderabbitaibot reviewed

Aug 8, 2025

Copy link

Contributor

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)

examples/llm-api/quickstart_multimodal.py (3)

150-158:Narrowexcept and punctuate the docstring (Ruff D415, E722).

Apply this diff:

-def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""-    try:-        import torch-        if torch.cuda.is_available():-            torch.cuda.empty_cache()-    except:-        pass+def clear_cuda_cache():+    """Clear CUDA cache to prevent memory issues."""+    try:+        import torch+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    try:+        if torch.cuda.is_available():+            torch.cuda.empty_cache()+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect CLI parameters instead of hard-coded loader defaults.

Apply this diff:

-                inputs = default_multimodal_input_loader(+                inputs = default_multimodal_input_loader(                     tokenizer=llm.tokenizer,                     model_dir=llm._hf_model_dir,                     model_type=model_type,                     modality=args.modality,                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=args.image_format,+                    num_frames=args.num_frames,+                    device=args.device)

193-239:Conversation history is not persisted between turns – context never grows.

Each iteration uses a freshargs.prompt[i]; the assistant reply appended tocur_prompt is discarded. Persist a running history so the model receives prior turns.

Apply this diff:

-        generated_outputs = []  # Store generated outputs for return+        generated_outputs = []  # Store generated outputs for return+        conversation_history = ""@@-                # Use multimodal input loader to process input with conversation context-                cur_prompt = args.prompt[i]+                # Use multimodal input loader to process input with conversation context+                cur_user = args.prompt[i]+                cur_prompt = (f"{conversation_history} user: {cur_user}"+                              if conversation_history else cur_user)@@-                generated_outputs.append({+                generated_outputs.append({                     "turn": i + 1,-                    "user_input": cur_prompt,+                    "user_input": cur_user,                     "assistant_response": response,                     "media": args.media                 })@@-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                conversation_history = f"{cur_prompt} assistant: {response}"

tests/integration/defs/test_e2e.py (1)

2601-2620:Harden matching, fix Ruff E501, and remove noisy prints in multi-turn check

Add an assertion that parsed outputs count equals number of prompts to catch parser/format drift early.
Make keyword matching robust and case-insensitive on both sides.
Remove debug prints that spam CI logs.
Split long lines (Line 2610, assert message) to satisfy E501.

-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)-    # Set match ratio based on model-    match_ratio = 4.0 / 5-    if model_name == "Phi-4-multimodal-instruct":-        match_ratio = 0.6--    # Check output accuracy-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )+    # Set match ratio based on model+    match_ratio = 4.0 / 5+    if model_name == "Phi-4-multimodal-instruct":+        match_ratio = 0.6++    # Check output accuracy+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )

🧹 Nitpick comments (2)

examples/llm-api/quickstart_multimodal.py (2)

111-119:Validate--conversation_turns to be >= 1.

Prevent silent no-op when negative by enforcing a positive integer at parse time.

Apply this diff:

-    parser.add_argument(-        "--conversation_turns",-        type=int,-        default=2,-        help="Number of conversation turns for automated testing.")+    parser.add_argument(+        "--conversation_turns",+        type=positive_int,+        default=2,+        help="Number of conversation turns for automated testing (>= 1).")

Add this helper (outside the shown range, e.g., above add_multimodal_args):

defpositive_int(value:str)->int:iv=int(value)ifiv<1:raiseargparse.ArgumentTypeError("conversation_turns must be >= 1")returniv

212-217:Redundantmodel_class is None check insideif args.load_lora.

model_class is guaranteed to be set in the earlierif args.load_lora: block; this branch cannot seemodel_class as None.

Apply this diff:

-                if args.load_lora:-                    if model_class is None:-                        raise ValueError(-                            "model_class must be provided when load_lora is True"-                        )-                    lora_request = model_class.lora_request(+                if args.load_lora:+                    lora_request = model_class.lora_request(                         len(inputs), args.modality, llm._hf_model_dir)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between7cc9490 and302df88.

📒 Files selected for processing (3)

examples/llm-api/quickstart_multimodal.py (3 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/test_lists/qa/llm_function_full.txt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/test_e2e.py
examples/llm-api/quickstart_multimodal.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/test_e2e.py
examples/llm-api/quickstart_multimodal.py

🧠 Learnings (5)

📓 Common learnings

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/defs/test_e2e.py

📚 Learning: 2025-08-06T13:58:07.506Z

Learnt from: galagamPR: NVIDIA/TensorRT-LLM#6487File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12Timestamp: 2025-08-06T13:58:07.506ZLearning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

tests/integration/defs/test_e2e.py

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidiaPR: NVIDIA/TensorRT-LLM#6506File: examples/models/core/mixtral/requirements.txt:3-3Timestamp: 2025-08-01T15:14:45.673ZLearning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tests/integration/defs/test_e2e.py

📚 Learning: 2025-08-06T21:22:55.018Z

Learnt from: CRPR: NVIDIA/TensorRT-LLM#0File: CODING_GUIDELINES.md:0-0Timestamp: 2025-08-06T21:22:55.018ZLearning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

examples/llm-api/quickstart_multimodal.py

🪛 Ruff (0.12.2)

tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

156-156: Do not use bareexcept

(E722)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

examples/llm-api/quickstart_multimodal.py (2)
253-253:No action needed.
Comment-only change for section separation.
199-206:default_multimodal_input_loader already handles 1-prompt -> N-media broadcasting.
Theif len(media) > len(prompts) and len(prompts) == 1 branch wraps your list of media into a single-element media list to match the single prompt, and the subsequentassert len(media_or_embeddings) == len(prompts) enforces correct pairing. No changes needed in the example for prompt/media length alignment.

examples/llm-api/quickstart_multimodal.py OutdatedShow resolvedHide resolved

tests/integration/defs/test_e2e.pyShow resolvedHide resolved

Copy link

Collaborator

tensorrt-cicd commentedAug 8, 2025

PR_Github #14620 [ run ] triggered by Bot

Copy link

Collaborator

tensorrt-cicd commentedAug 8, 2025

PR_Github #14620 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #33 completed with status: 'FAILURE'

crazydemo force-pushed theadd_multi-turn_test branch from302df88 to22670bfCompare

August 10, 2025 15:56

coderabbitaibot reviewed

Aug 10, 2025

Copy link

Contributor

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)

examples/llm-api/quickstart_multimodal.py (4)

150-158:Fix docstring punctuation and avoid bare except in clear_cuda_cache()

Align with Ruff D415/E722; narrow exceptions and log unexpected errors.

Apply:

 def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""+    """Clear CUDA cache to prevent memory issues."""     try:         import torch         if torch.cuda.is_available():             torch.cuda.empty_cache()-    except:-        pass+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect CLI args for image format, frames, and device

Hard-coded values ignore user input; pass through args to the loader.

Apply:

                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=args.image_format,+                    num_frames=args.num_frames,+                    device=args.device)

239-246:Always clear CUDA cache via finally

Errors skip clear_cuda_cache(); use finally to guarantee cleanup.

Apply:

-            except Exception as e:-                print(f"Error in turn {i+1}: {e}")-                import traceback-                traceback.print_exc()-                continue--            clear_cuda_cache()+            except Exception as e:+                print(f"Error in turn {i+1}: {e}")+                import traceback+                traceback.print_exc()+            finally:+                clear_cuda_cache()

236-238:Multi-turn: conversation context is not persisted to next turn

The updated prompt isn’t saved; next iteration reuses the original prompt. Persist it for true multi-turn behavior.

Apply:

-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                args.prompt[i] = f"{cur_prompt} assistant: {response}"

tests/integration/defs/test_e2e.py (2)

2573-2584:Ensure the test exercises all prompts by adding --conversation_turns

Avoid coupling to script defaults; make intent explicit.

Apply:

         "--modality",         "image",         "--multiturn",+        "--conversation_turns",+        str(len(accuracy_inputs["image"]["prompt"])),         "--prompt",

2601-2620:Harden checks, reduce log noise, and fix E501 with structured assertions

Parse once and assert counts.
Case-insensitive matching.
Remove verbose prints.
Break long assert to satisfy E501.

Apply:

-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )@@-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )

🧹 Nitpick comments (2)

examples/llm-api/quickstart_multimodal.py (2)

111-120:Make conversation_turns optional; default to number of provided prompts

Defaulting to 2 is surprising. Let users omit it and derive from len(prompts).

Apply:

-    parser.add_argument(-        "--conversation_turns",-        type=int,-        default=2,-        help="Number of conversation turns for automated testing.")+    parser.add_argument(+        "--conversation_turns",+        type=int,+        default=None,+        help="Number of conversation turns (default: use number of provided prompts).")

190-191:Derive max_turns when conversation_turns is omitted

Guard for None to run all provided prompts by default.

Apply:

-        max_turns = min(args.conversation_turns, len(args.prompt))+        max_turns = (len(args.prompt) if args.conversation_turns is None+                     else min(args.conversation_turns, len(args.prompt)))

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between302df88 and22670bf.

📒 Files selected for processing (4)

examples/llm-api/quickstart_multimodal.py (3 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (6 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/test_lists/qa/llm_function_full.txt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/accuracy/test_llm_api_pytorch.py
examples/llm-api/quickstart_multimodal.py
tests/integration/defs/test_e2e.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/accuracy/test_llm_api_pytorch.py
examples/llm-api/quickstart_multimodal.py
tests/integration/defs/test_e2e.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/defs/test_e2e.py

🪛 Ruff (0.12.2)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

156-156: Do not use bareexcept

(E722)

tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (6)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (6)
506-507:Migration to CudaGraphConfig looks correct
Using cuda_graph_config=CudaGraphConfig() if cuda_graph else None is the right replacement for the deprecated boolean flag. No issues spotted.
530-531:LGTM: consistent CudaGraphConfig usage in chunked prefill test
Same pattern applied consistently. Good.
651-652:LGTM: FP8 chunked prefill with CudaGraphConfig
Change aligns with new API; parameters remain unchanged.
675-676:LGTM: FP4 chunked prefill with CudaGraphConfig
Correct and consistent with other tests.
1036-1037:LGTM: Conditional CUDA graph config in CUTEDSL test
Conditional construction via CudaGraphConfig() is correct.
1194-1195:LGTM: Conditional CUDA graph config in 4-GPU CUTEDSL test
Matches the new API across the suite.

Copy link

CollaboratorAuthor

crazydemo commentedAug 10, 2025

/bot run

Copy link

Collaborator

tensorrt-cicd commentedAug 10, 2025

PR_Github #14714 [ run ] triggered by Bot

Copy link

Collaborator

tensorrt-cicd commentedAug 10, 2025

PR_Github #14714 [ run ] completed with stateSUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #48 completed with status: 'SUCCESS'

crazydemo force-pushed theadd_multi-turn_test branch from5e7b3ee toeaf5a2dCompare

August 11, 2025 04:42

Copy link

CollaboratorAuthor

crazydemo commentedAug 11, 2025

/bot reuse-pipeline

Copy link

Collaborator

tensorrt-cicd commentedAug 11, 2025

PR_Github #14748 [ reuse-pipeline ] triggered by Bot

coderabbitaibot reviewed

Aug 11, 2025

Copy link

Contributor

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)

examples/llm-api/quickstart_multimodal.py (4)

150-158:Fix bare except and finish the docstring sentence.

Ruff flags D415 and E722. Limit exceptions and add a period.

-def clear_cuda_cache():-    """Clear CUDA cache to prevent memory issues"""-    try:-        import torch-        if torch.cuda.is_available():-            torch.cuda.empty_cache()-    except:-        pass+def clear_cuda_cache():+    """Clear CUDA cache to prevent memory issues."""+    try:+        import torch+    except ImportError:+        # Torch not installed – nothing to clear.+        return+    try:+        if torch.cuda.is_available():+            torch.cuda.empty_cache()+    except Exception as err:+        # Log once, but don't kill the caller.+        print(f"[warn] Failed to clear CUDA cache: {err}")

196-209:Respect user CLI args instead of hard-coded defaults.

Useargs.image_format,args.num_frames, andargs.device to match single-turn behavior and user expectations.

-                inputs = default_multimodal_input_loader(+                inputs = default_multimodal_input_loader(                     tokenizer=llm.tokenizer,                     model_dir=llm._hf_model_dir,                     model_type=model_type,                     modality=args.modality,                     prompts=[cur_prompt],                     media=args.media,-                    image_data_format="pt",-                    num_frames=8,-                    device="cpu")+                    image_data_format=image_format,+                    num_frames=args.num_frames,+                    device=args.device)

239-246:Always clear CUDA cache using finally.

On exceptions, thecontinue skipsclear_cuda_cache(). Use afinally block.

-            except Exception as e:-                print(f"Error in turn {i+1}: {e}")-                import traceback-                traceback.print_exc()-                continue--            clear_cuda_cache()+            except Exception as e:+                print(f"Error in turn {i+1}: {e}")+                import traceback+                traceback.print_exc()+            finally:+                clear_cuda_cache()

236-238:Multi-turn context is not persisted — conversation state never grows.

You buildcur_prompt but don’t save it, so the next turn reloads the original prompt. Persist it.

-                # Add to cur_prompt-                cur_prompt = cur_prompt + response+                # Persist conversation context for next turn+                args.prompt[i] = f"{cur_prompt} assistant: {response}"

Alternative: maintain ahistory list per sample and concatenate before calling the loader for each turn.

tests/integration/defs/test_e2e.py (2)

2573-2584:Make the number of turns explicit to avoid silent truncation.

--conversation_turns defaults to 2; if more prompts are passed later, only two turns would run. Set it to the number of provided prompts for clarity and robustness.

         "--multiturn",+        "--conversation_turns",+        str(len(accuracy_inputs["image"]["prompt"])),         "--prompt",

2609-2620:Harden matching (case-insensitive), remove noisy prints, and fix E501 long line.

This reduces flakiness, cleans logs, and complies with line length rules.

-    for prompt_output, prompt_keywords in zip(-            parse_output(output), expected_keywords[model_name]["image"]):-        matches = [-            keyword in prompt_output.lower() for keyword in prompt_keywords-        ]-        obs_match_ratio = 1. * sum(matches) / len(matches)-        print("prompt_output:", prompt_output)-        print("prompt_keywords:", prompt_keywords)-        print("matches:", matches)-        print("obs_match_ratio:", obs_match_ratio)-        assert obs_match_ratio >= match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+    for prompt_output, prompt_keywords in zip(outputs, expected):+        matches = [keyword.lower() in prompt_output.lower()+                   for keyword in prompt_keywords]+        obs_match_ratio = 1.0 * sum(matches) / len(matches)+        assert obs_match_ratio >= match_ratio, (+            f"Incorrect output!\nGenerated \"{prompt_output}\"\n"+            f"Expected keywords \"{prompt_keywords}\"\n"+            f"Matched keywords: {matches}\n"+            f"Observed match ratio {obs_match_ratio} below threshold {match_ratio}"+        )

🧹 Nitpick comments (1)

tests/integration/defs/test_e2e.py (1)
2602-2602:Reduce CI log noise — drop raw output print and pre-parse instead.
Printing the entire raw output is noisy. Parse once and assert; include truncated raw text only in failure messages if needed.
-    output = llm_venv.run_cmd(cmd, caller=check_output)-    print("output:", output)+    output = llm_venv.run_cmd(cmd, caller=check_output)+    outputs = parse_output(output)+    expected = expected_keywords[model_name]["image"]+    assert len(outputs) == len(expected), (+        f"Expected {len(expected)} outputs, got {len(outputs)}. "+        f"Raw output (truncated): {output[:1000]}..."+    )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between22670bf andeaf5a2d.

📒 Files selected for processing (5)

examples/llm-api/quickstart_multimodal.py (3 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (6 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/defs/accuracy/test_llm_api_pytorch.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/accuracy/test_disaggregated_serving.py
examples/llm-api/quickstart_multimodal.py
tests/integration/defs/test_e2e.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/accuracy/test_disaggregated_serving.py
examples/llm-api/quickstart_multimodal.py
tests/integration/defs/test_e2e.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxuPR: NVIDIA/TensorRT-LLM#6303File: tests/integration/test_lists/qa/examples_test_list.txt:494-494Timestamp: 2025-07-28T17:06:08.621ZLearning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/defs/test_e2e.py

🧬 Code Graph Analysis (2)

examples/llm-api/quickstart_multimodal.py (3)

tensorrt_llm/llmapi/llm.py (4)
prompt (79-80)
tokenizer (691-695)
tokenizer (698-699)
generate (235-313)
tensorrt_llm/inputs/utils.py (1)
default_multimodal_input_loader (475-635)
tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
lora_request (267-288)

tests/integration/defs/test_e2e.py (4)

tests/integration/defs/conftest.py (2)
llm_models_root (77-83)
llm_root (180-181)
tests/integration/defs/triton_server/conftest.py (1)
llm_models_root (16-25)
tests/integration/defs/common.py (1)
parse_output (665-681)
tests/integration/defs/examples/test_llama.py (1)
parse_output (129-139)

🪛 Ruff (0.12.2)

examples/llm-api/quickstart_multimodal.py

151-151: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

156-156: Do not use bareexcept

(E722)

tests/integration/defs/test_e2e.py

2610-2610: Line too long (240 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
624-624:Appropriate Hopper-gating for FP8 model — add an explicit reason if supported.
Adding@skip_pre_hopper totest_auto_dtype for Qwen3-8B-FP8 is correct (FP8 requires Hopper+). Ifskip_pre_hopper supports a reason message, consider adding one for clarity in CI logs.
examples/llm-api/quickstart_multimodal.py (1)
111-119:New CLI flags look good.
--multiturn and--conversation_turns are sensible and scoped. No issues.

Copy link

Collaborator

tensorrt-cicd commentedAug 11, 2025

PR_Github #14748 [ reuse-pipeline ] completed with stateSUCCESS
ReusingPR_Github #14714 for commiteaf5a2d

crazydemo force-pushed theadd_multi-turn_test branch fromeaf5a2d to0555625Compare

August 11, 2025 06:24

Copy link

Collaborator

tensorrt-cicd commentedAug 13, 2025

PR_Github #15056 [ reuse-pipeline ] completed with stateSUCCESS
ReusingPR_Github #14714 for commit9a298af

StanleySun639enabled auto-merge (squash)

August 13, 2025 04:40

LarryXFly approved these changes

Aug 13, 2025

[TRTLLM-7205][feat] add llama4 tp4 tests#6989

LarryXFlydisabled auto-merge

August 13, 2025 05:10

LarryXFly merged commit15bcf80 intoNVIDIA:release/1.0

Aug 13, 2025

5 checks passed

coderabbitaibot mentioned this pull request

Aug 18, 2025

Merged

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

265ff5d

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

4d9a29e

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

coderabbitaibot mentioned this pull request

[None][chore] Mass integration of release/1.0 - 2nd#7171

Merged

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

8afdde2

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

Aug 23, 2025

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

658b7be

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

Aug 24, 2025

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

3b0fb9b

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

Aug 25, 2025

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

824bc03

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

Aug 26, 2025

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

9274731

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

5b5953a

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

b5dd4a5

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

b5f28a2

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

1454ddc

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

cd6f566

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

0d519f0

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

59692df

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

coderabbitaibot mentioned this pull request

[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case#7338

Merged

1 task

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

ab15014

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request

[TRTLLM-6975][test] Add multi-turn test cases for VLM models (NVIDIA#…

02ac360

…6749)Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request