Commit8030b54

authored

[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access (#7845)

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

1 parentfbe325c commit8030b54Copy full SHA for 8030b54

File tree

3 files changed

+13

-2

lines changed

cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention
- decoderMaskedMultiheadAttentionTemplate.h
tests/integration/test_lists
- test-db
  - l0_dgx_b200.yml
- waives.txt

3 files changed

+13

-2

lines changed

`‎cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionTemplate.h‎`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1363,7 +1363,8 @@ __global__ void __launch_bounds__(MAX_THEADS_PER_BLOCK, MIN_BLOCKS_PER_SM) maske`
`1363`	`1363`	`#ifndef MMHA_USE_FP32_ACCUM_FOR_LOGITS`
`1364`	`1364`	`if (sizeof(Tk) !=4)`
`1365`	`1365`	`{`
`1366`		`-autoconst max_timesteps =min(timestep,static_cast<unsigned>(cyclic_kv_cache_len));`
	`1366`	`+autoconst max_timesteps`
	`1367`	`+ =min(timestep,min(chunked_attention_size,static_cast<unsigned>(cyclic_kv_cache_len)));`
`1367`	`1368`	`logits_smem_ +=divUp(max_timesteps +1,4u) *16;`
`1368`	`1369`	`}`
`1369`	`1370`	`Tk* logits_smem =reinterpret_cast<Tk*>(logits_smem_);`

`‎tests/integration/test_lists/test-db/l0_dgx_b200.yml‎`

Lines changed: 11 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -39,7 +39,18 @@ l0_dgx_b200:`
`39`	`39`	`-accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[tep4_latency_moe_trtllm-torch_compile=True]`
`40`	`40`	`-accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[dep4_latency_moe_trtllm-torch_compile=False]`
`41`	`41`	`-accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[dep4_latency_moe_cutlass-torch_compile=False]`
	`42`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp8-cuda_graph=False]`
	`43`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp8ep4-cuda_graph=True]`
	`44`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp8ep8-cuda_graph=True]`
	`45`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp4-cuda_graph=False]`
	`46`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp4ep2-cuda_graph=True]`
	`47`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_auto_dtype[tp4ep4-cuda_graph=True]`
	`48`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp8ep8-cuda_graph=True]`
	`49`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp4-cuda_graph=True]`
	`50`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp4[tp8ep8-cuda_graph=True]`
`42`	`51`	`-accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp4[tp4-cuda_graph=True]`
	`52`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8_chunked_prefill[tp4ep4-cuda_graph=True]`
	`53`	`+ -accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp4_chunked_prefill[tp4ep4-cuda_graph=True]`
`43`	`54`	`-disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_ucx[DeepSeek-V3-Lite-fp8]`
`44`	`55`	`-accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[tp4-trtllm-auto]`
`45`	`56`	`-accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[ep4-cutlass-auto]`

`‎tests/integration/test_lists/waives.txt‎`

Lines changed: 0 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -334,7 +334,6 @@ full:H100/accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_fp8`
`334`	`334`	`full:H100/accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_fp8_eagle3[tp8-torch_compile=True] SKIP (https://nvbugs/5483534)`
`335`	`335`	`full:A100/test_e2e.py::test_ptp_quickstart_multimodal[NVILA-8B-FP16-vila/NVILA-8B-video-False] SKIP (https://nvbugs/5453725)`
`336`	`336`	`test_e2e.py::test_ptp_scaffolding[DeepSeek-R1-Distill-Qwen-7B-DeepSeek-R1/DeepSeek-R1-Distill-Qwen-7B] SKIP (https://nvbugs/5517260)`
`337`		`-accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8_chunked_prefill[tp4ep4-cuda_graph=True] SKIP (https://nvbugs/5522462)`
`338`	`337`	`accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[ep4-mtp_nextn=2-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] SKIP (https://nvbugs/5522746)`
`339`	`338`	`accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTLASS-mtp_nextn=0-tp2pp2-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False] SKIP (https://nvbugs/5522746)`
`340`	`339`	`test_e2e.py::test_ptp_quickstart_multimodal[NVILA-8B-FP16-vila/NVILA-8B-image-False] SKIP (https://nvbugs/5523925)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit8030b54

File tree

3 files changed

3 files changed

`‎cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionTemplate.h‎`

`‎tests/integration/test_lists/test-db/l0_dgx_b200.yml‎`

`‎tests/integration/test_lists/waives.txt‎`

0 commit comments