Commitbdd4170

achartier

authored and

yufeiwu-nv

committed

[None][fix] Disable DeepGEMM for Qwen3 MoE Attention layers (NVIDIA#8087)

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

1 parent76415e6 commitbdd4170Copy full SHA for bdd4170

File tree

2 files changed

-0

lines changed

tensorrt_llm/_torch/models
- modeling_qwen3.py
- modeling_qwen3_moe.py

2 files changed

-0

lines changed

`‎tensorrt_llm/_torch/models/modeling_qwen3.py‎`

Lines changed: 2 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -34,6 +34,7 @@ def __init__(`
`34`	`34`	`fuse_qk_norm_rope:bool=True,`
`35`	`35`	`attn_output_gate:bool=False,`
`36`	`36`	`use_gemma_rms_norm:bool=False,`
	`37`	`+disable_deep_gemm:bool=False,`
`37`	`38`	`):`
`38`	`39`	`config=model_config.pretrained_config`
`39`	`40`	`self.pretrained_config=config`
`@@ -71,6 +72,7 @@ def __init__(`
`71`	`72`	`config=model_config,`
`72`	`73`	`attn_output_gate=self.attn_output_gate,`
`73`	`74`	`use_gemma_rms_norm=use_gemma_rms_norm,`
	`75`	`+disable_deep_gemm=disable_deep_gemm,`
`74`	`76`	`)`
`75`	`77`
`76`	`78`

`‎tensorrt_llm/_torch/models/modeling_qwen3_moe.py‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -167,6 +167,7 @@ def __init__(self, model_config: ModelConfig[Qwen3MoeConfig],`
`167`	`167`	`self.self_attn=Qwen3Attention(`
`168`	`168`	`model_config,`
`169`	`169`	`layer_idx=layer_idx,`
	`170`	`+disable_deep_gemm=True,`
`170`	`171`	`)`
`171`	`172`	`self.mapping=model_config.mapping`
`172`	`173`	`self.enable_attention_dp=self.mapping.enable_attention_dp`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitbdd4170

File tree

2 files changed

2 files changed

`‎tensorrt_llm/_torch/models/modeling_qwen3.py‎`

`‎tensorrt_llm/_torch/models/modeling_qwen3_moe.py‎`

0 commit comments