Commitf877823

authored

[#8781][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang (#8803)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

1 parentda73410 commitf877823Copy full SHA for f877823

File tree

1 file changed

+16

-3

lines changed

tensorrt_llm/_torch/auto_deploy/distributed
- trtllm.py

1 file changed

+16

-3

lines changed

`‎tensorrt_llm/_torch/auto_deploy/distributed/trtllm.py‎`

Lines changed: 16 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,6 +8,11 @@`
`8`	`8`	`from ...distributedimportAllReduce,allgather`
`9`	`9`	`from ...modules.linearimportAllReduceFusionOp,AllReduceParams,AllReduceStrategy`
`10`	`10`
	`11`	`+# Cache AllReduce modules to avoid recreating on every call`
	`12`	`+# This is critical for CUDA graph compatibility - recreating modules during`
	`13`	`+# warmup causes hangs due to workspace allocation with CPU synchronization`
	`14`	`+_allreduce_cache= {}`
	`15`	`+`
`11`	`16`	`deftrtllm_allgather(tensor,dim,sizes=None):`
`12`	`17`	`rank,world_size=get_rank_world_size()`
`13`	`18`	`p_config=Mapping(world_size=world_size,tp_size=world_size,rank=rank)`
`@@ -16,9 +21,17 @@ def trtllm_allgather(tensor, dim, sizes=None):`
`16`	`21`	`deftrtllm_allreduce(tensor,op,all_reduce_params=None):`
`17`	`22`	`rank,world_size=get_rank_world_size()`
`18`	`23`	`assertop==ReduceOp.SUM,"TRT-LLM all reduce only supports SUM op."`
`19`		`-p_config=Mapping(world_size=world_size,tp_size=world_size,rank=rank)`
`20`		`-# Use Strategy.NCCL until https://nvbugspro.nvidia.com/bug/5331013 is fixed, then change to Strategy.AUTO`
`21`		`-torch_op=AllReduce(mapping=p_config,strategy=AllReduceStrategy.NCCL)`
	`24`	`+`
	`25`	`+# Cache key includes rank, world_size, and dtype to handle different configurations`
	`26`	`+cache_key= (rank,world_size,tensor.dtype)`
	`27`	`+ifcache_keynotin_allreduce_cache:`
	`28`	`+p_config=Mapping(world_size=world_size,tp_size=world_size,rank=rank)`
	`29`	`+# Use Strategy.AUTO for optimal performance`
	`30`	`+_allreduce_cache[cache_key]=AllReduce(`
	`31`	`+mapping=p_config,strategy=AllReduceStrategy.AUTO,dtype=tensor.dtype`
	`32`	`+ )`
	`33`	`+`
	`34`	`+torch_op=_allreduce_cache[cache_key]`
`22`	`35`	`returntorch_op(tensor,all_reduce_params=all_reduce_params)`
`23`	`36`
`24`	`37`	`@torch.library.custom_op(`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitf877823

File tree

1 file changed

1 file changed

`‎tensorrt_llm/_torch/auto_deploy/distributed/trtllm.py‎`

0 commit comments