- Notifications
You must be signed in to change notification settings - Fork1.1k
Open
Labels
Description
Question
I want to achieve Tree All-Reduce across intra-node GPUs. However, even after settingNCCL_ALGO=allreduce:tree, the reduction still appears to happen sequentially. Can we force intra-node All-Reduce to use a tree-structured topology?
Example log:
[2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
[1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
[3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2
[0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1