Commit1606899

jaglinux

authored and

facebook-github-bot

committed

distributed_test: Map rank to GPU accordingly (#47898)

Summary:If world_size is lesser than or equal to number of GPU's availablethen the rank can be directly mapped to corresponding GPU.This fixes the issue referenced in#45435 and#47629For world_size = 3 and number of GPU's = 8, the rank to GPU mappingwill be 0,2,4. This is due to the introduction of barrier,(refer PR#45181)the tensors in barrier is mapped to cuda0,1,2 and the tensors in theactual test cases are mapped to cuda0,2,4 resulting in different streams andleading to timeout. This issue is specific to default process group.Issue is not observed in new process group since the streams are created againafter the initial barrier call.This patch maps the rank to corresponding GPU's when the world_size isless than or equal to the number of GPU's, in this case 0,1,2Note: The barrier function in distributed_c10d.py should include new parameterto specify the tensor or rank to GPU mapping. In that case, this patch will beredundant but harmless since the tests can specify the tensors with appropriateGPU rankings.Fixes#47629Pull Requestresolved:#47898Reviewed By: smessmerDifferential Revision: D24956021Pulled By: rohan-varmafbshipit-source-id: a88257f22a7991ba36566329766c106d3360bb4e

1 parent982ae98 commit1606899Copy full SHA for 1606899

File tree

1 file changed

-1

lines changed

torch/testing/_internal/distributed
- distributed_test.py

1 file changed

-1

lines changed

`‎torch/testing/_internal/distributed/distributed_test.py‎`

Lines changed: 5 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -364,7 +364,11 @@ def _init_multigpu_helper(self):`
`364`	`364`	`ifBACKEND=="nccl":`
`365`	`365`	`apply_hack_for_nccl()`
`366`	`366`
`367`		`-nGPUs_per_process=nGPUs//world_size`
	`367`	`+# If rank is lesser than or equal to number of available GPU's`
	`368`	`+# then each rank can be mapped to corresponding GPU.`
	`369`	`+nGPUs_per_process=1`
	`370`	`+ifworld_size>nGPUs:`
	`371`	`+nGPUs_per_process=nGPUs//world_size`
`368`	`372`	`rank_to_GPU= {`
`369`	`373`	`i:list(`
`370`	`374`	`visible_devices[inGPUs_per_process: (i+1)nGPUs_per_process]`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit1606899

File tree

1 file changed

1 file changed

`‎torch/testing/_internal/distributed/distributed_test.py‎`

0 commit comments