Rate this Page
★★★★★
torch.cuda.comm.reduce_add_coalesced#
- torch.cuda.comm.reduce_add_coalesced(inputs,destination=None,buffer_size=10485760)[source]#
Sum tensors from multiple GPUs.
Small tensors are first coalesced into a buffer to reduce the numberof synchronizations.
- Parameters
- Returns
A tuple of tensors containing an elementwise sum of each group ofinputs, placed on the
destinationdevice.
On this page