Rate this Page

torch.cuda.comm.reduce_add_coalesced#

torch.cuda.comm.reduce_add_coalesced(inputs,destination=None,buffer_size=10485760)[source]#

Sum tensors from multiple GPUs.

Small tensors are first coalesced into a buffer to reduce the numberof synchronizations.

Parameters
  • inputs (Iterable[Iterable[Tensor]]) – iterable of iterables thatcontain tensors from a single device.

  • destination (int,optional) – a device on which the output will beplaced (default: current device).

  • buffer_size (int) – maximum size of the buffer used for coalescing

Returns

A tuple of tensors containing an elementwise sum of each group ofinputs, placed on thedestination device.