Rate this Page

Experimental Object Oriented Distributed API#

Created On: Jul 09, 2025 | Last Updated On: Jul 30, 2025

This is an experimental new API for PyTorch Distributed. This is actively in development and subject to change or deletion entirely.

This is intended as a proving ground for more flexible and object oriented distributed APIs.

classtorch.distributed._dist2.ProcessGroup#

Bases:pybind11_object

A ProcessGroup is a communication primitive that allows forcollective operations across a group of processes.

This is a base class that provides the interface for allProcessGroups. It is not meant to be used directly, but ratherextended by subclasses.

classBackendType#

Bases:pybind11_object

The type of the backend used for the process group.

Members:

UNDEFINED

GLOO

NCCL

XCCL

UCC

MPI

CUSTOM

CUSTOM=<BackendType.CUSTOM:6>#
GLOO=<BackendType.GLOO:1>#
MPI=<BackendType.MPI:4>#
NCCL=<BackendType.NCCL:2>#
UCC=<BackendType.UCC:3>#
UNDEFINED=<BackendType.UNDEFINED:0>#
XCCL=<BackendType.XCCL:5>#
propertyname#
propertyvalue#
CUSTOM=<BackendType.CUSTOM:6>#
GLOO=<BackendType.GLOO:1>#
MPI=<BackendType.MPI:4>#
NCCL=<BackendType.NCCL:2>#
UCC=<BackendType.UCC:3>#
UNDEFINED=<BackendType.UNDEFINED:0>#
XCCL=<BackendType.XCCL:5>#
abort(self:torch._C._distributed_c10d.ProcessGroup)None#

abort all operations and connections if supported by the backend

allgather(*args,**kwargs)#

Overloaded function.

  1. allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fad5a376870>) -> c10d::Work

Allgathers the input tensors from all processes across the process group.

Seetorch.distributed.all_gather() for more details.

  1. allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, timeout: datetime.timedelta | None = None) -> c10d::Work

Allgathers the input tensors from all processes across the process group.

Seetorch.distributed.all_gather() for more details.

allgather_coalesced(self:torch._C._distributed_c10d.ProcessGroup,output_lists:collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]],input_list:collections.abc.Sequence[torch.Tensor],opts:torch._C._distributed_c10d.AllgatherOptions=<torch._C._distributed_c10d.AllgatherOptionsobjectat0x7fad59f0ec70>)c10d::Work#

Allgathers the input tensors from all processes across the process group.

Seetorch.distributed.all_gather() for more details.

allgather_into_tensor_coalesced(self:torch._C._distributed_c10d.ProcessGroup,outputs:collections.abc.Sequence[torch.Tensor],inputs:collections.abc.Sequence[torch.Tensor],opts:torch._C._distributed_c10d.AllgatherOptions=<torch._C._distributed_c10d.AllgatherOptionsobjectat0x7fad5a6ef270>)c10d::Work#

Allgathers the input tensors from all processes across the process group.

Seetorch.distributed.all_gather() for more details.

allreduce(*args,**kwargs)#

Overloaded function.

  1. allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllreduceOptions = <torch._C._distributed_c10d.AllreduceOptions object at 0x7fad5a366130>) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

Seetorch.distributed.all_reduce() for more details.

  1. allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

Seetorch.distributed.all_reduce() for more details.

  1. allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

Seetorch.distributed.all_reduce() for more details.

allreduce_coalesced(self:torch._C._distributed_c10d.ProcessGroup,tensors:collections.abc.Sequence[torch.Tensor],opts:torch._C._distributed_c10d.AllreduceCoalescedOptions=<torch._C._distributed_c10d.AllreduceCoalescedOptionsobjectat0x7fad59f0e5b0>)c10d::Work#

Allreduces the provided tensors across all processes in the process group.

Seetorch.distributed.all_reduce() for more details.

alltoall(self:torch._C._distributed_c10d.ProcessGroup,output_tensors:collections.abc.Sequence[torch.Tensor],input_tensors:collections.abc.Sequence[torch.Tensor],opts:torch._C._distributed_c10d.AllToAllOptions=<torch._C._distributed_c10d.AllToAllOptionsobjectat0x7fad5a411ff0>)c10d::Work#

Alltoalls the input tensors from all processes across the process group.

Seetorch.distributed.all_to_all() for more details.

alltoall_base(*args,**kwargs)#

Overloaded function.

  1. alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], opts: torch._C._distributed_c10d.AllToAllOptions = <torch._C._distributed_c10d.AllToAllOptions object at 0x7fad5a3384b0>) -> c10d::Work

Alltoalls the input tensors from all processes across the process group.

Seetorch.distributed.all_to_all() for more details.

  1. alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], timeout: datetime.timedelta | None = None) -> c10d::Work

Alltoalls the input tensors from all processes across the process group.

Seetorch.distributed.all_to_all() for more details.

barrier(*args,**kwargs)#

Overloaded function.

  1. barrier(self: torch._C._distributed_c10d.ProcessGroup, opts: torch._C._distributed_c10d.BarrierOptions = <torch._C._distributed_c10d.BarrierOptions object at 0x7fad59f0fa70>) -> c10d::Work

Blocks until all processes in the group enter the call, and

then all leave the call together.

Seetorch.distributed.barrier() for more details.

  1. barrier(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta | None = None) -> c10d::Work

Blocks until all processes in the group enter the call, and

then all leave the call together.

Seetorch.distributed.barrier() for more details.

propertybound_device_id#
boxed(self:torch._C._distributed_c10d.ProcessGroup)object#
broadcast(*args,**kwargs)#

Overloaded function.

  1. broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.BroadcastOptions = <torch._C._distributed_c10d.BroadcastOptions object at 0x7fad5a355ff0>) -> c10d::Work

Broadcasts the tensor to all processes in the process group.

Seetorch.distributed.broadcast() for more details.

  1. broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Broadcasts the tensor to all processes in the process group.

Seetorch.distributed.broadcast() for more details.

gather(*args,**kwargs)#

Overloaded function.

  1. gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.GatherOptions = <torch._C._distributed_c10d.GatherOptions object at 0x7fad5a6eea70>) -> c10d::Work

Gathers the input tensors from all processes across the process group.

Seetorch.distributed.gather() for more details.

  1. gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Gathers the input tensors from all processes across the process group.

Seetorch.distributed.gather() for more details.

get_group_store(self:torch._C._distributed_c10d.ProcessGroup)torch._C._distributed_c10d.Store#

Get the store of this process group.

propertygroup_desc#

Gets this process group description

propertygroup_name#

(Gets this process group name. It’s cluster unique)

merge_remote_group(self:torch._C._distributed_c10d.ProcessGroup,store:torch._C._distributed_c10d.Store,size:SupportsInt,timeout:datetime.timedelta=datetime.timedelta(seconds=1800),group_name:str|None=None,group_desc:str|None=None)torch._C._distributed_c10d.ProcessGroup#
monitored_barrier(self:torch._C._distributed_c10d.ProcessGroup,timeout:datetime.timedelta|None=None,wait_all_ranks:bool=False)None#
Blocks until all processes in the group enter the call, and

then all leave the call together.

Seetorch.distributed.monitored_barrier() for more details.

name(self:torch._C._distributed_c10d.ProcessGroup)str#

Get the name of this process group.

rank(self:torch._C._distributed_c10d.ProcessGroup)int#

Get the rank of this process group.

recv(self:torch._C._distributed_c10d.ProcessGroup,tensors:collections.abc.Sequence[torch.Tensor],srcRank:SupportsInt,tag:SupportsInt)c10d::Work#

Receives the tensor from the specified rank.

Seetorch.distributed.recv() for more details.

recv_anysource(self:torch._C._distributed_c10d.ProcessGroup,arg0:collections.abc.Sequence[torch.Tensor],arg1:SupportsInt)c10d::Work#

Receives the tensor from any source.

Seetorch.distributed.recv() for more details.

reduce(*args,**kwargs)#

Overloaded function.

  1. reduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.ReduceOptions = <torch._C._distributed_c10d.ReduceOptions object at 0x7fad5a413930>) -> c10d::Work

Reduces the provided tensors across all processes in the process group.

Seetorch.distributed.reduce() for more details.

  1. reduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Reduces the provided tensors across all processes in the process group.

Seetorch.distributed.reduce() for more details.

reduce_scatter(*args,**kwargs)#

Overloaded function.

  1. reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ReduceScatterOptions = <torch._C._distributed_c10d.ReduceScatterOptions object at 0x7fad5a367fb0>) -> c10d::Work

Reduces and scatters the input tensors from all processes across the process group.

  1. reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Reduces and scatters the input tensors from all processes across the process group.

reduce_scatter_tensor_coalesced(self:torch._C._distributed_c10d.ProcessGroup,outputs:collections.abc.Sequence[torch.Tensor],inputs:collections.abc.Sequence[torch.Tensor],opts:torch._C._distributed_c10d.ReduceScatterOptions=<torch._C._distributed_c10d.ReduceScatterOptionsobjectat0x7fad5a376db0>)c10d::Work#

Reduces and scatters the input tensors from all processes across the process group.

scatter(*args,**kwargs)#

Overloaded function.

  1. scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ScatterOptions = <torch._C._distributed_c10d.ScatterOptions object at 0x7fad59f16ff0>) -> c10d::Work

Scatters the input tensors from all processes across the process group.

Seetorch.distributed.scatter() for more details.

  1. scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensor: torch.Tensor, input_tensors: collections.abc.Sequence[torch.Tensor], root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Scatters the input tensors from all processes across the process group.

Seetorch.distributed.scatter() for more details.

send(self:torch._C._distributed_c10d.ProcessGroup,tensors:collections.abc.Sequence[torch.Tensor],dstRank:SupportsInt,tag:SupportsInt)c10d::Work#

Sends the tensor to the specified rank.

Seetorch.distributed.send() for more details.

set_timeout(self:torch._C._distributed_c10d.ProcessGroup,timeout:datetime.timedelta)None#

Sets the default timeout for all future operations.

shutdown(self:torch._C._distributed_c10d.ProcessGroup)None#

shutdown the process group

size(self:torch._C._distributed_c10d.ProcessGroup)int#

Get the size of this process group.

split_group(self:torch._C._distributed_c10d.ProcessGroup,ranks:collections.abc.Sequence[typing.SupportsInt],timeout:datetime.timedelta|None=None,opts:c10d::Backend::Options|None=None,group_name:str|None=None,group_desc:str|None=None)torch._C._distributed_c10d.ProcessGroup#
staticunbox(arg0:object)torch._C._distributed_c10d.ProcessGroup#
classtorch.distributed._dist2.ProcessGroupFactory(*args,**kwargs)[source]#

Bases:Protocol

Protocol for process group factories.

torch.distributed._dist2.current_process_group()[source]#

Get the current process group. Thread local method.

Returns

The current process group.

Return type

ProcessGroup

torch.distributed._dist2.new_group(backend,timeout,device,**kwargs)[source]#

Create a new process group with the given backend and options. This group isindependent and will not be globally registered and thus not usable via thestandard torch.distributed.* APIs.

Parameters
  • backend (str) – The backend to use for the process group.

  • timeout (timedelta) – The timeout for collective operations.

  • device (Union[str,device]) – The device to use for the process group.

  • **kwargs (object) – All remaining arguments are passed to the backend constructor.See the backend specific documentation for details.

Returns

A new process group.

Return type

ProcessGroup

torch.distributed._dist2.process_group(pg)[source]#

Context manager for process groups. Thread local method.

Parameters

pg (ProcessGroup) – The process group to use.

Return type

Generator[None, None, None]

torch.distributed._dist2.register_backend(name,func)[source]#

Register a new process group backend.

Parameters
  • name (str) – The name of the backend.

  • func (ProcessGroupFactory) – The function to create the process group.

On this page