Distributed runtime#
Initializing the distributed runtime#
To use the distributed APIs, you must first initialize the distributed runtime.This is done by having each process provide a local CUDA device ID (referringto a GPU on the host on which that process runs), an MPI communicator and thedesired communication backends:
importnvmath.distributedfrommpi4pyimportMPIcomm=MPI.COMM_WORLD# can use any MPI communicatornvmath.distributed.initialize(device_id,comm,backends=["nvshmem","nccl"])
Note
nvmath-python uses MPI for bootstrapping, and other bootstrapping modesmay become available in the future.
Under the hood, the distributed math libraries use additionalcommunication backends, such as NVSHMEM and NCCL.
You are free to use MPI in other parts of your application.
After initializing the distributed runtime you may use the distributed APIs.Certain APIs such as FFT and Reshape require GPU operands to be allocated on thesymmetric memory heap. Refer toDistributed API Utilities forexamples and details of how to manage GPU operands on symmetric memory.
API Reference#
| Initialize nvmath.distributed runtime. |
| Finalize nvmath.distributed runtime (this is called automatically at exit if the runtime is initialized). |
Return the distributed runtime's context or None if not initialized. | |
| Context of initialized nvmath.distributed runtime. |