Communicator Creation and Management Functions¶

The following functions are public APIs exposed by NCCL to create and manage the collective communication operations.

ncclGetLastError¶

const char*ncclGetLastError(ncclComm_t comm)¶

Returns a human-readable string corresponding to the last error that occurred in NCCL.Note: The error is not cleared by calling this function.Please note that the string returned by ncclGetLastError could be unrelated to the current calland can be a result of previously launched asynchronous operations, if any.

ncclGetErrorString¶

const char*ncclGetErrorString(ncclResult_t result)¶

Returns a human-readable string corresponding to the passed error code.

ncclGetVersion¶

ncclResult_tncclGetVersion(int* version)¶

The ncclGetVersion function returns the version number of the currently linked NCCL library.The NCCL version number is returned inversion and encoded as an integer which includes theNCCL_MAJOR,NCCL_MINOR andNCCL_PATCH levels.The version number returned will be the same as theNCCL_VERSION_CODE defined innccl.h.NCCL version numbers can be compared using the supplied macro;NCCL_VERSION(MAJOR,MINOR,PATCH)

ncclGetUniqueId¶

ncclResult_tncclGetUniqueId(ncclUniqueId* uniqueId)¶

Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should becalled once when creating a communicator and the Id should be distributed to all ranks in thecommunicator before calling ncclCommInitRank.uniqueId should point to a ncclUniqueId object allocated by the user.

ncclCommInitRank¶

ncclResult_tncclCommInitRank(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank)¶

Creates a new communicator (multi thread/process version).rank must be between 0 andnranks-1 and unique within a communicator clique.Each rank is associated to a CUDA device, which has to be set before callingncclCommInitRank.ncclCommInitRank implicitly synchronizes with other ranks, hence it must becalled by different threads/processes or used within ncclGroupStart/ncclGroupEnd.

ncclCommInitAll¶

ncclResult_tncclCommInitAll(ncclComm_t* comms, int ndev, const int* devlist)¶

Creates a clique of communicators (single process version) in a blocking way.This is a convenience function to create a single-process communicator clique.Returns an array ofndev newly initialized communicators incomms.comms should be pre-allocated with size at least ndev*sizeof(ncclComm_t).devlist defines the CUDA devices associated with each rank. Ifdevlist is NULL,the firstndev CUDA devices are used, in order.

ncclCommInitRankConfig¶

ncclResult_tncclCommInitRankConfig(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank,ncclConfig_t* config)¶

This function works the same way asncclCommInitRank but accepts a configuration argument of extra attributes forthe communicator. If config is passed as NULL, the communicator will have the default behavior, as if ncclCommInitRankwas called.

See theCreating a communicator with options section for details on configuration options.

ncclCommInitRankScalable¶

ncclResult_tncclCommInitRankScalable(ncclComm_t* newcomm, int nranks, int myrank, int nId, ncclUniqueId* commIds,ncclConfig_t* config)¶

This function works the same way asncclCommInitRankConfig but accepts a list of ncclUniqueIds instead of a single one.If only one ncclUniqueId is passed, the communicator will be initialized as if ncclCommInitRankConfig was called.The provided ncclUniqueIds will all be used to initalize the single communicator given in argument.

See theCreating a communicator with options section for details on how to create and distribute the list of ncclUniqueIds.

ncclCommSplit¶

ncclResult_tncclCommSplit(ncclComm_t comm, int color, int key,ncclComm_t* newcomm,ncclConfig_t* config)¶

ThencclCommSplit is a collective function and creates a set of new communicators from an existing one. Ranks whichpass the samecolor value will be part of the same group; color must be a non-negative value. If it ispassed asNCCL_SPLIT_NOCOLOR, it means that the rank will not be part of any group, therefore returning NULLas newcomm.The value of key will determine the rank order, and the smaller key means the smaller rank in new communicator.If keys are equal between ranks, then the rank in the original communicator will be used to order ranks.If the new communicator needs to have a special configuration, it can be passed asconfig, otherwise settingconfig to NULL will make the new communicator inherit the original communicator’s configuration.When split, there should not be any outstanding NCCL operations on thecomm. Otherwise, it might causea deadlock.

ncclCommShrink¶

ncclResult_tncclCommShrink(ncclComm_t comm, int* excludeRanksList, int excludeRanksCount,ncclComm_t* newcomm,ncclConfig_t* config, int shrinkFlags)¶

ThencclCommShrink function creates a new communicator by removing specified ranks from an existing communicator.It is a collective function that must be called by all participating ranks in the newly created communicator.Ranks that are part ofexcludeRanksList should not call this function.The original ranks listed inexcludeRanksList (of sizeexcludeRanksCount) will be excluded from the new communicator.Within the new communicator, ranks will be updated to maintain a contiguous set of ids.If the new communicator needs a special configuration, it can be passed asconfig; otherwise, setting config to NULL will make the new communicator inherit the configuration of the parent communicator.

TheshrinkFlags parameter controls the behavior of the operation. UseNCCL_SHRINK_DEFAULT (or0) for normal operation, orNCCL_SHRINK_ABORT when shrinking after an error on the parent communicator.Specifically, when usingNCCL_SHRINK_DEFAULT, there should not be any outstanding NCCL operations on thecomm to avoid potential deadlocks. Further, if the parent communicator has the flag config.shrinkShare set to 1, NCCL will reuse the parent communicator resources.On the other hand, when usingNCCL_SHRINK_ABORT, NCCL will automatically abort any outstanding operations on the parent communicator, and no resources will be shared between the parent and the newly created communicator.

ncclCommFinalize¶

ncclCommRevoke¶

ncclResult_tncclCommRevoke(ncclComm_t comm, int revokeFlags)¶

Revokes in-flight operations on a communicator without destroying resources. Successful return may bencclInProgress (non-blocking) while revocation completes asynchronously; applications can queryncclCommGetAsyncError until it returnsncclSuccess.

revokeFlags must be set toNCCL_REVOKE_DEFAULT (0). Other values are reserved for future use.

After revoke completes, the communicator is quiesced and safe for destroy, split, and shrink. Launching new collectives on a revoked communicator returnsncclInvalidUsage. CallingncclCommFinalize after revoke is not supported. Resource sharing viasplitShare/shrinkShare is disabled when the parent communicator is revoked.

ncclResult_tncclCommFinalize(ncclComm_t comm)¶

Finalize a communicator objectcomm. When the communicator is marked as nonblocking,ncclCommFinalize is anonblocking function. Successful return from it will set communicator state asncclInProgress and indicatesthe communicator is under finalization where all uncompleted operations and the network-related resources arebeing flushed and freed.Once all NCCL operations are complete, the communicator will transition to thencclSuccess state. Userscan query that state withncclCommGetAsyncError.

ncclCommDestroy¶

ncclResult_tncclCommDestroy(ncclComm_t comm)¶

Destroy a communicator objectcomm.ncclCommDestroy only frees the local resources that are allocated to the communicator objectcomm ifncclCommFinalizewas previously called on the communicator; otherwise,ncclCommDestroy will call ncclCommFinalize internally.IfncclCommFinalize is called by users, users should guarantee that the state of the communicator becomesncclSuccess beforecallingncclCommDestroy.In all cases, the communicator should no longer be accessed after ncclCommDestroy returns. It is recommended thatusers callncclCommFinalize and thenncclCommDestroy.This function is an intra-node collective call, which all ranks on the same node should call to avoid a hang.

ncclCommAbort¶

ncclResult_tncclCommAbort(ncclComm_t comm)¶

ncclCommAbort frees resources that are allocated to a communicator objectcomm and aborts any uncompletedoperations before destroying the communicator. All active ranks are required to call this function in order toabort the NCCL communicator successfully. For more use cases, please checkFault Tolerance.

ncclCommGetAsyncError¶

ncclResult_tncclCommGetAsyncError(ncclComm_t comm,ncclResult_t* asyncError)¶

Queries the progress and potential errors of asynchronous NCCL operations.Operations which do not require a stream argument (e.g. ncclCommFinalize) can be considered complete as soonas the function returnsncclSuccess; operations with a stream argument (e.g. ncclAllReduce) will returnncclSuccess as soon as the operation is posted on the stream but may also report errors throughncclCommGetAsyncError() until they are completed. If the return code of any NCCL function isncclInProgress,it means the operation is in the process of being enqueued in the background, and users must query the statesof the communicators until all the states becomencclSuccess before calling another NCCL function. Before thestates change intoncclSuccess, users are not allowed to issue CUDA kernel to the streams being used by NCCL.If there has been an error on the communicator, user should destroy the communicator withncclCommAbort().If an error occurs on the communicator, nothing can be assumed about the completion or correctness of operationsenqueued on that communicator.

ncclCommCount¶

ncclResult_tncclCommCount(constncclComm_t comm, int* count)¶

Returns incount the number of ranks in the NCCL communicatorcomm.

ncclCommCuDevice¶

ncclResult_tncclCommCuDevice(constncclComm_t comm, int* device)¶

Returns indevice the CUDA device associated with the NCCL communicatorcomm.

ncclCommUserRank¶

ncclResult_tncclCommUserRank(constncclComm_t comm, int* rank)¶

Returns inrank the rank of the caller in the NCCL communicatorcomm.

ncclCommRegister¶

ncclResult_tncclCommRegister(constncclComm_t comm, void* buff, size_t size, void** handle)¶

Registers the bufferbuff withsize under communicatorcomm for zero-copy communication;handle isreturned for future deregistration. Seebuff andsize requirements and more instructions inUser Buffer Registration.

ncclCommDeregister¶

ncclResult_tncclCommDeregister(constncclComm_t comm, void* handle)¶

Deregister buffer represented byhandle under communicatorcomm.

ncclCommWindowRegister¶

ncclResult_tncclCommWindowRegister(ncclComm_t comm, void* buff, size_t size,ncclWindow_t* win, int winFlags)¶

Collectively register local bufferbuff withsize under communicatorcomm into NCCL window. Since this is a collective call,every rank in the communicator needs to participate in the registration, andsize by default needs to be equal among the ranks.win isreturned for future deregistration (if called within a group, the value may not be filled in until ncclGroupEnd() has completed).Seebuff requirement and more instructions inUser Buffer Registration. User can also passdifferent win flags to control the registration behavior. For more win flags information, please refer toWindow Registration Flags.

ncclCommWindowDeregister¶

ncclResult_tncclCommWindowDeregister(ncclComm_t comm,ncclWindow_t win)¶

Deregister NCCL window represented bywin under communicatorcomm. Deregistration is local to the rank, andcaller needs to make sure the corresponding buffer within the window is not being accessed by any NCCL operation.

ncclMemAlloc¶

ncclResult_tncclMemAlloc(void **ptr, size_t size)¶

Allocate a GPU buffer withsize. Allocated buffer head address will be returned byptr,and the actual allocated size can be larger than requested because of the buffer granularityrequirements from all types of NCCL optimizations.

ncclMemFree¶

ncclResult_tncclMemFree(void *ptr)¶

Free memory allocated byncclMemAlloc().

Movatterモバイル変換