Point To Point Communication Functions

(Since NCCL 2.7) Point-to-point communication primitives need to be used when ranks need to send andreceive arbitrary data from each other, which cannot be expressed as a broadcast or allgather, i.e.when all data sent and received is different.

ncclSend

ncclResult_tncclSend(const void* sendbuff, size_t count,ncclDataType_t datatype, int peer,ncclComm_t comm, cudaStream_t stream)

Send data fromsendbuff to rankpeer.

Rankpeer needs to call ncclRecv with the samedatatype and the samecount as this rank.

This operation is blocking for the GPU. If multiplencclSend() andncclRecv() operationsneed to progress concurrently to complete, they must be fused within ancclGroupStart()/ncclGroupEnd() section.

Related links:Point-to-point communication.

ncclRecv

ncclResult_tncclRecv(void* recvbuff, size_t count,ncclDataType_t datatype, int peer,ncclComm_t comm, cudaStream_t stream)

Receive data from rankpeer intorecvbuff.

Rankpeer needs to call ncclSend with the samedatatype and the samecount as this rank.

This operation is blocking for the GPU. If multiplencclSend() andncclRecv() operationsneed to progress concurrently to complete, they must be fused within ancclGroupStart()/ncclGroupEnd() section.

Related links:Point-to-point communication.