This section describes the virtual memory management functions of the low-level CUDA driver application programming interface.
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
Frees a virtual address range reserved by cuMemAddressReserve. The size must match what was given to memAddressReserve and the ptr given must match what was returned from memAddressReserve.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_OUT_OF_MEMORY,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
Reserves a virtual address range based on the given parameters, giving the starting address of the range inptr. This API requires a system that supports UVA. The size and address parameters must be a multiple of the host page size and the alignment must be a power of two or zero for default alignment. Ifaddr is 0, then the driver chooses the address at which to place the start of the reservation whereas when it is non-zero then the driver treats it as a hint about where to place the reservation.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_OUT_OF_MEMORY,CUDA_ERROR_INVALID_DEVICE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
This creates a memory allocation on the target device specified through theprop structure. The created allocation will not have any device or host mappings. The generic memoryhandle for the allocation can be mapped to the address space of calling process viacuMemMap. This handle cannot be transmitted directly to other processes (seecuMemExportToShareableHandle). On Windows, the caller must also pass an LPSECURITYATTRIBUTE inprop to be associated with this handle which limits or allows access to this handle for a recipient process (seeCUmemAllocationProp::win32HandleMetaData for more). Thesize of this allocation must be a multiple of the the value given viacuMemGetAllocationGranularity with theCU_MEM_ALLOC_GRANULARITY_MINIMUM flag. To create a CPU allocation that doesn't target any specific NUMA nodes, applications must set CUmemAllocationProp::CUmemLocation::type toCU_MEM_LOCATION_TYPE_HOST. CUmemAllocationProp::CUmemLocation::id is ignored for HOST allocations. HOST allocations are not IPC capable andCUmemAllocationProp::requestedHandleTypes must be 0, any other value will result inCUDA_ERROR_INVALID_VALUE. To create a CPU allocation targeting a specific host NUMA node, applications must set CUmemAllocationProp::CUmemLocation::type toCU_MEM_LOCATION_TYPE_HOST_NUMA and CUmemAllocationProp::CUmemLocation::id must specify the NUMA ID of the CPU. On systems where NUMA is not available CUmemAllocationProp::CUmemLocation::id must be set to 0. SpecifyingCU_MEM_LOCATION_TYPE_HOST_NUMA_CURRENT as theCUmemLocation::type will result inCUDA_ERROR_INVALID_VALUE.
Applications that intend to useCU_MEM_HANDLE_TYPE_FABRIC based memory sharing must ensure: (1) `nvidia-caps-imex-channels` character device is created by the driver and is listed under /proc/devices (2) have at least one IMEX channel file accessible by the user launching the application.
When exporter and importer CUDA processes have been granted access to the same IMEX channel, they can securely share memory.
The IMEX channel security model works on a per user basis. Which means all processes under a user can share memory if the user has access to a valid IMEX channel. When multi-user isolation is desired, a separate IMEX channel is required for each user.
These channel files exist in /dev/nvidia-caps-imex-channels/channel* and can be created using standard OS native calls like mknod on Linux. For example: To create channel0 with the major number from /proc/devices users can execute the following command: `mknod /dev/nvidia-caps-imex-channels/channel0 c <major number>=""> 0`
If CUmemAllocationProp::allocFlags::usage containsCU_MEM_CREATE_USAGE_TILE_POOL flag then the memory allocation is intended only to be used as backing tile pool for sparse CUDA arrays and sparse CUDA mipmapped arrays. (seecuMemMapArrayAsync).
Note that this function may also return error codes from previous, asynchronous launches.
See also:
cuMemRelease,cuMemExportToShareableHandle,cuMemImportFromShareableHandle
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
Given a CUDA memory handle, create a shareable memory allocation handle that can be used to share the memory with other processes. The recipient process can convert the shareable handle back into a CUDA memory handle usingcuMemImportFromShareableHandle and map it withcuMemMap. The implementation of what this handle is and how it can be transferred is defined by the requested handle type inhandleType
Once all shareable handles are closed and the allocation is released, the allocated memory referenced will be released back to the OS and uses of the CUDA handle afterward will lead to undefined behavior.
This API can also be used in conjunction with other APIs (e.g. Vulkan, OpenGL) that support importing memory from the shareable type
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
Calculates either the minimal or recommended granularity for a given allocation specification and returns it in granularity. This granularity can be used as a multiple for alignment, size, or address mapping.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
If the current process cannot support the memory described by this shareable handle, this API will error asCUDA_ERROR_NOT_SUPPORTED.
IfshHandleType isCU_MEM_HANDLE_TYPE_FABRIC and the importer process has not been granted access to the same IMEX channel as the exporter process, this API will error asCUDA_ERROR_NOT_PERMITTED.
Importing shareable handles exported from some graphics APIs(VUlkan, OpenGL, etc) created on devices under an SLI group may not be supported, and thus this API will return CUDA_ERROR_NOT_SUPPORTED. There is no guarantee that the contents ofhandle will be the same CUDA memory handle for the same given OS shareable handle, or the same underlying allocation.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_INVALID_DEVICE,CUDA_ERROR_OUT_OF_MEMORY,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED,CUDA_ERROR_ILLEGAL_STATE
Maps bytes of memory represented byhandle starting from byteoffset tosize to address range [addr,addr +size]. This range must be an address reservation previously reserved withcuMemAddressReserve, andoffset +size must be less than the size of the memory allocation. Bothptr,size, andoffset must be a multiple of the value given viacuMemGetAllocationGranularity with theCU_MEM_ALLOC_GRANULARITY_MINIMUM flag. Ifhandle represents a multicast object,ptr,size andoffset must be aligned to the value returned bycuMulticastGetGranularity with the flag CU_MULTICAST_MINIMUM_GRANULARITY. For best performance however, it is recommended thatptr,size andoffset be aligned to the value returned bycuMulticastGetGranularity with the flag CU_MULTICAST_RECOMMENDED_GRANULARITY.
Whenhandle represents a multicast object, this call may return CUDA_ERROR_ILLEGAL_STATE if the system configuration is in an illegal state. In such cases, to continue using multicast, verify that the system configuration is in a valid state and all required driver daemons are running properly.
Please note callingcuMemMap does not make the address accessible, the caller needs to update accessibility of a contiguous mapped VA range by callingcuMemSetAccess.
Once a recipient process obtains a shareable memory handle fromcuMemImportFromShareableHandle, the process must usecuMemMap to map the memory into its address ranges before setting accessibility withcuMemSetAccess.
cuMemMap can only create mappings on VA range reservations that are not currently mapped.
Note that this function may also return error codes from previous, asynchronous launches.
See also:
cuMemUnmap,cuMemSetAccess,cuMemCreate,cuMemAddressReserve,cuMemImportFromShareableHandle
Performs map or unmap operations on subregions of sparse CUDA arrays and sparse CUDA mipmapped arrays. Each operation is specified by a CUarrayMapInfo entry in themapInfoList array of sizecount. The structure CUarrayMapInfo is defined as follow:
typedef struct CUarrayMapInfo_st {CUresourcetype resourceType; union {CUmipmappedArray mipmap;CUarray array; } resource;CUarraySparseSubresourceType subresourceType; union { struct { unsigned int level; unsigned int layer; unsigned int offsetX; unsigned int offsetY; unsigned int offsetZ; unsigned int extentWidth; unsigned int extentHeight; unsigned int extentDepth; } sparseLevel; struct { unsigned int layer; unsigned long long offset; unsigned long long size; } miptail; } subresource;CUmemOperationType memOperationType;CUmemHandleType memHandleType; union { CUmemGenericAllocationHandle memHandle; } memHandle; unsigned long long offset; unsigned int deviceBitMask; unsigned int flags; unsigned int reserved[2]; }CUarrayMapInfo;whereCUarrayMapInfo::resourceType specifies the type of resource to be operated on. IfCUarrayMapInfo::resourceType is set toCUresourcetype::CU_RESOURCE_TYPE_ARRAY then CUarrayMapInfo::resource::array must be set to a valid sparse CUDA array handle. The CUDA array must be either a 2D, 2D layered or 3D CUDA array and must have been allocated usingcuArrayCreate orcuArray3DCreate with the flagCUDA_ARRAY3D_SPARSE orCUDA_ARRAY3D_DEFERRED_MAPPING. For CUDA arrays obtained usingcuMipmappedArrayGetLevel,CUDA_ERROR_INVALID_VALUE will be returned. IfCUarrayMapInfo::resourceType is set toCUresourcetype::CU_RESOURCE_TYPE_MIPMAPPED_ARRAY then CUarrayMapInfo::resource::mipmap must be set to a valid sparse CUDA mipmapped array handle. The CUDA mipmapped array must be either a 2D, 2D layered or 3D CUDA mipmapped array and must have been allocated usingcuMipmappedArrayCreate with the flagCUDA_ARRAY3D_SPARSE orCUDA_ARRAY3D_DEFERRED_MAPPING.
CUarrayMapInfo::subresourceType specifies the type of subresource within the resource. CUarraySparseSubresourceType_enum is defined as:
typedef enum CUarraySparseSubresourceType_enum { CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_SPARSE_LEVEL = 0, CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_MIPTAIL = 1 }CUarraySparseSubresourceType;where CUarraySparseSubresourceType::CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_SPARSE_LEVEL indicates a sparse-miplevel which spans at least one tile in every dimension. The remaining miplevels which are too small to span at least one tile in any dimension constitute the mip tail region as indicated by CUarraySparseSubresourceType::CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_MIPTAIL subresource type.
IfCUarrayMapInfo::subresourceType is set to CUarraySparseSubresourceType::CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_SPARSE_LEVEL then CUarrayMapInfo::subresource::sparseLevel struct must contain valid array subregion offsets and extents. The CUarrayMapInfo::subresource::sparseLevel::offsetX, CUarrayMapInfo::subresource::sparseLevel::offsetY and CUarrayMapInfo::subresource::sparseLevel::offsetZ must specify valid X, Y and Z offsets respectively. The CUarrayMapInfo::subresource::sparseLevel::extentWidth, CUarrayMapInfo::subresource::sparseLevel::extentHeight and CUarrayMapInfo::subresource::sparseLevel::extentDepth must specify valid width, height and depth extents respectively. These offsets and extents must be aligned to the corresponding tile dimension. For CUDA mipmapped arrays CUarrayMapInfo::subresource::sparseLevel::level must specify a valid mip level index. Otherwise, must be zero. For layered CUDA arrays and layered CUDA mipmapped arrays CUarrayMapInfo::subresource::sparseLevel::layer must specify a valid layer index. Otherwise, must be zero. CUarrayMapInfo::subresource::sparseLevel::offsetZ must be zero and CUarrayMapInfo::subresource::sparseLevel::extentDepth must be set to 1 for 2D and 2D layered CUDA arrays and CUDA mipmapped arrays. Tile extents can be obtained by callingcuArrayGetSparseProperties andcuMipmappedArrayGetSparseProperties
IfCUarrayMapInfo::subresourceType is set to CUarraySparseSubresourceType::CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_MIPTAIL then CUarrayMapInfo::subresource::miptail struct must contain valid mip tail offset in CUarrayMapInfo::subresource::miptail::offset and size in CUarrayMapInfo::subresource::miptail::size. Both, mip tail offset and mip tail size must be aligned to the tile size. For layered CUDA mipmapped arrays which don't have the flagCU_ARRAY_SPARSE_PROPERTIES_SINGLE_MIPTAIL set inCUDA_ARRAY_SPARSE_PROPERTIES::flags as returned bycuMipmappedArrayGetSparseProperties, CUarrayMapInfo::subresource::miptail::layer must specify a valid layer index. Otherwise, must be zero.
If CUarrayMapInfo::resource::array or CUarrayMapInfo::resource::mipmap was created withCUDA_ARRAY3D_DEFERRED_MAPPING flag set theCUarrayMapInfo::subresourceType and the contents of CUarrayMapInfo::subresource will be ignored.
CUarrayMapInfo::memOperationType specifies the type of operation.CUmemOperationType is defined as:
typedef enum CUmemOperationType_enum { CU_MEM_OPERATION_TYPE_MAP = 1, CU_MEM_OPERATION_TYPE_UNMAP = 2 }CUmemOperationType; IfCUarrayMapInfo::memOperationType is set to CUmemOperationType::CU_MEM_OPERATION_TYPE_MAP then the subresource will be mapped onto the tile pool memory specified by CUarrayMapInfo::memHandle at offsetCUarrayMapInfo::offset. The tile pool allocation has to be created by specifying theCU_MEM_CREATE_USAGE_TILE_POOL flag when callingcuMemCreate. Also,CUarrayMapInfo::memHandleType must be set to CUmemHandleType::CU_MEM_HANDLE_TYPE_GENERIC.IfCUarrayMapInfo::memOperationType is set to CUmemOperationType::CU_MEM_OPERATION_TYPE_UNMAP then an unmapping operation is performed. CUarrayMapInfo::memHandle must be NULL.
CUarrayMapInfo::deviceBitMask specifies the list of devices that must map or unmap physical memory. Currently, this mask must have exactly one bit set, and the corresponding device must match the device associated with the stream. IfCUarrayMapInfo::memOperationType is set to CUmemOperationType::CU_MEM_OPERATION_TYPE_MAP, the device must also match the device associated with the tile pool memory allocation as specified by CUarrayMapInfo::memHandle.
CUarrayMapInfo::flags andCUarrayMapInfo::reserved[] are unused and must be set to zero.
See also:
cuMipmappedArrayCreate,cuArrayCreate,cuArray3DCreate,cuMemCreate,cuArrayGetSparseProperties,cuMipmappedArrayGetSparseProperties
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
Frees the memory that was allocated on a device through cuMemCreate.
The memory allocation will be freed when all outstanding mappings to the memory are unmapped and when all outstanding references to the handle (including it's shareable counterparts) are also released. The generic memory handle can be freed when there are still outstanding mappings made with this handle. Each time a recipient process imports a shareable handle, it needs to pair it withcuMemRelease for the handle to be freed. Ifhandle is not a valid handle the behavior is undefined.
Note that this function may also return error codes from previous, asynchronous launches.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
The handle is guaranteed to be the same handle value used to map the memory. If the address requested is not mapped, the function will fail. The returned handle must be released with corresponding number of calls tocuMemRelease.
The addressaddr, can be any address in a range previously mapped bycuMemMap, and not necessarily the start address.
See also:
Given the virtual address range viaptr andsize, and the locations in the array given bydesc andcount, set the access flags for the target locations. The range must be a fully mapped address range containing all allocations created bycuMemMap /cuMemCreate. Users cannot specifyCU_MEM_LOCATION_TYPE_HOST_NUMA accessibility for allocations created on with other location types. Note: When CUmemAccessDesc::CUmemLocation::type isCU_MEM_LOCATION_TYPE_HOST_NUMA, CUmemAccessDesc::CUmemLocation::id is ignored. When setting the access flags for a virtual address range mapping a multicast object,ptr andsize must be aligned to the value returned bycuMulticastGetGranularity with the flag CU_MULTICAST_MINIMUM_GRANULARITY. For best performance however, it is recommended thatptr andsize be aligned to the value returned bycuMulticastGetGranularity with the flag CU_MULTICAST_RECOMMENDED_GRANULARITY.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibitssynchronous behavior for most use cases.
See also:
CUDA_SUCCESS,CUDA_ERROR_INVALID_VALUE,CUDA_ERROR_NOT_INITIALIZED,CUDA_ERROR_DEINITIALIZED,CUDA_ERROR_NOT_PERMITTED,CUDA_ERROR_NOT_SUPPORTED
The range must be the entire contiguous address range that was mapped to. In other words,cuMemUnmap cannot unmap a sub-range of an address range mapped bycuMemCreate /cuMemMap. Any backing memory allocations will be freed if there are no existing mappings and there are no unreleased memory handles.
WhencuMemUnmap returns successfully the address range is converted to an address reservation and can be used for a future calls tocuMemMap. Any new mapping to this virtual address will need to have access granted throughcuMemSetAccess, as all mappings start with no accessibility setup.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibitssynchronous behavior for most use cases.
See also: