This section describes the library management functions of the CUDA runtime application programming interface.
This call sets the value of a specified attributeattr on the kernelkernel for the requested devicedevice to an integer value specified byvalue. This function returnscudaSuccess if the new value of the attribute could be successfully set. If the set fails, this call will return an error. Not all attributes can have values set. Attempting to set a value on a read-only attribute will result in an error (cudaErrorInvalidValue)
Note that attributes set usingcudaFuncSetAttribute() will override the attribute set by this API irrespective of whether the call tocudaFuncSetAttribute() is made before or after this API call. Because of this and the stricter locking requirements mentioned below it is suggested that this call be used during the initialization path and not on each thread accessingkernel such as on kernel launches or on the critical path.
Valid values forattr are:
cudaFuncAttributeMaxDynamicSharedMemorySize - The requested maximum size in bytes of dynamically-allocated shared memory. The sum of this value and the function attribute sharedSizeBytes cannot exceed the device attributecudaDevAttrMaxSharedMemoryPerBlockOptin. The maximal size of requestable dynamic shared memory may differ by GPU architecture.
cudaFuncAttributePreferredSharedMemoryCarveout - On devices where the L1 cache and shared memory use the same hardware resources, this sets the shared memory carveout preference, in percent of the total shared memory. SeecudaDevAttrMaxSharedMemoryPerMultiprocessor. This is only a hint, and the driver can choose a different ratio if required to execute the function.
cudaFuncAttributeRequiredClusterWidth: The required cluster width in blocks. The width, height, and depth values must either all be 0 or all be positive. The validity of the cluster dimensions is checked at launch time. If the value is set during compile time, it cannot be set at runtime. Setting it at runtime will return cudaErrorNotPermitted.
cudaFuncAttributeRequiredClusterHeight: The required cluster height in blocks. The width, height, and depth values must either all be 0 or all be positive. The validity of the cluster dimensions is checked at launch time. If the value is set during compile time, it cannot be set at runtime. Setting it at runtime will return cudaErrorNotPermitted.
cudaFuncAttributeRequiredClusterDepth: The required cluster depth in blocks. The width, height, and depth values must either all be 0 or all be positive. The validity of the cluster dimensions is checked at launch time. If the value is set during compile time, it cannot be set at runtime. Setting it at runtime will return cudaErrorNotPermitted.
cudaFuncAttributeNonPortableClusterSizeAllowed: Indicates whether the function can be launched with non-portable cluster size. 1 is allowed, 0 is disallowed.
cudaFuncAttributeClusterSchedulingPolicyPreference: The block scheduling policy of a function. The value type is cudaClusterSchedulingPolicy.
The API has stricter locking requirements in comparison to its legacy counterpartcudaFuncSetAttribute() due to device-wide semantics. If multiple threads are trying to set the same attribute on the same device simultaneously, the attribute setting will depend on the interleavings chosen by the OS scheduler and memory consistency.
See also:
cudaLibraryLoadData,cudaLibraryLoadFromFile,cudaLibraryUnload,cudaLibraryGetKernel,cudaLaunchKernel,cudaFuncSetAttribute,cuKernelSetAttribute
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle
Returns inkernels a maximum number ofnumKernels kernel handles withinlib. The returned kernel handle becomes invalid when the library is unloaded.
See also:
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle,cudaErrorSymbolNotFoundcudaErrorDeviceUninitialized,cudaErrorContextIsDestroyed
Returns in*dptr and*bytes the base pointer and size of the global with namename for the requested librarylibrary and the current device. If no global for the requested namename exists, the call returnscudaErrorSymbolNotFound. One of the parametersdptr orbytes (not both) can be NULL in which case it is ignored. The returneddptr cannot be passed to the Symbol APIs such ascudaMemcpyToSymbol,cudaMemcpyFromSymbol,cudaGetSymbolAddress, orcudaGetSymbolSize.
See also:
cudaLibraryLoadData,cudaLibraryLoadFromFile,cudaLibraryUnload,cudaLibraryGetManaged,cuLibraryGetGlobal
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle,cudaErrorSymbolNotFound
Returns inpKernel the handle of the kernel with namename located in librarylibrary. If kernel handle is not found, the call returnscudaErrorSymbolNotFound.
See also:
cudaLibraryLoadData,cudaLibraryLoadFromFile,cudaLibraryUnload,cuLibraryGetKernel
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle
Returns incount the number of kernels inlib.
See also:
cudaLibraryEnumerateKernels,cudaLibraryLoadFromFile,cudaLibraryLoadData,cuLibraryGetKernelCount
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle,cudaErrorSymbolNotFound
Returns in*dptr and*bytes the base pointer and size of the managed memory with namename for the requested librarylibrary. If no managed memory with the requested namename exists, the call returnscudaErrorSymbolNotFound. One of the parametersdptr orbytes (not both) can be NULL in which case it is ignored. Note that managed memory for librarylibrary is shared across devices and is registered when the library is loaded. The returneddptr cannot be passed to the Symbol APIs such ascudaMemcpyToSymbol,cudaMemcpyFromSymbol,cudaGetSymbolAddress, orcudaGetSymbolSize.
See also:
cudaLibraryLoadData,cudaLibraryLoadFromFile,cudaLibraryUnload,cudaLibraryGetGlobal,cuLibraryGetManaged
cudaSuccess,cudaErrorCudartUnloading,cudaErrorInitializationError,cudaErrorInvalidValue,cudaErrorInvalidResourceHandle,cudaErrorSymbolNotFound
Returns in*fptr the function pointer to a unified function denoted bysymbol. If no unified function with namesymbol exists, the call returnscudaErrorSymbolNotFound. If there is no device with attributecudaDeviceProp::unifiedFunctionPointers present in the system, the call may returncudaErrorSymbolNotFound.
See also:
cudaLibraryLoadData,cudaLibraryLoadFromFile,cudaLibraryUnload,cuLibraryGetUnifiedFunction
cudaSuccess,cudaErrorInvalidValue,cudaErrorMemoryAllocation,cudaErrorInitializationError,cudaErrorCudartUnloading,cudaErrorInvalidPtx,cudaErrorUnsupportedPtxVersion,cudaErrorNoKernelImageForDevice,cudaErrorSharedObjectSymbolNotFound,cudaErrorSharedObjectInitFailed,cudaErrorJitCompilerNotFound
Takes a pointercode and loads the corresponding librarylibrary based on the application defined library loading mode:
If module loading is set to EAGER, via the environment variables described in "Module loading",library is loaded eagerly into all contexts at the time of the call and future contexts at the time of creation until the library is unloaded withcudaLibraryUnload().
If the environment variables are set to LAZY,library is not immediately loaded onto all existent contexts and will only be loaded when a function is needed for that context, such as a kernel launch.
These environment variables are described in the CUDA programming guide under the "CUDA environment variables" section.
Thecode may be a cubin or fatbin as output bynvcc, or a NULL-terminated PTX, either as output bynvcc or hand-written. A fatbin should also contain relocatable code when doing separate compilation. Please also see the documentation for nvrtc (https://docs.nvidia.com/cuda/nvrtc/index.html), nvjitlink (https://docs.nvidia.com/cuda/nvjitlink/index.html), and nvfatbin (https://docs.nvidia.com/cuda/nvfatbin/index.html) for more information on generating loadable code at runtime.
Options are passed as an array viajitOptions and any corresponding parameters are passed injitOptionsValues. The number of total JIT options is supplied vianumJitOptions. Any outputs will be returned viajitOptionsValues.
Library load options are passed as an array vialibraryOptions and any corresponding parameters are passed inlibraryOptionValues. The number of total library load options is supplied vianumLibraryOptions.
See also:
cudaSuccess,cudaErrorInvalidValue,cudaErrorMemoryAllocation,cudaErrorInitializationError,cudaErrorCudartUnloading,cudaErrorInvalidPtx,cudaErrorUnsupportedPtxVersion,cudaErrorNoKernelImageForDevice,cudaErrorSharedObjectSymbolNotFound,cudaErrorSharedObjectInitFailed,cudaErrorJitCompilerNotFound
Takes a pointercode and loads the corresponding librarylibrary based on the application defined library loading mode:
If module loading is set to EAGER, via the environment variables described in "Module loading",library is loaded eagerly into all contexts at the time of the call and future contexts at the time of creation until the library is unloaded withcudaLibraryUnload().
If the environment variables are set to LAZY,library is not immediately loaded onto all existent contexts and will only be loaded when a function is needed for that context, such as a kernel launch.
These environment variables are described in the CUDA programming guide under the "CUDA environment variables" section.
The file should be a cubin file as output bynvcc, or a PTX file either as output bynvcc or handwritten, or a fatbin file as output bynvcc. A fatbin should also contain relocatable code when doing separate compilation. Please also see the documentation for nvrtc (https://docs.nvidia.com/cuda/nvrtc/index.html), nvjitlink (https://docs.nvidia.com/cuda/nvjitlink/index.html), and nvfatbin (https://docs.nvidia.com/cuda/nvfatbin/index.html) for more information on generating loadable code at runtime.
Options are passed as an array viajitOptions and any corresponding parameters are passed injitOptionsValues. The number of total options is supplied vianumJitOptions. Any outputs will be returned viajitOptionsValues.
Library load options are passed as an array vialibraryOptions and any corresponding parameters are passed inlibraryOptionValues. The number of total library load options is supplied vianumLibraryOptions.
See also:
Unloads the library specified withlibrary
See also: