nvmath-python Bindings#
Overview#
Warning
All Python bindings documented in this section areexperimental and subject to futurechanges. Use it at your own risk.
Low-level Python bindings for C APIs from NVIDIA Math Libraries are exposed under thecorresponding modules innvmath.. To access the Python bindings, use themodules for the corresponding libraries. Under the hood, nvmath-python handles the run-timelinking to the libraries for you lazily.
The currently supported libraries along with the corresponding module names are listed asfollows:
Library name | Python access |
|---|---|
cuBLAS | |
cuBLASLt | |
cuBLASMp | |
cuDSS | |
cuFFT | |
cuRAND | |
cuSOLVER | |
cuSOLVERDn | |
cuSPARSE | |
NVPL BLAS | |
NVPL FFT |
Support for more libraries will be added in the future.
Naming & Calling Convention#
Inside each of the modules, all public APIs of the corresponding NVIDIA Math library areexposed following thePEP 8 style guide along with the following changes:
All library name prefixes are stripped
The function names are broken by words and follow the camel case
The first letter in each word in the enum names are capitalized
Each enum’s name prefix is stripped from its values’ names
Whenever applicable, the outputs are stripped away from the function arguments andreturned directly as Python objects
Pointers are passed as Python
intExceptions are raised instead of returning the C error code
Below is a non-exhaustive list of examples of such C-to-Python mappings:
Function:
cublasDgemm->cublas.dgemm().Function:
curandSetGeneratorOrdering->curand.set_generator_ordering()Enum type:
cublasLtMatmulTile_t->cublasLt.MatmulTileEnum type:
cufftXtSubFormat->cufft.XtSubFormatEnum value name:
CUSOLVER_EIG_MODE_NOVECTOR->cusolver.EigMode.NOVECTOREnum value name:
CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED->cusparse.Status.MATRIX_TYPE_NOT_SUPPORTEDReturns: The outputs of
cusolverDnXpotrf_bufferSizeare the workspace sizes on deviceand host, which are wrapped as a 2-tuple in the correspondingcusolverDn.xpotrf_buffer_size()Python API.
There may be exceptions for the above rules, but they would be self-evident and will beproperly documented. In the next section we discuss pointer passing in Python.
Memory management#
Pointer and data lifetime#
Unlike in C/C++, Python does not provide low-level primitives to allocate/deallocate hostmemory (nor device memory). In order to make the C APIs work with Python, it is importantthat memory management is properly done through Python proxy objects. In nvmath-python, weask users to address such needs using NumPy (for host memory) and CuPy (for device memory).
Note
It is also possible to usearray.array (plusmemoryview as needed) tomanage host memory. However it is more laborious compared to usingnumpy.ndarray, especially when it comes to array manipulation and computation.
Note
It is also possible to useCUDA Python to manage device memory, but as of CUDA 11there is no simple, pythonic way to modify the contents stored on GPU, which requirescustom kernels. CuPy is a lightweight, NumPy-compatible array library that addressesthis need.
To pass data from Python to C, using pointer addresses (as Pythonint) of variousobjects is required. We illustrate this using NumPy/CuPy arrays as follows:
# create a host buffer to hold 5 intbuf=numpy.empty((5,),dtype=numpy.int32)# pass buf's pointer to the wrapper# buf could get modified in-place if the function writes to itmy_func(...,buf.ctypes.data,...)# examine/use buf's dataprint(buf)# create a device buffer to hold 10 doublebuf=cupy.empty((10,),dtype=cupy.float64)# pass buf's pointer to the wrapper# buf could get modified in-place if the function writes to itmy_func(...,buf.data.ptr,...)# examine/use buf's dataprint(buf)# create an untyped device buffer of 128 bytesbuf=cupy.cuda.alloc(128)# pass buf's pointer to the wrapper# buf could get modified in-place if the function writes to itmy_func(...,buf.ptr,...)# buf is automatically destroyed when going out of scope
The underlying assumption is that the arrays must be contiguous inmemory (unless the C interface allows for specifying the array strides).
As a consequence, all C structs in NVIDIA Math libraries (including handles and descriptors)arenot exposed as Python classes; that is, they do not have their own types and aresimply cast to plain Pythonint for passing around. Any downstream consumer shouldcreate a wrapper class to hold the pointer address if so desired. In other words, users havefull control (and responsibility) for managing thepointer lifetime.
However, in certain cases we are able to convert Python objects for users (ifreadonly,host arrays are needed) so as to alleviate users’ burden. For example, in functions thatrequire a sequence or a nested sequence, the following operations are equivalent:
# passing a host buffer of int type can be done like thisbuf=numpy.array([0,1,3,5,6],dtype=numpy.int32)my_func(...,buf.ctypes.data,...)# or just thisbuf=[0,1,3,5,6]my_func(...,buf,...)# the underlying data type is determined by the C API
which is particularly useful when users need to pass multiple sequences or nested sequencesto C (For example,nvmath.).
Note
Some functions require their arguments to be in the device memory. You need to passdevice memory (for example,cupy.ndarray) to such arguments. nvmath-pythonneither validates the memory pointers nor implicitly transfers the data.Passing host memory where device memory is expected (and vice versa) results inundefined behavior.
API Reference#
This reference describes all nvmath-python’s math primitives.
- cuBLAS (
nvmath.)bindings. cublas - cuBLASLt (
nvmath.)bindings. cublaslt - cuBLASMp (
nvmath.)bindings. cublasMp - cuDSS (
nvmath.)bindings. cudss - cuFFT (
nvmath.)bindings. cufft - cuSOLVER (
nvmath.)bindings. cusolver - cuSOLVERDn (
nvmath.)bindings. cusolverDn - cuSPARSE (
nvmath.)bindings. cusparse - cuRAND (
nvmath.)bindings. curand - NVPL BLAS (
nvmath.)bindings. nvpl. blas - NVPL FFT (
nvmath.)bindings. nvpl. fft