Host APIs#

The following modules of nvmath-python offer integration with NVIDIA’shigh-performance computing libraries such as cuBLAS, cuDSS, cuFFT, andcuTENSOR (and their NVPL counterparts) through host APIs. Host APIsare called from host code but can execute in any supported executionspace (CPU or GPU).

Key Concepts#

Matrix and Tensor Qualifiers#

Recall that nvmath-python is not an array library but itinteroperates with array and tensor librariesincluding NumPy, CuPy, and PyTorch. Therefore we need a way to provideadditional information about the operands that are not contained in thendarray or tensor type to an operation (lazy conjugation or triangular matrixstructure are examples). This is done via the notion of qualifierson the tensor operands, which is provided as an NumPy ndarray of the samelength as the number of operands with the appropriate qualifiers dtype.Each qualifier in the qualifiers array provides auxiliary information aboutthe corresponding operand.

The following example shows a matrix multiplication between two matrices,\(a\) and\(b\), where\(a\) should be treated as a regulardense matrix while\(b\) as a lower triangular matrix.Note how the qualifier is used to inform the API of\(b\)’s triangular structure.

importnumpyasnpimportnvmath# Prepare sample input data.m,k=123,789a=np.random.rand(m,k).astype(np.float32)b=np.tril(np.random.rand(k,k).astype(np.float32))# We can choose the execution space for the matrix multiplication using ExecutionCUDA or# ExecutionCPU. By default, the execution space matches the operands, so in order to execute# a matrix multiplication on NumPy arrays using CUDA we need to specify ExecutionCUDA.# Tip: use help(nvmath.linalg.ExecutionCUDA) to see available options.execution=nvmath.linalg.ExecutionCUDA()# We can use structured matrices as inputs by providing the corresponding qualifier which# describes the matrix. By default, all inputs are assumed to be general matrices.# MatrixQualifiers are provided as an array of custom NumPy dtype,# nvmath.linalg.matrix_qualifiers_dtype.qualifiers=np.full((2,),nvmath.linalg.GeneralMatrixQualifier.create(),dtype=nvmath.linalg.matrix_qualifiers_dtype)qualifiers[1]=nvmath.linalg.TriangularMatrixQualifier.create(uplo=nvmath.linalg.FillMode.LOWER)result=nvmath.linalg.matmul(a,b,execution=execution,qualifiers=qualifiers)

The following example shows how a qualifier is used to conjugate a CuPy tensoroperand as part of the contraction operation. Since complex-conjugation is amemory-bound operation, this fusion improves performance compared to thealternative of performing the conjugationa priori using CuPy.

importcupyascpimportnumpyasnpimportnvmatha=cp.random.rand(8,8,8,8)+1j*cp.random.rand(8,8,8,8)b=cp.random.rand(8,8,8,8)+1j*cp.random.rand(8,8,8,8)c=cp.random.rand(8,8,8,8)+1j*cp.random.rand(8,8,8,8)d=cp.random.rand(8,8,8,8)+1j*cp.random.rand(8,8,8,8)# create an array of qualifiers (of length # of operands) with the default identity operatorqualifiers=np.full(4,nvmath.tensor.Operator.OP_IDENTITY,dtype=nvmath.tensor.tensor_qualifiers_dtype)# set the qualifier for operand b to conjugatequalifiers[1]=nvmath.tensor.Operator.OP_CONJ# result[i,j,p,q] = \sum_{k,l,m,n} a[i,j,k,l] * b[k,l,m,n].conj() * c[m,n,p,q] + d[i,j,p,q]result=nvmath.tensor.ternary_contraction("ijkl,klmn,mnpq->ijpq",a,b,c,d=d,qualifiers=qualifiers,beta=1)

Examples using qualifers can be found in theexamplesdirectory on GitHub.

Contents#