Installation #

Install nvmath-python#

nvmath-python, like most modern Python packages, provides pre-built binaries (wheels andconda packages) to the end users. The full source code is hosted in theNVIDIA/nvmath-python repository.

In terms of CUDA Toolkit (CTK) choices, nvmath-python is designed and implemented to allowbuilding and running against 1.pip-wheel, 2.conda, or 3. system installation ofCTK. Having a full CTK installation at either build- or run-time is not necessary; only asmall subset, as explained below, is enough.

Host & device APIs (seeOverview) have different run-time dependencies andrequirements. Even among host APIs the needed underlying libraries are different (forexample,fft() on GPUs only needs cuFFT and not cuBLAS). Libraries areloaded when only needed. Therefore, nvmath-python is designed to have most of itsdependenciesoptional, but provides convenient installation commands for users to quicklyspin up a working Python environment.

Thecheatsheet below captures nvmath-python’s required and optionalbuild-time and run-time dependencies. Using the installation commands from the sectionsbelow should support most of your needs.

Install from PyPI#

The pre-built wheels can bepip-installed from the public PyPI. There are severaloptional dependencies expressible in the standard “extras” bracket notation. The followingassumes thatCTK components are also installed via pip (so no extra step from users isneeded; the dependencies are pulled via extras).

Important

Using at least one of thepip extras described below is required for allpipinstalls to ensure that nvmath-python’s dependencies are correctly constrained bypip.

pipinstallnvmath-python (no extras) is a bare installation (very lightweight) forsystem admins or expert users. This requires that the user manage of all dependencies.

Command	Description
`pipinstallnvmath-python[cu12]`	Install nvmath-python along with all CUDA 12 optionaldependencies (wheels for cuBLAS/cuFFT/… and CuPy) to supportnvmath host APIs.
`pipinstallnvmath-python[cu12-dx]`	Install nvmath-python along with all CUDA 12 optionaldependencies (wheels for cuBLAS/cuFFT/…, CuPy, Numba, …) to supportnvmath host & device APIs (which only supports CUDA 12)[8].
`pipinstallnvmath-python[cpu]`	Install nvmath-python along with all CPU optional dependencies(wheels for NVPL or MKL) to support optimized CPU FFT APIs.[1] Note: NVPL supports only ARM architecture, while MKL or another FFTW3[9]compatible library may be substituted for x86 architecture. The environment variable`NVMATH_FFT_CPU_LIBRARY` may be used toprovide the path to an alternate shared object which implements theFFTW3 (non-guru) API. Ensure`LD_LIBRARY_PATH` includes thislibrary if it is not already in the PATH. The environment variable`NVMATH_BLAS_CPU_LIBRARY` may be used toprovide the path to an alternate shared object which implements theBLAS ABI.`LD_LIBRARY_PATH` should be set properly toinclude this library if it is not already in the PATH.
`pipinstallnvmath-python[cu12-distributed]`	Install nvmath-python along with all MGMN optional dependencies (wheels for mpi4py,NVSHMEM, cuFFTMp, …) to support multi-GPU multi-node APIs. Note: Users must provide an MPI implementation.
`pipinstallnvmath-python[cu12-dx]'cuda-toolkit[nvcc,nvrtc]==12.8.*'--extra-index-urlhttps://download.pytorch.org/whl/cu128torch`	Install nvmath-python along with all CUDA 12 optional dependencies to supportnvmath.device APIs and a PyTorch built with CTK 12.8. Note: PyTorch has strict pinnings for some CUDA components, and builds of PyTorchlag behind the latest CUDA component releases. We must therefore explicitly requirethat the nvcc and nvrtc components installed for device extensions match; otherwise,pip will create a mismatched environment with CTK components from different releases.Verbose installation commands such as the the one here are necessary because oflimitations of the current wheel format and Python package index. Please see thePyTorch installation instructions for releases built with other CTK versions.

The options below are for adventurous users who want to manage most of the dependenciesthemselves. The following assumes thatsystem CTK is installed.

Command	Description
`pipinstallnvmath-python[sysctk12]`	Install nvmath-python along with CuPy for CUDA 12 to supportnvmath host APIs. Note: Set`LD_LIBRARY_PATH` to include the CUDA libraries.
`pipinstallnvmath-python[sysctk12-dx]`	Install nvmath-python along with CuPy for CUDA 12 to supportnvmath host & device APIs. Note: Set`LD_LIBRARY_PATH` to include the CUDA libraries. To use`nvmath.device` APIs, set`CUDA_HOME` (or`CUDA_PATH`)to point to the system CTK.
`pipinstallnvmath-python[sysctk12]mpi4py`	Install nvmath-python and mpi4py along with no MGMN optional dependencies to supportmulti-GPU multi-node APIs. Note: Users must provide an MPI implementation and the required cuMp librariesand dependencies (NVSHMEM, cuFFTMp, …).

Command

Description

pipinstallnvmath-python[sysctk12]

Install nvmath-python along with CuPy for CUDA 12 to supportnvmath host APIs.

Note: SetLD_LIBRARY_PATH to include the CUDA libraries.

pipinstallnvmath-python[sysctk12-dx]

Install nvmath-python along with CuPy for CUDA 12 to supportnvmath host & device APIs.

Note:

SetLD_LIBRARY_PATH to include the CUDA libraries.
To usenvmath.device APIs, setCUDA_HOME (orCUDA_PATH)to point to the system CTK.

pipinstallnvmath-python[sysctk12]mpi4py

Install nvmath-python and mpi4py along with no MGMN optional dependencies to supportmulti-GPU multi-node APIs.

Note: Users must provide an MPI implementation and the required cuMp librariesand dependencies (NVSHMEM, cuFFTMp, …).

Hint

To install extras for CUDA 13, use13 in the extra names instead of12.

Install from conda#

Conda packages can be installed from theconda-forge channel.

Command	Description
`condainstall-cconda-forgenvmath-pythoncuda-version=12`	Install nvmath-python along with all CUDA 12 optionaldependencies (packages for cuBLAS/cuFFT/… and CuPy) to supportnvmath host APIs.
`condainstall-cconda-forgenvmath-python-dxcuda-version=12`	Install nvmath-python along with all CUDA 12 optionaldependencies (packages for cuBLAS/cuFFT/…, CuPy, Numba, …) to supportnvmath host & device APIs (which only supports CUDA 12). Note: `nvmath-python-dx` is a metapackage for ease of installing`nvmath-python` and other dependencies.
`condainstall-cconda-forgenvmath-python-cpu`	Install nvmath-python along with all CPU optional dependencies(NVPL or other) to support optimized CPU FFT APIs.[1] Note: `nvmath-python-cpu` is a meta-package for ease of installing`nvmath-python` and other dependencies. NVPL is for ARM architecture only. MKL or another FFTW3[9] compatiblelibrary may be substituted for x86 architecture. The environment variable`NVMATH_FFT_CPU_LIBRARY` may be used toprovide the path to an alternate shared object which implements theFFTW3 (non-guru) API.`LD_LIBRARY_PATH` should be set properly toinclude this library if it is not already in the PATH. The environment variable`NVMATH_BLAS_CPU_LIBRARY` may be used toprovide the path to an alternate shared object which implements theBLAS ABI.`LD_LIBRARY_PATH` should be set properly toinclude this library if it is not already in the PATH.
`condainstall-cconda-forgenvmath-python-distributed`	Install nvmath-python along with all MGMN optional dependencies (packages for mpi4py,NVSHMEM, cuFFTMp, MPI, …) to support multi-GPU multi-node APIs. Note: conda-forge provides a pass-through MPI package variant that may be used inorder to use a system-installed MPI instead of the conda-forge-provided MPIimplementations.

Notes:

For expert users,condainstall-cconda-forgenvmath-python=*=core* would be a bareminimal installation (very lightweight). This allows fully explicit control of alldependencies.
If you installedconda fromminiforge,most likely the conda-forge channel is already set as the default, then the-cconda-forge part in the above instruction can be omitted.

Build from source#

Once you clone the repository and go into the root directory, you can build the project fromsource. There are several ways to build it since we need some CUDA headers at build time.

Command	Description
`pipinstall-v.`	Set up a build isolation (as perPEP 517),install CUDA wheels and other build-time dependencies to thebuild environment, build the project, and install it to thecurrent user environment together with the run-timedependencies. Note: in this case we get CUDA headers by installing pip wheels to the isolatedbuild environment.
`CUDA_PATH=/path/to/your/cuda/installationpipinstall--no-build-isolation-v.`	Skip creating a build isolation (it would use CUDA headers from`$CUDA_PATH/include` instead), build the project, and install it to the currentuser environment together with the run-time dependencies. Use: conda: After installing CUDA 12 conda packages, set the environment variable`CUDA_PATH` linux-64:`CUDA_PATH=$CONDA_PREFIX/targets/x86_64-linux/` linux-aarch64:`CUDA_PATH=$CONDA_PREFIX/targets/sbsa-linux/` win-64:`CUDA_PATH=$CONDA_PREFIX\Library` system CTK: Just set`CUDA_PATH` to the system CTK location.

Command

Description

pipinstall-v.

Set up a build isolation (as perPEP 517),install CUDA wheels and other build-time dependencies to thebuild environment, build the project, and install it to thecurrent user environment together with the run-timedependencies.

Note: in this case we get CUDA headers by installing pip wheels to the isolatedbuild environment.

CUDA_PATH=/path/to/your/cuda/installationpipinstall--no-build-isolation-v.

Skip creating a build isolation (it would use CUDA headers from$CUDA_PATH/include instead), build the project, and install it to the currentuser environment together with the run-time dependencies. Use:

conda: After installing CUDA 12 conda packages, set the environment variableCUDA_PATH
- linux-64:CUDA_PATH=$CONDA_PREFIX/targets/x86_64-linux/
- linux-aarch64:CUDA_PATH=$CONDA_PREFIX/targets/sbsa-linux/
- win-64:CUDA_PATH=$CONDA_PREFIX\Library
system CTK: Just setCUDA_PATH to the system CTK location.

Notes:

If you add the “extras” notation after the dot. (for example.[cpu],.[cu12-dx], …), it has the same meaning as explained in theprevious section.
If you don’t want the run-time dependencies to be automatically handled, add--no-depsafter thepipinstall command above; in this case, however, it’s your responsibilityto make sure that all the run-time requirements are met.
By replacinginstall bywheel, a wheel can be built targeting the current OS andCPython version.
If you want inplace/editable install, add the-e flag to the command above (before thedot.). This is suitable for local development with a system-installed CTK. However,our wheels rely onnon-editable builds so that the RPATH hack can kick in. DO NOT passthe-e flag when building wheels!
All optional run-time dependencies as listed below need to be manually installed.

Cheatsheet#

Below we provide a summary of requirements to support all nvmath-python functionalities. Adependency isrequired unless stated otherwise.

	When Building	When Running - host APIs	When Running - device APIs	When Running - host API callbacks	When Running - distributed APIs
CPU architecture & OS	linux-64, linux-aarch64, win-64	linux-64, linux-aarch64, win-64	linux-64, linux-aarch64[1]	linux-64, linux-aarch64	linux-64, linux-aarch64
GPU hardware		All hardware supported by the underlying CUDA Toolkit[5] Optional: needed if the execution space is GPU.	Compute Capability 7.0+ (Volta and above)	Compute Capability 7.0+ (Volta and above)	Data Center GPUwith Compute Capability 7.0+ (Volta and above).GPU connectivity:p2p or GPUDirect RDMA over IB requirements
CUDA driver[2]		525.60.13+ (Linux) / 527.41+ (Windows) with CUDA >=12.0 580+ with CUDA >=13.0 Optional: needed if the execution space is GPU or for loading any CUDA library.	525.60.13+ (Linux) with CUDA >=12.0	525.60.13+ (Linux) with CUDA >=12.0	525.60.13+ (Linux) with CUDA >=12.0
Python	3.10-3.13	3.10-3.13	3.10-3.13	3.10-3.13	3.10-3.13
pip	22.3.1+
setuptools	>=77.0.3
wheel	>=0.34.0
Cython	>=3.0.4,<3.1
CUDA	CUDA >=12.0 (only need headers from NVCC & CUDART[6])	CUDA >=12.0 Optional: depending on the math operations in use	CUDA >=12.0,!=12.4.*,!=12.5.0[7] (NVRTC, NVVM, CCCL[8], CUDART)	CUDA >=12.0	CUDA >=12.0
cuda-pathfinder		>=1.2.1	>=1.2.1	>=1.2.1	>=1.2.1
cuda-core		>=0.3.2	>=0.3.2	>=0.3.2	>=0.3.2
NumPy		>=1.25	>=1.25	>=1.25	>=1.25
CuPy (seeCuPy installation guide)		>=12.1[4]		>=12.1[4]	>=12.1[4]
PyTorch (seePyTorch installation guide)		>=1.12 (optional)[10]		>=1.12 (optional)	>=1.12 (optional)
libmathdx (cuBLASDx, cuFFTDx, …)			>=0.2.3,<0.3
numba-cuda			>=0.18.1	>=0.18.1
Math Kernel Library (MKL)		>=2024 (optional)
NVIDIA Performance Libraries (NVPL)		24.7 (optional)

Test Configuration#

nvmath-python is tested in the following environments:

CUDA	12.0, 12.9, 13.0
Driver	R525, R575, R580
GPU model	H100, B200, RTX 4090, CG1 (Grace-Hopper)
Python	3.10, 3.11, 3.12, 3.13
CPU architecture	x86_64, aarch64
Operating system	Ubuntu 22.04, Ubuntu 20.04, RHEL 9, Windows 11

Run nvmath-python#

As mentioned earlier, nvmath-python can be run with all methods of CUDA installation,including wheels, conda packages, and system CTK. As a result, there is detection logic todiscover shared libraries (for host APIs) and headers (for device APIs to do JITcompilation).

Shared libraries#

pip wheels: Will be auto-discovered if installed
conda packages: Will be auto-discovered if installed, after wheel
system CTK: On Linux, the users needs to ensure the shared libraries are discoverable bythe dynamic linker, say by settingLD_LIBRARY_PATH or updating system search paths toinclude the DSO locations.

Headers#

This includes libraries such as CCCL and MathDx.

pip wheels: Will be auto-discovered if installed
conda packages: Will be auto-discovered if installed, after wheel
system CTK: Need to setCUDA_HOME (orCUDA_PATH) andMATHDX_HOME (for MathDxheaders)

Host APIs#

This terminology is explained in theHost APIs.

Examples#

See theexamples directory in the repo. Currently we have:

examples/fft
examples/linalg

Tests#

Thepyproject.toml file lists dependencies required forpip-controlledenvironments to run tests. These requirements are installed via thedev dependencygroup. e.g.pipinstall--groupdev

Running functionality tests#

pytesttests/example_teststests/nvmath_tests/ffttests/nvmath_tests/linalg

Running performance tests#

This will currently run two tests for fft and one test for linalg:

pytest-v-s-k'perf'tests/nvmath_tests/fft/pytest-v-s-k'perf'tests/nvmath_tests/linalg/

Device APIs#

This terminlogy is explained in theDevice APIs.

Examples#

See theexamples/device directory in the repo.

Tests#

Running functionality tests#

pytesttests/nvmath_tests/deviceexamples/device

Running performance tests#

pytest-v-s-k'perf'tests/nvmath_tests/device/

Troubleshooting#

Forpip-users, there are known limitations (many of which are nicely captured in thepypackaging community project) in Python packagingtools. For a complex library such as nvmath-python that interacts with many nativelibraries, there are user-visible caveats.

Be sure that there are no packages with both-cu12 (for CUDA 12) and-cu13 (forCUDA 13) suffixes coexisting in your Python environment. For example, this is a corruptedenvironment:
```
$piplistPackageVersion---------------------------nvidia-cublas-cu1212.5.2.13nvidia-cublas13.0.2.14pip24.0setuptools70.0.0wheel0.43.0
```
Sometimes such conflicts could come from a dependency of the libraries that you use, sopay extra attention to what’s installed.
pip does not attempt to check if the installed packages can actually be run againstthe installed GPU driver (CUDA GPU driver cannot be installed bypip), so make sureyour GPU driver is new enough to support the installed-cuXX packages[2]. Thedriver version can be checked by executingnvidia-smi and inspecting theDriverVersion field on the output table.
CuPy installed frompip currently (as of v13.3.0) only supports conda and system CTK,and notpip-installed CUDA wheels. nvmath-python can help CuPy use the CUDA librariesinstalled tosite-packages (where wheels are installed to) ifnvmath is imported.From beta 2 (v0.2.0) onwards the libraries are “soft-loaded” (no error is raised if alibrary is not installed) whenimportnvmath happens. This behavior may change in afuture release.
Numba installed frompip currently (as of v0.60.0) only supports conda and systemCTK, and notpip-installed CUDA wheels. nvmath-python can also help Numba use theCUDA compilers installed tosite-packages ifnvmath is imported. Same as above,this behavior may change in a future release.
PyTorch installed frompip pins some CUDA wheels packages to version v12.6 (or v12.8depending on the installation method). However, nvmath-python does not pin CUDA wheelspackages, so they will float up the latest version. This can cause a mismatch betweencompiler components when using thedx extra. In this case, it’s recommended tomanually constraincuda-cccl,cuda-nvcc,cuda-nvrtc, andcuda-runtimepackages to match the variant of PyTorch installed.

In general, mixing-and-matching CTK packages frompip,conda, and the system ispossible but can be very fragile, so it’s important to understand what you’re doing. Thenvmath-python internals are designed to work with everything installed either viapip,conda, or local system (system CTK, includingtarball extractions, are the fallback solution in the detection logic),but mix-n-match makes the detection logic impossible to get right.

To help you perform an integrity check, the rule of thumb is that every single packageshould only come from one place (eitherpip, orconda, or local system). Forexample, if bothnvidia-cufft-cu12 (which is frompip) andlibcufft (fromconda) appear in the output ofcondalist, something is almost certainly wrong.Below is the package name mapping betweenpip andconda, withXX=12denoting CUDA’s major version:

pip (`cuda-toolkit==12`)	pip (`cuda-toolkit>=13`)	conda (`cuda-version>=12`)
`nvidia-cuda-nvcc-cuXX`	`nvidia-cuda-nvcc`	`cuda-nvcc`
`nvidia-cuda-nvrtc-cuXX`	`nvidia-cuda-nvrtc`	`cuda-nvrtc`
`nvidia-cuda-runtime-cuXX`	`nvidia-cuda-runtime`	`cuda-cudart-dev`
`nvidia-cuda-cccl-cuXX`	`nvidia-cuda-cccl`	`cuda-cccl`
`nvidia-cublas-cuXX`	`nvidia-cublas`	`libcublas`
`nvidia-cusolver-cuXX`	`nvidia-cusolver`	`libcusolver`
`nvidia-cusparse-cuXX`	`nvidia-cusparse`	`libcusparse`
`nvidia-cufft-cuXX`	`nvidia-cufft`	`libcufft`
`nvidia-curand-cuXX`	`nvidia-curand`	`libcurand`

Note that system packages (by design) do not show up in the output ofcondalist orpiplist. Linux users should check the installation list from your distro packagemanager (apt,yum,dnf, …). See also theLinux Package Manager InstallationGuide for additional information.

For more information with regard to the new CUDA 12+ package layout on conda-forge, see theCUDA recipe README.

Footnotes

[1](1,2,3)

Windows support will be added in a future release.

[2](1,2)

nvmath-python relies onCUDA minor version compatibility.

[4](1,2,3)

As of Beta 7.0 (v0.7.0), CuPy is an optional run-time dependency. It is not included inthe extras/meta-packages, and must be installed separately if desired.

[5]

For example, Blackwell GPUs are supported starting CUDA 12.8, so they would not workwith libraries from CUDA 12.6 or below (There is no CUDA 12.7).

[6]

While we need some CUDA headers at build time, there is no limitation in the CUDAversion seen at build time.

[7]

These versions are not supported due to a known compiler bug; the[dx] extrasalready takes care of this.

[8](1,2)

If CCCL is installed viapip manually it needs to be constrained with"nvidia-cuda-cccl-cu12>=12.4.127" due to a packaging issue; the[dx] extrasalready takes care of this.

[9](1,2)

The library must ship FFTW3 symbols for single and double precision transforms in asingleso file.

[10]

To usematmul with FP8 or MXFP8 you need PyTorch version built with CUDA 12.8(>=2.7.0 or nightly version)

On this page

Movatterモバイル変換

Installation#

Install nvmath-python#

Install from PyPI#

Install from conda#

Build from source#

Cheatsheet#

Test Configuration#

Run nvmath-python#

Shared libraries#

Headers#

Host APIs#

Examples#

Tests#

Running functionality tests#

Running performance tests#

Device APIs#

Examples#

Tests#

Running functionality tests#

Running performance tests#

Troubleshooting#

Installation #