This section provides an overview of the major components of the NVIDIA® CUDA® Toolkit and points to their locations after installation.
Code samples that illustrate how to use various CUDA and library APIs are available in thesamples/ directory on Linux and Mac, and are installed toC:\ProgramData\NVIDIA Corporation\CUDA Samples on Windows. On Linux and Mac, thesamples/ directory is read-only and the samples must be copied to another location if they are to be modified. Further instructions can be found in theGetting Started Guides for Linux and Mac.
The most current version of these release notes can be found online athttp://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html. Also, theversion.txt file in the root directory of the toolkit will contain the version and build number of the installed toolkit.
Documentation can be found in PDF form in thedoc/pdf/ directory, or in HTML form atdoc/html/index.html and online athttp://docs.nvidia.com/cuda/index.html.
The release notes for the CUDA® Toolkit can be found online athttp://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.
This section summarizes the changes in CUDA 11.0 Update 1 since the 11.0 GA release.
New Features
Resolved Issues
Known Issues
Deprecations
None.
This section summarizes the changes in CUDA 11.0 GA since the 11.0 RC release.
General CUDA
NPP New Features
nvJPEG New Features
cuBLAS New Features
cuSOLVER New Features
cuSOLVER Resolved Issues
cuSPARSE New Features
This release adds the following features for cusparseSpMM:
Ci=A⋅Bi
Ci=Ai⋅B
Ci=Ai⋅Bi
cuFFT New Features
Known Issues
Deprecations
Starting with CUDA 11, the various components in the toolkit are versioned independently.
For CUDA11.0 Update 1, the table below indicates the versions:
| Component Name | Version Information | Supported Architectures |
|---|---|---|
| CUDA Runtime (cudart) | 11.0.221 | x86_64, POWER, Arm64 |
| cuobjdump | 11.0.221 | x86_64, POWER, Arm64 |
| CUPTI | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA Demo Suite | 11.0.167 | x86_64 |
| CUDA GDB | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA Memcheck | 11.0.221 | x86_64, POWER |
| CUDA NVCC | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA nvdisasm | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA NVML Headers | 11.0.167 | x86_64, POWER, Arm64 |
| CUDA nvprof | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA nvprune | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA NVRTC | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA NVTX | 11.0.167 | x86_64, POWER, Arm64 |
| CUDA NVVP | 11.0.221 | x86_64, POWER |
| CUDA Samples | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA Compute Sanitizer API | 11.0.221 | x86_64, POWER, Arm64 |
| CUDA cuBLAS | 11.2.0.252 | x86_64, POWER, Arm64 |
| CUDA cuFFT | 10.2.1.245 | x86_64, POWER, Arm64 |
| CUDA cuRAND | 10.2.1.245 | x86_64, POWER, Arm64 |
| CUDA cuSOLVER | 10.6.0.245 | x86_64, POWER, Arm64 |
| CUDA cuSPARSE | 11.1.1.245 | x86_64, POWER, Arm64 |
| CUDA NPP | 11.1.0.245 | x86_64, POWER, Arm64 |
| CUDA nvJPEG | 11.1.1.245 | x86_64, POWER, Arm64 |
| Nsight Eclipse Plugins | 11.0.221 | x86_64, POWER |
| Nsight Compute | 2020.1.2.4 | x86_64, POWER, Arm64 |
| Nsight Windows NVTX | 1.21018621 | x86_64, POWER, Arm64 |
| Nsight Systems | 2020.3.2.6 | x86_64, POWER, Arm64 |
| Nsight Visual Studio Edition (VSE) | 2020.1.2.20203 | x86_64 (Windows) |
| NVIDIA Linux Driver | 450.51.06 | x86_64, POWER, Arm64 |
| NVIDIA Windows Driver | 451.82 | x86_64 (Windows) |
Running a CUDA application requires the system with at least one CUDA capable GPU and a driver that is compatible with the CUDA Toolkit. SeeTable 2. For more information various GPU products that are CUDA capable, visithttps://developer.nvidia.com/cuda-gpus.
Each release of the CUDA Toolkit requires a minimum version of the CUDA driver. The CUDA driver is backward compatible, meaning that applications compiled against a particular version of the CUDA will continue to work on subsequent (later) driver releases.
More information on compatibility can be found athttps://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#cuda-runtime-and-driver-api-version.
Note: Starting with CUDA 11.0, the toolkit components are individually versioned, and the toolkit itself is versioned as shown in the table below.
| CUDA Toolkit | Linux x86_64 Driver Version | Windows x86_64 Driver Version |
|---|---|---|
| CUDA 11.0.3 Update 1 | >= 450.51.06 | >= 451.82 |
| CUDA 11.0.2 GA | >= 450.51.05 | >= 451.48 |
| CUDA 11.0.1 RC | >= 450.36.06 | >= 451.22 |
| CUDA 10.2.89 | >= 440.33 | >= 441.22 |
| CUDA 10.1 (10.1.105 general release, and updates) | >= 418.39 | >= 418.96 |
| CUDA 10.0.130 | >= 410.48 | >= 411.31 |
| CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26 |
| CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44 |
| CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29 |
| CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54 |
| CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51 |
| CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30 |
| CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66 |
| CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62 |
For convenience, the NVIDIA driver is installed as part of the CUDA Toolkit installation. Note that this driver is for development purposes and is not recommended for use in production with Tesla GPUs.
For running CUDA applications in production with Tesla GPUs, it is recommended to download the latest driver for Tesla GPUs from the NVIDIA driver downloads site athttp://www.nvidia.com/drivers.
During the installation of the CUDA Toolkit, the installation of the NVIDIA driver may be skipped on Windows (when using the interactive or silent installation) or on Linux (by using meta packages).
For more information on customizing the install process on Windows, seehttp://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#install-cuda-software.
For meta packages on Linux, seehttps://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-metas
CUDA 11.0 adds support for the Arm server platform (arm64 SBSA). Note that with this release, only the following platforms are supported with Tesla V100 GPU:
HPE Apollo 70 (using Marvell ThunderX2™ CN99XX)
Gigabyte R2851 (using Marvell ThunderX2™ CN99XX)
Huawei TaiShan 2280 V2 (using Huawei Kunpeng 920)
CUDA supports a wide range of Linux and Windows distributions. For a full list of supported operating systems, seesystem requirements for more information. The following new Linux distributions are supported in CUDA 11.0.
For x86 (x86_64):
Red Hat Enterprise Linux (RHEL) 8.1
Ubuntu 18.04.4 LTS
For Arm (arm64):
SUSE SLES 15.1
For POWER (ppc64le):
Red Hat Enterprise Linux (RHEL) 8.1
CUDA 11.0 adds the following support for WMMA:
Added support for double (FP64) to the list of available input/output types for 8x8x4 shapes (DMMA.884)
Added support for__nv_bfloat16 andtf32 precision formats for the HMMA 16x16x8 shape
The following new compilers are supported as host compilers for the CUDA compiler (nvcc)
nvcc added two new flags ('-Wdefault-stream-launch') and ('-Werror=default-stream-launch') to generate a warning and an error, respectively, when a stream argument is not explicitly specified in the<<<...>>> kernel launch syntax. For example:
$ cat j1.cu
__global__ void foo() { }
int main() { foo<<<1,1>>>();
}
$nvcc -Wdefault-stream-launch j1.cu -ptx
j1.cu(2): warning: explicit stream argument not provided in kernel launch
$nvcc -Werror=default-stream-launch j1.cu -c
j1.cu(2): error: explicit stream argument not provided in kernel launch
The compiler optimizer now implements more aggressive dead code elimination for__shared__ variables whose value is not used. For example:
//--__device__ void foo() {
__shared__ int xxx;
xxx = 1;
}
In previous CUDA toolkits, the variable "xxx" is still present in the generated PTX. With CUDA 11 or later, the variable may be removed in the generated PTX, because its value is not used. Marking the variable as "volatile" will inhibit this compiler optimization.
In previous CUDA toolkits, NVRTC on Linux incorrectly added "/usr/include" to the default header file search path. This issue has been fixed; NVRTC in CUDA 11.0 and later will not implicitly add '/usr/include' to the header file search path.
If some included files are present inside/usr/include, the NVRTCnvrtcCompileProgram() API call must now be explicitly passed the "/usr/include" path with the "-I" flag.
nvcc now allows options that take a single argument to be redefined. If the redefinition is incompatible with the earlier instance, a warning is issued. For example:
// the following command line is now accepted, previously nvcc gave an error
$nvcc -rdc=true -rdc=true -c j1.cu
// the following command line is now accepted with a warning (due to incompatible redefinition of '-rdc' argument), previously nvcc gave an error
$nvcc -rdc=true -rdc=false -c j1.cu
nvcc warning : incompatible redefinition for option 'relocatable-device-code'
This release of the toolkit includes the following updates:
This release adds the following features for cusparseSpMM:
Ci=A⋅Bi
Ci=Ai⋅B
Ci=Ai⋅Bi
This release adds more control and helpful functionalities for the Tensor Cores Accelerated Iterative Refinement Solver TCAIRS.
AddednppiSegmentWatershed functions.
Support for the following compute capabilities are deprecated in the CUDA Toolkit:
sm_35 (Kepler)
sm_37 (Kepler)
sm_50 (Maxwell)
For more information on GPU products and compute capability, seehttps://developer.nvidia.com/cuda-gpus.
CUDA 11.0 no longer supports development of CUDA applications on the following Windows distributions:
The following multiphase APIs have been removed:
nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseOne
nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseTwo
nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseThree
nvjpegStatus_t NVJPEGAPI nvjpegDecodeBatchedPhaseOne
nvjpegStatus_t NVJPEGAPI nvjpegDecodeBatchedPhaseTwo
Corrected documented maximum ulp error thresholds inerfcinvf andpowf.
NVIDIA extends thanks to Professor Mike Giles of Oxford University for providing the initial code for the optimized version of the device implementation of the double-precisionexp() function found in this release of the CUDA toolkit.
NVIDIA acknowledges Scott Gray for his work on small-tile GEMM kernels for Pascal. These kernels were originally developed for OpenAI and included since cuBLAS 8.0.61.2.
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.