Using XGBoost with RAPIDS Memory Manager (RMM) plugin
RAPIDS Memory Manager (RMM) library provides acollection of efficient memory allocators for NVIDIA GPUs. It is now possible to useXGBoost with memory allocators provided by RMM, by enabling the RMM integration plugin.
The demos in this directory highlights one RMM allocator in particular:the poolsub-allocator. This allocator addresses the slow speed ofcudaMalloc() byallocating a large chunk of memory upfront. Subsequent allocations will draw from the poolof already allocated memory and thus avoid the overhead of callingcudaMalloc()directly. Seethis GTC talk slides formore details.
Before running the demos, ensure that XGBoost is compiled with the RMM plugin enabled. To do this,run CMake with option-DPLUGIN_RMM=ON (-DUSE_CUDA=ON also required):
cmake-Bbuild-S.-DUSE_CUDA=ON-DUSE_NCCL=ON-DPLUGIN_RMM=ONcmake--buildbuild-j$(nproc)
CMake will attempt to locate the RMM library in your build environment. You may choose to buildRMM from the source, or install it using the Conda package manager. If CMake cannot find RMM, youshould specify the location of RMM with the CMake prefix:
# If using Conda:cmake-Bbuild-S.-DUSE_CUDA=ON-DUSE_NCCL=ON-DPLUGIN_RMM=ON-DCMAKE_PREFIX_PATH=$CONDA_PREFIX# If using RMM installed with a custom locationcmake-Bbuild-S.-DUSE_CUDA=ON-DUSE_NCCL=ON-DPLUGIN_RMM=ON-DCMAKE_PREFIX_PATH=/path/to/rmm
Informing XGBoost about RMM pool
When XGBoost is compiled with RMM, most of the large size allocation will go through RMMallocators, but some small allocations in performance critical areas are using a differentcaching allocator so that we can have better control over memory allocation behavior.Users can override this behavior and force the use of rmm for all allocations by settingthe global configurationuse_rmm:
withxgb.config_context(use_rmm=True):clf=xgb.XGBClassifier(tree_method="hist",device="cuda")
Depending on the choice of memory pool size and the type of the allocator, this can addmore consistency to memory usage but with slightly degraded performance impact.
No Device Ordinal for Multi-GPU
Since with RMM the memory pool is pre-allocated on a specific device, changing the CUDAdevice ordinal in XGBoost can result in memory errorcudaErrorIllegalAddress. Use theCUDA_VISIBLE_DEVICES environment variable instead of thedevice="cuda:1" parameterfor selecting device. For distributed training, the distributed computing frameworks likedask-cuda are responsible for device management. For Scala-Spark, seeXGBoost4J-Spark-GPU Tutorial for more info.
Memory Over-Subscription
Warning
This feature is still experimental and is under active development.
The newer NVIDIA platforms likeGrace-Hopper useNVLink-C2C, which allows the CPU and GPU tohave a coherent memory model. Users can use theSamHeadroomMemoryResource in the latestRMM to utilize system memory for storing data. This can help XGBoost utilize memory fromthe host for GPU computation, but it may reduce performance due to slower CPU memory speedand page migration overhead.