- Notifications
You must be signed in to change notification settings - Fork0
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
License
modular/rocSHMEM
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Researchinitiative to provide GPU-centric networking through an OpenSHMEM-like interface.This intra-kernel networking library simplifies applicationcode complexity and enables more fine-grained communication/computationoverlap than traditional host-driven networking.rocSHMEM uses a single symmetric heap that is allocated on GPU memories.
There are currently three backends for rocSHMEM;IPC, Reverse Offload (RO), and GDA.The backends primarily differ in their implementations of intra-kernel networking.
The IPC backend implements communication primitives using load/store operations issued from the GPU.
The Reverse Offload (RO) backend has the GPU runtime forward rocSHMEM networking operationsto the host-side runtime, which calls into a traditional MPI or OpenSHMEMimplementation. This forwarding of requests is transparent to theprogrammer, who only sees the GPU-side interface.
The GPU Direct Async (GDA) backend allows for rocSHMEM to issue communication operations to the NIC directly from the device-side code, without involving a CPU proxy.within the GPU.During initialization we prepare network resources for each NIC vendor using the vendor-appropriateDirect Verbs APIs.When calling the device-side rocSHMEM API, the device threads are used to construct Work Queue Entries (WQEs) and post the communication to the send queues of the NIC directly.Completion Queues (CQs) are polled from the device-side code as well.
The RO and GDA backend is provided as-is with limited support from AMD or AMD Research.
rocSHMEM base requirements:
ROCm v6.2.2 onwards
- May work with other versions, but it has not been tested
The following AMD GPUs have been fully tested for compatibility with rocSHMEM:
- MI250X
- MI300X
- MI350X (Requires ROCm 7.0 or later)
Note: Other AMD GPUs might function with unknown limitations. For the complete list of supported hardware, seeROCm System Requirements.
ROCm-aware Open MPI and UCX as described inBuilding the Dependencies
rocSHMEM only supports HIP applications. There are no plans to port toOpenCL.
rocSHMEM uses the CMake build system. The CMakeLists file containsadditional details about library options.
To create an out-of-source build for the IPC backend for single-node use-cases:
mkdir buildcd build../scripts/build_configs/ipc_singleTo create an out-of-source build for the RO backend for multi-node use-cases that can also utilize the IPC mechanisms for certain intra-node operations:
mkdir buildcd build../scripts/build_configs/ro_ipcTo create an out-of-source build for the GDA backend, we do the following.Ensure you select the correct NIC vendor script
mkdir buildcd build../scripts/build_configs/gda_<vendor>The build script passes configuration options to CMake to setup canonical builds.There are other scripts in./scripts/build_configsdirectory but currently, onlyipc_single andro_ipc is supported.
By default, the library is installed in~/rocshmem. You may provide acustom install path by supplying it as an argument. For example:
../scripts/build_configs/ipc_single /path/to/installIf you have built dependencies in a non-standard path (for example using instructions fromBuilding the Dependencies), you may have to set the following variables to find the dependencies:
MPI_ROOT=/path/to/openmpi UCX_ROOT=/path/to/ucx CMAKE_PREFIX_PATH="/path/to/rocm:$CMAKE_PREFIX_PATH" ../scripts/build_configs/ipc_single /path/to/installrocSHMEM is built as a library that can be staticallylinked to your application during compilation usinghipcc.
During the compilation of your application, include the rocSHMEM header filesand the rocSHMEM library when using hipcc.Since rocSHMEM depends on MPI you will need to link to an MPI library.The arguments for MPI linkage must be added manuallyas opposed to using mpicc.
When using hipcc directly (as opposed to through a build system), werecommend performing the compilation and linking steps separately.At the top of the examples files (./examples/*),example compile and link commands are provided:
# Compilehipcc -c -fgpu-rdc -x hip rocshmem_allreduce_test.cc \ -I/opt/rocm/include \ -I$ROCSHMEM_INSTALL_DIR/include \ -I$OPENMPI_UCX_INSTALL_DIR/include/# Linkhipcc -fgpu-rdc --hip-link rocshmem_allreduce_test.o -o rocshmem_allreduce_test \ $ROCSHMEM_INSTALL_DIR/lib/librocshmem.a \ $OPENMPI_UCX_INSTALL_DIR/lib/libmpi.so \ -L/opt/rocm/lib -lamdhip64 -lhsa-runtime64If your project uses cmake,you may find theUsing CMake with AMD ROCmpage useful.
rocSHMEM has the following enviroment variables:
ROCSHMEM_HEAP_SIZE (default : 1 GB) Defines the size of the rocSHMEM symmetric heap Note the heap is on the GPU memory. ROCSHMEM_RO_DISABLE_MIXED_IPC (default : 0) Disables IPC support for the RO or GDA backends. ROCSHMEM_MAX_NUM_CONTEXTS (default : 1024) Maximum number of contexts used in library. ROCSHMEM_MAX_NUM_TEAMS (default : 40) Maximum number of teams supported by the library. ROCSHMEM_GDA_ALTERNATE_QP_PORTS (default : 1) Enables/Disables having QPs alternate their mappings across rocSHMEM contexts.rocSHMEM is similar to OpenSHMEM and should be familiar to programmers whohave experience with OpenSHMEM or other PGAS network programming APIs in thecontext of CPUs.The best way to learn how to use rocSHMEM is to read the functions described inheaders in the dirctory./include/rocshmem/,or to look at the provided example code in the./example/ directory.The examples can be run like so:
mpirun --map-by numa --mca pml ucx --mca osc ucx -np 2 ./build/examples/rocshmem_getmem_testrocSHMEM is shipped with a functional and unit test suite for the supported rocSHMEM API.They test Puts, Gets, nonblocking Puts,nonblocking Gets, Quiets, Atomics, Tests, Wait-untils, Broadcasts, Reductions, and etc.To run the tests, you may use the driver scripts provided in the./scripts/ directory:
# Run Functional Tests./scripts/functional_tests/driver.sh ./build/tests/functional_tests/rocshmem_functional_tests all <log_directory># Run Unit Tests./scripts/unit_tests/driver.sh ./build/tests/unit_tests/rocshmem_unit_tests allrocSHMEM targets 80% code coverage in both unit and functional tests. To check the coverage report for yourchanges, we have a helper script you can use to build, test and generate the coverage report in a single step.
Because we need to build all 3 ofipc,ro_net andro_ipc, thecodecov script is run from the context ofthebuild/ directory and will create and build to the 3 directories, with instrumented code. It will then starta python http server where you can navigate to the link to view the coverage report.
cd rocSHMEMmkdir build && cd build../scripts/build_configs/codecovrocSHMEM requires a ROCm-Aware Open MPI and UCX.Other MPI implementations, such as MPICH,should be compatible with rocSHMEM but it has not been thoroughly tested.
We have a script to install dependencies.However, it is not guaranteed to work and perform optimally on all platforms.Configuration options are platform dependent.
BUILD_DIR=/path/to/not_rocshmem_src_or_build/dependencies /path/to/rocshmem_src/sripts/install_dependencies.shAfter compiling and installing UCX and Open MPI, please update yourPATH andLD_LIBRARY_PATHto point to the installation locations.The exact locations will be printed if the script ran successfully.
To build and configure ROCm-Aware UCX (1.17.0 or later), you need to:
git clone https://github.com/openucx/ucx.git -b v1.17.xcd ucx./autogen.sh./configure --prefix=<ucx_install_dir> --with-rocm=<rocm_path> --enable-mtmake -j 8make -j 8 installThen, you need to build Open MPI (5.0.6 or later) with UCX support.
git clone --recursive https://github.com/open-mpi/ompi.git -b v5.0.xcd ompi./autogen.pl./configure --prefix=<ompi_install_dir> --with-rocm=<rocm_path> --with-ucx=<ucx_install_dir>make -j 8make -j 8 installAfter compiling and installing UCX and Open MPI, please update your PATH and LD_LIBRARY_PATH to point to the installation locations, e.g.
export PATH=<ompi_install_dir>/bin:$PATHexport LD_LIBRARY_PATH=<ompi_install_dir>/lib:<ucx_install_dir>/lib:$LD_LIBRARY_PATHFor more information on OpenMPI-UCX support, please visit:https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html
About
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Languages
- C++88.8%
- C4.2%
- Python2.7%
- Shell2.2%
- CMake2.1%