Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

rocWMMA

License

NotificationsYou must be signed in to change notification settings

ROCm/rocWMMA

Repository files navigation

Welcome! rocWMMA is a C++ library for accelerating mixed-precision matrix multiply-accumulate (MMA)operations leveraging AMD GPU hardware. rocWMMA makes it easier to break down MMA problemsinto fragments and distribute block-wise MMA operations in parallel across GPU wavefronts. The APIconsists of a header library, that can be used to compile MMA acceleration directly into GPU kerneldevice code. This can benefit from compiler optimization in the generation of kernel assembly, anddoesn't incur additional overhead costs of linking to external runtime libraries or having to launchseparate kernels.

rocWMMA includes sample projects to validate and demonstrate API usage. These include simple GEMMs,performant GEMMs, DLRM, GEMV and hipRTC integration.

The test suite includes validation and benchmarking projects that focus on unit testing, GEMMs and DLRM.

Note

The published rocWMMA documentation is available atrocWMMA in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the rocWMMA/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information, seeContribute to ROCm documentation.

Requirements

rocWMMA currently supports the following AMD GPU architectures:

  • CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx940, gfx941, gfx942 as 'gfx9'
  • RDNA3 class GPU featuring AI acceleration support: gfx1100, gfx1101, gfx1102 as 'gfx11'

Dependencies:

  • Minimum ROCm version support is 6.4.
  • Minimum cmake version support is 3.14.
  • Minimum ROCm-cmake version support is 0.8.0.
  • Minimum rocBLAS version support is rocBLAS 4.0.0 for ROCm 6.0* (or ROCm packages rocblas and rocblas-dev).
  • Minimum HIP runtime version support is 4.3.0 (or ROCm package ROCm hip-runtime-amd).
  • Minimum LLVM OpenMP runtime dev package version support is 10.0 (available as ROCm package rocm-llvm-dev).
    * = if using rocBLAS for validation.    It is best to use available ROCm packages from the same release where applicable.

Build with CMake

For more detailed information, please refer to therocWMMA installation guide.

Project options

OptionDescriptionDefault value
GPU_TARGETSBuild code for specific GPU target(s)gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
AMDGPU_TARGETS(Deprecated) Build code for specific GPU target(s)gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
ROCWMMA_BUILD_TESTSBuild TestsON
ROCWMMA_BUILD_SAMPLESBuild SamplesON
ROCWMMA_BUILD_DOCSBuild doxygen documentation from codeOFF
ROCWMMA_BUILD_ASSEMBLYGenerate assembly filesOFF
ROCWMMA_BUILD_VALIDATION_TESTSBuild validation testsON (requires ROCWMMA_BUILD_TESTS=ON)
ROCWMMA_BUILD_BENCHMARK_TESTSBuild benchmark testsOFF (requires ROCWMMA_BUILD_TESTS=ON)
ROCWMMA_BUILD_EXTENDED_TESTSBuild extended testing coverageOFF (requires ROCWMMA_BUILD_TESTS=ON)
ROCWMMA_VALIDATE_WITH_ROCBLASUse rocBLAS for validation testsON (requires ROCWMMA_BUILD_VALIDATION_TESTS=ON)
ROCWMMA_BENCHMARK_WITH_ROCBLASInclude rocBLAS benchmarking dataOFF (requires ROCWMMA_BUILD_BENCHMARK_TESTS=ON)
ROCWMMA_USE_SYSTEM_GOOGLETESTUse system Google Test library instead of downloading and building itOFF (requires ROCWMMA_BUILD_TESTS=ON)

Example configurations

By default, the project is configured in release mode and is linked against rocBLAS for validatingresults. Here are some configuration examples:

ConfigurationCommand
BasicCC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> .
Targeting gfx908CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DGPU_TARGETS=gfx908:xnack-
Debug buildCC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug
Build without rocBLAS (default on)CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DROCWMMA_VALIDATE_WITH_ROCBLAS=OFF -DROCWMMA_BENCHMARK_WITH_ROCBLAS=OFF

After configuration, build withcmake --build <build_dir> -- -j<nproc>

Documentation

For more comprehensive documentation on installation, samples and test contents, API reference and programmer's guide you can build the documentation locally in different ways.

Html

cd docspip3 install -r sphinx/requirements.txtpython3 -m sphinx -T -E -b html -d _build/doctrees -D language=en. _build/html

The HTML documentation can be viewed in your browser by opening docs/_build/html/index.html result.

Pdf

cd docssudo apt-get updatesudo apt-get install doxygensudo apt-get install texlive-latex-base texlive-latex-extrapip3 install -r sphinx/requirements.txtpython3 -m sphinx -T -E -b latex -d _build/doctrees -D language=en. _build/latexcd _build/latexpdflatex rocwmma.tex

Running the above commands generatesrocwmma.pdf.

The latest official documentation for rocWMMA is available at:https://rocm.docs.amd.com/projects/rocWMMA/en/latest/index.html.

Contributing to the rocWMMA Library

Community collaboration is encouraged! If you are considering contributing, please follow therocWMMA Contribution Guide to get started.


[8]ページ先頭

©2009-2025 Movatter.jp