alpaka-group/alpakaPublic

NotificationsYou must be signed in to change notification settings
Fork81
Star385

Abstraction Library for Parallel Kernel Acceleration 🦙

License

MPL-2.0 license

385 stars 81 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 3,134 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
cmake		cmake
docs		docs
example		example
include/alpaka		include/alpaka
script		script
test		test
thirdParty		thirdParty
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_SYCL.md		README_SYCL.md

Repository files navigation

alpaka - Abstraction Library for Parallel Kernel Acceleration

Thealpaka library is a header-only C++20 abstraction library for accelerator development.

Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism.

It is platform independent and supports the concurrent and cooperative use of multiple devices such as the hosts CPU (x86, ARM, RISC-V and Power 8+) and GPU accelerators from different vendors (NVIDIA, AMD and Intel).A multitude of accelerator back-end variants using CUDA, HIP, SYCL, OpenMP 2.0+, std::thread and also serial execution is provided and can be selected depending on the device.Only one implementation of the user kernel is required by representing them as function objects with a special interface.There is no need to write special CUDA, HIP, SYCL, OpenMP or custom threading code.Accelerator back-ends can be mixed and synchronized via compute device queue.The decision which accelerator back-end executes which kernel can be made at runtime.

The abstraction used is very similar to the CUDA grid-blocks-threads domain decomposition strategy.Algorithms that should be parallelized have to be divided into a multi-dimensional grid consisting of small uniform work items.These functions are called kernels and are executed in parallel threads.The threads in the grid are organized in blocks.All threads in a block are executed in parallel and can interact via fast shared memory and low level synchronization methods.Blocks are executed independently and can not interact in any way.The block execution order is unspecified and depends on the accelerator in use.By using this abstraction the execution can be optimally adapted to the available hardware.

Software License

alpaka is licensed underMPL-2.0.

Documentation

The alpaka documentation can be found in theonline manual.The documentation files in.rst (reStructuredText) format are located in thedocs subfolder of this repository.Thesource code documentation is generated withdoxygen.

Accelerator Back-ends

Accelerator Back-end	Lib/API	Devices	Execution strategy grid-blocks	Execution strategy block-threads
Serial	n/a	Host CPU (single core)	sequential	sequential (only 1 thread per block)
OpenMP 2.0+ blocks	OpenMP 2.0+	Host CPU (multi core)	parallel (preemptive multitasking)	sequential (only 1 thread per block)
OpenMP 2.0+ threads	OpenMP 2.0+	Host CPU (multi core)	sequential	parallel (preemptive multitasking)
std::thread	std::thread	Host CPU (multi core)	sequential	parallel (preemptive multitasking)
TBB	TBB 2.2+	Host CPU (multi core)	parallel (preemptive multitasking)	sequential (only 1 thread per block)
CUDA	CUDA 12.0+	NVIDIA GPUs	parallel (undefined)	parallel (lock-step within warps)
HIP(clang)	HIP 6.0+	AMD GPUs	parallel (undefined)	parallel (lock-step within warps)
SYCL(oneAPI)	oneAPI 2024.2+	CPUs, Intel GPUs and FPGAs	parallel (undefined)	parallel (lock-step within warps)

Supported Compilers

This library uses C++20 (or newer when available).

Accelerator Back-end	gcc 11.1 (Linux)	gcc 12.3 (Linux)	gcc 13.1 (Linux)	clang 14 (Linux)	clang 15 (Linux)	clang 16 (Linux)	clang 17 (Linux)	clang 18 (Linux)	clang 19 (Linux)	icpx 2025.0 (Linux)	Xcode 15.4 / 16.1 (macOS)	Visual Studio 2022 (Windows)
Serial	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
OpenMP 2.0+ blocks	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅¹	✅	✅
OpenMP 2.0+ threads	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅¹	✅	✅
std::thread	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
TBB	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
CUDA (nvcc)	✅ (CUDA 12.0)	✅ (CUDA 12.0 - 12.5)	✅ (CUDA 12.4 - 12.5)	✅ (CUDA 12.0)	✅ (CUDA 12.2)	✅ (CUDA 12.3)	✅ (CUDA 12.4 - 12.5)	✅ (CUDA 12.4 - 12.5)	❌	❌	-	❌
CUDA (clang)	-	-	-	❌	❌	❌	❌	❌	❌	❌	-	-
HIP (clang)	-	-	-	❌	❌	❌	✅ (HIP 6.0 - 6.1)	✅ (HIP 6.2)	❌	❌	-	-
SYCL	❌	❌	❌	❌	❌	❌	❌	❌	❌	✅²	-	❌

Other compilers or combinations marked with ❌ in the table above may work but are not tested in CI and are therefore not explicitly supported.

Dependencies

Thealpaka library itself just requires header-only libraries.However some of the accelerator back-end implementations require different boost libraries to be built.

When an accelerator back-end usingCUDA is enabled, version12.0 (with nvcc as CUDA compiler) or version12.0 (with clang as CUDA compiler) of theCUDA SDK is the minimum requirement.NOTE: When using clang as a nativeCUDA compiler, theCUDA accelerator back-end can not be enabled together with anyOpenMP accelerator back-end because this combination is currently unsupported.NOTE: Separable compilation is disabled by default and can be enabled via the CMake flagCMAKE_CUDA_SEPARABLE_COMPILATION.

When an accelerator back-end usingOpenMP is enabled, the compiler and the platform have to support the corresponding minimumOpenMP version.

When an accelerator back-end usingTBB is enabled, the compiler and the platform have to support the corresponding minimumTBB version.

Boost 1.78.0+ is an optional external dependency, if the used C++ standard library does not supportstd::atomic_ref.

Usage

The library is header only so nothing has to be built.CMake 3.22+ is required to provide the correct defines and include paths.Just callalpaka_add_executable instead ofadd_executable and the difficulties of the CUDA nvcc compiler in handling.cu and.cpp files are automatically taken care of.Source files do not need any special file ending.Examples of how to utilize alpaka within CMake can be found in theexample folder.

The whole alpaka library can be included with:#include <alpaka/alpaka.hpp>Code that is not intended to be utilized by the user is hidden in thedetail namespace.

Furthermore, for a CUDA-like experience when adopting alpaka we provide the librarycupla.It enables a simple and straightforward way of porting existing CUDA applications to alpaka and thus to a variety of accelerators.

Single header

The CI creates a single-header version of alpaka on each commit,which you can find on thesingle-header branch.

This is especially useful, if you would like to play with alpaka onCompiler explorer.Just include alpaka like

#include<https://raw.githubusercontent.com/alpaka-group/alpaka/single-header/include/alpaka/alpaka.hpp>

and enable the desired backend on the compiler's command line using the corresponding macro, e.g. via-DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED.

Introduction

For a quick introduction, feel free to playback the recording of our presentation atGTC 2016:

E. Zenker, R. Widera, G. Juckeland et al.,Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka,video link (39 min),slides (PDF),DOI:10.5281/zenodo.6336086

Citing alpaka

Currently all authors ofalpaka are scientists or connected withresearch. For us to justify the importance and impact of our work, pleaseconsider citing us accordingly in your derived work and publications:

% Peer-Reviewed Publication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Peer reviewed and accepted publication in%   "2nd International Workshop on Performance Portable%    Programming Models for Accelerators (P^3MA)"% colocated with the%   "2017 ISC High Performance Conference"%   in Frankfurt, Germany@inproceedings{MathesP3MA2017,  author    = {{Matthes}, A. and {Widera}, R. and {Zenker}, E. and {Worpitz}, B. and               {Huebl}, A. and {Bussmann}, M.},  title     = {Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code               using the Alpaka library},  archivePrefix ="arXiv",  eprint    = {1706.10086},  keywords  = {Computer Science - Distributed, Parallel, and Cluster Computing},  day       = {30},  month     = {Jun},  year      = {2017},  url       = {https://arxiv.org/abs/1706.10086},}% Peer-Reviewed Publication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Peer reviewed and accepted publication in%   "The Sixth International Workshop on%    Accelerators and Hybrid Exascale Systems (AsHES)"% at the%   "30th IEEE International Parallel and Distributed%    Processing Symposium" in Chicago, IL, USA@inproceedings{ZenkerAsHES2016,  author    = {Erik Zenker and Benjamin Worpitz and Ren{\'{e}} Widera               and Axel Huebl and Guido Juckeland and               Andreas Kn{\"{u}}pfer and Wolfgang E. Nagel and Michael Bussmann},  title     = {Alpaka - An Abstraction Library for Parallel Kernel Acceleration},  archivePrefix ="arXiv",  eprint    = {1602.08477},  keywords  = {Computer science;CUDA;Mathematical Software;nVidia;OpenMP;Package;               performance portability;Portability;Tesla K20;Tesla K80},  day       = {23},  month     = {May},  year      = {2016},  publisher = {IEEE Computer Society},  url       = {http://arxiv.org/abs/1602.08477},}% Original Work: Benjamin Worpitz' Master Thesis %%%%%%%%%%%@MasterThesis{Worpitz2015,  author = {Benjamin Worpitz},  title  = {Investigating performance portability of a highly scalable            particle-in-cell simulation code on various multi-core            architectures},  school = {{Technische Universit{\"{a}}t Dresden}},  month  = {Sep},  year   = {2015},  type   = {Master Thesis},  doi    = {10.5281/zenodo.49768},  url    = {http://dx.doi.org/10.5281/zenodo.49768}}

Contributing

Rules for contributions can be found inCONTRIBUTING.md.Any pull request will be reviewed by amaintainer.

Thanks to allactive and former contributors.

Footnotes

Due to anLLVM bug in debug mode only release builds are supported.↩↩²
Currently, the unit tests are compiled but not executed.↩

About

Abstraction Library for Parallel Kernel Acceleration 🦙

alpaka.readthedocs.io

Releases21

2.0.0 Concepts Ahead Latest

Jun 25, 2025

+ 20 releases

Contributors45

+ 31 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

alpaka - Abstraction Library for Parallel Kernel Acceleration

Software License

Documentation

Accelerator Back-ends

Supported Compilers

Dependencies

Usage

Single header

Introduction

Citing alpaka

Contributing

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases21

Uh oh!

Contributors45

Uh oh!

Languages