- Notifications
You must be signed in to change notification settings - Fork263
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
License
xtensor-stack/xsimd
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
C++ wrappers for SIMD intrinsics
SIMD (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operationon a batch of values at once, and thus provide a way to significantly accelerate code execution. However, these instructions differ between microprocessorvendors and compilers.
xsimd
provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of numbers with the same arithmetic operators as for single values. It also provides accelerated implementation of common mathematical functions operating on batches.
Beyond Xtensor, Xsimd has been adopted by major open-source projects, such as Mozilla Firefox, Apache Arrow, Pythran, and Krita.
The XSimd project started with a series of blog articles by Johan Mabille on how to implement wrappers for SIMD intrinsicts.The archives of the blog can be found here:The C++ Scientist. The design described inthe articles remained close to the actual architecture of XSimd up until Version 8.0.
The mathematical functions are a lightweight implementation of the algorithms originally implemented in the now deprecatedboost.SIMD project.
xsimd
requires a C++11 compliant compiler. The following C++ compilers are supported:
Compiler | Version |
---|---|
Microsoft Visual Studio | MSVC 2015 update 2 and above |
g++ | 4.9 and above |
clang | 4.0 and above |
The following SIMD instruction set extensions are supported:
Architecture | Instruction set extensions |
---|---|
x86 | SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3+SSE, FMA3+AVX, FMA3+AVX2 |
x86 | AVX512BW, AVX512CD, AVX512DQ, AVX512F (gcc7 and higher) |
x86 AMD | FMA4 |
ARM | NEON, NEON64, SVE128/256/512 (fixed vector size) |
WebAssembly | WASM |
RISC-V | RISC-V128/256/512 (fixed vector size) |
A package for xsimd is available on the mamba (or conda) package manager.
mamba install -c conda-forge xsimd
A package for xsimd is available on the Spack package manager.
spack install xsimdspack load xsimd
You can directly install it from the sources with cmake:
cmake -D CMAKE_INSTALL_PREFIX=your_install_prefix.make install
To get started with usingxsimd
, check out the full documentation
xsimd
has an optional dependency on thextl library:
xsimd | xtl (optional) |
---|---|
master | ^0.7.0 |
12.x | ^0.7.0 |
11.x | ^0.7.0 |
10.x | ^0.7.0 |
9.x | ^0.7.0 |
8.x | ^0.7.0 |
The dependency onxtl
is required if you want to support vectorization forxtl::xcomplex
. In this case, you must build your project with C++14 support enabled.
The version 8 of the library is a complete rewrite and there are some slight differences with 7.x versions.A migration guide will be available soon. In the meanwhile, the following examples show how to use both versions7 and 8 of the library?
Here is an example that computes the mean of two sets of 4 double floating point values, assuming AVX extension is supported:
#include<iostream>#include"xsimd/xsimd.hpp"namespacexs= xsimd;intmain(int argc,char* argv[]){ xs::batch<double, xs::avx2> a = {1.5,2.5,3.5,4.5}; xs::batch<double, xs::avx2> b = {2.5,3.5,4.5,5.5};auto mean = (a + b) /2; std::cout << mean << std::endl;return0;}
Do not forget to enable AVX extension when building the example. With gcc or clang, this is done with the-mavx
flag,on MSVC you have to pass the/arch:AVX
option.
This example outputs:
(2.0,3.0,4.0,5.0)
The same computation operating on vectors and using the most performant instruction set available:
#include<cstddef>#include<vector>#include"xsimd/xsimd.hpp"namespacexs= xsimd;using vector_type = std::vector<double, xsimd::aligned_allocator<double>>;voidmean(const vector_type& a,const vector_type& b, vector_type& res){ std::size_t size = a.size();constexpr std::size_t simd_size = xsimd::simd_type<double>::size; std::size_t vec_size = size - size % simd_size;for(std::size_t i =0; i < vec_size; i += simd_size) {auto ba =xs::load_aligned(&a[i]);auto bb =xs::load_aligned(&b[i]);auto bres = (ba + bb) /2.; bres.store_aligned(&res[i]); }for(std::size_t i = vec_size; i < size; ++i) { res[i] = (a[i] + b[i]) /2.; }}
Building the tests requirescmake.
cmake
is available as a package for most linux distributions. Besides, they can also be installed with theconda
package manager (even on windows):
conda install -c conda-forge cmake
Oncecmake
is installed, you can build and run the tests:
mkdir buildcd buildcmake ../ -DBUILD_TESTS=ONmake xtest
In the context of continuous integration with Travis CI, tests are run in aconda
environment, which can be activated with
cdtestconda env create -f ./test-environment.ymlsource activate test-xsimdcd ..cmake. -DBUILD_TESTS=ONmake xtest
xsimd's documentation is built with three tools
While doxygen must be installed separately, you can install breathe by typing
pip install breathe
Breathe can also be installed withconda
conda install -c conda-forge breathe
Finally, build the documentation with
make html
from thedocs
subdirectory.
We use a shared copyright model that enables all contributors to maintain thecopyright on their contributions.
This software is licensed under the BSD-3-Clause license. See theLICENSE file for details.
About
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))