Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork69
C++ template library for high performance SIMD based sorting algorithms
License
numpy/x86-simd-sort
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
C++ template library for high performance SIMD based sorting routines forbuilt-in integers and floats (16-bit, 32-bit and 64-bit data types) and customdefined C++ objects. The sorting routines are accelerated using AVX-512/AVX2when available. The library auto picks the best version depending on theprocessor it is run on. If you are looking for the AVX-512 or AVX2 specificimplementations, please seeREADME fileundersrc/
directory. The following routines are currently supported:
template<typename T,typename U,typename Func>voidx86simdsort::object_qsort(T *arr, U arrsize, Func key_func)
T
is any user defined struct or class andarr
is a pointer to the firstelement in the array of objects of typeT
. Thearrsize
parameter can be any32-bit or 64-bit integer type.Func
is a lambda function that computes thekey
value for each object which is the metric used to sort the objects.Func
needs to have the following signature:
[] (T obj) ->key_t {key_t key;/* compute key for obj*/return key; }
Note that the return type of the keykey_t
needs to be one of the following :[float, uint32_t, int32_t, double, uint64_t, int64_t]
.object_qsort
has aspace complexity ofO(N)
. Specifically, it requiresarrsize * sizeof(key_t)
bytes to store a vector with all the keys and an additionalarrsize * sizeof(uint32_t)
bytes to store the indexes of the object array. Forperformance reasons, we recommend usingobject_qsort
when the array sizeis less than or equal toUINT32_MAX
. An example usage ofobject_qsort
isprovided in theexamplessection. Refer tosection to get a sense ofhow fast this is relative tostd::sort
.
voidx86simdsort::qsort(T* arr,size_t size,bool hasnan,bool descending);voidx86simdsort::qselect(T* arr,size_t k,size_t size,bool hasnan,bool descending);voidx86simdsort::partial_qsort(T* arr,size_t k,size_t size,bool hasnan,bool descending);
Supported datatypes:T
[_Float16, uint16_t, int16_t, float, uint32_t, int32_t, double, uint64_t, int64_t]
voidx86simdsort::keyvalue_qsort(T1* key, T2* val,size_t size,bool hasnan,bool descending);voidx86simdsort::keyvalue_select(T1* key, T2* val,size_t k,size_t size,bool hasnan,bool descending);voidx86simdsort::keyvalue_partial_sort(T1* key, T2* val,size_t k,size_t size,bool hasnan,bool descending);
Supported datatypes:T1
,T2
[float, uint32_t, int32_t, double, uint64_t, int64_t]
Note that keyvalue sort is not yet supported for 16-bitdata types.
std::vector<size_t> arg = x86simdsort::argsort(T* arr,size_t size,bool hasnan,bool descending);std::vector<size_t> arg = x86simdsort::argselect(T* arr,size_t k,size_t size,bool hasnan);
Supported datatypes:T
[_Float16, uint16_t, int16_t, float, uint32_t, int32_t, double, uint64_t, int64_t]
Note that argsort and argselect are not accelerated with SIMD when using 16-bitdata types.
meson is the used build system. Commandto build and install the library:
meson setup --buildtype release builddir && cd builddirmeson compilesudo meson install
Once installed, you can usepkg-config --cflags --libs x86simdsortcpp
topopulate the right cflags and ldflags to compile and link your C++ program.This repository also contains a test suite and benchmarking suite which arewritten usinggoogletest andgooglebenchmark (>= v1.9.2) frameworksrespectively. You can configure meson to build them both by using-Dbuild_tests=true
and-Dbuild_benchmarks=true
.
qsort
,argsort
, andkeyvalue_qsort
can achieve even greater performance(up-to 3x speedup) through parallelization withOpenMP. By default, OpenMP support is disabled; toenable it, set the-Duse_openmp=true
flag when configuring Meson. If you areusing only the static SIMD implementations, compile with-fopenmp -DXSS_USE_OPENMP
.
OpenMP-based parallel sorting routines are used for arrays larger than aspecific threshold where threading makes sense. The number of threads islimited to a maximum of 16. You can control the number of threads by settingtheOMP_NUM_THREADS
environment variable.
If you would like to use this as a Meson subproject, then createsubprojects
directory and copyx86-simd-sort
into it. Add these two linesin your meson.build.
xss = subproject('x86-simd-sort')xss_dep = xss.get_variable('x86simdsortcpp_dep')
For more detailed instructions please refer to Mesondocumentation.
#include"x86simdsort.h"intmain() { std::vector<float> arr{1000};x86simdsort::qsort(arr.data(),1000,true);return0;}
#include"x86simdsort.h"#include<cmath>structPoint {double x, y, z;};intmain() { std::vector<Point> arr{1000};// Sort an array of Points by its x value:x86simdsort::object_qsort(arr.data(),1000, [](Point p) {return p.x; });// Sort an array of Points by its distance from origin:x86simdsort::object_qsort(arr.data(),1000, [](Point p) {returnsqrt(p.x*p.x+p.y*p.y+p.z*p.z); });return0;}
x86simdsort::qsort
is equivalent toqsort
inCorstd::sort
inC++.x86simdsort::qselect
is equivalent tostd::nth_element
inC++ ornp.partition
inNumPy.x86simdsort::partial_qsort
is equivalent tostd::partial_sort
inC++.x86simdsort::argsort
is equivalent tonp.argsort
inNumPy.x86simdsort::argselect
is equivalent tonp.argpartition
inNumPy.
Supported datatypes:uint16_t, int16_t, _Float16, uint32_t, int32_t, float, uint64_t, int64_t, double
. Note that_Float16
will require building thislibrary with g++ >= 12.x. All the functions have an optional argumentbool hasnan
set tofalse
by default (these are relevant to floating point datatypes only). If your array has NAN's, the the behaviour of the sorting routineis undefined. Ifhasnan
is set to true, NAN's are always sorted to the end ofthe array. In addition to that, qsort will replace all your NAN's withstd::numeric_limits<T>::quiet_NaN
. The original bit-exact NaNs inthe input are not preserved. Also note that the arg methods (argsort andargselect) will not use the SIMD based algorithms if they detect NAN's in thearray. You can read details of all the implementationshere.
Performance ofobject_qsort
can vary significantly depending on the defintionof the custom class and we highly recommend benchmarking before using it. Forthe sake of illustration, we provide a few examples in./benchmarks/bench-objsort.hpp which measuresperformance ofobject_qsort
relative tostd::sort
when sorting an array of3D points represented by the class:struct Point {double x, y, z;}
andstruct Point {float x, y, x;}
. We sort these points based on severaldifferent metrics:
- sort by coordinate
x
- sort by manhanttan distance (relative to origin):
abs(x) + abx(y) + abs(z)
- sort by Euclidean distance (relative to origin):
sqrt(x*x + y*y + z*z)
- sort by Chebyshev distance (relative to origin):
max(abs(x), abs(y), abs(z))
The performance data (shown in the plot below) can be collected by building thebenchmarks suite and running./builddir/benchexe --benchmark_filter==*obj*
.The data plot shown below was collected on a processor with AVX-512. For thesimplest of cases where we want to sort an array of struct by one of itsmembers,object_qsort
can be up-to 5x faster for 32-bit data type and about4x for 64-bit data type. It tends to do even better when the metric to sort bygets more complicated. Sorting by Euclidean distance can be up-to 10x faster.
- NumPy uses this as asubmodule to accelerate
np.sort, np.argsort, np.partition and np.argpartition
. - PyTorch uses this as asubmodule to accelerate
torch.sort, torch.argsort
. - A slightly modifed version this library has been integrated intoopenJDK.
- GRAPE: C++ library for parallel graph processing.
- AVX-512 version of the key-value sort has been submitted toOceanbase.
About
C++ template library for high performance SIMD based sorting algorithms
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.