- Notifications
You must be signed in to change notification settings - Fork755
[ARCHIVED] The C++ parallel algorithms library. Seehttps://github.com/NVIDIA/cccl
License
NVIDIA/thrust
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Examples | Godbolt | Documentation |
---|
Thrust is the C++ parallel algorithms library which inspired the introductionof parallel algorithms to the C++ Standard Library.Thrust'shigh-level interface greatly enhances programmerproductivitywhile enabling performance portability between GPUs and multicore CPUs.It builds on top of established parallel programming frameworks (such as CUDA,TBB, and OpenMP).It also provides a number of general-purpose facilities similar to those foundin the C++ Standard Library.
The NVIDIA C++ Standard Library is an open source project; it is available onGitHub and included in the NVIDIA HPC SDK and CUDA Toolkit.If you have one of those SDKs installed, no additional installation or compilerflags are needed to use libcu++.
Thrust is best learned through examples.
The following example generates random numbers serially and then transfers themto a parallel device where they are sorted.
#include<thrust/host_vector.h>#include<thrust/device_vector.h>#include<thrust/generate.h>#include<thrust/sort.h>#include<thrust/copy.h>#include<thrust/random.h>intmain() {// Generate 32M random numbers serially. thrust::default_random_enginerng(1337); thrust::uniform_int_distribution<int> dist; thrust::host_vector<int>h_vec(32 <<20);thrust::generate(h_vec.begin(), h_vec.end(), [&] {returndist(rng); });// Transfer data to the device. thrust::device_vector<int> d_vec = h_vec;// Sort data on the device.thrust::sort(d_vec.begin(), d_vec.end());// Transfer data back to host.thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());}
This example demonstrates computing the sum of some random numbers in parallel:
#include<thrust/host_vector.h>#include<thrust/device_vector.h>#include<thrust/generate.h>#include<thrust/reduce.h>#include<thrust/functional.h>#include<thrust/random.h>intmain() {// Generate random data serially. thrust::default_random_enginerng(1337); thrust::uniform_real_distribution<double>dist(-50.0,50.0); thrust::host_vector<double>h_vec(32 <<20);thrust::generate(h_vec.begin(), h_vec.end(), [&] {returndist(rng); });// Transfer to device and compute the sum. thrust::device_vector<double> d_vec = h_vec;double x =thrust::reduce(d_vec.begin(), d_vec.end(),0, thrust::plus<int>());}
This example show how to perform such a reduction asynchronously:
#include<thrust/host_vector.h>#include<thrust/device_vector.h>#include<thrust/generate.h>#include<thrust/async/copy.h>#include<thrust/async/reduce.h>#include<thrust/functional.h>#include<thrust/random.h>#include<numeric>intmain() {// Generate 32M random numbers serially. thrust::default_random_enginerng(123456); thrust::uniform_real_distribution<double>dist(-50.0,50.0); thrust::host_vector<double>h_vec(32 <<20);thrust::generate(h_vec.begin(), h_vec.end(), [&] {returndist(rng); });// Asynchronously transfer to the device. thrust::device_vector<double>d_vec(h_vec.size()); thrust::device_event e =thrust::async::copy(h_vec.begin(), h_vec.end(), d_vec.begin());// After the transfer completes, asynchronously compute the sum on the device. thrust::device_future<double> f0 =thrust::async::reduce(thrust::device.after(e), d_vec.begin(), d_vec.end(),0.0, thrust::plus<double>());// While the sum is being computed on the device, compute the sum serially on// the host.double f1 =std::accumulate(h_vec.begin(), h_vec.end(),0.0, thrust::plus<double>());}
Thrust is a header-only library; there is no need to build or install the projectunless you want to run the Thrust unit tests.
The CUDA Toolkit provides a recent release of the Thrust source code ininclude/thrust
. This will be suitable for most users.
Users that wish to contribute to Thrust or try out newer features shouldrecursively clone the Thrust Github repository:
git clone --recursive https://github.com/NVIDIA/thrust.git
For CMake-based projects, we provide a CMake package for use withfind_package
. See theCMake README for moreinformation. Thrust can also be added viaadd_subdirectory
or tools liketheCMake Package Manager.
For non-CMake projects, compile with:
- The Thrust include path (
-I<thrust repo root>
) - The libcu++ include path (
-I<thrust repo root>/dependencies/libcudacxx/
) - The CUB include path, if using the CUDA device system (
-I<thrust repo root>/dependencies/cub/
) - By default, the CPP host system and CUDA device system are used.These can be changed using compiler definitions:
-DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_XXX
,whereXXX
isCPP
(serial, default),OMP
(OpenMP), orTBB
(Intel TBB)-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_XXX
, whereXXX
isCPP
,OMP
,TBB
, orCUDA
(default).
Thrust uses theCMake build system to build unit tests, examples, and headertests.To build Thrust as a developer, it is recommended that you use ourcontainerized development system:
# Clone Thrust and CUB repos recursively:git clone --recursive https://github.com/NVIDIA/thrust.gitcd thrust# Build and run tests and examples:ci/local/build.bash
That does the equivalent of the following, but in a clean containerizedenvironment which has all dependencies installed:
# Clone Thrust and CUB repos recursively:git clone --recursive https://github.com/NVIDIA/thrust.gitcd thrust# Create build directory:mkdir buildcd build# Configure -- use one of the following:cmake ..# Command line interface.ccmake ..# ncurses GUI (Linux only).cmake-gui# Graphical UI, set source/build directories in the app.# Build:cmake --build. -j${NUM_JOBS}# Invokes make (or ninja, etc).# Run tests and examples:ctest
By default, a serialCPP
host system,CUDA
accelerated device system, andC++14 standard are used.This can be changed in CMake and via flags toci/local/build.bash
More information on configuring your Thrust build and creating a pull requestcan be found in thecontributing section.
Thrust is an open source project developed onGitHub.Thrust is distributed under theApache License v2.0 with LLVM Exceptions;some parts are distributed under theApache License v2.0 and theBoost License v1.0.
About
[ARCHIVED] The C++ parallel algorithms library. Seehttps://github.com/NVIDIA/cccl