gkrls/cuIdxPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star4

A single header CUDA C++ library to assist with linear indexing

License

MIT license

4 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
cuidx.cuh		cuidx.cuh
makedocs.sh		makedocs.sh

Repository files navigation

cuIdx

cuIdx is a single header CUDA/C++ library that aims to assist with indexing in CUDA kernels.In particular it provides functions to retrieve:

The 1D local index of a thread (within its block)
The 1D global index of a thread
The 1D local warp index of a thread (within its block)
The 1D global warp index of a thread
The the index of a thread w.r.t to it's warp (lane)
The 1D block size of a thread block

for any number of grid and block dimensions.

All library functions are meant to be called from within CUDA kernels.(Almost) all functions accept template arguments that denote the number ofgrid/block dimensions as specified on kernel launch. Non-template versions exist andcan cover any number of grid/block dimensions at the expense of a slightly more costlycomputation (3 dimensions are assumed to cover all cases.)

cuIdx.cuh should be compiled with NVCC. Minimum supported CUDA Version:2.1

Documentation

Todo

Example Usage

Retrieve global thread index

#include"cuidx.cuh"usingnamespacecuidx;__global__voidvecadd(int *A,int *B,int *C,unsigned len) {// auto pos = blockIdx.x * blockDim.x + threadIdx.x;    // vanilla cuda, 1D grid/blocks// auto pos = blockIdx.x * blockDim.x * blockDim.y//            + threadIdx.y * blockDim.x + threadIdx.x; // vanilla cuda, 1D grid, 2D blocks// auto pos = blockIdx.x * blockDim.x * blockDim.y * blockDim.z//            + threadIdx.z * blockDim.y * blockDim.x//            + threadIdx.y * blockDim.x//            + threadIdx.x                             // vanilla cuda, 1D grid, 3D blocksauto pos =gtid();// <- cuidxif (pos < len)    C[pos] = A[pos] + B[pos];}

Retrieve global thread index (faster)

#include"cuidx.cuh"usingnamespacecuidx;__global__voidvecadd(int *A,int *B,int *C,unsigned len) {// auto pos = gtid();                                // <- cuidx, assumes 3D grid/blocksauto pos = gtid<1,1>();// <- cuidx, for 1D grid/blocks (faster)if (pos < len)    C[pos] = A[pos] + B[pos];}

Warp level

#include"cuidx.cuh"usingnamespacecuidx;__global__voidkernel(int *A,int *B,int *C,unsigned len) {  __shared__int shmem[BLOCK_SIZE / WARPSIZE] = {0};int warp_sum =0;  ...__syncwarp();// vanilla cudaauto tid = threadIdx.x * blockDim.x + threadIdx.x;// get tid in 2D blockif ( tid %32 ==0)// check laneid == 0 (leader)    shmem[tid /32] = warp_sum;// write at warp index// cuidxif (wleader())    shmem[wid()] = warp_sum;}

About

A single header CUDA C++ library to assist with linear indexing

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

cuIdx

Documentation

Example Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

gkrls/cuIdx

Folders and files

Latest commit

History

Repository files navigation

cuIdx

Documentation

Example Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages