KLAP is a source-to-source compiler that optimizes CUDA code which uses dynamic parallelism to implement applications with nested parallelism. KLAP aggregates dynamic launches across warps, blocks, and grids to reduce the total number of grid launches and increase their granularity.
Refer tosrc
for instructions on how to build the compiler.
Refer toinclude
for instructions on how to setup the runtime.
Refer totest
for instructions on how to run the benchmarks.
Please cite the following paper if you find this work useful:
- I. El Hajj, J. Gómez-Luna, C. Li, L.-W. Chang, D. Milojicic, W.-M. Hwu.KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism.InProceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.