prg-titech/dynasoarPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star37

CUDA Dynamic Memory Allocator for SOA Data Layout

License

View license

37 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 597 Commits
allocator		allocator
benchmark_scripts		benchmark_scripts
bitmap		bitmap
build_scripts		build_scripts
example		example
lib		lib
util		util
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
dynasoar.h		dynasoar.h

Repository files navigation

DynaSOAr: A CUDA Framework for Single-Method Multiple-Objects Applications

SMMO (Single-Method Multiple-Objects) is a wide-spread pattern of parallel, object-oriented, high-performance code. It is OOP-speech for SIMD (Single-Instruction Multiple-Data) and means that a method should be executed for all objects of a type.

DynaSOAr comes with a parallel, lock-free, dynamic memory allocator that lets programmers create/delete objects in device code. In contrast to other allocators, this allocator is anobject allocator for structured data. While other allocators allocateX number of bytes, this allocator can only allocate objects of C++ classes/structs that were defined within DynaSOAr. This allows us to apply additional data layout optimizations.

As an example, an nbody simulation consists ofn body objects, for each of which amove method for computing the next position of a body should be executed. DynaSOAr is a CUDA framework (C++ template library) that facilitates the development of such programs. The four main features of DynaSOAr are:

SOA Data Layout: Objects are stored in the SIMD-friendly Structure of Arrays data layout. Other layouts may be supported in the future.
Dynamic Memory Management on Device: New objects can be created at any time in the CUDA kernel and existing objects can be deleted (new/delete).
Parallel Enumeration: DynaSOAr provides an efficient way to run a member function (method) for all objects of a type in parallel.
Memory Defragmentation: Can lower overall memory usage and speed up application code in the object space becomes to fragmented.

Documentation/Papers

Matthias Springer.CompactGpu: Massivey GPU Memory Compaction. ACM Student Research Competition (PLDI 2019).
Matthias Springer, Hidehiko Masuhara.Massively Parallel GPU Memory Compaction. ISMM 2019.
Matthias Springer, Hidehiko Masuhara.DynaSOAr: A Parallel Memory Allocator for Object-oriented Programming on GPUs with Efficient Memory Access. ECOOP 2019.
Matthias Springer.SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs. ACM Student Research Competition (SPLASH 2018).
Matthias Springer, Yaozhu Sun, Hidehiko Masuhara.Inner Array Inlining for Structure of Arrays Layout. ARRAY 2018.
Matthias Springer, Hidehiko Masuhara.Ikra-Cpp: A C++/CUDA DSL for Object-oriented Programming with Structure-of-Arrays Data Layout. WPMVP 2018.
Matthias Springer.A C++/CUDA DSL for Object-oriented Programming with Structure-of-Arrays Data Layout. ACM Student Research Competition (CGO 2018).

Prerequisites

Tested with the CUDA Toolkit versions 9.1 (gcc version 5.4.0) and 10.1 on an Nvidia Titan Xp machine (Ubuntu 16.04.1). A device with a minimum compute capability of 5.0 is required.libsdl2 is required for graphical visualizations in example code. We provide build scripts for compiling the examples. See theWiki for more information.

API Overview

All classes/structs that should be managed by DynaSOAr must inherit fromAllocatorT::Base, whereAllocatorT is the fully configured type of the allocator. The first template argument toSoaAllocator is the maximum number of objects that can exist within the allocator at any given time; this number determines the memory usage of the allocator. The following arguments are all classes/structs that are managed by DynaSOAr.

DynaSOAr has a host side API (AllocatorHandle<AllocatorT>) and a device side API (AllocatorT). The following functionality is provided with those APIs.

AllocatorHandle::AllocatorHandle(): This constructor allocates all necessary memory on GPU.
AllocatorHandle::device_pointer(): Returns a pointer to the device allocator handle (AllocatorT*).
AllocatorHandle::parallel_do<C, &C::foo>(): Runs a member functionC::foo() in parallel for all objects of typeC that were created with the allocator. This will launch a CUDA kernel. This function returns when the CUDA kernel has finished processing all objects.
new(allocator) (args...): Creates a new object of typeC and returns a pointer to the new object.C must be a type that is managed by the allocator.allocator is a pointer to the device allocator.
destroy(allocator, ptr): Deletes an existing objectptr of typeC that was created with the allocator. This is similar to C++delete.
AllocatorT::device_do<C>(&C::foo, args...): RunsC::foo(args...) for all objects of typeC that were created with the allocator. Note that this does not spawn a new CUDA kernel; execution is sequential.
AllocatorT::parallel_defrag<C>(): Runs a defragmentation pass on all objects of typeC. This pass compacts objects and rewrites pointers to object that were relocated.

API Example

This example does not compute anything meaningful and is only meant to show the API. Take a look at theDynaSOAr tutorial and at the code in theexample directory formore interesting examples.

#include"dynasoar.h"// Pre-declare all classes.classFoo;classBar;// Declare allocator type. First argument is max. number of objects that can be created.using AllocatorT = SoaAllocator<64*64*64*64, Foo, Bar>;// Allocator handles.__device__ AllocatorT* device_allocator;AllocatorHandle<AllocatorT>* allocator_handle;classFoo :publicAllocatorT::Base {public:// Pre-declare types of all fields.declare_field_types(Foo,float,int,char)// Declare fields.  SoaField<Foo,0> field1_;// float  SoaField<Foo,1> field2_;// int  SoaField<Foo,2> field3_;// char    __device__Foo(float f1,int f2,char f3) : field1_(f1), field2_(f2), field3_(f3) {}   __device__voidqux() {    field1_ = field2_ + field3_;  }  __device__voidbaz() {// Run in Bar::foo(42) sequentially for all objects of type Bar. Note that// Foo::baz may run in parallel though.    device_allocator->templatedevice_do<Bar>(&Bar::foo,42);  }};classBar :publicSoaBase<AllocatorT> {/* ...*/ };__global__voidcreate_objects() {  Foo* object =new(device_allocator)Foo(1.23f,4,0);// Delete object: destroy(device_allocator, object);}intmain(int argc,char** argv) {// Optional, for debugging.AllocatorT::DBG_print_stats();// Create new allocator.  allocator_handle =new AllocatorHandle<AllocatorT>();  AllocatorT* dev_ptr = allocator_handle->device_pointer();cudaMemcpyToSymbol(device_allocator, &dev_ptr,sizeof(AllocatorT*),0,                     cudaMemcpyHostToDevice);// Create 2048 objects.  create_objects<<<32,64>>>();cudaDeviceSynchronize();// Call Foo::qux on all 2048 objects.  allocator_handle->parallel_do<Foo, &Foo::qux>();}

About

CUDA Dynamic Memory Allocator for SOA Data Layout

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DynaSOAr: A CUDA Framework for Single-Method Multiple-Objects Applications

Documentation/Papers

Prerequisites

API Overview

API Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

prg-titech/dynasoar

Folders and files

Latest commit

History

Repository files navigation

DynaSOAr: A CUDA Framework for Single-Method Multiple-Objects Applications

Documentation/Papers

Prerequisites

API Overview

API Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages