LouisJenkinsCS/Distributed-Data-StructuresPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star15

[GSoC] Distributed Data Structures - Collections Framework for Chapel language

summerofcode.withgoogle.com/archive/2017/projects/6530769430249472/

License

BSD-3-Clause license

15 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
bin		bin
collections		collections
docs		docs
results		results
testing		testing
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Repository files navigation

Distributed Data Structures

This repository hosts the first framework for distributed data structures for theChapel programming language. Here we introduce a 'Collections' module, based on Java'sCollections framework, as well as two scalable data structures (one ordered, one unordered).Documentation can be seenhere.

Changelog for Chapel version 1.16 can be foundhere, slide 13 - 17 contains the highlights of the project.

GSoC Information

This project was made possible through the Google Summer of Code program, who provided me an amplestipend to live on, gave me the once-in-a-lifetime chance to design and develop new solutions in thearea of distributed computing (PGAS in particular), provided the more-than-necessary resources (Cray-XC40cluster), and a way to learn more exciting and useful knowledge. As well, I would like to thank both of mymentors,@e-kayrakli and@mppf, who Ihave had the honor to server under. Finally, I would like to thank the Chapel project itself.

Issues, Pull Requests & Discussions

For documentation purposes, I will list the most important information that should be taken into account for GSoCfinal evaluations.

Issues Honorable Mentions

While I have had many issues with Chapel's compiler so far, I will only list the ones that are relatively significant.

LICM (Loop-Invariant-Code-Motion), a compiler optimization, causes tuples to cause undefined behavior due to improperhoisting. This issue has caused me a large amount of grief and countless hours (estimated to cause me to lose a weekof development time), and it deserves to be mentioned first. All-in-all I am glad it was caught while developing andnot, say, for a unsuspecting user.Link.
Code generation phase inaccurately performed comparison directly on references. In the compiler, references are actuallypointers, and so during comparison it would compare it by pointer to something else. Now if you were to perform thiscomparison, it could lead to code inaccurately taking branches of code it was not meant to. This only caused me to losea day.Link.
Returning a tuple from an overloaded method, which would result in dynamically dispatching the method at runtime, wouldcause the compiler to crash if not captured at the callsite. This is no normal internal error, where you get a decenterror message and a line number, this resulted in an actual segmentation fault during compilation. Currently, this bugis still not patched and a careless user can trigger it by not capturing the return value ofremove. This caused meto lose a weekend of development time.Link.

GlobalAtomicObject - Spin-Off Project

A spin-off project that was created from the desire to solve a core problem to creating scalable data structures:performing atomic operations on class instances. It has played an important role in experimentation, and it wouldbe invaluable as a tool to create more scalable distributed data structures, and is useful even for any arbitraryapplication. This sub-project could not be completed as it would require some Chapel runtime and compiler changesto make it work more efficiently, but even now it proves to be a scalable solution, and as such deserves to bementioned here. There will be more improvements made on this, as it will become my next (unfortunately non-funded)project, as it will be another first and novel solution.

Discussion - Is currently on hold as the GSoC project is unfinishedand is over larger priority.

Pull Request - Currently is closed as this requires a lot more work.

Repository - It has been moved to its own repository and strippedfrom this project.

Collections Module

The 'Collections' module is an attempt to bring Java-like data structures to distributed computing. Prior to this module, theonly available data structure was theList, a singly-linked list that offered no parallel-safety nor took advantage of distributingmemory across nodes. The pattern before Collection was to create your own distributed data structures, specific to your own needs.While this makes sense for raw performance, the core focus of Chapel is to bring abstraction and make it easier to make distributeddata structures; tie on the fact that, while I'm not trying to exaggerate my achievements but, making distributed data structures ishard. With abstraction comes a lack of control, as many fine-grained communications are made implicitly betwen nodes without realizing,easily degrading performance when not needed. Developing these data structures is also time consuming, and developing them each time willdetract from the time needed to create appplication software.

Not only have I created two excellent data structures, they both make use of low-level techniques that the user should avoid, such asprivatization (allocating a clone on each node that is used to prevent excess implicit communications caused by accessing class fields).Furthermore, not only do my data structures perform well, they are also very useful, as they offer abstractions people may commonlyrequire, such as iteration, reductions, utility methods, etc. Lastly, the Collections module offers something that was also lacking: aninterface. Currently Chapel does not support interfaces (yet...), and this will be the first. We provide a contract between the user and developer that a certainobject will support insertion, removal, and iteration, with lookup, bulk insertion, and bulk removal being added for 'free'. This allowsthe user to reliably use any Collection, and it allows developers who do not to implement specialized versions of the 'for free' methods.

Discussion - The original discussions for the Collections module.

Pull Request - The initial PR that was merged upstream. Needed to perform nightlytesting.

Pull Request (Improvements) - The next PR that contain multiple improvements to the data structures.

Performance Testing

All benchmarks performed on a Cray-XC40 cluster, at 64-nodes, with Intel Broadwell processors.

Deque

Provides a strict ordering without sacrificing too much performance. Supports insertionand removal from both ends, allowing FIFO, LIFO, and a Total ordering, which ispreserved across all nodes in a cluster, and employs a wait-free round-robin approachto load distribution that ensures fairness in memory, bandwidth, and computation.

Disclaimer: The deque provided, while scalable, rely heavily on network atomics.The benchmark results are produced using said network atomic operations.

Bag

With performance that scales both in the number of nodes in a cluster and thenumber of cores per node, we offer a multiset implementation, called a 'Bag',which is a medium that allows storing and retrieving data in any arbitrary order.This type of data structure is ideal for work queues as it employs it's own loadbalancing, and offers unparalleled performance.

Performance

We compare our data structures to a naive synchronized list implementationas that is all that is available in the Chapel standard library.In all cases, the data structures scale and outperform the naive implementation.

Insert

Implementation	Performance over Naive (at 64 nodes)
SynchronizedList	1x
DistributedDeque	63x
DistributedBag	403x

Remove

Implementation	Performance over Naive (at 64 nodes)
SynchronizedList	1x
DistributedDeque	123x
DistributedBag	651x

About

[GSoC] Distributed Data Structures - Collections Framework for Chapel language

summerofcode.withgoogle.com/archive/2017/projects/6530769430249472/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Distributed Data Structures

GSoC Information

Issues, Pull Requests & Discussions

Issues Honorable Mentions

GlobalAtomicObject - Spin-Off Project

Collections Module

Performance Testing

Deque

Bag

Performance

Insert

Remove

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

LouisJenkinsCS/Distributed-Data-Structures

Folders and files

Latest commit

History

Repository files navigation

Distributed Data Structures

GSoC Information

Issues, Pull Requests & Discussions

Issues Honorable Mentions

GlobalAtomicObject - Spin-Off Project

Collections Module

Performance Testing

Deque

Bag

Performance

Insert

Remove

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages