- Notifications
You must be signed in to change notification settings - Fork670
PERF-#4494: Get partition widths/lengths in parallel instead of serially#4683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Conversation
codecovbot commentedJul 19, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov Report
@@ Coverage Diff @@## master #4683 +/- ##===========================================- Coverage 85.28% 72.15% -13.13%=========================================== Files 259 259 Lines 19378 19496 +118 ===========================================- Hits 16527 14068 -2459- Misses 2851 5428 +2577
📣 Codecov can now indicate which changes are the most critical in Pull Requests.Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This logic is duplicated from thePartitionManager classes above, but I'm not sure how to access the correct partition manager from here.
pyrito commentedJul 21, 2022
Haven't taken a closer look at the implementation details, but do you have any benchmarks or performance measurements to compare with master? |
noloerino commentedJul 21, 2022 via email
Sadly no, and I’d appreciate some suggestions on what code to run. Rehansuggested manually invalidating the ._row_lengths_cache and .length_cachefields on a dataframe and its partitions, then ensuring they’re recomputedproperly. It succeeds for simple examples, but I had trouble producing aRay timeline, and I’m not sure how else to benchmark it (most API-leveldataframe manipulations would probably hit the cached length/width). …On Wed, Jul 20, 2022 at 19:18 Karthik Velayutham ***@***.***> wrote: Haven't taken a closer look at the implementation details, but do you have any benchmarks or performance measurements to compare with master? — Reply to this email directly, view it on GitHub <#4683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFFY4GR46CDY7NCNZ722GSDVVCXO7ANCNFSM535Z7DMA> . You are receiving this because you authored the thread.Message ID: ***@***.***> |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
mvashishtha commentedJul 21, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I spent a while today trying to get a script that showcases the performance here without breaking anything in Modin, but I failed. Getting a reproducer is hard for a few reasons. For one thing, this optimization is only useful for unusual cases like in#4493 where the partitions' call queues include costly operations. When there is no call queue, the partitions will execute all dataframe functions eagerly, simultaneously calculating shapes. The call queues are generally meant to carry cheap operations like transpose and reindexing, but thereproducer in that issue has a frame that is very expensive to serialize, so that even the transpose was expensive. There the slow code was in Looking at all the serial shape computations I listedhere, most are in internal length computations. One is I think it's good practice to get multiple ray objects in parallel (see alsothis note about a similar improvement in |
vnlitvinov commentedJul 21, 2022
This adds a certain bit of complexity (judging by the number of lines change, haven't looked at the diff yet), and I haven't yet seen any performance proof for that. I would like to see some measurements before increasing our (already huge) codebase... |
RehanSD left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Left some comments, but great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why do we need to compute dimensions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thelength andwidth values of each partition are accessed in the localcompute_part_size, defined immediately below. The doublefor loop structure wherecompute_part_size is called makes it hard to parallelize the computation of these dimensions, so I thought it would be simplest to precompute the relevant dimensions before the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We need to unwrap_length_cache here, since its type will bePandasDataframePartition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What do you mean byunwrap? Also, as far as I can tell, the logic for this method should be the same as it originally was (the code was just moved into thetry_build_length_cache, so does this mean the original code returnedPandasDataframePartition as well?
modin/core/execution/dask/implementations/pandas_on_dask/partitioning/virtual_partition.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Shouldn't this just bei as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
No, sincenew_lengths may have fewer elements thancaches in the case where some length values were already computed (and are filtered out by theisinstance(cache, Future) check). The value computed atnew_lengths[dask_idx] should correspond to the promise atcaches[i].
modin/core/execution/dask/implementations/pandas_on_dask/partitioning/virtual_partition.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
modin/core/execution/ray/implementations/pandas_on_ray/partitioning/virtual_partition.py OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
noloerino commentedJul 26, 2022
@vnlitvinov that makes sense, I'll look into coming up with concrete benchmarks. |
vnlitvinov commentedJul 27, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@pyrito please have a look athttps://github.com/vnlitvinov/modin/tree/speedup-masking and#4726, it might be doing somewhat the same in terms of getting the sizes in parallel |
YarShev commentedJul 27, 2022
Related discussion on handling metadata (index and columns) in#3673. |
6a17fc3 toe0bb5faCompareSigned-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
…end)Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Co-authored-by: Rehan Sohail Durrani <rdurrani@berkeley.edu>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
Uh oh!
There was an error while loading.Please reload this page.
What do these changes do?
Computes widths and lengths of block partitions in parallel as batched calls to
ray.get/DaskWrapper.materializerather than in serial.This adds the
try_build_[length|width]_cacheandtry_set_[length|width]_cachemethods to block partitions; the former returns a promise/future for computing the partition's length, and the latter should be called by the partition manager to inform the block partition of the computation's value. This also adds the_update_partition_dimension_cachesto thePartitionManagerclass, which will call the length/width futures returned by its constituent partitions.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.pyblack --check modin/ asv_bench/benchmarks scripts/doc_checker.pygit commit -sdocs/development/architecture.rstis up-to-date