Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.4k
bpo-29842: Make Executor.map less eager so it handles large/unbounded…#707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
… input iterables appropriately
the-knights-who-say-ni commentedMar 18, 2017
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed thePSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username onbugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:
Thanks again to your contribution and we look forward to looking at it! |
mention-bot commentedMar 18, 2017
@MojoVampire, thanks for your PR! By analyzing the history of the files in this pull request, we identified@birkenfeld,@brianquinlan and@ezio-melotti to be potential reviewers. |
MojoVampire commentedMar 18, 2017
I've already submitted a contributor agreement pre-GitHub migration. I just updated my b.p.o. user profile (josh.r) to link to my GitHub account name. Is that sufficient, or do I need to submit a new contributor agreement based on my GitHub e-mail address? |
MojoVampire commentedMar 18, 2017
Hmm... Is the failure of continuous-integration/travis-ci/pr something real? Clicking Details just tells me it can't find a python/cpython repository at all... |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
pkch commentedMay 17, 2017
You can also take a look at my implementation that I uploaded tohttps://github.com/pkch/executors. It does something more like what I described in the issue tracker, the main benefit being that it's not blocking. |
Uh oh!
There was an error while loading.Please reload this page.
methane left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM at code level.
Uh oh!
There was an error while loading.Please reload this page.
bedevere-bot commentedJul 25, 2018
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
leezu commentedNov 13, 2018
@MojoVampire could you share your plans about this PR? Do you plan to drive it forward? |
MojoVampire commentedMay 6, 2019
I have made the requested changes; please review again. I did not add a Misc/NEWS entry since the file no longer exists (it's autogenerated from commit messages now, correct?). |
bedevere-bot commentedMay 6, 2019
Thanks for making the requested changes! @methane: please review the changes made to this pull request. |
tirkarthi commentedMay 6, 2019
NEWS entries can be generated usingblurb orblurb-it Please see :https://devguide.python.org/committing/?highlight=news#what-s-new-and-news-entries |
MojoVampire commentedMay 6, 2019 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I have made the requested changes; please review again. Actually made the Misc/NEWS entry properly. Sorry for confusion; I haven't made a PR since the news Misc/NEWS regime began and didn't know about the blurb tool. Thanks for the assist@tirkarthi |
bedevere-bot commentedMay 6, 2019
Thanks for making the requested changes! @methane: please review the changes made to this pull request. |
pitrou left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Some comments below. A significant issue is that this changes the behaviour ofshutdown(wait=True) to not wait for completion of all pending futures. I don't think that's an acceptable change.
Uh oh!
There was an error while loading.Please reload this page.
| By default, a reasonable number of tasks are | ||
| queued beyond the number of workers, an explicit *prefetch* count may be | ||
| provided to specify how many extra tasks should be queued. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Using "chunks" here would be more precise than "tasks".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The documentation for chunksize uses the phrasing "this method chops iterables into a number of chunks which it submits to the pool as separate tasks", and since not all executors even use chunks (ThreadPoolExecutor ignores the argument), I figured I'd stick with "tasks". It does kind of leave out a term to describe a single work item; the docs uses chunks and tasks as synonyms, with no term for a single work item.
| self.assertCountEqual(finished,range(10)) | ||
| # No guarantees on how many tasks dispatched, | ||
| # but at least one should have been dispatched | ||
| self.assertGreater(len(finished),0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think this change breaks compatibility. The doc forshutdown says:
If wait is True then this method will not return until all the pending futures are done executing and the resources associated with the executor have been freed.
So all futures should have executed, instead of being cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
At the time I wrote it, it didn't conflict with the documentation precisely; the original documentation said that map was "Equivalent to map(func, *iterables) except func is executed asynchronously and several calls to func may be made concurrently.", but doesn't guarantee that any actual futures exist (it's implemented in terms of submit and futures, but doesn't actually require such a design).
That said, it looks like you updated the documentation to add "the iterables are collected immediately rather than lazily;", which, if considered a guarantee, rather than a warning, would make this a breaking change even ignoring the "cancel vs. wait" issue.
Do you have any suggestions? If strict adherence to your newly (as of late 2017) documented behavior is needed, I suppose I could change the default behavior from "reasonable prefetch" to "exhaustive prefetch", so when prefetch isn't passed, every task is submitted, but it would be kind of annoying to lose the "good by default" behavior of limited prefetching.
The reason I cancelled rather than waiting on the result is that I was trying to follow the normal use pattern for map; since the results are yielded lazily, if the iterator goes away or is closed explicitly (or you explicitly shut down the executor), you're done; having the outstanding futures complete when you're not able to see the results means you're either:
- Expecting the tasks to complete without running out the Executor.map (which doesn't work with Py3's map at all, so the analogy to map should allow it; if you don't run it out, you have no guarantees anything was done)
- Not planning to use any further results (in which case running any submitted but unscheduled futures means doing work no one can see the results of)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually, I think there are two problems to discuss:
- What happens when
shutdown(wait=True)is called. Currently it waits for all outstanding tasks. I don't think we can change that (the explicitwaitflag exists for a reason). - Whether
map()can be silently switched to a lazy mode of operation. There's a (perhaps minor) problem with that. Currently, if one of iterables raises an error,map()propagates the exception. With your proposal, the exception may be raised later inside the result iterator.
I think 2) might easily be worked around by introducing a separate method (lazy_map?).
It seems it would be good to discuss those questions on themailing-list.
MojoVampireMay 6, 2019 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, the problem with using the "lazy_map" name is that it feels like recreating the same annoying distinctions between map and imap from the Py2 era, and it would actually have Executor.map (which is supposed to match map, which lazily consumes the input(s)) be less similar to map than Executor.lazy_map.
If it's necessary to gain acceptance, I could change the default behavior to use prefetch=sys.maxsize - self._max_workers. It would match the pre-existing behavior for just about anything that conceivably worked before (modulo the tiny differences in memory usage of deque vs. list for storing the futures) since:
- All tasks would be submitted fully up front, so shutdown(wait=True) would in fact wait on them (and no further calls to submit would occur in the generator, so submitting wouldn't occur post-shutdown, which would raise a RuntimeError and cause the cancellation
- It wouldn't be lazy for anything by default (it would either work eagerly or crash, in the same manner it currently behaves)
If you passed a reasonable prefetch, you wouldn't have these behaviors (and we should document that interaction), but at least existing code would continue to work identically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't have a strong opinion. I think discussing those alternatives on the ML, to gather more opinions and arguments, would be useful.
bedevere-bot commentedMay 6, 2019
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…lds no reference to result at the moment it yieldsReduce line lengths to PEP8 limits
flavianh commentedJul 20, 2019
@MojoVampire I think you need to comment your PR with |
This PR is stale because it has been open for 30 days with no activity. |
erlend-aasland commentedJun 29, 2022
@MojoVampire, are you going to follow up this PR? |
MojoVampire commentedJul 12, 2022
@erlend-aasland: I'd like to apply this, but I never got any idea of what would constitute an acceptable final result. Executor.map is, frankly, useless for many of its intended purposes right now; in an effort to improve performance on huge inputs, you end up prefetching theentire input and pre-schedulingall the tasks before you can process any of them. |
erlend-aasland commentedJul 14, 2022
cc.@kumaraditya303, is this something you'd be interested in reviewing? |
kumaraditya303 commentedJul 15, 2022
I'll take a look soon, but it seems there are two PRs for the same thing and this one has conflicts so maybe we should continue on#18566 |
kumaraditya303 commentedJul 18, 2022
Closing this in favor of#18566, thanks for the PR! |
Uh oh!
There was an error while loading.Please reload this page.
… input iterables appropriately
https://bugs.python.org/issue29842
https://bugs.python.org/issue29842