Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-98363: Presize the list for batched()#98419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
rhettinger merged 6 commits intopython:mainfromrhettinger:better_batched
Oct 20, 2022

Conversation

@rhettinger
Copy link
Contributor

@rhettingerrhettinger commentedOct 18, 2022
edited by bedevere-bot
Loading

Presizing the result lists saves space from list over-allocation and is faster than thePyList_Append() approach.

This is a short PR but it took a while to figure out thatPyList_GetSlice() would make short work of the problem. Ideally, we need something like_PyTuple_Resize() to resize the list inplace using arealloc() and updating theallocated field.

@rhettingerrhettinger added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelOct 19, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@rhettinger for commitfd5b508 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelOct 19, 2022
for (i=0 ;i<n ;i++) {
item=PyIter_Next(it);
if (item==NULL) {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we care how this handles strange iterators that keep going after having raisedStopIteration?

>>> from itertools import batched>>> class A:...     def __init__(self):...         self.i = 0...     def __iter__(self):...         return self...     def __next__(self):...         i = self.i...         self.i += 1...         if i == 7:...             raise StopIteration()...         return i...>>> x = batched(A(), 5)>>> next(x)[0, 1, 2, 3, 4]>>> next(x)[5, 6]>>> next(x)[8, 9, 10, 11, 12]

Maybe thePy_CLEAR(bo->it) could be moved to before thisbreak;.

Copy link
ContributorAuthor

@rhettingerrhettingerOct 20, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm thinking of leaving this as is. The current PR matches the behavior of the pure python version and the strange iterator works the same way with other tools:

>>> it = zip(range(10), A())>>> list(it)[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]>>> list(it)[(8, 8), (9, 9)]>>> it = enumerate(A())>>> list(it)[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]>>> next(it)(7, 8)>>> next(it)(8, 9)

That said, I'm not really sure about this one and could easily be persuaded to add aPy_CLEAR(bo->it) to the code path for the short list.

Copy link
Member

@sweeneydesweeneydeOct 20, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This does seem like an edge case that could go either way.

I'm noticing that some iterators doPy_CLEAR(obj->it) as soon as thenext() call fails, while other objects have internal iterators that are never NULL. My concern is thatbatched() clears the iterator sometimes (when the firstnext() of a batch fails), but not other times (when a subsequentnext() fails).

So I would think to do one or the other, but not both:

  1. Clear the iterator immediately afterany internalnext() call fails, similar topairwise,islice,chain, andcycle (sort of).

  2. Never clear the iterator and remove theif (it == NULL) check altogether, similar todropwhile,takewhile,accumulate,compress,zip andenumerate. Unfortunately, it would mean allocating and throwing awayPyList_New(n), but that's in the rare case that someone is abusing the iterator.

Maybe removing thePy_CLEAR and theNULL check is most natural?

Copy link
Member

@sweeneydesweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Regardless of the decision aboutPy_CLEAR, I think this is a good improvement.

Out of curiosity, before this PR, I measuredconsume(batched(repeat(None, 1_000_000), 100)) to have a bimodal distribution of timings, with about half of runs at 41ms and half at 8ms (cache effects?). After this PR, I measured about 3ms consistently. This was on Windows, without PGO.

@rhettinger
Copy link
ContributorAuthor

rhettinger commentedOct 20, 2022
edited
Loading

Thanks for the interesting line of thought. I like having this match the pure python equivalent. Also, I don't think that it would ever matter. Code likeenumerate() andzip() has been deployed for 20 years with no ill effects. More-itertools has iterators than also succumb to your "strange iterator" and no one seems have had an issue.

The likely reason that you're getting a bimodal timing distribution is thatPyList_Append() callsrealloc() which uncontrollably runs in either of two modes, fast for an inplace resize when the memory can be extended inplace, and slow for a resize that has to move all the data because there isn't room for an inplace resize. The new code doesn't have that issue becausePyList_New() makes a single call tomalloc() which never moves data. The new code also has better memory utilization because it never returns an overallocated list.

sweeneyde reacted with thumbs up emoji

carljm added a commit to carljm/cpython that referenced this pull requestOct 20, 2022
* main: (40 commits)pythongh-98461: Fix source location in comprehensions bytecode (pythonGH-98464)pythongh-98421: Clean Up PyObject_Print (pythonGH-98422)pythongh-98360: multiprocessing now spawns children on Windows with correct argv[0] in virtual environments (pythonGH-98462)  CODEOWNERS: Become a typing code owner (python#98480)  [doc] Improve logging cookbook example. (pythonGH-98481)  Add more tkinter.Canvas tests (pythonGH-98475)pythongh-95023: Added os.setns and os.unshare functions (python#95046)pythonGH-98363: Presize the list for batched() (pythonGH-98419)pythongh-98374: Suppress ImportError for invalid query for help() command. (pythongh-98450)  typing tests: `_overload_dummy` raises `NotImplementedError`, not `RuntimeError` (python#98351)pythongh-98354: Add unicode check for 'name' attribute in _imp_create_builtin (pythonGH-98412)pythongh-98257: Make _PyEval_SetTrace() reentrant (python#98258)pythongh-98414: py.exe launcher does not use defaults for -V:company/ option (pythonGH-98460)pythongh-98417: Store int_max_str_digits on the Interpreter State (pythonGH-98418)  Doc: Remove title text from internal links (python#98409)  [doc] Refresh the venv introduction documentation, and correct the statement about VIRTUAL_ENV (pythonGH-98350)  Docs: Bump sphinx-lint and fix unbalanced inline literal markup (python#98441)pythongh-92886: Replace assertion statements in `handlers.BaseHandler` to support running with optimizations (`-O`) (pythonGH-93231)pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `_test_multiprocessing.py` (pythonGH-93233)pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `test_py_compile.py` (pythonGH-93235)  ...
@rhettingerrhettinger deleted the better_batched branchOctober 20, 2022 21:38
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@sweeneydesweeneydesweeneyde approved these changes

Assignees

No one assigned

Labels

performancePerformance or resource usageskip news

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@rhettinger@bedevere-bot@sweeneyde

[8]ページ先頭

©2009-2025 Movatter.jp