Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
GH-98363: Presize the list for batched()#98419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
bedevere-bot commentedOct 19, 2022
🤖 New build scheduled with the buildbot fleet by@rhettinger for commitfd5b508 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
| for (i=0 ;i<n ;i++) { | ||
| item=PyIter_Next(it); | ||
| if (item==NULL) { | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we care how this handles strange iterators that keep going after having raisedStopIteration?
>>> from itertools import batched>>> class A:... def __init__(self):... self.i = 0... def __iter__(self):... return self... def __next__(self):... i = self.i... self.i += 1... if i == 7:... raise StopIteration()... return i...>>> x = batched(A(), 5)>>> next(x)[0, 1, 2, 3, 4]>>> next(x)[5, 6]>>> next(x)[8, 9, 10, 11, 12]Maybe thePy_CLEAR(bo->it) could be moved to before thisbreak;.
rhettingerOct 20, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm thinking of leaving this as is. The current PR matches the behavior of the pure python version and the strange iterator works the same way with other tools:
>>> it = zip(range(10), A())>>> list(it)[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]>>> list(it)[(8, 8), (9, 9)]>>> it = enumerate(A())>>> list(it)[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]>>> next(it)(7, 8)>>> next(it)(8, 9)That said, I'm not really sure about this one and could easily be persuaded to add aPy_CLEAR(bo->it) to the code path for the short list.
sweeneydeOct 20, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This does seem like an edge case that could go either way.
I'm noticing that some iterators doPy_CLEAR(obj->it) as soon as thenext() call fails, while other objects have internal iterators that are never NULL. My concern is thatbatched() clears the iterator sometimes (when the firstnext() of a batch fails), but not other times (when a subsequentnext() fails).
So I would think to do one or the other, but not both:
Clear the iterator immediately afterany internal
next()call fails, similar topairwise,islice,chain, andcycle(sort of).Never clear the iterator and remove the
if (it == NULL)check altogether, similar todropwhile,takewhile,accumulate,compress,zipandenumerate. Unfortunately, it would mean allocating and throwing awayPyList_New(n), but that's in the rare case that someone is abusing the iterator.
Maybe removing thePy_CLEAR and theNULL check is most natural?
sweeneyde left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Regardless of the decision aboutPy_CLEAR, I think this is a good improvement.
Out of curiosity, before this PR, I measuredconsume(batched(repeat(None, 1_000_000), 100)) to have a bimodal distribution of timings, with about half of runs at 41ms and half at 8ms (cache effects?). After this PR, I measured about 3ms consistently. This was on Windows, without PGO.
rhettinger commentedOct 20, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks for the interesting line of thought. I like having this match the pure python equivalent. Also, I don't think that it would ever matter. Code like The likely reason that you're getting a bimodal timing distribution is that |
* main: (40 commits)pythongh-98461: Fix source location in comprehensions bytecode (pythonGH-98464)pythongh-98421: Clean Up PyObject_Print (pythonGH-98422)pythongh-98360: multiprocessing now spawns children on Windows with correct argv[0] in virtual environments (pythonGH-98462) CODEOWNERS: Become a typing code owner (python#98480) [doc] Improve logging cookbook example. (pythonGH-98481) Add more tkinter.Canvas tests (pythonGH-98475)pythongh-95023: Added os.setns and os.unshare functions (python#95046)pythonGH-98363: Presize the list for batched() (pythonGH-98419)pythongh-98374: Suppress ImportError for invalid query for help() command. (pythongh-98450) typing tests: `_overload_dummy` raises `NotImplementedError`, not `RuntimeError` (python#98351)pythongh-98354: Add unicode check for 'name' attribute in _imp_create_builtin (pythonGH-98412)pythongh-98257: Make _PyEval_SetTrace() reentrant (python#98258)pythongh-98414: py.exe launcher does not use defaults for -V:company/ option (pythonGH-98460)pythongh-98417: Store int_max_str_digits on the Interpreter State (pythonGH-98418) Doc: Remove title text from internal links (python#98409) [doc] Refresh the venv introduction documentation, and correct the statement about VIRTUAL_ENV (pythonGH-98350) Docs: Bump sphinx-lint and fix unbalanced inline literal markup (python#98441)pythongh-92886: Replace assertion statements in `handlers.BaseHandler` to support running with optimizations (`-O`) (pythonGH-93231)pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `_test_multiprocessing.py` (pythonGH-93233)pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `test_py_compile.py` (pythonGH-93235) ...
Uh oh!
There was an error while loading.Please reload this page.
Presizing the result lists saves space from list over-allocation and is faster than the
PyList_Append()approach.This is a short PR but it took a while to figure out that
PyList_GetSlice()would make short work of the problem. Ideally, we need something like_PyTuple_Resize()to resize the list inplace using arealloc()and updating theallocatedfield.