Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-97933: inline list/dict/set comprehensions#101441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
carljm merged 56 commits intopython:mainfromcarljm:inlinecomp2
May 9, 2023

Conversation

carljm
Copy link
Member

@carljmcarljm commentedJan 30, 2023
edited
Loading

Closes#97933.

In builds with--enable-optimizations:

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]' --compare-to ../mainopt/python/home/carljm/build/mainopt/python: ..................... 200 ns +- 3 ns/home/carljm/build/ic2opt/python: ..................... 120 ns +- 2 nsMean +- std dev: [/home/carljm/build/mainopt/python] 200 ns +- 3 ns -> [/home/carljm/build/ic2opt/python] 120 ns +- 2 ns: 1.67x faster

Credit to@vladima for authoring theoriginal version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.

A couple implementation notes:

  • We need a new opcodeLOAD_FAST_AND_CLEAR because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert thatLOAD_FAST never loadsNULL, and would add inefficient refcount operations and interpreter loop cycles.
  • If a comprehension iteration variable is a cell inside the comprehension (i.e. the comprehension builds closures) we end up including that variable in bothco_varnames andco_cellvars for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just useLOAD_FAST_AND_CLEAR/STORE_FAST as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.

This PR also adds a lot of new tests for comprehension scoping edge cases, and runs all new and existing scoping
tests in module, function, and class scopes (without requiring duplication of the test code.) All comprehension
scoping tests added in this diff also pass with comprehension inlining disabled (ensuring semantics did not change.)

ponimas, NoahTheDuke, 5j9, erlend-aasland, and bdraco reacted with heart emojiitamaro, TeamSpen210, vladima, sigma67, CAM-Gerlach, adamchainz, 5j9, corona10, erlend-aasland, scastlara, and 2 more reacted with rocket emoji
@carljm
Copy link
MemberAuthor

@markshannon@iritkatriel would be grateful for a pyperformance run on this one (presuming signal all looks good.)

@carljmcarljm changed the titlegh-97933: inline sync list/dict/set comprehensionsgh-97933: inline list/dict/set comprehensions in functionsJan 31, 2023
@carljmcarljm added the performancePerformance or resource usage labelJan 31, 2023
@markshannon
Copy link
Member

I'll run the benchmarks now

@markshannon
Copy link
Member

Benchmarking is delayed a bit, but should be back tomorrow at the latest.

carljm reacted with thumbs up emoji

@markshannon
Copy link
Member

markshannon commentedJan 31, 2023
edited
Loading

You don't seem to be clearing the inner variable before use, so that

deff():l= [None ]    [ (l[0],l)in [[1,2]] ]print(l)

will, I think, print[1] instead of raising an UnboundLocalError

You can actually make the code more efficient by clearing the local when stashing it on the stack:

inst(LOAD_FAST_AND_CLEAR, (--val)) {val=LOCALS[oparg];LOCALS[oparg]=NULL;}

as there is no need for aNULL check orINCREF.

@markshannon
Copy link
Member

markshannon commentedJan 31, 2023
edited
Loading

A couple of obscure issues, but this looks like a very nice improvement.

I think the gain in performance is well worth the cost in code size, given how common comprehensions are.

carljm, itamaro, and erlend-aasland reacted with thumbs up emoji

* main:pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439)pythongh-99276 - Updated Doc/faq/general.rst (python#101396)  Add JOBS parameter to docs Makefile (python#101395)pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443)pythongh-77607: Improve accuracy of os.path.join docs (python#101406)  Fixes typo in asyncio.TaskGroup context manager code example (python#101449)pythongh-98831: Clean up and add cache size static_assert to macro (python#101442)pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
@carljm
Copy link
MemberAuthor

carljm commentedJan 31, 2023
edited
Loading

You don't seem to be clearing the inner variable before use

Your example code doesn't involve a comprehension, I'm assuming you meant something like

def f():    l = [ None ]    [ (l[0], l) for l in [[1,2]] ]    print(l)

But this doesn't raiseUnboundLocalError on main either, it works fine; the iteration variablel is always bound within the comprehension before the expression is evaluated.

So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!

You can actually make the code more efficient by clearing the local when stashing it on the stack

But I think this is sufficient reason regardless to justify switching fromLOAD_FAST_OR_NULL toLOAD_FAST_AND_CLEAR, so I'll do that!

@carljm
Copy link
MemberAuthor

carljm commentedJan 31, 2023
edited
Loading

Actually I don't thinkLOAD_FAST_AND_CLEAR will work, because of cases like this:

def f(x):    return [x for x in x]

We need to save/restore the outer value ofx, but we cannot clear it on entry because we are also relying on being able to load it when we evaluate the iterable part of the comprehension (which in the non-inlined case would have been evaluated outside the function and passed in as the.0 parameter.) So I think we need to stick with a non-clearingLOAD_FAST_OR_NULL on entry.

If there were a correctness reason for needing to clear in other cases, I could add an additionalDELETE_FAST after theLOAD_FAST_OR_NULL, but so far I haven't found any such correctness need, and clearing this way is not an efficiency improvement.

EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though...

@carljm
Copy link
MemberAuthor

@markshannon I thinkLOAD_FAST_OR_NULL could also become a pseudo-instruction that resolves toLOAD_FAST (which would get around having it turned intoLOAD_FAST_CHECK), but we'd have to remove theassert(value != NULL) in theLOAD_FAST implementation. Do you prefer keeping that assertion at the cost of another (non-pseudo) opcode?

@carljm
Copy link
MemberAuthor

carljm commentedJan 31, 2023
edited
Loading

I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:

  • Eliminate some unnecessary push/pops in cases where the value is never loaded after the comprehension. This would fall into the general category of dead store elimination. I'm not sure if we can actually do this or if it violates some guarantees about introspectable frame state in tracing?
  • Better handling of SWAPs on the post-comprehension pops.apply_static_swaps can't help us here if the comprehension value is immediately returned (because we can't "swap" aRETURN_VALUE -- this case would go away if we can do the dead store elimination, since if it's right before aRETURN_VALUE it's clearly a dead store) or if it's immediately popped (because thePOP_TOP will have a lineno of-1 andapply_static_swaps will thus refuse to swap it). Instead of having oneSWAP 2 per pop, we could have a singleSWAP N and do the last pop first.
  • Teachadd_checks_for_loads_of_uninitialized_variables to understand and trace through the push/pop pairs.

* main:pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354)pythongh-101409: Improve generated clinic code for self type checks (python#101411)pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458)pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
@markshannon
Copy link
Member

Sorry for posting a gibberish example.
Here's what I should have posted:

deff():l= [None ]    [1for (l[0],l)in [[1,2]] ]print(l)f()

3.11

$ python3.11 ~/test/test.py Traceback (most recent call last):   ...UnboundLocalError: cannot access local variable 'l' where it is not associated with a value

This PR

$ ./python ~/test/test.py [1]
carljm reacted with thumbs up emoji

Copy link
MemberAuthor

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for review!

carljmand others added3 commitsMay 8, 2023 23:19
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* main: (47 commits)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172)pythongh-103193: Improve `getattr_static` test coverage (python#104286)  Trim trailing whitespace and test on CI (python#104275)pythongh-102500: Remove mention of bytes shorthand (python#104281)pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256)pythongh-99108: Replace SHA3 implementation HACL* version (python#103597)pythongh-104273: Remove redundant len() calls in argparse function (python#104274)pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200)pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266)pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244)pythongh-103650: Fix perf maps address format (python#103651)pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243)  ...
@carljmcarljm added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelMay 9, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@carljm for commita8425a6 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelMay 9, 2023
Copy link
Contributor

@erlend-aaslanderlend-aasland left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see Jelle and Irit did the heavy review, so this is yourPEP-7 bot chiming in for some style suggestions 🤖

carljm reacted with heart emoji
carljmand others added3 commitsMay 9, 2023 07:08
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
* main:pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)
@carljmcarljm added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelMay 9, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@carljm for commit95401fe 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelMay 9, 2023
@carljm
Copy link
MemberAuthor

Thes390x Fedora LTO + PGO PR ands390x Fedora LTO PR buildbots look generally unhealthy. The latter has a full disk.

AMD64 Debian root PR is weird. The failure looks a bit flaky (segfault intest_threads_join) and unlikely to be related, but that failure isn't showing up on any of its other builds. Requested a re-build to see if it repros.

@carljm
Copy link
MemberAuthor

The buildbot worker hosting the variouss390x Fedora ... builds appears to be out of disk space and unhealthy. So I'm satisfied that there aren't any buildbot issues caused by this PR.

Merging!

JelleZijlstra, itamaro, AlexWaygood, erlend-aasland, and Eclips4 reacted with hooray emoji

@carljmcarljm merged commitc3b595e intopython:mainMay 9, 2023
@carljmcarljm deleted the inlinecomp2 branchMay 9, 2023 17:02
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main:pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)pythongh-104240: return code unit metadata from codegen (python#104300)
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (156 commits)pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)pythongh-104240: return code unit metadata from codegen (python#104300)pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)  ...
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (35 commits)pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)pythongh-104240: return code unit metadata from codegen (python#104300)pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)  ...
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@erlend-aaslanderlend-aaslanderlend-aasland left review comments

@AlexWaygoodAlexWaygoodAlexWaygood left review comments

@JelleZijlstraJelleZijlstraJelleZijlstra approved these changes

@iritkatrieliritkatrieliritkatriel approved these changes

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

Assignees
No one assigned
Labels
performancePerformance or resource usage
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Inline dict/list/set comprehensions in the compiler for better performance
9 participants
@carljm@markshannon@bedevere-bot@brianm78@ericsnowcurrently@JelleZijlstra@iritkatriel@erlend-aasland@AlexWaygood

[8]ページ先頭

©2009-2025 Movatter.jp