Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-87729: add LOAD_SUPER_ATTR instruction for faster super()#103497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
carljm merged 23 commits intopython:mainfromcarljm:superopt
Apr 24, 2023

Conversation

carljm
Copy link
Member

@carljmcarljm commentedApr 13, 2023
edited
Loading

This PR speeds upsuper() (by around 85%, for a simple one-levelsuper().meth() microbenchmark) by avoiding allocation of a new single-usesuper() object on each use.

Microbenchmark results

With this PR:

➜ ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'.....................Mean +- std dev: 70.4 ns +- 1.4 ns

Without this PR:

➜ ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'.....................Mean +- std dev: 130 ns +- 1 ns

Microbenchmark code

➜ cat superbench.pyclass A:    def meth(self):        return 1class B(A):    def meth(self):        return super().meth()b = B()

Microbenchmark numbers are the same (both pre and post) if the microbenchmark is switched to usereturn super(B, self).meth() instead.

super() is already special-cased in the compiler to ensure the presence of the__class__ cell needed by zero-argumentsuper(). This extends that special-casing a bit in order to compilesuper().meth() as

              4 LOAD_GLOBAL              0 (super)             14 LOAD_DEREF               1 (__class__)             16 LOAD_FAST                0 (self)             18 LOAD_SUPER_ATTR          5 (NULL|self + meth)             20 CALL                     0

instead of the current:

              4 LOAD_GLOBAL              1 (NULL + super)             14 CALL                     0             22 LOAD_ATTR                3 (NULL|self + meth)             42 CALL                     0
Bytecode comparison for simple attribute

And compilesuper().attr as

              4 LOAD_GLOBAL              0 (super)             14 LOAD_DEREF               1 (__class__)             16 LOAD_FAST                0 (self)             18 LOAD_SUPER_ATTR     4 (attr)

instead of the current:

              4 LOAD_GLOBAL              1 (NULL + super)             14 CALL                     0             22 LOAD_ATTR                2 (attr)

The new bytecode has one more instruction, but still ends up executing much faster, because it eliminates the cost of allocating a new single-usesuper object each time. For zero-arg super, it also eliminates dynamically figuring out each time via frame introspection where to find theself argument and__class__ cell, even though the location of both is already known at compile time.

TheLOAD_GLOBAL ofsuper remains only in order to support existing semantics in case the namesuper is re-bound to some other callable besides the built-insuper type.

Besides being faster, the new bytecode is preferable because it regularizes the loading ofself and__class__ to use the normalLOAD_FAST andLOAD_DEREF opcodes, instead of custom code in thesuper object (not part of the interpreter) relying on private details of interpreter frames to load these in a bespoke way. This helps optimizers like the Cinder JIT that fully supportLOAD_FAST andLOAD_DEREF but may not maintain frame locals in the same way. It also makes the bytecode more easily amenable to future optimization by a type-specializing tier 2 interpreter, because__class__ andself will now be surfaced and visible to the optimizer in the usual way, rather than hidden inside thesuper object.

I'll follow up with a specialization ofLOAD_SUPER_ATTR for the case where we are looking up a method and a method is found (because this is a common case, and a case where the output ofLOAD_SUPER_ATTR depends only on the type ofself and not on the actual instance). But to simplify review, I'll do this in a separate PR. I think the benefits of this PR stand alone, even without further benefits of specialization. (ETA: the specialization is now also ready athttps://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 and increases the microbenchmark win from 85% to 2.3x.)

The frame introspection code for runtime/dynamic zero-argsuper() still remains, but after this PR it would only ever be used in an odd edge case likesuper(*args) (ifargs turns out to be empty at runtime), where we can't detect at compile time whether we will have zero-arg or two-argsuper().

"Odd" uses ofsuper() (like one-argumentsuper, use of a super object as a descriptor etc) are still supported and experience no change; the compiler will not emit the newLOAD_SUPER_ATTR opcode.

I chose to make the new opcode more general by using it for both (statically detectable) zero- and two-arg super. Optimizing zero-arg super is more important because it is more common in modern Python code, and because it also eliminates the frame introspection. But supporting two-arg super costs only one extra bit smuggled via the oparg; this seems worth it.

Real-world results and macrobenchmarks

This approach provides a speed-up of about 0.5% globally on the Instagram server real-world workload (measured recently on Python 3.10.) I can work on a macrobenchmark for thepyperformance suite that exercisessuper() (currently it isn't significantly exercised by any benchmark.) (ETA: benchmark is now ready atpython/pyperformance#271 -- this diff improves its performance by 10%, the specialization follow-up by another 10%.)

Prior art

This PR is essentially an updated version of#24936 -- thanks to@vladima for the original inspiration for this approach. Notable differences from that PR:

  • I avoid turning the oparg for the new opcode into a const load, preferring to pass the needed bits of information by bit-shifting the oparg instead (following the precedent ofLOAD_ATTR).
  • I prioritize code simplicity over performance in edge cases like when asuper() attribute access raisesAttributeError, which also reduces the footprint of the PR.

#30992 was an attempt to optimizesuper() solely using the specializing interpreter, but it was never merged because there are too many problems caused by adaptive super-instructions in the tier 1 specializing interpreter.

corona10, thinkwelltwd, vladima, AlexWaygood, tekknolagi, bratao, itamaro, adamchainz, and ltfish reacted with rocket emoji
@carljmcarljm changed the titlegh-87729: add instruction for faster zero-arg super()gh-87729: add LOAD_SUPER_ATTR instruction for faster super()Apr 13, 2023
* main:pythongh-103479: [Enum] require __new__ to be considered a data type (pythonGH-103495)pythongh-103365: [Enum] STRICT boundary corrections (pythonGH-103494)pythonGH-103488: Use return-offset, not yield-offset. (pythonGH-103502)pythongh-103088: Fix test_venv error message to avoid bytes/str warning (pythonGH-103500)pythonGH-103082: Turn on branch events for FOR_ITER instructions. (python#103507)pythongh-102978: Fix mock.patch function signatures for class and staticmethod decorators (python#103228)pythongh-103462: Ensure SelectorSocketTransport.writelines registers a writer when data is still pending (python#103463)pythongh-95299: Rework test_cppext.py to not invoke setup.py directly (python#103316)
@corona10
Copy link
Member

cc@Fidget-Spinner

carljm reacted with thumbs up emoji

@carljm
Copy link
MemberAuthor

https://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 has a draft of the first specialization ofLOAD_SUPER_ATTR built on top of this, specializing for the method case.

With that specialization, the./python -m pyperf timeit -s 'from superbench import b' 'b.meth()' microbenchmark shown above now runs in 56ns, down from 130ns originally and 70ns without the specialization. That's 2.3x better than the current main-branch speed. For reference, a version of the same benchmark that usesreturn A.meth(self) in place ofreturn super().meth() runs in 48ns. So we are getting pretty close to zero-costsuper method calls.

(If reviewers would prefer to just have the specialization(s) included directly in this PR and all reviewed together, let me know and I can push everything here.)

vladima reacted with rocket emoji

@corona10corona10 added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 14, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@corona10 for commit92c943b 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 14, 2023
carljmand others added2 commitsApril 14, 2023 08:35
@markshannon
Copy link
Member

Is the microbenchmark code correct? It doesn't look like you callmeth()

@carljm
Copy link
MemberAuthor

Is the microbenchmark code correct? It doesn't look like you callmeth()

The call tob.meth() happens in the actual invocation ofpyperf timeit:./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'

markshannon reacted with thumbs up emoji

Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There is a fair bit of branchingLOAD_SUPER_ATTR which suggest that either it needs reworking or splitting up.

I've made a few suggestions as to how it can be made less branchy.
We'll see if that is sufficient.

The compiler code looks OK to me, but I'll leave it@iritkatriel to review it properly.

@carljm
Copy link
MemberAuthor

carljm commentedApr 19, 2023
edited
Loading

@markshannon

Thanks for the review!

There is a fair bit of branching LOAD_SUPER_ATTR which suggest that either it needs reworking or splitting up.

The causes of branching are these:

  1. Has built-insuper been replaced or shadowed? This branching is unavoidable, since it can happen dynamically; we have to check at runtime.
  2. Are we loading a method that will be called?
  3. Is this zero-arg or two-argsuper? (We only need to know this ifsuper has been shadowed, so we can reconstruct the right call to whatever it is now.)

(2) and (3) are both known at compile time, so we could split the opcode in two along either axis (i.e.LOAD_SUPER_ATTR vsLOAD_SUPER_METHOD, orLOAD_ZERO_SUPER_ATTR vsLOAD_TWO_SUPER_ATTR). I considered both splits, and decided neither made sense: the second split would result in two separate opcodes that we'll later want to specialize to the same opcode, which is awkward, and the first split loses the parallel to howLOAD_ATTR works. (Both splits would result in code duplication.)

Your suggestion above about how to handleoparg & 2 eliminates the branching for zero-arg vs two-argsuper in the shadowing case; hopefully that's enough.

* main: (24 commits)pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561)pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545)pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634)pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633)pythongh-102856: Initial implementation of PEP 701 (python#102855)pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589)pythongh-83004: Harden msvcrt further (python#103420)pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626)pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314)pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586)pythongh-103583: Always pass multibyte codec structs as const (python#103588)pythongh-103617: Fix compiler warning in _iomodule.c (python#103618)pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600)pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576)pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613)  [Doc] Fix a typo in optparse.rst (python#103504)pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531)pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039)pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569)pythongh-67230: update whatsnew note for csv changes (python#103598)  ...
@carljm
Copy link
MemberAuthor

@markshannon I've now addressed or replied to all comments, if you want to take another look.

* main: (53 commits)pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)pythongh-94300: Update datetime.strptime documentation (python#95318)pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)  Revert "Add tests for empty range equality (python#103751)" (python#103770)pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)pythonGH-65022: Fix description of copyreg.pickle function (python#102656)pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)pythongh-91687: modernize dataclass example typing (python#103773)pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)pythongh-87452: Improve the Popen.returncode docs  Removed unnecessary escaping of asterisks (python#103714)pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)  Add tests for empty range equality (python#103751)pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713)  ...
@markshannonmarkshannon self-requested a reviewApril 24, 2023 21:12
* main:pythongh-101517: fix line number propagation in code generated for except* (python#103550)pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)
@carljmcarljmenabled auto-merge (squash)April 24, 2023 21:27
@carljmcarljm added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 24, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@carljm for commit0de5bc6 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 24, 2023
* main:pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794)pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493)pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)
@carljmcarljm added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 24, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@carljm for commitdbe1665 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 24, 2023
@carljmcarljm merged commit0dc8b50 intopython:mainApr 24, 2023
carljm added a commit to carljm/cpython that referenced this pull requestApr 24, 2023
* main:pythongh-87729: add LOAD_SUPER_ATTR instruction for faster super() (python#103497)pythongh-103791: Make contextlib.suppress also act on exceptions within an ExceptionGroup (python#103792)
@carljmcarljm deleted the superopt branchApril 28, 2023 18:34
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@iritkatrieliritkatrieliritkatriel left review comments

@TeamSpen210TeamSpen210TeamSpen210 left review comments

@Fidget-SpinnerFidget-SpinnerFidget-Spinner left review comments

@corona10corona10corona10 approved these changes

@markshannonmarkshannonmarkshannon approved these changes

@brandtbucherbrandtbucherAwaiting requested review from brandtbucher

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@carljm@corona10@bedevere-bot@markshannon@iritkatriel@TeamSpen210@Fidget-Spinner

[8]ページ先頭

©2009-2025 Movatter.jp