Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-46564: Optimizesuper().meth() calls via adaptive superinstructions#30992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-SpinnerFidget-Spinner commentedJan 28, 2022
edited
Loading

They should now have almost no overhead over a correspondingself.meth() call.

Summary of changes:

  • typeobject.c -- refactoring to reuse code during specialization, also useInterpreterFrame overPyFrameObject for lazy frame benefits. Some changes here are partially taken frombpo-43563 : Introduce dedicated opcodes for super calls #24936. All credits to@vladima (I've tried to properly include them in the news item too.)
  • specialize.c -- specialize for the 0-argument and 2-argument form ofsuper().
  • ceval.c -- does both aCALL andLOAD_METHOD without intermediates (and both are specialized forms too).

TODO:
benchmarks!

https://bugs.python.org/issue46564

@Fidget-SpinnerFidget-Spinner changed the titlebpo-46564: Optimizesuper().meth() callsbpo-46564: Optimizesuper().meth() calls via adaptive superinstructionsJan 28, 2022
@markshannonmarkshannon self-assigned thisJan 28, 2022
@Fidget-SpinnerFidget-Spinner marked this pull request as draftJanuary 28, 2022 18:14
@Fidget-Spinner
Copy link
MemberAuthor

Marking as draft as I need make this work with the newCALL convention.


DEOPT_IF(_PyType_CAST(super_callable) != &PySuper_Type, CALL);
/* super() - zero argument form */
if (_PySuper_GetTypeArgs(frame, frame->f_code, &su_type, &su_obj) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can't we do this at specialization time? The number of locals, the index of "self", and whether it is a cell are all known then. Likewise the nature of__class__ is also known.

}
assert(su_obj != NULL);
DEOPT_IF(lm_adaptive->version != Py_TYPE(su_obj)->tp_version_tag, CALL);
DEOPT_IF(cache0->version != su_type->tp_version_tag, CALL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When can this fail?
Isn't the next item in the MRO determined solely bytype(self) and__class__, both of which are known at this point?

Copy link
MemberAuthor

@Fidget-SpinnerFidget-SpinnerJan 29, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wanted assurance that__class__ didn't change.. Then again, I'm not sure if it can?

@markshannon
Copy link
Member

Maybe we should merge#31002 first, as that PR is simpler.
It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

@Fidget-Spinner
Copy link
MemberAuthor

Maybe we should merge#31002 first, as that PR is simpler. It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

👍

@Fidget-Spinner
Copy link
MemberAuthor

Mark, I'm going to run benchmarks ondeltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

Copy link
Member

@arhadthedevarhadthedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A couple of indentation-related inconsistencies:

staticint
super_init_without_args(InterpreterFrame *cframe, PyCodeObject *co,
int
_PySuper_GetTypeArgs(InterpreterFrame *cframe, PyCodeObject *co,
PyTypeObject **type_p, PyObject **obj_p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
PyTypeObject**type_p,PyObject**obj_p)
PyTypeObject**type_p,PyObject**obj_p)

The line was aligned with an opening parenthesis of a parameter list.

Comment on lines +278 to +279
PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,
PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
PyObject*kwnames,SpecializedCacheEntry*cache,PyObject*builtins,
PyObject**stack_pointer,InterpreterFrame*frame,PyObject*names);
PyObject*kwnames,SpecializedCacheEntry*cache,PyObject*builtins,
PyObject**stack_pointer,InterpreterFrame*frame,PyObject*names);

as in a removed line, or even:

Suggested change
PyObject*kwnames,SpecializedCacheEntry*cache,PyObject*builtins,
PyObject**stack_pointer,InterpreterFrame*frame,PyObject*names);
PyObject*kwnames,SpecializedCacheEntry*cache,PyObject*builtins,
PyObject**stack_pointer,InterpreterFrame*frame,PyObject*names);

as in_Py_Specialize_BinaryOp right below.

@Fidget-Spinner
Copy link
MemberAuthor

Mark, I'm going to run benchmarks ondeltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

Well that was depressing.deltablue only shows 1.03x speedup. Looking closer at the code,super isn't called in any tight loops so that might be why. Maybe I need to pull out microbenchmarks now.

@Fidget-Spinner
Copy link
MemberAuthor

Fidget-Spinner commentedFeb 1, 2022
edited
Loading

Microbenchmarks show thatsuper() has sped up by more than2.2x. This is faster than that other attempt because there's also speedups from theLOAD_METHOD_CACHED:
(Extremely unscientific, I'm short on time to set up pyperf right now)

importtimeitsetup="""class A:    def f(self): passclass B(A):    def g(self): super().f()    def h(self): self.f()b = B()"""# super() callprint(timeit.timeit("b.g()",setup=setup,number=20_000_000))# referenceprint(timeit.timeit("b.h()",setup=setup,number=20_000_000))

Results:

# Main5.7960373999958392.4094066999969073# This branch2.45782730000064482.3718886000060593

Sosuper().meth() is now only ~10% slowly than the correspondingself.meth() call whereas it was nearly 2x as slow previously. If I manage to incorporate your suggestions correctly, this will effectively just be a competition betweenLOAD_GLOBAL_BUILTIN (super) andLOAD_FAST (self).

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@markshannonmarkshannonmarkshannon left review comments

@arhadthedevarhadthedevarhadthedev requested changes

Assignees

@markshannonmarkshannon

Labels
awaiting core reviewperformancePerformance or resource usage
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@Fidget-Spinner@markshannon@arhadthedev@iritkatriel@the-knights-who-say-ni@ezio-melotti@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp