Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls.#118322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
c6c1f31 todce9b23Comparebrandtbucher commentedMay 3, 2024
Finally found the bug (took meway too long). This fixes it: diff --git a/Python/bytecodes.c b/Python/bytecodes.cindex c54763ce719..689bb9cbb57 100644--- a/Python/bytecodes.c+++ b/Python/bytecodes.c@@ -3171,15 +3171,10 @@ dummy_func( frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL); }- op(_CHECK_IS_FUNCTION, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {- DEOPT_IF(!PyFunction_Check(callable));- }- macro(CALL_PY_GENERAL) = unused/1 + // Skip over the counter- unused/2 + _CHECK_PEP_523 +- _CHECK_IS_FUNCTION ++ _CHECK_FUNCTION + _CALL_PY_GENERAL + _PUSH_FRAME; |
markshannon commentedMay 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I created an issue for this:#118540, but I'll do the simpler fix you suggest for 3.13. |
markshannon commentedMay 3, 2024
Stats look good. ~13% reduction in tier 1 instructions executed and a ~5% increase in tier 2 uops executed with an increase of uops/trace from 26.3 to 28.5 and 3% reduction in the number of traces executed. |
markshannon commentedMay 3, 2024
Tier 1 performance is neutral (1.00x faster) |
Uh oh!
There was an error while loading.Please reload this page.
Python/bytecodes.c Outdated
| if (new_frame==NULL) { | ||
| ERROR_NO_POP(); | ||
| } | ||
| frame->return_offset= (uint16_t)(1+INLINE_CACHE_ENTRIES_CALL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could we use_SAVE_RETURN_OFFSET for this bit?
| op(_CHECK_IS_NOT_PY_CALLABLE, (callable,unused,unused[oparg]--callable,unused,unused[oparg])) { | ||
| DEOPT_IF(PyFunction_Check(callable)); | ||
| DEOPT_IF(Py_TYPE(callable)==&PyMethod_Type); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe these are better as exits so we can handle more cases? If we trace through this, it'd be a shame if we couldn't then stitch a function or method into the call site if its polymorphic.
| op(_CHECK_IS_NOT_PY_CALLABLE, (callable,unused,unused[oparg]--callable,unused,unused[oparg])) { | |
| DEOPT_IF(PyFunction_Check(callable)); | |
| DEOPT_IF(Py_TYPE(callable)==&PyMethod_Type); | |
| } | |
| op(_CHECK_IS_NOT_PY_CALLABLE, (callable,unused,unused[oparg]--callable,unused,unused[oparg])) { | |
| EXIT_IF(PyFunction_Check(callable)); | |
| EXIT_IF(Py_TYPE(callable)==&PyMethod_Type); | |
| } |
| _SAVE_RETURN_OFFSET+ | ||
| _PUSH_FRAME; | ||
| inst(CALL_PY_WITH_DEFAULTS, (unused/1,func_version/2,callable,self_or_null,args[oparg]--unused)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just curious, this isn't worth specializing anymore? I forget exactly why it can't be converted to tier two (I'm guessing that there's more than one reason, since the use ofthis_instr can be easily worked around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Two reasons:
- Because this loops over both arguments and defaults, it probably isn't much faster than
CALL_PY_GENERALfor tier 1. It is almost certainly slower in the JIT because of its size.CALL_PY_GENERALcalls a help function, so is more compact. - We expect that, for 3.14, the tier 2 optimizer will convert them both to the same optimal sequence of operations.
| #ifdefLLTRACE | ||
| staticvoid | ||
| dump_stack(_PyInterpreterFrame*frame,PyObject**stack_pointer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are the changes in this function intentional, or left over from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Left over from debugging, I needed them asdump_stack was too slow for a realistic number (many millions) of instructions for finding bugs.
It might be worth making this change, but that's for another PR.
| gotogeneric; | ||
| } | ||
| if (tp->tp_new==PyBaseObject_Type.tp_new) { | ||
| PyFunctionObject*init=get_init_for_simple_managed_python_class(tp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
GH won't let me comment outside the diff, but below this line there's a failure path if we're out of type versions. We could go generic in that case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could, but I don't want to hide any errors.
| intargcount=-1; | ||
| if (kind==SPEC_FAIL_CODE_NOT_OPTIMIZED) { | ||
| SPECIALIZATION_FAIL(CALL,SPEC_FAIL_CODE_NOT_OPTIMIZED); | ||
| return-1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
CALL_PY_GENERAL can handle this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, but I don't want to hide these cases, otherwise we will never know if they are a problem.
We could record stats for failed specialization that have a fall back, like this case, but I think that's a bit too intrusive for 3.13.
| intversion=_PyFunction_GetVersionForCurrentState(func); | ||
| if (version==0) { | ||
| SPECIALIZATION_FAIL(CALL,SPEC_FAIL_OUT_OF_VERSIONS); | ||
| return-1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Once we fix the version checks in tier two, we could handle this too, right? Since we don't care about function version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes.
| } | ||
| SPECIALIZATION_FAIL(CALL,tp==&PyUnicode_Type ? | ||
| SPEC_FAIL_CALL_STR :SPEC_FAIL_CALL_CLASS_NO_VECTORCALL); | ||
| return-1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we handle this withCALL_NON_PY_GENERAL?
markshannonMay 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, but inspecialize_class_call.
It keeps the logic cleaner if we don't make assumptions about howspecialize_class_call could fail here.
| void | ||
| _Py_Specialize_Call(PyObject*callable,_Py_CODEUNIT*instr,intnargs) |
brandtbucherMay 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is really cool. Am I correct that we are now (in theory) able to specialize all calls, except for those where:
- We're out of versions somewhere.
- The call itself is invalid.
And even thesecan be specialized, even if it doesn't make a whole lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, more or less. There might be a few other cases, but they are all very rare.
| first_valid_check_stack=corresponding_check_stack; | ||
| } | ||
| else { | ||
| elseif (corresponding_check_stack){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
corresponding_check_stack may be NULL forPY_CALL_GENERAL.
…etter tier 2 support of calls. (pythonGH-118322)* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.* Remove CALL_PY_WITH_DEFAULTS specialization* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
Co-Authored-By: Ken Jin <kenjin@python.org>Signed-off-by: Manjusaka <me@manjusaka.me>
Co-Authored-By: Ken Jin <kenjin@python.org>Signed-off-by: Manjusaka <me@manjusaka.me>
Co-Authored-By: Ken Jin <kenjin@python.org>Signed-off-by: Manjusaka <me@manjusaka.me>
… introduced inpythongh-118322Co-Authored-By: Ken Jin <kenjin@python.org>Signed-off-by: Manjusaka <me@manjusaka.me>
… introduced inpythongh-118322Co-Authored-By: Ken Jin <kenjin@python.org>Signed-off-by: Manjusaka <me@manjusaka.me>
…duced inpythongh-118322 (pythonGH-120712)Co-authored-by: Ken Jin <kenjin4096@gmail.com>
…duced inpythongh-118322 (pythonGH-120712)Co-authored-by: Ken Jin <kenjin4096@gmail.com>
…duced inpythongh-118322 (pythonGH-120712)Co-authored-by: Ken Jin <kenjin4096@gmail.com>
Uh oh!
There was an error while loading.Please reload this page.
There is a bug in the tier 2 code, and this needs to be benchmarked.Bug is fixed. See below for benchmarking and stats