Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-116968: Reimplement Tier 2 counters#117144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Results of the benchmark are available here: 1% faster geom. mean, 0% faster HPT, 1% less memory. Though notably this strongly improves a lot of the interpreter-heavy benchmarks.https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240321-3.13.0a5+-716c0c6-JIT |
Additionally, concerning the results, it's probably safe to say this PR makes things better, but given the#116206 increasing the overall benchmark times from 1h15 to 2h30, they are probably noisier than usual. |
I missed that. Do we run every benchmark twice for incremental GC? Or did that make CPython twice as slow? Regardless, it seems unfortunate that the benchmarks now take 2h30. Regarding the benchmark numbers, I'm guessing the improvements come from not wasting so much time on fruitless efforts like in hexiom. And possibly because the There's still a lot of cleanup to do in this PR.@markshannon What do you think of my general approach? |
Nothing changed in how we run the benchmarks --#116206 just seems to be a large regression overall, though more than made up by the follow-up in#117120. Once that's merged we should hopefully have working pystats and closer-to-baseline timings again. |
Exciting! |
ericsnowcurrently commentedMar 25, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Regarding the WASI failures, "call stack exhausted" means a stack overflow. Our WASI builds have a stack of about 8MB.12 From the notes in the build script, that's derived from the stack size on Linux ( There are two possibilities with the failures here:
In either case I'd normally expect a new "call stack exhausted" crash to indicate that we were already close to the limit. I'm not sure how well that does or doesn't apply for this PR. That said, given that the stack size on WASI is meant to be similar to Linux, I'd expect a problem on WASI to manifest on Linux too. Perhaps WASI is simply our canary in a coalmine here? The next step is probably to take a look at how close were are getting to the stack size on Linux (since we know we're hitting it on WASI). If we're not getting close then we'll need to see what's so special about WASI here. For similar failures seehttps://github.com/python/cpython/issues?q=is%3Aissue+%22call+stack+exhausted%22. There were cases wherelowering the Python recursion limit was the solution. However, I don't think that applies here. Footnotes |
Also note that 226 tests did pass. |
Uh oh!
There was an error while loading.Please reload this page.
gvanrossum commentedMar 26, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This is still in draft mode. Here's my a plan:
Once that's all done (EDIT: and the tests pass) I'll request reviews. |
This changes a lot of things but the end result is arguably better.
- The initial exit temperature is 64; this must be greater than the specialization cooldown value (52) otherwise we might create a trace before we have re-specialized the Tier 1 bytecode- There's now a handy helper function for every counter initialization
I'm starting 3 benchmarks:
I also have prepared a blurb, but I'll merge it only when I have something else to merge (to save CI resources): +Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),+shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The+API used for adaptive specialization counters is changed but the behavior is+(supposed to be) identical.++The behavior of the Tier 2 counters is changed:++- There are no longer dynamic thresholds (we never varied these). - All+counters now use the same exponential backoff. - The counter for+``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in+side exits starts counting down from 64. |
@@ -477,13 +473,9 @@ write_location_entry_start(uint8_t *ptr, int code, int length) | |||
#define ADAPTIVE_COOLDOWN_VALUE 52 | |||
#define ADAPTIVE_COOLDOWN_BACKOFF 0 | |||
#define MAX_BACKOFF_VALUE (16 - ADAPTIVE_BACKOFF_BITS) | |||
static inline uint16_t | |||
adaptive_counter_bits(uint16_t value, uint16_t backoff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I could dispense withadaptive_counter_bits()
, usingmake_backoff_counter()
directly below, but I would oppose getting rid of thecooldown()
andwarmup()
helpers, because they are used in several/many places.
@@ -89,7 +89,7 @@ static inline uint16_t uop_get_error_target(const _PyUOpInstruction *inst) | |||
typedef struct _exit_data { | |||
uint32_t target; | |||
int16_t temperature; | |||
_Py_BackoffCounter temperature; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
temperature
is now a bit of a misnomer, since it counts down. Maybe it should be renamed tocounter
(same as inCODEUNIT
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm fine with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll keeptemperature
, despite the misnomer -- it stands out and makes it easy to grep for this particular concept.
{ | ||
return counter.value == 0; | ||
} | ||
static inline uint16_t | ||
initial_backoff_counter(void) | ||
/* Initial JUMP_BACKWARD counter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There's a lot of boilerplate for these initial values; I followed your lead foradaptive_counter_warmup()
andadaptive_counter_cooldown()
, more or less.
gvanrossum commentedApr 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Benchmarking results comment (will update as they complete):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good. Thanks for doing this.
A possible further improvement (for another PR) would be to make the code generator aware of counters. specializingop(_SPECIALIZE_TO_BOOL, (counter/1,value--value)) {if (ADAPTIVE_COUNTER_TRIGGERS(counter)) { ...ADVANCE_ADAPTIVE_COUNTER(this_instr[1].counter); to specializingop(_SPECIALIZE_TO_BOOL, (counter/1,value--value)) {if (backoff_counter_triggers(counter)) { ...advance_backoff_counter(counter); by having the code generator generate: _Py_BackoffCounter*counter=&this_instr[1].counter; instead of
|
bedevere-bot commentedApr 4, 2024
|
This change appears to have broken building scipy
conformed scipy builds with63bbe77 |
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. TheAPI used for adaptive specialization counters is changed but the behavior is(supposed to be) identical.The behavior of the Tier 2 counters is changed:- There are no longer dynamic thresholds (we never varied these).- All counters now use the same exponential backoff.- The counter for ``JUMP_BACKWARD`` starts counting down from 16.- The ``temperature`` in side exits starts counting down from 64.
- Fix a few places where we were not using atomics to (de)instrument opcodes.- Fix a few places where we weren't using atomics to reset adaptive counters.- Remove some redundant non-atomic resets of adaptive counters that presumably snuck as merge artifacts ofpython#118064 andpython#117144 landing close together.
Uh oh!
There was an error while loading.Please reload this page.
This introduces a unified 16-bit backoff counter type (
_Py_BackoffCounter
), shared between the Tier 1 adaptive specialization machinery and the Tier 2 optimizer.The latter's side exit temperature now uses exponential backoff, and starts initially at 64, to avoid creating side exit traces for code that hasn't been respecialized yet (since the latter only happens after the cooldown counter has reached zero from an initial value of 52).
The threshold value for back-edge optimizations is no longer dynamic; we just use a backoff counter initialized to (16, 4).