NotificationsYou must be signed in to change notification settings
Fork34k
Star71.3k

GH-118036: Fix a bug with CALL_STAT_INC#117933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

gvanrossum merged 5 commits intopython:mainfromgvanrossum:call-stat-inc

Apr 18, 2024

Merged

GH-118036: Fix a bug with CALL_STAT_INC#117933

gvanrossum merged 5 commits intopython:mainfromgvanrossum:call-stat-inc

Apr 18, 2024

Conversation

Copy link

Member

gvanrossum commentedApr 16, 2024•
edited by bedevere-appbot
Loading

We were under-counting calls in_PyEvalFramePushAndInit because theCALL_STAT_INC macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted (I had wanted to move the code around, but that would require moving something else around, and in the end I figured it was easier to tweak the macros@markshannon might disagree though?). This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).

@mdboom can you review? This isone commit from my experiment about removing Tier 2 entirely (gh-117908).

To see the effect, look atthese pystat diffs.

Issue:Call stats are incorrect for tier 2 and maybe for tier 1 as well #118036

Fix a bug with CALL_STAT_INC

b0ac767

We were under-counting calls in `_PyEvalFramePushAndInit`because the `CALL_STAT_INC` macro was redefined to a no-opfor the Tier 2 interpreter. The fix is a little convoluted.This ought to result in ~37% more "Frames pushed" reportedunder "Call stats". The new count is the correct one(I presume).

gvanrossum added skip issue skip news labels

Apr 16, 2024

gvanrossum requested a review frommdboom

April 16, 2024 15:30

gvanrossum assignedmdboom

Apr 16, 2024

gvanrossum requested a review frommarkshannon as acode owner

April 16, 2024 15:30

bedevere-appbot added the awaiting core review label

Apr 16, 2024

mdboom reviewed

Apr 16, 2024

View reviewed changes

Copy link

Contributor

mdboom left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I understand how this is broken and why this change fixes it, so I'm marking as approved, but I agree it's "weird" / more convoluted than it needs to be. Moving the order of functions inceval.c would result in less weirdness (probably just moving_PyEval_EvalFrameDefault to the bottom since that's where all the macro updates happen) but that would create a lot of churn.

Alternatively, what if you renameREAL_CALL_STAT_INC toCALL_STAT_INC_ALWAYS and then just useCALL_STAT_INC_ALWAYS directly from_PyEvalFramePushAndInit (which I think is the only call site impacted by this change). That would get rid of the weird dance of#undef and restoring justCALL_STAT_INC (but admittedly that would just replace it with another subtlety in_PyEvalFramePushAndInit).

mdboom approved these changes

Apr 16, 2024

View reviewed changes

gvanrossum added2 commits

April 16, 2024 14:22

Revert "Fix a bug with CALL_STAT_INC"

779f7f4

Let's try a different fix instead.

Move _PyEval_EvalFrameDefault to the end of the ceval.c

34584c3

This should fix the issue with CALL_STAT_INC in a cleaner way (even if the diff is much larger).

Copy link

MemberAuthor

gvanrossum commentedApr 16, 2024•
edited
Loading

Here's another version, where I moved the interpreter loop (and a small assortment of related stuff) to the end of the file.

I ran Pystats on a single loop of the Richards benchmark, and found that most stats are approximately the same (not completely) except that "Frames pushed" is about 10% larger, which indicates that the fix works. (The diff is much more annoying to review, but I promise I just moved the stuff from the original lines 606 - 1120 to the bottom of the file, except I had to move#include "ceval_macros" to nearly the top (but afterLLTRACE is defined.)

Copy link

MemberAuthor

gvanrossum commentedApr 17, 2024

Benchmark says speed and memory unchanged.

Copy link

Member

markshannon commentedApr 17, 2024

I think the correct fix forCALL_STAT_INC is to not#undefine it at all.
It doesn't matter whether it happens in tier 1 or tier 2. A call is a call.

In executor_cases.c.hCALL_STAT_INC occurs only once in_PUSH_FRAME and that should be counted.

Copy link

MemberAuthor

gvanrossum commentedApr 17, 2024

I think the correct fix forCALL_STAT_INC is to not#undefine it at all.

Okay, I'll try that next.

gvanrossum added2 commits

April 17, 2024 10:17

Revert "Move _PyEval_EvalFrameDefault to the end of the ceval.c"

681ca31

I'm going to try yet another approach.

Just don't redefine CALL_STAT_INC

5e20f0f

Copy link

MemberAuthor

gvanrossum commentedApr 17, 2024

The proof will be in the pudding. I'll fire off two benchmark runs, with pystats, one plain, one with Tier 2. (The JIT pystats ought to be similar but I don't want to wait.)

Copy link

MemberAuthor

gvanrossum commentedApr 17, 2024

Benchmark using Tier 1 only shows 36.8% more frames pushed, which is very close to what I measured with the first version of this PR, so I think that suggests this fixes that issue. Everything else I looked at is basically unchanged, suggesting I'm not breaking anything.

Still waiting for the Tier 2 benchmark, will update when I see those numbers.

Copy link

MemberAuthor

gvanrossum commentedApr 17, 2024

Benchmark with Tier 2 poses a bit of a mystery, at least the pystats diff.

Go toCall stats and open the details box.

Frames pushed is 40% higher
Calls to Python functions inlined is 30% higher (it wasn't in the Tier 1 pystats diff)

@markshannon,@mdboom,@brandtbucher -- could there be some kind of double counting going on? Or is this an expected result? The only change from main is now that wedon't undefineCALL_STAT_INC, which means that where that macro is called from Tier 2 it actually updates the count.