Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-98831: Implement super-instruction generation#99084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gvanrossum merged 6 commits intopython:mainfromgvanrossum:super-instrs
Nov 6, 2022

Conversation

@gvanrossum
Copy link
Member

@gvanrossumgvanrossum commentedNov 4, 2022
edited by bedevere-bot
Loading

@markshannon Please see "PLEASE NOTE" below.

This is pretty straightforward: the expansion of

super(FOO__BAR) = FOO + BAR;

generates code like this:

TARGET(FOO__BAR) {    {        <code block for FOO>    }    NEXTOPARG();    next_instr++;    {        <code block for BAR>    }    DISPATCH();}

The only tricky thing is that the code block forLOAD_CONST contains aPREDICTED() macro call, so we strip ththoseat out. (Eventually we'll auto-generate thePREDICTED() macros so we won't need to do this any more, but super-instruction generation is a simpler project than auto-PREDICTED insertion, so I figured I'd tackle it first.)

The new super-instructions are all placed at the end (the parser doesn't keep track of how instruction definitions are interspersed), so the diff for generated_cases.c.h looks perhaps a bit more imposing than it is.

PLEASE NOTE: The code for some of the generated super-instructions isdifferent than the hand-written super-instructions because somehow the hand-written versions moved theNEXTOPARG(); next_instr++; pair up a few lines. I don't know why the originals were written that way --@markshannon do you remember? Do you think it matters? (Another difference is that the parts each define their own variables, whereas the hand-written versions reusedPyObject *value, but I assume the optimizer will handle that.)

@gvanrossum
Copy link
MemberAuthor

@CAM-Gerlach Can you double check the markup in the NEWS entry? What's the recommended way to refer to a markdown file in a different repo owned by a different org?

Copy link
Member

@CAM-GerlachCAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I had a few small suggestions to clarify the news entry and move the link to be inline where the DSL introduced, instead of a bare URL at the end of the body, but nothing too major.

Maybe out of scope for this particular PR, but while it may not affect too many people's work directly, this#98831 overall seems a significant enough change to deserve a What's New entry in the (e.g. in the Build Section, or as appropriate), which this NEWS entry seems like a great basis for.

@CAM-Gerlach
Copy link
Member

What's the recommended way to refer to a markdown file in a different repo owned by a different org?

Right now, same or different repo/org, there isn't (AFAIK) currently an easy way to do this with a nice role as opposed to just a plain reST link.

However, following discussion and some experimentation in the devguide repo and Discord, I'm working on a little Sphinx extension with a base "SmarkLink" role that, with just a couple lines of customization can be used to link files on the CPython repo, other repos/orgs or even other websites (like PEPs, RFCs, etc, as a smarter and more powerful version of the built-in PEP/RFC roles). If we implement that, you'll be able to do e.g.:gh-file:`Tools/cases_generator` to generateTools/case-generator (for the current branch by default), and also supports files on other repos, e.g.:gh-file:`faster-cpython/ideas:3.12/interpreter_definition.md`, as well as specifying branch/tag/commit, standard Sphinx~,! and<> syntax,# for subsections, etc.

Here'san earlier prototype.

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
@gvanrossum
Copy link
MemberAuthor

Right now, same or different repo/org, there isn't (AFAIK) currently an easy way to do this with a nice role as opposed to just a plain reST link.

Thanks for your suggestions! All applied.

I'm looking forward to using your "SnarkLink" extension next time. :-)

CAM-Gerlach reacted with thumbs up emojiCAM-Gerlach and AlexWaygood reacted with laugh emoji

@markshannon
Copy link
Member

We need the code generator to have an understanding of stack effects before breaking instructions into their parts.
Consider a hypotheticalLOAD_FAST__STORE_FAST
Simple concatenation would produce this:

TARGET(STORE_FAST__LOAD_FAST) {    {PyObject*value=GETLOCAL(oparg);assert(value!=NULL);Py_INCREF(value);PUSH(value);    }NEXTOPARG();next_instr++;    {PyObject*value=POP();SETLOCAL(oparg,value);    }DISPATCH();}

This is clearly inefficient.
The output should look something like:

TARGET(STORE_FAST__LOAD_FAST) {PyObject*_stack_tmp1;    {PyObject*value=GETLOCAL(oparg);assert(value!=NULL);Py_INCREF(value);_stack_tmp1=value;    }NEXTOPARG();next_instr++;    {PyObject*value=_stack_tmp1SETLOCAL(oparg,value);    }DISPATCH();}

In order to do this, we first need to have stack effects that the tooling understands.

This probably doesn't matter for superinstructions, but does matter if we want to split specialized instructions into guard and action parts.

@markshannon
Copy link
Member

markshannon commentedNov 4, 2022
edited by gvanrossum
Loading

I think the next enhancements to generated code should be generating the code for dispatch related macros.

  • Strip out remainingDISPATCH() macros from bytecodes.c
  • Strip out allPREDICTED() macros. The code generator can insert them as required byPREDICT(name) macros.
  • Replace allgotos in bytecodes.c withERROR,UNWIND,RESUME_FRAME, etc., so that they are easily identifiable by code-gen. This is necessary, so that the code generator can ensure that the stack in consistent in case of error or unwind. It will also allow us to experiment with different layouts and inlining of code sequences.

Implementing stack effects can be done before or after the above.

Our goal is to expose "micro ops" for optimization:
E.g.

op(FLOAT_CHECK2, (left right -- left right)) {    DEOPT_IF(!PyFloat_CheckExact(left));    DEOPT_IF(Py_TYPE(right) != Py_TYPE(left));}op(FLOAT_ADD, (left right  -- sum)) {    STAT_INC(BINARY_OP, hit);    double dsum = ((PyFloatObject *)left)->ob_fval +        ((PyFloatObject *)right)->ob_fval;    PyObject *sum = PyFloat_FromDouble(dsum);    if (sum == NULL) {        ERROR();    }    _Py_DECREF_SPECIALIZED(right, _PyFloat_ExactDealloc);    _Py_DECREF_SPECIALIZED(left, _PyFloat_ExactDealloc);}BINARY_OP_ADD_FLOAT = counter/1 + FLOAT_CHECK2 + FLOAT_ADD;

And be able to generate code forBINARY_OP_ADD_FLOAT that is at least as efficient as what we have now.

@gvanrossum
Copy link
MemberAuthor

gvanrossum commentedNov 4, 2022
edited
Loading

@markshannon, I asked you one question you didn't answer yet: Does it matter that the placement of theNEXTOPARG(); next_instr++; pairs in the generated super-instructions is different than it was in the hand-written versions? (You can compare them in the diff.)

I think the next enhancements to generated code should be generating the code for dispatch related macros.

  • Strip out remainingDISPATCH() macros from bytecodes.c

I imagine this is important for opcodes that can be combined into others. It doesn't matter for top-level instructions, right?

Doing this everywhere seems complicated, since there are quite a few that have conditionalDISPATCH() all over the place (e.g.UNARY_NOT). What do you propose I do in those cases? Add a label at the end of the instruction andgoto there? That seems to defeat half the purpose ofDISPATCH(). (There are also easy cases, e.g. for some reason the extract_cases script didn't elide theDISPATCH() at the end ofSTOPITERATION_ERROR. Those I can do of course.)

  • Strip out allPREDICTED() macros. The code generator can insert them as required byPREDICT(name) macros.

That was going to be my next project, indeed.

  • Replace allgotos in bytecodes.c withERROR,UNWIND,RESUME_FRAME, etc., so that they are easily identifiable by code-gen. This is necessary, so that the code generator can ensure that the stack in consistent in case of error or unwind. It will also allow us to experiment with different layouts and inlining of code sequences.

What purpose does this serve? The generator can recognizegoto error just as easily.

Anyway, my own planned sequence of projects is roughly:

After that we should decide what's next, e.g. combining uops into instrs, cache effects, array stack effects, conditional stack effects, register opcodes, who knows.

@markshannon
Copy link
Member

Does it matter that the placement of the NEXTOPARG(); next_instr++; pairs in the generated super-instructions is different than it was in the hand-written versions?

Only if it impairs performance. I don't think it should.

@markshannon
Copy link
Member

We need to implement stack effects before superinstructions. Without the stack effects the code generator cannot efficiently move values from one op to the next.

@markshannon
Copy link
Member

The generator can recognizegoto error just as easily.

Sure, it is just about of consistency. It is easier to see what's going on ifSHOUTY_CASE_MACROS need to be handled by the code generator, and everything can be treated as an opaque blob.

@markshannon
Copy link
Member

Replacing all theDISPATCH()s withgotos to the end of the instruction means that the code generator doesn't need to handleDISPATCH. It should be fine to leave it for now, as the code generator doesn't do anything with theDISPATCH() anyway.

@gvanrossum
Copy link
MemberAuthor

We need to implement stack effects before superinstructions. Without the stack effects the code generator cannot efficiently move values from one op to the next.

Okay, but thecurrent crop of 5 superinstructions don't need this. There's not a single case where the second half does something with the result of the first case (no LOAD_FAST__STORE_FAST).

@gvanrossumgvanrossum merged commit7dcd28e intopython:mainNov 6, 2022
@gvanrossumgvanrossum deleted the super-instrs branchNovember 6, 2022 17:40
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@CAM-GerlachCAM-GerlachCAM-Gerlach left review comments

@brandtbucherbrandtbucherAwaiting requested review from brandtbucher

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@gvanrossum@CAM-Gerlach@markshannon@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp