Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-117139: Set up the tagged evaluation stack#117186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-SpinnerFidget-Spinner commentedMar 23, 2024
edited by bedevere-appbot
Loading

This PR is up mostly just for discussion. It compiles (with warnings) and passes (Ubuntu) tests as of now.

The main point of contention is how do we deal with bytecode that expect arrays of untagged objects? E.g. vectorcall takes an array from the stack of objects. There are two general approaches I can think of:

  1. Untag all values used by the bytecode on the stack, then let them do the usual. This has really bad performance.
  2. Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside. This way we don't break C API. This is kind of bad from a type safety standpoint (and needs us to cast things around), but it's the one with the least performance loss.

For example I currently have a_PyList_FromArraySteal and a_PyList_FromTaggedArraySteal. With the 2nd approach, I just need some bitwise operations in_PyList_FromArraySteal and it should be good to go. Then I can remove_PyList_FromTaggedArraySteal. What do y'all think?

However, we still have to untag everything if we call escaping functions that call 3rd party code (e.g. vectorcall). This is only safe right now for CPython's own C API.

corona10 reacted with eyes emoji
@colesbury
Copy link
Contributor

colesbury commentedMar 26, 2024
edited
Loading

I'm a confused by your description of approach 2:

Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside.

That does not look like the approach in the PR, which adds new functions that take tagged values. The approach in the PR seems correct. We can change the signatures of internal-only APIs as needed.

However, we still have to untag everything if we call escaping functions that call 3rd party code (e.g. vectorcall)

If you untag in-place on the stack, you need toPy_INCREF() deferred references: if you mark them as non-deferred on the evaluation stack then they need to reflect that.

I think it's better to do the untagging off the evaluation stack. In the common case, you can have some fixed sizePyObject *buf[N] on the C stack and untaginto it, for some reasonable small fixedN. In the general case,

Comment on lines 231 to 247
typedefunion {
PyObject*obj;
uintptr_tbits;
}_Py_TaggedObject;

#definePy_OBJECT_TAG (0b0)

#ifdefPy_GIL_DISABLED
#definePy_CLEAR_TAG(tagged) ((PyObject *)((tagged).bits & ~(Py_OBJECT_TAG)))
#else
#definePy_CLEAR_TAG(tagged) ((PyObject *)(uintptr_t)((tagged).bits))
#endif

#definePy_OBJ_PACK(obj) ((_Py_TaggedObject){.bits = (uintptr_t)(obj)})

#definePy_TAG_CAST(o) ((_Py_TaggedObject){.obj = (o)})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should not make any of these macros or data structures part of the public C API.

Fidget-Spinner reacted with thumbs up emoji
@Fidget-Spinner
Copy link
MemberAuthor

I'm a confused by your description of approach 2:

Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside.

That does not look like the approach in the PR, which adds new functions that take tagged values. The approach in the PR seems correct for internal-only APIs -- we can change their signatures as needed.

I did a mix of both -- for internal functions I converted them to tagged form. For stuff that might be used by other people eg._PyList_FromArraySteal, I made them internally support tagged pointers.

I think it's better to do the untagging off the evaluation stack. In the common case, you can have some fixed sizePyObject *buf[N] on the C stack and untaginto it, for some reasonable small fixedN. In the general case,

Ok I will think about that. That does make sense -- to pass to a scratch buffer for now. If we exceed the buffer I assume we need to untag the C stack then. One thing we might want to be wary of is overflowing the C recursion stack with the current recursion limit. Since this will make_Py_EvalEvalFrameDefault a little larger.

@Fidget-Spinner
Copy link
MemberAuthor

Fidget-Spinner commentedMar 26, 2024
edited
Loading

Also to note: everyPy_DECREF(Py_CLEAR_TAG(x)) andPy_INCREF(Py_CLEAR_TAG(x)) and so on is an opportunity to swap out withPy_DECREF_TAGGED(x) and so on in a future PR, when we implement the actual deferring. To keep this PR small and easier to review, I didn't do them for now. Though I think I might as well since this PR is quite bulky already, and to reduce code churn in the future.

@Fidget-SpinnerFidget-Spinner changed the titlegh-117139: Tagged evaluation stackgh-117139: Set up the tagged evaluation stackMar 26, 2024
@colesbury
Copy link
Contributor

For stuff that might be used by other people eg. _PyList_FromArraySteal, I made them internally support tagged pointers.

That's undefined behavior in C -- let's avoid it if possible. If we're concerned about external users, add a new function that takes tagged references (and keep the old one unused, if desired).

If we exceed the buffer I assume we need to untag the C stack then...

If we exceed the fixed size buffer we should allocate enough space for a temporary buffer usingPyMem_Malloc().

One thing we might want to be wary of is overflowing the C recursion stack...

I would structure this so that the small, temporary buffer is only used when we perform a vectorcall (and not to a Python function). For example, in pseudo-code:

#defineN 5PyObject*PyObject_VectorcallTagged(PyObject*callable,const_Py_TaggedObject*tagged,size_tnargs,PyObject*kwnames){PyObject*args[N];if (nargs >=N) {returnPyObject_VectorcallTaggedSlow(callable,tagged,nargs,kwnames);  }untag(args,tagged,nargs);returnPyObject_Vectorcall(callable,args,nargs,kwnames);}PyObject*PyObject_VectorcallTaggedSlow(PyObject*callable,const_Py_TaggedObject*tagged,size_tnargs,PyObject*kwnames){PyObject*args=PyMem_Malloc(nargs*sizeof(PyObject*));if (args==NULL) ...untag(args,tagged,nargs);PyObject*res=PyObject_Vectorcall(callable,args,nargs,kwnames);PyMem_Free(args);returnres;}

@Fidget-Spinner
Copy link
MemberAuthor

Thanks for the clear explanation! I will address them tomorrow.

@gvanrossum
Copy link
Member

Hm, I now recall that the problem of untagging arrays of arguments sunk my attempt (I got it to work, but it was too slow). Hope you fare better!

@nineteendo
Copy link
Contributor

Could you also configure pre-commit?https://devguide.python.org/getting-started/setup-building/#install-pre-commit

index 2bd9c40..3aa7dea 100644--- a/Tools/cases_generator/analyzer.py+++ b/Tools/cases_generator/analyzer.py@@ -359,7 +359,7 @@ def has_error_without_pop(op: parser.InstDef) -> bool:     "Py_XDECREF_STACKREF",     "Py_INCREF_STACKREF",     "Py_XINCREF_TAGGED",-    "Py_NewRef_StackRef",+    "Py_NewRef_StackRef",     "Py_INCREF",     "_PyManagedDictPointer_IsValues",     "_PyObject_GetManagedDict",

@Fidget-Spinner
Copy link
MemberAuthor

!buildbot nogil

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@Fidget-Spinner for commitd59145b 🤖

The command will test the builders whose names match following regular expression:nogil

The builders matched are:

  • x86-64 MacOS Intel ASAN NoGIL PR
  • AMD64 Ubuntu NoGIL Refleaks PR
  • ARM64 MacOS M1 NoGIL PR
  • AMD64 Windows Server 2022 NoGIL PR
  • AMD64 Ubuntu NoGIL PR
  • ARM64 MacOS M1 Refleaks NoGIL PR
  • x86-64 MacOS Intel NoGIL PR

@Fidget-Spinner
Copy link
MemberAuthor

Fidget-Spinner commentedApr 26, 2024
edited
Loading

Here are results from Sam'smicrobenchmark suite testing scalability. Note that I've only implemented deferred refcounting for method and function calls.

Specs:20 virtual cores (hyper threading)pyperf system tuneMain:object_cfunction MEGA FAILED: 1.1x slowercmodule_function MEGA FAILED: 2.1x slowergenerator FAILED: 3.1x fasterpymethod MEGA FAILED: 1.4x slowerpyfunction MEGA FAILED: 2.7x slowermodule_function MEGA FAILED: 1.9x slowerload_string_const MEGA FAILED: 2.1x slowerload_tuple_const MEGA FAILED: 1.9x slowercreate_closure MEGA FAILED: 3.6x slowercreate_pyobject MEGA FAILED: 2.9x slowerMy branch:object_cfunction FAILED: 9.3x fastercmodule_function MEGA FAILED: 2.0x slowergenerator FAILED: 2.9x fasterpymethod MEGA FAILED: 1.1x slowerpyfunction MEGA FAILED: 2.4x slowermodule_function MEGA FAILED: 1.9x slowerload_string_const MEGA FAILED: 2.2x slowerload_tuple_const MEGA FAILED: 2.1x slowercreate_closure MEGA FAILED: 3.6x slowercreate_pyobject MEGA FAILED: 3.1x slower

Notice thatobject_cfunction went from1.1x slower to9.3x faster on a 20 thread workload!

All tests pass except ASAN. I am now happy with the state this PR is in. So I will close this and start upstreaming things in peices.

@Fidget-Spinner
Copy link
MemberAuthor

Fidget-Spinner commentedApr 26, 2024
edited
Loading

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@colesburycolesburycolesbury left review comments

@gvanrossumgvanrossumAwaiting requested review from gvanrossum

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@methanemethaneAwaiting requested review from methanemethane is a code owner

@brandtbucherbrandtbucherAwaiting requested review from brandtbucherbrandtbucher is a code owner

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently will be requested when the pull request is marked ready for reviewericsnowcurrently is a code owner

@ZeroIntensityZeroIntensityAwaiting requested review from ZeroIntensityZeroIntensity will be requested when the pull request is marked ready for reviewZeroIntensity is a code owner

@savannahostrowskisavannahostrowskiAwaiting requested review from savannahostrowskisavannahostrowski will be requested when the pull request is marked ready for reviewsavannahostrowski is a code owner

@diegorussodiegorussoAwaiting requested review from diegorussodiegorusso will be requested when the pull request is marked ready for reviewdiegorusso is a code owner

@brettcannonbrettcannonAwaiting requested review from brettcannonbrettcannon will be requested when the pull request is marked ready for reviewbrettcannon is a code owner

@ncoghlanncoghlanAwaiting requested review from ncoghlanncoghlan will be requested when the pull request is marked ready for reviewncoghlan is a code owner

@warsawwarsawAwaiting requested review from warsawwarsaw will be requested when the pull request is marked ready for reviewwarsaw is a code owner

@kumaraditya303kumaraditya303Awaiting requested review from kumaraditya303kumaraditya303 will be requested when the pull request is marked ready for reviewkumaraditya303 is a code owner

@erlend-aaslanderlend-aaslandAwaiting requested review from erlend-aaslanderlend-aasland will be requested when the pull request is marked ready for reviewerlend-aasland is a code owner

@AA-TurnerAA-TurnerAwaiting requested review from AA-TurnerAA-Turner will be requested when the pull request is marked ready for reviewAA-Turner is a code owner

@emmatypingemmatypingAwaiting requested review from emmatypingemmatyping will be requested when the pull request is marked ready for reviewemmatyping is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

8 participants

@Fidget-Spinner@colesbury@gvanrossum@brandtbucher@corona10@markshannon@nineteendo@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp