Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
GH-115802: JIT "small" code for Windows#115964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm sorry, I realized halfway through the review that this touches upon several large blank spaces in my brain. So this is not a very useful review. I do think that_PyOptimize_Optimize()
belongs in an internal header though.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
assert(oparg >= FVC_STR && oparg <= FVC_ASCII); | ||
conv_fn =CONVERSION_FUNCTIONS[oparg]; | ||
conv_fn =_PyEval_ConversionFuncs[oparg]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could makeinst(CONVERT_VALUE
replicate(4) inst(CONVERT_VALUE
and rely on clang removing the lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is there a way to do that where a fifth generic version isn't generated too? Otherwise, it doesn't help, so we might as well stay with this?
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good, I've a few comments.
As a general comment regarding JIT development:
Don't be afraid to make small changes tobytecodes.c
and other interpreter files to enable the JIT to work more smoothly.
PyAPI_DATA(const binaryfunc) _PyEval_BinaryOps[]; | ||
PyAPI_DATA(const conversion_func) _PyEval_ConversionFuncs[]; | ||
PyAPI_FUNC(int) _PyEval_CheckExceptStarTypeValid(PyThreadState *tstate, PyObject* right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Rather than exposing all these symbols, could we put the function pointers in a struct, and pass that to the JIT as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The issue isn't finding symbols when jitting the code (jit.c
is linked into the main executable and can find everything just fine).
The purpose of addingPyAPI_FUNC(...)
andPyAPI_DATA(...)
to the declarations is so they are declared with__declspec(dllimport)
when compiling the templates. This makes Clang emit indirect memory accesses on Windows (similar to position-independent code on other platforms).
bedevere-bot commentedFeb 29, 2024
|
Uh oh!
There was an error while loading.Please reload this page.
For 32-bit Windows, the small code model works fine out-of-the-box.
For 64-bit Windows, we don't have position-independent code with GOT relocations like Linux and macOS. Instead, we compile the stencils like a
Py_BUILD_CORE_MODULE
binary extension module, which creates a level of indirection similar to a GOT (__impl_Py_XXX
holds the address ofPy_XXX
). We process this just like the GOT on other platforms, and everything just works.This does require adding
PyAPI_FUNC
andPyAPI_DATA
to some symbols in internal headers to get the correct visibility, which is why this PR touches so many files.Looks like this makes the Windows JIT~3-4% faster.