Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
GH-94526: Force utf8 encoding in _bootstrap_python#96889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
I have to ask@tiran to review this, I don't know how that stuff works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, with one minor suggestion.
Like@gvanrossum, I would like to know if@tiran has any concerns.
Uh oh!
There was an error while loading.Please reload this page.
bedevere-bot commentedSep 26, 2022
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Programs/_bootstrap_python.c Outdated
PyPreConfig preconfig; | ||
PyPreConfig_InitIsolatedConfig(&preconfig); | ||
// Force utf8 encoding | ||
preconfig.utf8_mode = 1; | ||
status = Py_PreInitialize(&preconfig); | ||
if (PyStatus_Exception(status)) { | ||
Py_ExitStatusException(status); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
tl;dr this PR initializes the preconfig differently, but thatmight be okay (or even what we actually want).
FWIW, there is a subtle difference here.
analysis:
before:
- initialize the config (
PyConfig
) - set argv on the config
- call
PyConfig_Read()
a. call_Py_PreInitializeFromConfig()
- call
_PyPreConfig_InitFromConfig()
<-- initialize the preconfig
a. callPyPreConfig_InitIsolatedConfig()
b. call_PyPreConfig_GetConfig()
-- copies from config:parse_argv
=1 (set on the original line 71)isolated
=0 (set on the original line 72)use_environment
=0 (set inPyConfig_InitIsolatedConfig()
)dev_mode
=0 (set inPyConfig_InitIsolatedConfig()
)
- call
_Py_PreInitializeFromPyArgv()
<-- use the preconfig
a. call_PyPreConfig_Read()
<-- uses argv from config
b. call_PyPreConfig_Write()
<-- updates the runtime state
- call
- call
Py_InitializeFromConfig()
after:
- initialize the preconfig (
PyPreConfig
)
a. callPyPreConfig_InitIsolatedConfig()
<-- initialize the preconfigparse_argv
=0 (set in_PyPreConfig_InitCompatConfig()
)isolated
=1use_environment
=0dev_mode
=0
b. set utf8_mode
- call
Py_PreInitialize()
- call
_Py_PreInitializeFromPyArgv()
<-- use the preconfig
a. call_PyPreConfig_Read()
<-- argv not used
b. call_PyPreConfig_Write()
<-- updates the runtime state
- call
- initialize the config (
PyConfig
) - set argv on the config
- call
PyConfig_Read()
a. call_Py_PreInitializeFromConfig()
- skip (
runtime->preinitialized
is true)
- skip (
- call
Py_InitializeFromConfig()
The key difference is how the preconfig is initialized:
- before: with
_PyPreConfig_InitFromConfig()
(viaPyConfig_Read()
) using the already initialized config - after: with
PyPreConfig_InitIsolatedConfig()
(When the preconfig is initialized is also different, but that doesn't really affect the outcome.)
Consequently,preconfig.parse_argv
andpreconfig.isolated
are different (which slightly changes how the runtime is initialized) and_PyPreCmdline_SetArgv()
is never called (in_PyPreConfig_Read()
).
If we wanted thehow to be strictly equivalent, we'd initialize the preconfig after the config:
PyConfigconfig;PyConfig_InitIsolatedConfig(&config);// don't warn, pybuilddir.txt does not exist yetconfig.pathconfig_warnings=0;// parse argumentsconfig.parse_argv=1;// add current script dir to sys.pathconfig.isolated=0;config.safe_path=0;#ifdefMS_WINDOWSstatus=PyConfig_SetArgv(&config,argc,argv);#elsestatus=PyConfig_SetBytesArgv(&config,argc,argv);#endifif (PyStatus_Exception(status)) { gotoerror; }PyPreConfigpreconfig;_PyPreConfig_InitFromConfig(&preconfig,&config);// Force utf8 encodingpreconfig.utf8_mode=1;_PyArgvconfig_args= { .use_bytes_argv=0, .argc=config->argv.length, .wchar_argv=config->argv.items};status=_Py_PreInitializeFromPyArgv(&preconfig,&config_args);if (PyStatus_Exception(status)) {Py_ExitStatusException(status); }status=PyConfig_Read(&config);
All that said, it isn't clear that difference in runtime init is bad. In fact, it might be what we want for_bootstrap_python
. The difference might also not matter for_bootstrap_python
, especially for how we're parsingargv
. Regardless, my point is that it isn't clear that these details were considered. It would be worth making sure that the resulting runtime state is something we're okay with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
FWIW for what_bootstrap_python
is used, which is just to run one or two scripts this seems sufficient. It isn't something user facing or anything and it is "internal". I am hoping@vstinner can take a look as he authored thePyConfig
PEPhttps://peps.python.org/pep-0587/ and he is most familiar with these APIs.
I did this PR mostly by reading the source code and in particularthis example from the PEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You must not and cannot pre-initialize (PyPreConfig) Python once it's already initialized (PyConfig).
PyConfig_Read() does not pre-initialize Python if it's already pre-initialized. (If it's the case, it's would a bug.)
PyStatus_Py_PreInitializeFromConfig(constPyConfig*config,const_PyArgv*args){ ...if (runtime->preinitialized) {/* Already initialized: do nothing */return_PyStatus_OK(); } ...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Calling PyConfig_SetArgv() first to then call _Py_PreInitializeFromPyArgv() doesn't work, since PyConfig_SetArgv() does pre-initialize Python :-)
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.PyConfig_SetArgv
It's unfortunate that PyPreConfig and PyConfig are separated and that some options are redundant. You might have to set the same options in the two structures:
But I'm not sure why you start from an isolated config to disable isolation and use parse_argv=1. If you want a Python which looks like regular Python, use the PythonConfig rather than the IsolatedConfig. |
PyPreConfig parses argv to set isolated ( I'm curious why isolated=0 is used. Currently, _bootstrap_python.c uses:
Hum, that's the purpose of the new safe_path option. Problem: if isolated=1, safe_path is always set to 1. Currently, it's not possible to have isolated=1and safe_path=0. Maybe the API should be enhanced to support this use case. In general, I tried to always take the highest priority for PyConfig members if a member is set explicitly. |
@vstinner: I agree with your analysis (after all you are the author :)). I was also confused Can you review the utf8 part and if it is set correctly? Thank you! |
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not really convinced that this change is the correct fix for the root issue. Please see my analysis in my comment#94526 (comment) and my PR#97645 which fix the root issue.
Superseded by#97645 |
IMO#97645 fixed the root issue and this change is no longer needed. I'm not convinced that _bootstrap_python requires to enforce the UTF-8 Mode, since I spent significant time over the last years to make sure that it's possible to start Python with basically any possible locale encoding ;-) During Python startup, Py_EncodeLocale() and Py_DecodeLocale() use the C library (libc) to encode/decode strings, until Python gets access to its own codecs. |
At least, for me, it was interesting to see how the Python preconfiguration API can be used :-D |
Well, I'm just glad I didn't blindly accepted it. :-) |
As long as the bug is fixed I'm happy :-) But as I pointed out in the other PR that the fix only fixed only specific cases, it is possible that this happens again if runtime initialization changes. This PR would have avoided that entirely but it is rare so itshould be fine. |
Uh oh!
There was an error while loading.Please reload this page.
Before fix:
After fix it compiles successfully in paths containing non ascii characters.
Closes#94526