Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-94526: Force utf8 encoding in _bootstrap_python#96889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303kumaraditya303 commentedSep 17, 2022
edited by bedevere-bot
Loading

Before fix:

Current thread 0x00007f716f57f280 (most recent call first):  <no Python frame>make: *** [Makefile:1247: Python/frozen_modules/_collections_abc.h] Error 1Exception ignored error evaluating path:Traceback (most recent call last):  File "<frozen getpath>", line 349, in <module>ModuleNotFoundError: No module named 'encodings'Fatal Python error: error evaluating pathPython runtime state: core initializedCurrent thread 0x00007ff524566280 (most recent call first):  <no Python frame>make: *** [Makefile:1250: Python/frozen_modules/_sitebuiltins.h] Error 1

After fix it compiles successfully in paths containing non ascii characters.

Closes#94526

@kumaraditya303kumaraditya303 added type-bugAn unexpected behavior, bug, or error interpreter-core(Objects, Python, Grammar, and Parser dirs) needs backport to 3.10only security fixes needs backport to 3.11only security fixes and removed needs backport to 3.10only security fixes labelsSep 17, 2022
@kumaraditya303kumaraditya303 self-assigned thisSep 17, 2022
@gvanrossum
Copy link
Member

I have to ask@tiran to review this, I don't know how that stuff works.

Copy link
Member

@ericsnowcurrentlyericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, with one minor suggestion.

Like@gvanrossum, I would like to know if@tiran has any concerns.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Comment on lines 66 to 73
PyPreConfig preconfig;
PyPreConfig_InitIsolatedConfig(&preconfig);
// Force utf8 encoding
preconfig.utf8_mode = 1;
status = Py_PreInitialize(&preconfig);
if (PyStatus_Exception(status)) {
Py_ExitStatusException(status);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

tl;dr this PR initializes the preconfig differently, but thatmight be okay (or even what we actually want).

FWIW, there is a subtle difference here.

analysis:

before:

  1. initialize the config (PyConfig)
  2. set argv on the config
  3. callPyConfig_Read()
    a. call_Py_PreInitializeFromConfig()
    1. call_PyPreConfig_InitFromConfig()<-- initialize the preconfig
      a. callPyPreConfig_InitIsolatedConfig()
      b. call_PyPreConfig_GetConfig() -- copies from config:
      • parse_argv =1 (set on the original line 71)
      • isolated =0 (set on the original line 72)
      • use_environment =0 (set inPyConfig_InitIsolatedConfig())
      • dev_mode =0 (set inPyConfig_InitIsolatedConfig())
    2. call_Py_PreInitializeFromPyArgv()<-- use the preconfig
      a. call_PyPreConfig_Read()<-- uses argv from config
      b. call_PyPreConfig_Write()<-- updates the runtime state
  4. callPy_InitializeFromConfig()

after:

  1. initialize the preconfig (PyPreConfig)
    a. callPyPreConfig_InitIsolatedConfig()<-- initialize the preconfig
    • parse_argv =0 (set in_PyPreConfig_InitCompatConfig())
    • isolated =1
    • use_environment =0
    • dev_mode =0
      b. set utf8_mode
  2. callPy_PreInitialize()
    1. call_Py_PreInitializeFromPyArgv()<-- use the preconfig
      a. call_PyPreConfig_Read()<-- argv not used
      b. call_PyPreConfig_Write()<-- updates the runtime state
  3. initialize the config (PyConfig)
  4. set argv on the config
  5. callPyConfig_Read()
    a. call_Py_PreInitializeFromConfig()
    1. skip (runtime->preinitialized is true)
  6. callPy_InitializeFromConfig()

The key difference is how the preconfig is initialized:

  • before: with_PyPreConfig_InitFromConfig() (viaPyConfig_Read()) using the already initialized config
  • after: withPyPreConfig_InitIsolatedConfig()

(When the preconfig is initialized is also different, but that doesn't really affect the outcome.)

Consequently,preconfig.parse_argv andpreconfig.isolated are different (which slightly changes how the runtime is initialized) and_PyPreCmdline_SetArgv() is never called (in_PyPreConfig_Read()).

If we wanted thehow to be strictly equivalent, we'd initialize the preconfig after the config:

PyConfigconfig;PyConfig_InitIsolatedConfig(&config);// don't warn, pybuilddir.txt does not exist yetconfig.pathconfig_warnings=0;// parse argumentsconfig.parse_argv=1;// add current script dir to sys.pathconfig.isolated=0;config.safe_path=0;#ifdefMS_WINDOWSstatus=PyConfig_SetArgv(&config,argc,argv);#elsestatus=PyConfig_SetBytesArgv(&config,argc,argv);#endifif (PyStatus_Exception(status)) {        gotoerror;    }PyPreConfigpreconfig;_PyPreConfig_InitFromConfig(&preconfig,&config);// Force utf8 encodingpreconfig.utf8_mode=1;_PyArgvconfig_args= {        .use_bytes_argv=0,        .argc=config->argv.length,        .wchar_argv=config->argv.items};status=_Py_PreInitializeFromPyArgv(&preconfig,&config_args);if (PyStatus_Exception(status)) {Py_ExitStatusException(status);    }status=PyConfig_Read(&config);

All that said, it isn't clear that difference in runtime init is bad. In fact, it might be what we want for_bootstrap_python. The difference might also not matter for_bootstrap_python, especially for how we're parsingargv. Regardless, my point is that it isn't clear that these details were considered. It would be worth making sure that the resulting runtime state is something we're okay with.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

FWIW for what_bootstrap_python is used, which is just to run one or two scripts this seems sufficient. It isn't something user facing or anything and it is "internal". I am hoping@vstinner can take a look as he authored thePyConfig PEPhttps://peps.python.org/pep-0587/ and he is most familiar with these APIs.

I did this PR mostly by reading the source code and in particularthis example from the PEP.

ericsnowcurrently reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You must not and cannot pre-initialize (PyPreConfig) Python once it's already initialized (PyConfig).

PyConfig_Read() does not pre-initialize Python if it's already pre-initialized. (If it's the case, it's would a bug.)

PyStatus_Py_PreInitializeFromConfig(constPyConfig*config,const_PyArgv*args){    ...if (runtime->preinitialized) {/* Already initialized: do nothing */return_PyStatus_OK();    }    ...}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Calling PyConfig_SetArgv() first to then call _Py_PreInitializeFromPyArgv() doesn't work, since PyConfig_SetArgv() does pre-initialize Python :-)

https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.PyConfig_SetArgv

@vstinner
Copy link
Member

Consequently, preconfig.parse_argv and preconfig.isolated are different (which slightly changes how the runtime is initialized) and _PyPreCmdline_SetArgv() is never called (in _PyPreConfig_Read()).

It's unfortunate that PyPreConfig and PyConfig are separated and that some options are redundant. You might have to set the same options in the two structures:

preconfig.isolated = 0;preconfig.parse_argv = 1;...config.isolated = 0;config.parse_argv = 1;

But I'm not sure why you start from an isolated config to disable isolation and use parse_argv=1. If you want a Python which looks like regular Python, use the PythonConfig rather than the IsolatedConfig.

@vstinner
Copy link
Member

PyPreConfig parses argv to set isolated (-I), use_environment (-E) and dev_mode (-X utf8 or-X utf8=1). But it seems like _bootstrap_python should not depend on environment variables and should not enable the development mode.

I'm curious why isolated=0 is used. Currently, _bootstrap_python.c uses:

    // add current script dir to sys.path    config.isolated = 0;

Hum, that's the purpose of the new safe_path option. Problem: if isolated=1, safe_path is always set to 1. Currently, it's not possible to have isolated=1and safe_path=0. Maybe the API should be enhanced to support this use case. In general, I tried to always take the highest priority for PyConfig members if a member is set explicitly.

@kumaraditya303
Copy link
ContributorAuthor

@vstinner: I agree with your analysis (after all you are the author :)). I was also confusedconfig.isolated = 0; but this PR needs to be backported so I left it for another PR to fix up.

Can you review the utf8 part and if it is set correctly? Thank you!

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not really convinced that this change is the correct fix for the root issue. Please see my analysis in my comment#94526 (comment) and my PR#97645 which fix the root issue.

@kumaraditya303
Copy link
ContributorAuthor

Superseded by#97645

@kumaraditya303kumaraditya303 deleted the utf8-deepfreeze branchSeptember 30, 2022 13:06
@vstinner
Copy link
Member

IMO#97645 fixed the root issue and this change is no longer needed.

I'm not convinced that _bootstrap_python requires to enforce the UTF-8 Mode, since I spent significant time over the last years to make sure that it's possible to start Python with basically any possible locale encoding ;-) During Python startup, Py_EncodeLocale() and Py_DecodeLocale() use the C library (libc) to encode/decode strings, until Python gets access to its own codecs.

@vstinner
Copy link
Member

At least, for me, it was interesting to see how the Python preconfiguration API can be used :-D

@gvanrossum
Copy link
Member

Well, I'm just glad I didn't blindly accepted it. :-)

@kumaraditya303
Copy link
ContributorAuthor

As long as the bug is fixed I'm happy :-)

But as I pointed out in the other PR that the fix only fixed only specific cases, it is possible that this happens again if runtime initialization changes. This PR would have avoided that entirely but it is rare so itshould be fine.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ericsnowcurrentlyericsnowcurrentlyericsnowcurrently requested changes

@vstinnervstinnervstinner requested changes

@gvanrossumgvanrossumAwaiting requested review from gvanrossum

@tirantiranAwaiting requested review from tiran

Assignees

@kumaraditya303kumaraditya303

Labels
awaiting changesinterpreter-core(Objects, Python, Grammar, and Parser dirs)needs backport to 3.11only security fixestype-bugAn unexpected behavior, bug, or error
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Exception in ucs2lib_utf8_encoder in _bootstrap_python
5 participants
@kumaraditya303@gvanrossum@bedevere-bot@vstinner@ericsnowcurrently

[8]ページ先頭

©2009-2025 Movatter.jp