Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-94526: getpath_dirname() no longer encodes the path#97645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
vstinner merged 1 commit intopython:mainfromvstinner:getpath_unicode
Sep 30, 2022
Merged

gh-94526: getpath_dirname() no longer encodes the path#97645

vstinner merged 1 commit intopython:mainfromvstinner:getpath_unicode
Sep 30, 2022

Conversation

vstinner
Copy link
Member

@vstinnervstinner commentedSep 29, 2022
edited
Loading

Fix the Python path configuration used to initialized sys.path at
Python startup. getpath_basename() and getpath_dirname() functions no
longer encode the path to UTF-8/strict to avoid encoding errors if it
contains surrogate characters (created by decoding a bytes path with
the surrogateescape error handler).

The functions now use PyUnicode_FindChar() and PyUnicode_Substring()
on the Unicode path, rather than strrchr() on the encoded bytes
string.

@vstinner
Copy link
MemberAuthor

Fix building Python in a non-ASCII path

Well. In fact, the issue is broader: no only _bootstrap_python is affected, anypython program is affected since theModules/getpath.c code is used by all Python executables.

@vstinner
Copy link
MemberAuthor

I rebased and updated the PR to clarify that this issue affects the Python path configuration (sys.path creation).

@vstinner
Copy link
MemberAuthor

Sadly,Modules/getpath.c is not a regular extension module, it cannot be loaded in test_getpath to easily write unit tests.

There are getpath_methods which are injected inside a namespace (dict) by funcs_to_dict() function.

It may be interesting to convert it to a regular extension module (_getpath?).

const char *path;
if (!PyArg_ParseTuple(args, "s", &path)) {
PyObject *path;
if (!PyArg_ParseTuple(args, "U", &path)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

BTW, I would useMETH_O andPyArg_Parse() in these functions, but this is another issue.

Why cannot they be implemented in Python?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

BTW, I would use METH_O and PyArg_Parse() in these functions, but this is another issue.

I tried to minimize the changes.

Why cannot they be implemented in Python?

Ask@zooba who designed this. Maybe it can be changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Perf, mostly. These trivial ones probably could be, but don't fall into the trap of trying to port the full ntpath/posixpath implementations into getpath - we don't have a lot of the functionality needed to handle those at this stage (e.g. no codecs, no os module).

Fix the Python path configuration used to initialized sys.path atPython startup. Paths are no longer encoded to UTF-8/strict to avoidencoding errors if it contains surrogate characters (bytes paths aredecoded with the surrogateescape error handler).getpath_basename() and getpath_dirname() functions no longer encodethe path to UTF-8/strict, but work directly on Unicode strings. Thesefunctions now use PyUnicode_FindChar() and PyUnicode_Substring() onthe Unicode path, rather than strrchr() on the encoded bytes string.
@vstinner
Copy link
MemberAuthor

@serhiy-storchaka:

The NEWS entry is meant to be read by a common Python user, not a core dev. getpath_basename and getpath_dirname are not Python function. Could you rewrite this?

I rephrased the NEWS entry to omit function names. Is it better? I only named functions in the commit message.

@kumaraditya303
Copy link
Contributor

Although this fixes the issue, this is a bit fragile since it can break if any other function were to use utf8 handler for encoding.

This PR avoids the case, can you add a comment that utf8 should be avoided here?

@vstinnervstinner merged commit9f2f1dd intopython:mainSep 30, 2022
@miss-islington
Copy link
Contributor

Thanks@vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry@vstinner, I had trouble checking out the3.11 backport branch.
Please backport usingcherry_picker on command line.
cherry_picker 9f2f1dd131b912e224cd0269adde8879799686c4 3.11

@vstinnervstinner deleted the getpath_unicode branchSeptember 30, 2022 12:58
@vstinner
Copy link
MemberAuthor

Although this fixes the issue, this is a bit fragile since it can break if any other function were to use utf8 handler for encoding.

Yes, a regression can be introduced again tomorrow. Well, we can fix it in this case :-)

This PR avoids the case, can you add a comment that utf8 should be avoided here?

I'm not sure about the intent of a comment explaining that UTF-8 should not be used, since the modified functions now use Unicode (no encode/decode).

@vstinnervstinner added needs backport to 3.11only security fixes and removed needs backport to 3.11only security fixes labelsSep 30, 2022
@miss-islington
Copy link
Contributor

Thanks@vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-97677 is a backport of this pull request to the3.11 branch.

@bedevere-botbedevere-bot removed the needs backport to 3.11only security fixes labelSep 30, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestSep 30, 2022
…H-97645)Fix the Python path configuration used to initialized sys.path atPython startup. Paths are no longer encoded to UTF-8/strict to avoidencoding errors if it contains surrogate characters (bytes paths aredecoded with the surrogateescape error handler).getpath_basename() and getpath_dirname() functions no longer encodethe path to UTF-8/strict, but work directly on Unicode strings. Thesefunctions now use PyUnicode_FindChar() and PyUnicode_Substring() onthe Unicode path, rather than strrchr() on the encoded bytes string.(cherry picked from commit9f2f1dd)Co-authored-by: Victor Stinner <vstinner@python.org>
miss-islington added a commit that referenced this pull requestSep 30, 2022
Fix the Python path configuration used to initialized sys.path atPython startup. Paths are no longer encoded to UTF-8/strict to avoidencoding errors if it contains surrogate characters (bytes paths aredecoded with the surrogateescape error handler).getpath_basename() and getpath_dirname() functions no longer encodethe path to UTF-8/strict, but work directly on Unicode strings. Thesefunctions now use PyUnicode_FindChar() and PyUnicode_Substring() onthe Unicode path, rather than strrchr() on the encoded bytes string.(cherry picked from commit9f2f1dd)Co-authored-by: Victor Stinner <vstinner@python.org>
serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this pull requestOct 2, 2022
…97645)Fix the Python path configuration used to initialized sys.path atPython startup. Paths are no longer encoded to UTF-8/strict to avoidencoding errors if it contains surrogate characters (bytes paths aredecoded with the surrogateescape error handler).getpath_basename() and getpath_dirname() functions no longer encodethe path to UTF-8/strict, but work directly on Unicode strings. Thesefunctions now use PyUnicode_FindChar() and PyUnicode_Substring() onthe Unicode path, rather than strrchr() on the encoded bytes string.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@zoobazoobazooba left review comments

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

Assignees

@vstinnervstinner

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@vstinner@kumaraditya303@miss-islington@bedevere-bot@zooba@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp