Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-133711: Enable UTF-8 mode by default (PEP 686)#133712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
AA-Turner wants to merge8 commits intopython:main
base:main
Choose a base branch
Loading
fromAA-Turner:utf-8-mode

Conversation

AA-Turner
Copy link
Member

@AA-TurnerAA-Turner commentedMay 8, 2025
edited by github-actionsbot
Loading

@StanFromIreland
Copy link
Contributor

Can someone please (I don't have permissions to order it) runthis buildbot to verify it clears up#133677 for 3.15.

@AA-Turner

This comment was marked as resolved.

@methane
Copy link
Member

test_python_legacy_windows_stdio tests pipe encoding, but it should test console I/O encoding.
cc:@zooba

@methane
Copy link
Member

assert_python_ok uses PIPE for stdin/stdout/stderr.
spawn_python doesn't use PIPE for stderr. So this test can be rewritten like this.
But this test still requires the test is running in Console.

diff --git a/Lib/test/test_cmd_line.py b/Lib/test/test_cmd_line.pyindex 1b40e0d05fe..243069aeb18 100644--- a/Lib/test/test_cmd_line.py+++ b/Lib/test/test_cmd_line.py@@ -972,10 +972,12 @@ def test_python_legacy_windows_fs_encoding(self):     @unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')     def test_python_legacy_windows_stdio(self):-        code = "import sys; print(sys.stdin.encoding, sys.stdout.encoding)"+        # stdin and stdout are PIPE. So we check stderr encoding for Console I/O.+        code = "import sys; print(sys.stderr.encoding)"         expected = 'cp'-        rc, out, err = assert_python_ok('-c', code, PYTHONLEGACYWINDOWSSTDIO='1')-        self.assertIn(expected.encode(), out)+        p = spawn_python('-c', code, env={"PYTHONLEGACYWINDOWSSTDIO": "1"})+        out, rc = _kill_python_and_exit_code(p)+        self.assertRegex(rb'\Acp\d+\Z', out.strip())

(I don't test it yet because I don't use Windows daily.)

If the UTF-8 mode is disabled, the interpreter defaults to using
the current locale settings, *unless* the current locale is identified
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`),
and locale coercion is either disabled or fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

IsPEP 538: Coercing the legacy C locale to a UTF-8 based locale still relevant if UTF-8 mode is enabled by default? It may make disabling the UTF-8 mode more complicated. It's just an open question, I don't have the answer.

vasily-v-ryabov reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When UTF-8 mode is disabled:

  • If locale is not C or POSIX: locale encoding is used.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is not set, locale is changed to C.UTF-8.
    • Although UTF-8 mode is disabled, locale encoding is UTF-8.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is set, locale encoding will be ASCII.

* Python UTF-8 mode is now enabled by default.
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
an environment variable or by using the :option:`-X utf8=0 <-X>` flag.
See :pep:`686` for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I feel like we can probably put some more explanation in here, such as that it affects TextIOWrapper and henceopen(). The current description doesn't sound as scary as it needs to, in my opinion.

Along the lines of: "Python UTF-8 mode is now enabled by default. This means that (files/console/etc.) will now use UTF-8 regardless of system settings, unless specifically overridden in code (typically with anencoding= argument)."

vstinner reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To be clear, it's nothing new. But we shouldn't assume that everyone already knows what UTF-8 mode implies. There are many more people out there who haven't ever thought about it than those who are waiting for it to be the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Another effect of the UTF-8 Mode is that Python ignores the locale encoding.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@FFY00FFY00 removed their request for reviewMay 10, 2025 02:28
@@ -75,7 +75,30 @@ New features
Other language changes
======================


* Python now uses UTF-8_ as the default encoding, independent of the system's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You might mention the UTF-8 Mode earlier since it has other side effects documented in the UTF-8 Mode section, such as changing sys.stdout error handler and ignoring the locale encoding.

@AA-Turner

This comment was marked as resolved.

@bedevere-bot

This comment was marked as resolved.

code = 'import sys; print(type(sys.stderr.buffer.raw))'
env = {'PYTHONLEGACYWINDOWSSTDIO': str(int(legacy_windows_stdio))}
# use stderr=None as legacy_windows_stdio doesn't affect pipes
p = spawn_python('-c', code, env=env, stderr=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

GitHub Action would run test with pipe, not with console.
In such case, stderr=None is still pipe.

Addingcreationflags=CREATE_NEW_CONSOLE would allocate new console for subprocess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

creationflags=CREATE_NEW_CONSOLE didn't fix test on GitHub Action...

@@ -972,10 +976,19 @@ def test_python_legacy_windows_fs_encoding(self):

@unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
@unittest.skipUnless(support.MS_WINDOWS,'Test only applicable on Windows')
@unittest.skipUnless(type(sys.stderr.buffer.raw).__name__=="_WindowsConsoleIO",
"Test only applicable on Windows with console IO")

@StanFromIreland
Copy link
Contributor

Fails on buildbot:

======================================================================FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)----------------------------------------------------------------------Traceback (most recent call last):  File "/home/stan/buildarea/pull_request.stan-raspbian/build/Lib/test/test_readline.py", line 315, in test_nonascii    self.assertIn(b"result " + expected + b"\r\n", output)    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^AssertionError: b"result '[\\xefnserted]|t\\xebxt[after]'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xefnserted]|t\xeb[after]\x08\x08\x08\x08\x08\x08\x08text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\n\x07text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\nsubstitution \'t\\xeb\'\r\nmatches [\'t\\xebnt\', \'t\\xebxt\']\r\n\x1b[1@x\x1b[1@t\r\nresult \'[\\udcefnserted]|t\\udcebxt[after]\'\r\nhistory \'[\\xefnserted]|t\\xebxt[after]\'\r\n")----------------------------------------------------------------------Ran 14 tests in 2.151s

Presumably a result of this pr since I have never seen this fail before. At least all the tests in#133677 are no longer problematic.

@vstinner
Copy link
Member

FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)

I suppose that you looked at ARM64 Raspbian PR. This buildbot has a special locale encoding:

== encodings: locale=ISO-8859-1 FS=utf-8

test.pythoninfo:

locale.getencoding: ISO-8859-1
os.environ[LANG]: en_IE
os.environ[LC_ALL]: en_IE

The locale en_IE doesn't use UTF-8 but ISO-8859-1. I can reproduce the issue withfr_FR locale which also uses ISO-8859-1:

$ LANG=fr_FR ./python -m test -v test_readline  -u all...FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)...

I can also reproduce the issue in the main branch using thefr_FR locale and the command:

PYTHONUTF8=1 LANG=fr_FR ./python -m test -v test_readline  -u all

@vstinner
Copy link
Member

test_cmd_line fail on Windows. You can try@methane's suggestion.

FAIL: test_python_legacy_windows_stdio (test.test_cmd_line.CmdLineTest.test_python_legacy_windows_stdio)----------------------------------------------------------------------Traceback (most recent call last):  File "D:\a\cpython\cpython\Lib\test\test_cmd_line.py", line 991, in test_python_legacy_windows_stdio    self.assertEqual('_io._WindowsConsoleIO', out)    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^AssertionError: '_io._WindowsConsoleIO' != '_io.FileIO'- _io._WindowsConsoleIO+ _io.FileIO

@AA-Turner
Copy link
MemberAuthor

I've paused work on this PR as Serhiy asked to wait until all issues with running tests in a non ASCII/UTF8 locale have been fixed.

I'd like to try and find a solution to properly testing legacy windows stdio whilst on CI, but I agree that Inada-san's suggestion will work otherwise.

@methane
Copy link
Member

I have created PR to fix test_python_legacy_windows_stdio.
#134080

@methane
Copy link
Member

I can reproduce the test_readline fail without UTF-8 mode on macOS & main branch.

$ ./python.exe -c 'import sys; print(sys.flags.utf8_mode)'0$ export LANG=en_US.ISO8859-1$ localeLANG="en_US.ISO8859-1"LC_COLLATE="en_US.ISO8859-1"LC_CTYPE="en_US.ISO8859-1"LC_MESSAGES="en_US.ISO8859-1"LC_MONETARY="en_US.ISO8859-1"LC_NUMERIC="en_US.ISO8859-1"LC_TIME="en_US.ISO8859-1"LC_ALL=$ ./python.exe Lib/test/test_readline.pyreadline version: 0x402readline runtime version: 0x402readline library version: 'EditLine wrapper'use libedit emulation? Trues..s.....s.F..======================================================================FAIL: test_nonascii (__main__.TestReadline.test_nonascii)----------------------------------------------------------------------Traceback (most recent call last):  File "/Users/inada-n/work/python/cpython/Lib/test/test_readline.py", line 298, in test_nonascii    self.assertIn(b"text 't\\xeb'\r\n", output)    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xefnserted]|t\xeb[after]\x08\x08\x08\x08\x08\x08\x08\x1b[1@x[\x08\r\nresult \'[\\xefnserted]|t\\xebx[after]\'\r\nhistory \'[\\xefnserted]|t\\xebx[after]\'\r\n")----------------------------------------------------------------------Ran 14 tests in 0.235sFAILED (failures=1, skipped=3)

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@vstinnervstinnervstinner left review comments

@methanemethanemethane left review comments

@zoobazoobazooba left review comments

@gpsheadgpsheadAwaiting requested review from gpsheadgpshead is a code owner

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently is a code owner

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@AA-Turner@StanFromIreland@methane@bedevere-bot@vstinner@zooba

[8]ページ先頭

©2009-2025 Movatter.jp