Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
GH-133711: Enable UTF-8 mode by default (PEP 686)#133712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
test_python_legacy_windows_stdio tests pipe encoding, but it should test console I/O encoding. |
assert_python_ok uses PIPE for stdin/stdout/stderr.
(I don't test it yet because I don't use Windows daily.) |
Uh oh!
There was an error while loading.Please reload this page.
If the UTF-8 mode is disabled, the interpreter defaults to using | ||
the current locale settings, *unless* the current locale is identified | ||
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`), | ||
and locale coercion is either disabled or fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
IsPEP 538: Coercing the legacy C locale to a UTF-8 based locale still relevant if UTF-8 mode is enabled by default? It may make disabling the UTF-8 mode more complicated. It's just an open question, I don't have the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
When UTF-8 mode is disabled:
- If locale is not C or POSIX: locale encoding is used.
- If locale is C or POSIX and PYTHONCOERCELOCALE is not set, locale is changed to C.UTF-8.
- Although UTF-8 mode is disabled, locale encoding is UTF-8.
- If locale is C or POSIX and PYTHONCOERCELOCALE is set, locale encoding will be ASCII.
Uh oh!
There was an error while loading.Please reload this page.
Doc/whatsnew/3.15.rst Outdated
* Python UTF-8 mode is now enabled by default. | ||
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as | ||
an environment variable or by using the :option:`-X utf8=0 <-X>` flag. | ||
See :pep:`686` for further details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I feel like we can probably put some more explanation in here, such as that it affects TextIOWrapper and henceopen()
. The current description doesn't sound as scary as it needs to, in my opinion.
Along the lines of: "Python UTF-8 mode is now enabled by default. This means that (files/console/etc.) will now use UTF-8 regardless of system settings, unless specifically overridden in code (typically with anencoding=
argument)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
To be clear, it's nothing new. But we shouldn't assume that everyone already knows what UTF-8 mode implies. There are many more people out there who haven't ever thought about it than those who are waiting for it to be the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Another effect of the UTF-8 Mode is that Python ignores the locale encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've expanded theWhat's New entry, please seehttps://cpython-previews--133712.org.readthedocs.build/en/133712/whatsnew/3.15.html#other-language-changes
Uh oh!
There was an error while loading.Please reload this page.
@@ -75,7 +75,30 @@ New features | |||
Other language changes | |||
====================== | |||
* Python now uses UTF-8_ as the default encoding, independent of the system's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You might mention the UTF-8 Mode earlier since it has other side effects documented in the UTF-8 Mode section, such as changing sys.stdout error handler and ignoring the locale encoding.
Uh oh!
There was an error while loading.Please reload this page.
# Conflicts:#Doc/whatsnew/3.15.rst
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
code = 'import sys; print(type(sys.stderr.buffer.raw))' | ||
env = {'PYTHONLEGACYWINDOWSSTDIO': str(int(legacy_windows_stdio))} | ||
# use stderr=None as legacy_windows_stdio doesn't affect pipes | ||
p = spawn_python('-c', code, env=env, stderr=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
GitHub Action would run test with pipe, not with console.
In such case, stderr=None is still pipe.
Addingcreationflags=CREATE_NEW_CONSOLE
would allocate new console for subprocess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
creationflags=CREATE_NEW_CONSOLE didn't fix test on GitHub Action...
@@ -972,10 +976,19 @@ def test_python_legacy_windows_fs_encoding(self): | |||
@unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@unittest.skipUnless(support.MS_WINDOWS,'Test only applicable on Windows') | |
@unittest.skipUnless(type(sys.stderr.buffer.raw).__name__=="_WindowsConsoleIO", | |
"Test only applicable on Windows with console IO") |
Fails on buildbot:
Presumably a result of this pr since I have never seen this fail before. At least all the tests in#133677 are no longer problematic. |
I suppose that you looked at ARM64 Raspbian PR. This buildbot has a special locale encoding:
test.pythoninfo:
The locale en_IE doesn't use UTF-8 but ISO-8859-1. I can reproduce the issue with
I can also reproduce the issue in the main branch using the
|
test_cmd_line fail on Windows. You can try@methane's suggestion.
|
I've paused work on this PR as Serhiy asked to wait until all issues with running tests in a non ASCII/UTF8 locale have been fixed. I'd like to try and find a solution to properly testing legacy windows stdio whilst on CI, but I agree that Inada-san's suggestion will work otherwise. |
I have created PR to fix test_python_legacy_windows_stdio. |
I can reproduce the test_readline fail without UTF-8 mode on macOS & main branch.
|
Uh oh!
There was an error while loading.Please reload this page.
📚 Documentation preview 📚:https://cpython-previews--133712.org.readthedocs.build/