Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitc89a66f

Browse files
GH-133711: Enable UTF-8 mode by default (PEP 686) (#133712)
Co-authored-by: Victor Stinner <vstinner@python.org>
1 parentf320c95 commitc89a66f

File tree

14 files changed

+93
-85
lines changed

14 files changed

+93
-85
lines changed

‎Doc/c-api/init_config.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -975,9 +975,7 @@ PyPreConfig
975975
Set to ``0`` or ``1`` by the:option:`-X utf8 <-X>` command line option
976976
and the:envvar:`PYTHONUTF8` environment variable.
977977
978-
Also set to ``1`` if the ``LC_CTYPE`` locale is ``C`` or ``POSIX``.
979-
980-
Default: ``-1`` in Python config and ``0`` in isolated config.
978+
Default: ``1``.
981979
982980
983981
.. _c-preinit:

‎Doc/library/os.rst

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,12 @@ Python UTF-8 Mode
108108
..versionadded::3.7
109109
See:pep:`540` for more details.
110110

111+
..versionchanged::next
112+
113+
Python UTF-8 mode is now enabled by default (:pep:`686`).
114+
It may be disabled with by setting:envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
115+
an environment variable or by using the:option:`-X utf8=0 <-X>` command line option.
116+
111117
The Python UTF-8 Mode ignores the:term:`locale encoding` and forces the usage
112118
of the UTF-8 encoding:
113119

@@ -139,31 +145,22 @@ level APIs also exhibit different default behaviours:
139145
default so that attempting to open a binary file in text mode is likely
140146
to raise an exception rather than producing nonsense data.
141147

142-
The:ref:`Python UTF-8 Mode<utf8-mode>` is enabled if the LC_CTYPE locale is
143-
``C`` or ``POSIX`` at Python startup (see the:c:func:`PyConfig_Read`
144-
function).
145-
146-
It can be enabled or disabled using the:option:`-X utf8 <-X>` command line
147-
option and the:envvar:`PYTHONUTF8` environment variable.
148-
149-
If the:envvar:`PYTHONUTF8` environment variable is not set at all, then the
150-
interpreter defaults to using the current locale settings, *unless* the current
151-
locale is identified as a legacy ASCII-based locale (as described for
152-
:envvar:`PYTHONCOERCECLOCALE`), and locale coercion is either disabled or
153-
fails. In such legacy locales, the interpreter will default to enabling UTF-8
154-
mode unless explicitly instructed not to do so.
155-
156-
The Python UTF-8 Mode can only be enabled at the Python startup. Its value
148+
The:ref:`Python UTF-8 Mode<utf8-mode>` is enabled by default.
149+
It can be disabled using the:option:`-X utf8=0 <-X>` command line
150+
option or the:envvar:`PYTHONUTF8=0 <PYTHONUTF8>` environment variable.
151+
The Python UTF-8 Mode can only be disabled at Python startup. Its value
157152
can be read from:data:`sys.flags.utf8_mode <sys.flags>`.
158153

154+
If the UTF-8 mode is disabled, the interpreter defaults to using
155+
the current locale settings, *unless* the current locale is identified
156+
as a legacy ASCII-based locale (as described for:envvar:`PYTHONCOERCECLOCALE`),
157+
and locale coercion is either disabled or fails.
158+
In such legacy locales, the interpreter will default to enabling UTF-8 mode
159+
unless explicitly instructed not to do so.
160+
159161
See also the:ref:`UTF-8 mode on Windows<win-utf8-mode>`
160162
and the:term:`filesystem encoding and error handler`.
161163

162-
..seealso::
163-
164-
:pep:`686`
165-
Python 3.15 will make:ref:`utf8-mode` default.
166-
167164

168165
.. _os-procinfo:
169166

‎Doc/using/windows.rst

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1006,6 +1006,9 @@ UTF-8 mode
10061006
==========
10071007

10081008
..versionadded::3.7
1009+
..versionchanged::next
1010+
1011+
Python UTF-8 mode is now enabled by default (:pep:`686`).
10091012

10101013
Windows still uses legacy encodings for the system encoding (the ANSI Code
10111014
Page). Python uses it for the default encoding of text files (e.g.
@@ -1014,20 +1017,22 @@ Page). Python uses it for the default encoding of text files (e.g.
10141017
This may cause issues because UTF-8 is widely used on the internet
10151018
and most Unix systems, including WSL (Windows Subsystem for Linux).
10161019

1017-
You can use the:ref:`Python UTF-8 Mode<utf8-mode>` to change the default text
1018-
encoding to UTF-8. You can enable the:ref:`Python UTF-8 Mode<utf8-mode>` via
1019-
the ``-X utf8`` command line option, or the ``PYTHONUTF8=1`` environment
1020-
variable. See:envvar:`PYTHONUTF8` for enabling UTF-8 mode, and
1021-
:ref:`setting-envvars` for how to modify environment variables.
1022-
1023-
When the:ref:`Python UTF-8 Mode<utf8-mode>` is enabled, you can still use the
1020+
The:ref:`Python UTF-8 Mode<utf8-mode>`, enabled by default, can help by
1021+
changing the default text encoding to UTF-8.
1022+
When the:ref:`UTF-8 mode<utf8-mode>` is enabled, you can still use the
10241023
system encoding (the ANSI Code Page) via the "mbcs" codec.
10251024

1026-
Note that adding ``PYTHONUTF8=1`` to the default environment variables
1027-
will affect all Python 3.7+ applications on your system.
1028-
If you have any Python 3.7+ applications which rely on the legacy
1029-
system encoding, it is recommended to set the environment variable
1030-
temporarily or use the ``-X utf8`` command line option.
1025+
You can disable the:ref:`Python UTF-8 Mode<utf8-mode>` via
1026+
the ``-X utf8=0`` command line option, or the ``PYTHONUTF8=0`` environment
1027+
variable. See:envvar:`PYTHONUTF8` for disabling UTF-8 mode, and
1028+
:ref:`setting-envvars` for how to modify environment variables.
1029+
1030+
..hint::
1031+
Adding ``PYTHONUTF8={0,1}`` to the default environment variables
1032+
will affect all Python 3.7+ applications on your system.
1033+
If you have any Python 3.7+ applications which rely on the legacy
1034+
system encoding, it is recommended to set the environment variable
1035+
temporarily or use the ``-X utf8`` command line option.
10311036

10321037
..note::
10331038
Even when UTF-8 mode is disabled, Python uses UTF-8 by default

‎Doc/whatsnew/3.15.rst

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,11 +172,35 @@ production systems where traditional profiling approaches would be too intrusive
172172
Other language changes
173173
======================
174174

175+
* Python now usesUTF-8_ as the default encoding, independent of the system's
176+
environment. This means that I/O operations without an explicit encoding,
177+
e.g. ``open('flying-circus.txt')``, will use UTF-8.
178+
UTF-8 is a widely-supportedUnicode_ character encoding that has become a
179+
*de facto* standard for representing text, including nearly every webpage
180+
on the internet, many common file formats, programming languages, and more.
181+
182+
This only applies when no ``encoding`` argument is given. For best
183+
compatibility between versions of Python, ensure that an explicit ``encoding``
184+
argument is always provided. The:ref:`opt-in encoding warning<io-encoding-warning>`
185+
can be used to identify code that may be affected by this change.
186+
The special special ``encoding='locale'`` argument uses the current locale
187+
encoding, and has been supported since Python 3.10.
188+
189+
To retain the previous behaviour, Python's UTF-8 mode may be disabled with
190+
the:envvar:`PYTHONUTF8=0 <PYTHONUTF8>` environment variable or the
191+
:option:`-X utf8=0 <-X>` command line option.
192+
193+
..seealso:::pep:`686` for further details.
194+
195+
.. _UTF-8:https://en.wikipedia.org/wiki/UTF-8
196+
.. _Unicode:https://home.unicode.org/
197+
198+
(Contributed by Adam Turner in:gh:`133711`; PEP 686 written by Inada Naoki.)
199+
175200
* Several error messages incorrectly using the term "argument" have been corrected.
176201
(Contributed by Stan Ulbrych in:gh:`133382`.)
177202

178203

179-
180204
New modules
181205
===========
182206

‎Include/cpython/initconfig.h

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -102,15 +102,14 @@ typedef struct PyPreConfig {
102102

103103
/* Enable UTF-8 mode? (PEP 540)
104104
105-
Disabled by default (equals to 0).
105+
If equal to 1, use the UTF-8 encoding and use "surrogateescape" for the
106+
stdin & stdout error handlers.
106107
107-
Setto 1 by "-X utf8" and "-X utf8=1" command line options.
108-
Set to 1 by PYTHONUTF8=1 environment variable.
108+
Enabled by default (equalto 1; PEP 686), or if Py_UTF8Mode=1,
109+
or if "-X utf8=1" or PYTHONUTF8=1.
109110
110-
Set to 0 by "-X utf8=0" and PYTHONUTF8=0.
111-
112-
If equals to -1, it is set to 1 if the LC_CTYPE locale is "C" or
113-
"POSIX", otherwise it is set to 0. Inherit Py_UTF8Mode value value. */
111+
Set to 0 by "-X utf8=0" or PYTHONUTF8=0.
112+
*/
114113
intutf8_mode;
115114

116115
/* If non-zero, enable the Python Development Mode.

‎Lib/locale.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,8 @@ def getpreferredencoding(do_setlocale=True):
651651
ifsys.flags.warn_default_encoding:
652652
importwarnings
653653
warnings.warn(
654-
"UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead.",
654+
"UTF-8 Mode affects locale.getpreferredencoding(). "
655+
"Consider locale.getencoding() instead.",
655656
EncodingWarning,2)
656657
ifsys.flags.utf8_mode:
657658
return'utf-8'

‎Lib/subprocess.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -380,8 +380,7 @@ def _text_encoding():
380380

381381
ifsys.flags.utf8_mode:
382382
return"utf-8"
383-
else:
384-
returnlocale.getencoding()
383+
returnlocale.getencoding()
385384

386385

387386
defcall(*popenargs,timeout=None,**kwargs):

‎Lib/test/test_cmd_line.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,10 @@ def run_utf8_mode(arg):
300300
cmd= [sys.executable,'-X','utf8','-c',code,arg]
301301
returnsubprocess.run(cmd,stdout=subprocess.PIPE,text=True)
302302

303+
defrun_no_utf8_mode(arg):
304+
cmd= [sys.executable,'-X','utf8=0','-c',code,arg]
305+
returnsubprocess.run(cmd,stdout=subprocess.PIPE,text=True)
306+
303307
valid_utf8='e:\xe9, euro:\u20ac, non-bmp:\U0010ffff'.encode('utf-8')
304308
# invalid UTF-8 byte sequences with a valid UTF-8 sequence
305309
# in the middle.
@@ -312,7 +316,8 @@ def run_utf8_mode(arg):
312316
)
313317
test_args= [valid_utf8,invalid_utf8]
314318

315-
forrun_cmdin (run_default,run_c_locale,run_utf8_mode):
319+
forrun_cmdin (run_default,run_c_locale,run_utf8_mode,
320+
run_no_utf8_mode):
316321
withself.subTest(run_cmd=run_cmd):
317322
forargintest_args:
318323
proc=run_cmd(arg)

‎Lib/test/test_embed.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
543543
'configure_locale':True,
544544
'coerce_c_locale':False,
545545
'coerce_c_locale_warn':False,
546-
'utf8_mode':False,
546+
'utf8_mode':True,
547547
}
548548
ifMS_WINDOWS:
549549
PRE_CONFIG_COMPAT.update({
@@ -560,7 +560,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
560560
configure_locale=False,
561561
isolated=True,
562562
use_environment=False,
563-
utf8_mode=False,
563+
utf8_mode=True,
564564
dev_mode=False,
565565
coerce_c_locale=False,
566566
)
@@ -805,12 +805,6 @@ def get_expected_config(self, expected_preconfig, expected,
805805
'stdio_encoding','stdio_errors'):
806806
expected[key]=self.IGNORE_CONFIG
807807

808-
ifnotexpected_preconfig['configure_locale']:
809-
# UTF-8 Mode depends on the locale. There is no easy way
810-
# to guess if UTF-8 Mode will be enabled or not if the locale
811-
# is not configured.
812-
expected_preconfig['utf8_mode']=self.IGNORE_CONFIG
813-
814808
ifexpected_preconfig['utf8_mode']==1:
815809
ifexpected['filesystem_encoding']isself.GET_DEFAULT_CONFIG:
816810
expected['filesystem_encoding']='utf-8'

‎Lib/test/test_utf8_mode.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,8 @@ def test_env_var(self):
8989
# the UTF-8 mode
9090
ifnotself.posix_locale():
9191
# PYTHONUTF8 should be ignored if -E is used
92-
out=self.get_output('-E','-c',code,PYTHONUTF8='1')
93-
self.assertEqual(out,'0')
92+
out=self.get_output('-E','-c',code,PYTHONUTF8='0')
93+
self.assertEqual(out,'1')
9494

9595
# invalid mode
9696
out=self.get_output('-c',code,PYTHONUTF8='xxx',failure=True)
@@ -116,7 +116,7 @@ def test_filesystemencoding(self):
116116
# PYTHONLEGACYWINDOWSFSENCODING disables the UTF-8 mode
117117
# and has the priority over -X utf8 and PYTHONUTF8
118118
out=self.get_output('-X','utf8','-c',code,
119-
PYTHONUTF8='strict',
119+
PYTHONUTF8='xxx',
120120
PYTHONLEGACYWINDOWSFSENCODING='1')
121121
self.assertEqual(out,'mbcs/replace')
122122

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp