Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-43510: Implement PEP 597 opt-in EncodingWarning.#19481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
methane merged 51 commits intopython:masterfrommethane:open-encoding
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes fromall commits
Commits
Show all changes
51 commits
Select commitHold shift + click to select a range
a3c014b
Raise a warning when encoding is omitted
methaneApr 7, 2020
050bd1b
add test
methaneApr 12, 2020
939f4a0
wrap encoding=None with text_encoding.
methaneApr 12, 2020
3c99777
Add io.LOCALE_ENCODING = "locale"
methaneJan 29, 2021
4016278
Add EncodingWarning.
methaneJan 29, 2021
c5c556c
Add sys.warn_default_encoding
methaneJan 29, 2021
d9a08c2
shorten option names
methaneJan 30, 2021
772648e
EncodingWarning extends Warning
methaneJan 30, 2021
1a8e305
make clinic
methaneJan 30, 2021
20966cd
fix test
methaneJan 30, 2021
2b80f42
remove wrong test case
methaneJan 30, 2021
760308c
fix exception_hierarchy.txt
methaneJan 30, 2021
a95dff2
Make sys.flags.encoding_warning int
methaneJan 31, 2021
31fb411
Fix text_embed.
methaneJan 31, 2021
096a0a3
Fix test_pickle
methaneJan 31, 2021
99fc938
configparser: use io.text_encoding()
methaneFeb 13, 2021
6fdbcbc
Rename option names
methaneFeb 22, 2021
3f362bc
Merge remote-tracking branch 'upstream/master' into open-encoding
methaneMar 16, 2021
674feff
Update docs
methaneMar 16, 2021
d9d850f
Add NEWS entry
methaneMar 16, 2021
16463ea
Add document for text_encoding and encoding="locale".
methaneMar 17, 2021
412d633
Suppress EncodingWarning from site.py
methaneMar 17, 2021
ee883d1
Remove io.LOCALE_ENCODING
methaneMar 18, 2021
6a15e2a
text_encoding() first argument is mandatory.
methaneMar 18, 2021
5d474b4
Apply suggestions from code review
methaneMar 18, 2021
c17016f
Simplify _PyPreCmdline and PyConfig
methaneMar 18, 2021
03f971c
Update EncodingWarning doc
methaneMar 18, 2021
9d26b7a
Update document
methaneMar 19, 2021
60e74cf
tweak warning message
methaneMar 19, 2021
a505b5f
Use stacklevel=2 for text_encoding() default
methaneMar 19, 2021
cbe22e2
fixup
methaneMar 19, 2021
a9f9f04
tweak for readability
methaneMar 19, 2021
3bea88f
make clinic
methaneMar 19, 2021
d260a4c
fix doc build error
methaneMar 19, 2021
049a269
tweak warning message
methaneMar 19, 2021
018ba64
fixup
methaneMar 19, 2021
3a9623e
Fix subprocess
methaneMar 23, 2021
737059e
Update Doc/library/io.rst
methaneMar 23, 2021
6a62211
Update Doc/library/io.rst
methaneMar 23, 2021
54c7dc6
Update Doc/library/io.rst
methaneMar 23, 2021
5b2830b
Update Doc/library/io.rst
methaneMar 23, 2021
14f2a6e
Apply suggestions from code review
methaneMar 23, 2021
06e2a32
Move EncodingWarnings
methaneMar 23, 2021
27d49d2
fix comment
methaneMar 23, 2021
80f4644
fix text_encoding() docstring
methaneMar 23, 2021
6ad0e7f
update what's new
methaneMar 23, 2021
73b27f1
fix doc build
methaneMar 23, 2021
c149d65
Update Doc/library/io.rst
methaneMar 24, 2021
4eb7655
Apply suggestions from code review
methaneMar 24, 2021
e3bce76
Apply suggestions from code review
methaneMar 24, 2021
c089fd7
Update Doc/library/io.rst
methaneMar 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletionsDoc/c-api/init_config.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -583,6 +583,15 @@ PyConfig

Default: ``0``.

.. c:member:: int warn_default_encoding

If non-zero, emit a :exc:`EncodingWarning` warning when :class:`io.TextIOWrapper`
uses its default encoding. See :ref:`io-encoding-warning` for details.

Default: ``0``.

.. versionadded:: 3.10

.. c:member:: wchar_t* check_hash_pycs_mode

Control the validation behavior of hash-based ``.pyc`` files:
Expand Down
9 changes: 9 additions & 0 deletionsDoc/library/exceptions.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -741,6 +741,15 @@ The following exceptions are used as warning categories; see the
Base class for warnings related to Unicode.


.. exception:: EncodingWarning

Base class for warnings related to encodings.

See :ref:`io-encoding-warning` for details.

.. versionadded:: 3.10


.. exception:: BytesWarning

Base class for warnings related to :class:`bytes` and :class:`bytearray`.
Expand Down
81 changes: 81 additions & 0 deletionsDoc/library/io.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -106,6 +106,56 @@ stream by opening a file in binary mode with buffering disabled::
The raw stream API is described in detail in the docs of :class:`RawIOBase`.


.. _io-text-encoding:

Text Encoding
-------------

The default encoding of :class:`TextIOWrapper` and :func:`open` is
locale-specific (:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`).

However, many developers forget to specify the encoding when opening text files
encoded in UTF-8 (e.g. JSON, TOML, Markdown, etc...) since most Unix
platforms use UTF-8 locale by default. This causes bugs because the locale
encoding is not UTF-8 for most Windows users. For example::

# May not work on Windows when non-ASCII characters in the file.
with open("README.md") as f:
long_description = f.read()

Additionally, while there is no concrete plan as of yet, Python may change
the default text file encoding to UTF-8 in the future.

Accordingly, it is highly recommended that you specify the encoding
explicitly when opening text files. If you want to use UTF-8, pass
``encoding="utf-8"``. To use the current locale encoding,
``encoding="locale"`` is supported in Python 3.10.

When you need to run existing code on Windows that attempts to opens
UTF-8 files using the default locale encoding, you can enable the UTF-8
mode. See :ref:`UTF-8 mode on Windows <win-utf8-mode>`.

.. _io-encoding-warning:

Opt-in EncodingWarning
^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 3.10
See :pep:`597` for more details.

To find where the default locale encoding is used, you can enable
the ``-X warn_default_encoding`` command line option or set the
:envvar:`PYTHONWARNDEFAULTENCODING` environment variable, which will
emit an :exc:`EncodingWarning` when the default encoding is used.

If you are providing an API that uses :func:`open` or
:class:`TextIOWrapper` and passes ``encoding=None`` as a parameter, you
can use :func:`text_encoding` so that callers of the API will emit an
:exc:`EncodingWarning` if they don't pass an ``encoding``. However,
please consider using UTF-8 by default (i.e. ``encoding="utf-8"``) for
new APIs.


High-level Module Interface
---------------------------

Expand DownExpand Up@@ -143,6 +193,32 @@ High-level Module Interface
.. versionadded:: 3.8


.. function:: text_encoding(encoding, stacklevel=2)

This is a helper function for callables that use :func:`open` or
:class:`TextIOWrapper` and have an ``encoding=None`` parameter.

This function returns *encoding* if it is not ``None`` and ``"locale"`` if
*encoding* is ``None``.

This function emits an :class:`EncodingWarning` if
:data:`sys.flags.warn_default_encoding <sys.flags>` is true and *encoding*
is None. *stacklevel* specifies where the warning is emitted.
For example::

def read_text(path, encoding=None):
encoding = io.text_encoding(encoding) # stacklevel=2
with open(path, encoding) as f:
return f.read()

In this example, an :class:`EncodingWarning` is emitted for the caller of
``read_text()``.

See :ref:`io-text-encoding` for more information.

.. versionadded:: 3.10


.. exception:: BlockingIOError

This is a compatibility alias for the builtin :exc:`BlockingIOError`
Expand DownExpand Up@@ -879,6 +955,8 @@ Text I/O
*encoding* gives the name of the encoding that the stream will be decoded or
encoded with. It defaults to
:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`.
``encoding="locale"`` can be used to specify the current locale's encoding
explicitly. See :ref:`io-text-encoding` for more information.

*errors* is an optional string that specifies how encoding and decoding
errors are to be handled. Pass ``'strict'`` to raise a :exc:`ValueError`
Expand DownExpand Up@@ -930,6 +1008,9 @@ Text I/O
locale encoding using :func:`locale.setlocale`, use the current locale
encoding instead of the user preferred encoding.

.. versionchanged:: 3.10
The *encoding* argument now supports the ``"locale"`` dummy encoding name.

:class:`TextIOWrapper` provides these data attributes and methods in
addition to those from :class:`TextIOBase` and :class:`IOBase`:

Expand Down
15 changes: 15 additions & 0 deletionsDoc/using/cmdline.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -453,6 +453,9 @@ Miscellaneous options
* ``-X pycache_prefix=PATH`` enables writing ``.pyc`` files to a parallel
tree rooted at the given directory instead of to the code tree. See also
:envvar:`PYTHONPYCACHEPREFIX`.
* ``-X warn_default_encoding`` issues a :class:`EncodingWarning` when the
locale-specific default encoding is used for opening files.
See also :envvar:`PYTHONWARNDEFAULTENCODING`.

It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
Expand DownExpand Up@@ -482,6 +485,9 @@ Miscellaneous options

The ``-X showalloccount`` option has been removed.

.. versionadded:: 3.10
The ``-X warn_default_encoding`` option.

.. deprecated-removed:: 3.9 3.10
The ``-X oldparser`` option.

Expand DownExpand Up@@ -907,6 +913,15 @@ conflict.

.. versionadded:: 3.7

.. envvar:: PYTHONWARNDEFAULTENCODING

If this environment variable is set to a non-empty string, issue a
:class:`EncodingWarning` when the locale-specific default encoding is used.

See :ref:`io-encoding-warning` for details.

.. versionadded:: 3.10


Debug-mode variables
~~~~~~~~~~~~~~~~~~~~
Expand Down
24 changes: 24 additions & 0 deletionsDoc/whatsnew/3.10.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -444,6 +444,30 @@ For the full specification see :pep:`634`. Motivation and rationale
are in :pep:`635`, and a longer tutorial is in :pep:`636`.


.. _whatsnew310-pep597:

Optional ``EncodingWarning`` and ``encoding="locale"`` option
-------------------------------------------------------------

The default encoding of :class:`TextIOWrapper` and :func:`open` is
platform and locale dependent. Since UTF-8 is used on most Unix
platforms, omitting ``encoding`` option when opening UTF-8 files
(e.g. JSON, YAML, TOML, Markdown) is very common bug. For example::

# BUG: "rb" mode or encoding="utf-8" should be used.
with open("data.json") as f:
data = json.laod(f)

To find this type of bugs, optional ``EncodingWarning`` is added.
It is emitted when :data:`sys.flags.warn_default_encoding <sys.flags>`
is true and locale-specific default encoding is used.

``-X warn_default_encoding`` option and :envvar:`PYTHONWARNDEFAULTENCODING`
are added to enable the warning.

See :ref:`io-text-encoding` for more information.


New Features Related to Type Annotations
========================================

Expand Down
1 change: 1 addition & 0 deletionsInclude/cpython/initconfig.h
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -153,6 +153,7 @@ typedef struct PyConfig {
PyWideStringList warnoptions;
int site_import;
int bytes_warning;
int warn_default_encoding;
int inspect;
int interactive;
int optimization_level;
Expand Down
1 change: 1 addition & 0 deletionsInclude/internal/pycore_initconfig.h
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -102,6 +102,7 @@ typedef struct {
int isolated; /* -I option */
int use_environment; /* -E option */
int dev_mode; /* -X dev and PYTHONDEVMODE */
int warn_default_encoding; /* -X warn_default_encoding and PYTHONWARNDEFAULTENCODING */
} _PyPreCmdline;

#define _PyPreCmdline_INIT \
Expand Down
1 change: 1 addition & 0 deletionsInclude/pyerrors.h
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -146,6 +146,7 @@ PyAPI_DATA(PyObject *) PyExc_FutureWarning;
PyAPI_DATA(PyObject *) PyExc_ImportWarning;
PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
PyAPI_DATA(PyObject *) PyExc_BytesWarning;
PyAPI_DATA(PyObject *) PyExc_EncodingWarning;
PyAPI_DATA(PyObject *) PyExc_ResourceWarning;


Expand Down
47 changes: 37 additions & 10 deletionsLib/_pyio.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -40,6 +40,29 @@
_CHECK_ERRORS = _IOBASE_EMITS_UNRAISABLE


def text_encoding(encoding, stacklevel=2):
"""
A helper function to choose the text encoding.

When encoding is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").

This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.

This can be used in APIs with an encoding=None parameter
that pass it to TextIOWrapper or open.
However, please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
encoding = "locale"
if sys.flags.warn_default_encoding:
import warnings
warnings.warn("'encoding' argument not specified.",
EncodingWarning, stacklevel + 1)
return encoding


def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):

Expand DownExpand Up@@ -248,6 +271,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
result = buffer
if binary:
return result
encoding = text_encoding(encoding)
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
Expand DownExpand Up@@ -2004,19 +2028,22 @@ class TextIOWrapper(TextIOBase):
def __init__(self, buffer, encoding=None, errors=None, newline=None,
line_buffering=False, write_through=False):
self._check_newline(newline)
if encoding is None:
encoding = text_encoding(encoding)

if encoding == "locale":
try:
encoding = os.device_encoding(buffer.fileno())
encoding = os.device_encoding(buffer.fileno()) or "locale"
except (AttributeError, UnsupportedOperation):
pass
if encoding is None:
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "ascii"
else:
encoding = locale.getpreferredencoding(False)

if encoding == "locale":
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "utf-8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I saw what you did there! :-D Mention it in the final commit message (I didn't read your 24 commit messages, GitHub UI isn't convenient for that :-( ).

else:
encoding = locale.getpreferredencoding(False)

if not isinstance(encoding, str):
raise ValueError("invalid encoding: %r" % encoding)
Expand Down
1 change: 1 addition & 0 deletionsLib/bz2.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -311,6 +311,7 @@ def open(filename, mode="rb", compresslevel=9,
binary_file = BZ2File(filename, bz_mode, compresslevel=compresslevel)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
1 change: 1 addition & 0 deletionsLib/configparser.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -690,6 +690,7 @@ def read(self, filenames, encoding=None):
"""
if isinstance(filenames, (str, bytes, os.PathLike)):
filenames = [filenames]
encoding = io.text_encoding(encoding)
read_ok = []
for filename in filenames:
try:
Expand Down
1 change: 1 addition & 0 deletionsLib/gzip.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -62,6 +62,7 @@ def open(filename, mode="rb", compresslevel=_COMPRESS_LEVEL_BEST,
raise TypeError("filename must be a str or bytes object, or a file")

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
2 changes: 1 addition & 1 deletionLib/io.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -54,7 +54,7 @@
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
open, open_code, FileIO, BytesIO, StringIO, BufferedReader,
BufferedWriter, BufferedRWPair, BufferedRandom,
IncrementalNewlineDecoder, TextIOWrapper)
IncrementalNewlineDecoder,text_encoding,TextIOWrapper)

OpenWrapper = _io.open # for compatibility with _pyio

Expand Down
1 change: 1 addition & 0 deletionsLib/lzma.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -302,6 +302,7 @@ def open(filename, mode="rb", *,
preset=preset, filters=filters)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
4 changes: 4 additions & 0 deletionsLib/pathlib.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -1241,6 +1241,8 @@ def open(self, mode='r', buffering=-1, encoding=None,
Open the file pointed by this path and return a file object, as
the built-in open() function does.
"""
if "b" not in mode:
encoding = io.text_encoding(encoding)
return io.open(self, mode, buffering, encoding, errors, newline,
opener=self._opener)

Expand All@@ -1255,6 +1257,7 @@ def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()

Expand All@@ -1274,6 +1277,7 @@ def write_text(self, data, encoding=None, errors=None, newline=None):
if not isinstance(data, str):
raise TypeError('data must be str, not %s' %
data.__class__.__name__)
encoding = io.text_encoding(encoding)
with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
return f.write(data)

Expand Down
4 changes: 3 additions & 1 deletionLib/site.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -170,7 +170,9 @@ def addpackage(sitedir, name, known_paths):
fullname = os.path.join(sitedir, name)
_trace(f"Processing .pth file: {fullname!r}")
try:
f = io.TextIOWrapper(io.open_code(fullname))
# locale encoding is not ideal especially on Windows. But we have used
# it for a long time. setuptools uses the locale encoding too.
f = io.TextIOWrapper(io.open_code(fullname), encoding="locale")
except OSError:
return
with f:
Expand Down
Loading

[8]ページ先頭

©2009-2025 Movatter.jp