Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 597 – Add optional EncodingWarning

Author:
Inada Naoki <songofacandy at gmail.com>
Status:
Final
Type:
Standards Track
Created:
05-Jun-2019
Python-Version:
3.10

Table of Contents

Abstract

Add a new warning categoryEncodingWarning. It is emitted when theencoding argument toopen() is omitted and the defaultlocale-specific encoding is used.

The warning is disabled by default. A new-Xwarn_default_encodingcommand-line option and a newPYTHONWARNDEFAULTENCODING environmentvariable can be used to enable it.

A"locale" argument value forencoding is added too. Itexplicitly specifies that the locale encoding should be used, silencingthe warning.

Motivation

Using the default encoding is a common mistake

Developers using macOS or Linux may forget that the default encodingis not always UTF-8.

For example, usinglong_description=open("README.md").read() insetup.py is a common mistake. Many Windows users cannot installsuch packages if there is at least one non-ASCII character(e.g. emoji, author names, copyright symbols, and the like)in their UTF-8-encodedREADME.md file.

Of the 4000 most downloaded packages from PyPI, 489 use non-ASCIIcharacters in their README, and 82 fail to install from source onnon-UTF-8 locales due to not specifying an encoding for a non-ASCIIfile.[1]

Another example islogging.basicConfig(filename="log.txt").Some users might expect it to use UTF-8 by default, but the localeencoding is actually what is used.[2]

Even Python experts may assume that the default encoding is UTF-8.This creates bugs that only happen on Windows; see[3],[4],[5],and[6] for example.

Emitting a warning when theencoding argument is omitted will helpfind such mistakes.

Explicit way to use locale-specific encoding

open(filename) isn’t explicit about which encoding is expected:

  • If ASCII is assumed, this isn’t a bug, but may result in decreasedperformance on Windows, particularly with non-Latin-1 locale encodings
  • If UTF-8 is assumed, this may be a bug or a platform-specific script
  • If the locale encoding is assumed, the behavior is as expected(but could change if future versions of Python modify the default)

From this point of view,open(filename) is not readable code.

encoding=locale.getpreferredencoding(False) can be used tospecify the locale encoding explicitly, but it is too long and easyto misuse (e.g. one can forget to passFalse as its argument).

This PEP provides an explicit way to specify the locale encoding.

Prepare to change the default encoding to UTF-8

Since UTF-8 has become the de-facto standard text encoding,we might default to it for opening files in the future.

However, such a change will affect many applications and libraries.If we start emittingDeprecationWarning everywhere theencodingargument is omitted, it will be too noisy and painful.

Although this PEP doesn’t propose changing the default encoding,it will help enable that change by:

  • Reducing the number of omittedencoding arguments in librariesbefore we start emitting aDeprecationWarning by default.
  • Allowing users to passencoding="locale" to suppressthe current warning and anyDeprecationWarning added in the future,as well as retaining consistent behavior if later Python versionschange the default, ensuring support for any Python version >=3.10.

Specification

EncodingWarning

Add a newEncodingWarning warning class as a subclass ofWarning. It is emitted when theencoding argument is omitted andthe default locale-specific encoding is used.

Options to enable the warning

The-Xwarn_default_encoding option and thePYTHONWARNDEFAULTENCODING environment variable are added. Theyare used to enableEncodingWarning.

sys.flags.warn_default_encoding is also added. The flag is true whenEncodingWarning is enabled.

When the flag is set,io.TextIOWrapper(),open() and othermodules using them will emitEncodingWarning when theencodingargument is omitted.

SinceEncodingWarning is a subclass ofWarning, they areshown by default (if thewarn_default_encoding flag is set), unlikeDeprecationWarning.

encoding="locale"

io.TextIOWrapper will accept"locale" as a valid argument toencoding. It has the same meaning as the currentencoding=None,except thatio.TextIOWrapper doesn’t emitEncodingWarning whenencoding="locale" is specified.

io.text_encoding()

io.text_encoding() is a helper for functions with anencoding=None parameter that pass it toio.TextIOWrapper() oropen().

A pure Python implementation will look like this:

deftext_encoding(encoding,stacklevel=1):"""A helper function to choose the text encoding.    When *encoding* is not None, just return it.    Otherwise, return the default text encoding (i.e. "locale").    This function emits an EncodingWarning if *encoding* is None and    sys.flags.warn_default_encoding is true.    This function can be used in APIs with an encoding=None parameter    that pass it to TextIOWrapper or open.    However, please consider using encoding="utf-8" for new APIs.    """ifencodingisNone:ifsys.flags.warn_default_encoding:importwarningswarnings.warn("'encoding' argument not specified.",EncodingWarning,stacklevel+2)encoding="locale"returnencoding

For example,pathlib.Path.read_text() can use it like this:

defread_text(self,encoding=None,errors=None):encoding=io.text_encoding(encoding)withself.open(mode='r',encoding=encoding,errors=errors)asf:returnf.read()

By usingio.text_encoding(),EncodingWarning is emitted forthe caller ofread_text() instead ofread_text() itself.

Affected standard library modules

Many standard library modules will be affected by this change.

Most APIs acceptingencoding=None will useio.text_encoding()as written in the previous section.

Where using the locale encoding as the default encoding is reasonable,encoding="locale" will be used instead. For example,thesubprocess module will use the locale encoding as the defaultfor pipes.

Many tests useopen() withoutencoding specified to readASCII text files. They should be rewritten withencoding="ascii".

Rationale

Opt-in warning

AlthoughDeprecationWarning is suppressed by default, alwaysemittingDeprecationWarning when theencoding argument isomitted would be too noisy.

Noisy warnings may lead developers to dismiss theDeprecationWarning.

“locale” is not a codec alias

We don’t add “locale” as a codec alias because the locale can bechanged at runtime.

Additionally,TextIOWrapper checksos.device_encoding()whenencoding=None. This behavior cannot be implemented ina codec.

Backward Compatibility

The new warning is not emitted by default, so this PEP is 100%backwards-compatible.

Forward Compatibility

Passing"locale" as the argument toencoding is notforward-compatible. Code using it will not work on Python older than3.10, and will instead raiseLookupError:unknownencoding:locale.

Until developers can drop Python 3.9 support,EncodingWarningcan only be used for finding missingencoding="utf-8" arguments.

How to Teach This

For new users

SinceEncodingWarning is used to write cross-platform code,there is no need to teach it to new users.

We can just recommend using UTF-8 for text files and usingencoding="utf-8" when opening them.

For experienced users

Usingopen(filename) to read text files encoded in UTF-8 is acommon mistake. It may not work on Windows because UTF-8 is not thedefault encoding.

You can use-Xwarn_default_encoding orPYTHONWARNDEFAULTENCODING=1 to find this type of mistake.

Omitting theencoding argument is not a bug when opening text filesencoded in the locale encoding, butencoding="locale" is recommendedin Python 3.10 and later because it is more explicit.

Reference Implementation

https://github.com/python/cpython/pull/19481

Discussions

The latest discussion thread is:https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/

  • Why not implement this in linters?
    • encoding="locale" andio.text_encoding() must be implementedin Python.
    • It is difficult to find all callers of functions wrappingopen() orTextIOWrapper() (see theio.text_encoding()section).
  • Many developers will not use the option.
    • Some will, and report the warnings to libraries they use,so the option is worth it even if many developers don’t enable it.
    • For example, I found[7] and[8] by runningpipinstall-Upip, and[9] by runningtoxwith the reference implementation. This demonstrates how thisoption can be used to find potential issues.

References

[1]
“Packages can’t be installed when encoding is not UTF-8”(https://github.com/methane/pep597-pypi-ascii)
[2]
“Logging - Inconsistent behaviour when handling unicode”(https://bugs.python.org/issue37111)
[3]
Packaging tutorial in packaging.python.org didn’t specifyencoding to read aREADME.md(https://github.com/pypa/packaging.python.org/pull/682)
[4]
json.tool had used locale encoding to read JSON files.(https://bugs.python.org/issue33684)
[5]
site: Potential UnicodeDecodeError when handling pth file(https://bugs.python.org/issue33684)
[6]
pypa/pip: “Installing packages fails if Python 3 installedinto path with non-ASCII characters”(https://github.com/pypa/pip/issues/9054)
[7]
“site: Potential UnicodeDecodeError when handling pth file”(https://bugs.python.org/issue43214)
[8]
“[pypa/pip] Useencoding option or binary mode for open()”(https://github.com/pypa/pip/pull/9608)
[9]
“Possible UnicodeError caused by missing encoding=”utf-8””(https://github.com/tox-dev/tox/issues/1908)

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0597.rst

Last modified:2025-02-01 08:56:52 GMT


[8]ページ先頭

©2009-2025 Movatter.jp