Add a new warning categoryEncodingWarning. It is emitted when theencoding argument toopen() is omitted and the defaultlocale-specific encoding is used.
The warning is disabled by default. A new-Xwarn_default_encodingcommand-line option and a newPYTHONWARNDEFAULTENCODING environmentvariable can be used to enable it.
A"locale" argument value forencoding is added too. Itexplicitly specifies that the locale encoding should be used, silencingthe warning.
Developers using macOS or Linux may forget that the default encodingis not always UTF-8.
For example, usinglong_description=open("README.md").read() insetup.py is a common mistake. Many Windows users cannot installsuch packages if there is at least one non-ASCII character(e.g. emoji, author names, copyright symbols, and the like)in their UTF-8-encodedREADME.md file.
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCIIcharacters in their README, and 82 fail to install from source onnon-UTF-8 locales due to not specifying an encoding for a non-ASCIIfile.[1]
Another example islogging.basicConfig(filename="log.txt").Some users might expect it to use UTF-8 by default, but the localeencoding is actually what is used.[2]
Even Python experts may assume that the default encoding is UTF-8.This creates bugs that only happen on Windows; see[3],[4],[5],and[6] for example.
Emitting a warning when theencoding argument is omitted will helpfind such mistakes.
open(filename) isn’t explicit about which encoding is expected:
From this point of view,open(filename) is not readable code.
encoding=locale.getpreferredencoding(False) can be used tospecify the locale encoding explicitly, but it is too long and easyto misuse (e.g. one can forget to passFalse as its argument).
This PEP provides an explicit way to specify the locale encoding.
Since UTF-8 has become the de-facto standard text encoding,we might default to it for opening files in the future.
However, such a change will affect many applications and libraries.If we start emittingDeprecationWarning everywhere theencodingargument is omitted, it will be too noisy and painful.
Although this PEP doesn’t propose changing the default encoding,it will help enable that change by:
encoding arguments in librariesbefore we start emitting aDeprecationWarning by default.encoding="locale" to suppressthe current warning and anyDeprecationWarning added in the future,as well as retaining consistent behavior if later Python versionschange the default, ensuring support for any Python version >=3.10.EncodingWarningAdd a newEncodingWarning warning class as a subclass ofWarning. It is emitted when theencoding argument is omitted andthe default locale-specific encoding is used.
The-Xwarn_default_encoding option and thePYTHONWARNDEFAULTENCODING environment variable are added. Theyare used to enableEncodingWarning.
sys.flags.warn_default_encoding is also added. The flag is true whenEncodingWarning is enabled.
When the flag is set,io.TextIOWrapper(),open() and othermodules using them will emitEncodingWarning when theencodingargument is omitted.
SinceEncodingWarning is a subclass ofWarning, they areshown by default (if thewarn_default_encoding flag is set), unlikeDeprecationWarning.
encoding="locale"io.TextIOWrapper will accept"locale" as a valid argument toencoding. It has the same meaning as the currentencoding=None,except thatio.TextIOWrapper doesn’t emitEncodingWarning whenencoding="locale" is specified.
io.text_encoding()io.text_encoding() is a helper for functions with anencoding=None parameter that pass it toio.TextIOWrapper() oropen().
A pure Python implementation will look like this:
deftext_encoding(encoding,stacklevel=1):"""A helper function to choose the text encoding. When *encoding* is not None, just return it. Otherwise, return the default text encoding (i.e. "locale"). This function emits an EncodingWarning if *encoding* is None and sys.flags.warn_default_encoding is true. This function can be used in APIs with an encoding=None parameter that pass it to TextIOWrapper or open. However, please consider using encoding="utf-8" for new APIs. """ifencodingisNone:ifsys.flags.warn_default_encoding:importwarningswarnings.warn("'encoding' argument not specified.",EncodingWarning,stacklevel+2)encoding="locale"returnencoding
For example,pathlib.Path.read_text() can use it like this:
defread_text(self,encoding=None,errors=None):encoding=io.text_encoding(encoding)withself.open(mode='r',encoding=encoding,errors=errors)asf:returnf.read()
By usingio.text_encoding(),EncodingWarning is emitted forthe caller ofread_text() instead ofread_text() itself.
Many standard library modules will be affected by this change.
Most APIs acceptingencoding=None will useio.text_encoding()as written in the previous section.
Where using the locale encoding as the default encoding is reasonable,encoding="locale" will be used instead. For example,thesubprocess module will use the locale encoding as the defaultfor pipes.
Many tests useopen() withoutencoding specified to readASCII text files. They should be rewritten withencoding="ascii".
AlthoughDeprecationWarning is suppressed by default, alwaysemittingDeprecationWarning when theencoding argument isomitted would be too noisy.
Noisy warnings may lead developers to dismiss theDeprecationWarning.
We don’t add “locale” as a codec alias because the locale can bechanged at runtime.
Additionally,TextIOWrapper checksos.device_encoding()whenencoding=None. This behavior cannot be implemented ina codec.
The new warning is not emitted by default, so this PEP is 100%backwards-compatible.
Passing"locale" as the argument toencoding is notforward-compatible. Code using it will not work on Python older than3.10, and will instead raiseLookupError:unknownencoding:locale.
Until developers can drop Python 3.9 support,EncodingWarningcan only be used for finding missingencoding="utf-8" arguments.
SinceEncodingWarning is used to write cross-platform code,there is no need to teach it to new users.
We can just recommend using UTF-8 for text files and usingencoding="utf-8" when opening them.
Usingopen(filename) to read text files encoded in UTF-8 is acommon mistake. It may not work on Windows because UTF-8 is not thedefault encoding.
You can use-Xwarn_default_encoding orPYTHONWARNDEFAULTENCODING=1 to find this type of mistake.
Omitting theencoding argument is not a bug when opening text filesencoded in the locale encoding, butencoding="locale" is recommendedin Python 3.10 and later because it is more explicit.
The latest discussion thread is:https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/
encoding="locale" andio.text_encoding() must be implementedin Python.open() orTextIOWrapper() (see theio.text_encoding()section).pipinstall-Upip, and[9] by runningtoxwith the reference implementation. This demonstrates how thisoption can be used to find potential issues.README.md(https://github.com/pypa/packaging.python.org/pull/682)json.tool had used locale encoding to read JSON files.(https://bugs.python.org/issue33684)encoding option or binary mode for open()”(https://github.com/pypa/pip/pull/9608)This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0597.rst
Last modified:2025-02-01 08:56:52 GMT