Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Closed
Description
Feature or enhancement
Currently,ElementTree.tostring(root, encoding="unicode", xml_declaration=True) uses locale encoding.
I think ElementTree should use UTF-8, instead of locale encoding.
Example:
$ LANG=ja_JP.eucJP ./python.exePython 3.11.0a7+ (heads/bytes-alloc-dirty:7fbc7f6128, Apr 19 2022, 16:53:54) [Clang 12.0.0 (clang-1200.0.32.29)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>import xml.etree.ElementTreeasET>>> et=ET.fromstring("<t>hello</t>")>>>ET.tostring(et,encoding="unicode",xml_declaration=True)"<?xml version='1.0' encoding='eucJP'?>\n<t>hello</t>"
Code:
cpython/Lib/xml/etree/ElementTree.py
Lines 732 to 742 inbcf14ae
| with_get_writer(file_or_filename,enc_lower)aswrite: | |
| ifmethod=="xml"and (xml_declarationor | |
| (xml_declarationisNoneand | |
| enc_lowernotin ("utf-8","us-ascii","unicode"))): | |
| declared_encoding=encoding | |
| ifenc_lower=="unicode": | |
| # Retrieve the default encoding for the xml declaration | |
| importlocale | |
| declared_encoding=locale.getpreferredencoding() | |
| write("<?xml version='1.0' encoding='%s'?>\n"% ( | |
| declared_encoding,)) |
Pitch
- UTF-8 is the most common encoding for XML.
- Locale encoding name (e.g.
cp932oreucJP) would be different from XML encoding name recommended by w3c (e.g.Shift_JISorEUC-JP).