Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Closed as not planned
Description
Minor bug with decoding of EUC-JP character "㎝".
Bug report
the character "㎝" is part of theJIS_X_0208
encoding. The python core libraries include theEUC-JP
encoding, which represents theJIS X 0208
,JIS X 0212
, andJIS X 0201
encodings. However, attempting to decode the "㎝" character with theEUC-JP
codec results in decoding errors.
Example
As taken fromhttps://stackoverflow.com/questions/73255012/python-fails-to-decode-euc-jp-strings-with-the-character:
print(b"58\xad\xd1".decode("EUC-JP"))
throws
Traceback (most recent call last): File "<pyshell#53>", line 1, in <module> print(b"58\xad\xd1".decode("EUC-JP"))UnicodeDecodeError: 'euc_jp' codec can't decode byte 0xad in position 2: illegal multibyte sequence
However, decoding with alternative codecs works
content = b"\xa5\xb5\xa5\xa4\xa5\xba\xa1\xa7XL \xcc\xf377\xad\xd1\xa1\xdf\xcc\xf358\xad\xd1"print(b"58\xad\xd1".decode("euc_jisx0213"))>58㎝
Your environment
- CPython versions tested on: 3.9, 3.10
- Operating system and architecture: Windows x64
Metadata
Metadata
Assignees
Projects
Status
Done