NotificationsYou must be signed in to change notification settings
Fork34k
Star71.4k

Parser gives UnicodeDecodeError on what should be good code #139516

Closed

Parser gives UnicodeDecodeError on what should be good code#139516

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertopic-unicodetype-bugAn unexpected behavior, bug, or error

Description

tom-pytel

opened

on Oct 2, 2025

Bug report

Bug description:

In 3.12 and below parses fine. Note the special unicode character in the inner string, its'\u3001' (utf8b'\xe3\x80\x81'), it goes from good to fail by removing a space or parenthesizing the whole expression, so positional?

>>>fromioimportBytesIO>>>fromtokenizeimporttokenize>>>>>>src_good='''f"{f(a=lambda: '、'\n)}"'''>>>src_bad1='''f"{f(a=lambda: '、'\n)}"'''>>>src_bad2='''(f"{f(a=lambda: '、'\n)}")'''>>>>>>fortokenintokenize(BytesIO(src_good.encode()).readline):print(token)...TokenInfo(type=65 (ENCODING),string='utf-8',start=(0,0),end=(0,0),line='')TokenInfo(type=59 (FSTRING_START),string='f"',start=(1,0),end=(1,2),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='{',start=(1,2),end=(1,3),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='f',start=(1,3),end=(1,4),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='(',start=(1,4),end=(1,5),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='a',start=(1,5),end=(1,6),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='=',start=(1,6),end=(1,7),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='lambda',start=(1,7),end=(1,13),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=':',start=(1,13),end=(1,14),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=3 (STRING),string="'、'",start=(1,15),end=(1,18),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=63 (NL),string='\n',start=(1,19),end=(1,20),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=')',start=(2,0),end=(2,1),line=')}"')TokenInfo(type=55 (OP),string='}',start=(2,1),end=(2,2),line=')}"')TokenInfo(type=61 (FSTRING_END),string='"',start=(2,2),end=(2,3),line=')}"')TokenInfo(type=4 (NEWLINE),string='',start=(2,3),end=(2,4),line=')}"')TokenInfo(type=0 (ENDMARKER),string='',start=(3,0),end=(3,0),line='')

>>>fortokenintokenize(BytesIO(src_bad1.encode()).readline):print(token)...TokenInfo(type=65 (ENCODING),string='utf-8',start=(0,0),end=(0,0),line='')TokenInfo(type=59 (FSTRING_START),string='f"',start=(1,0),end=(1,2),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='{',start=(1,2),end=(1,3),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='f',start=(1,3),end=(1,4),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='(',start=(1,4),end=(1,5),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='a',start=(1,5),end=(1,6),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='=',start=(1,6),end=(1,7),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='lambda',start=(1,7),end=(1,13),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=':',start=(1,13),end=(1,14),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=3 (STRING),string="'、'",start=(1,15),end=(1,18),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=63 (NL),string='\n',start=(1,18),end=(1,19),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=')',start=(2,0),end=(2,1),line=')}"')Traceback (mostrecentcalllast):File"<python-input-8>",line1,in<module>fortokenintokenize(BytesIO(src_bad1.encode()).readline):print(token)~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/usr/local/lib/python3.13/tokenize.py",line492,intokenizeyieldfrom_generate_tokens_from_c_tokenizer(rl_gen.__next__,encoding,extra_tokens=True)File"/usr/local/lib/python3.13/tokenize.py",line582,in_generate_tokens_from_c_tokenizerforinfoinit:^^UnicodeDecodeError:'utf-8'codeccan'tdecodebytesinposition13-14:unexpectedendofdata

The other bad source and other permutations give same error. You get immediate error typing the bad src interactively.

CPython versions tested on:

3.13, 3.14, 3.15

Operating systems tested on:

Linux

Linked PRs

Metadata

Assignees

No one assigned

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertopic-unicodetype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parser gives UnicodeDecodeError on what should be good code #139516

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions