Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34k
Closed
Description
Bug report
Bug description:
In 3.12 and below parses fine. Note the special unicode character in the inner string, its'\u3001' (utf8b'\xe3\x80\x81'), it goes from good to fail by removing a space or parenthesizing the whole expression, so positional?
>>>fromioimportBytesIO>>>fromtokenizeimporttokenize>>>>>>src_good='''f"{f(a=lambda: '、'\n)}"'''>>>src_bad1='''f"{f(a=lambda: '、'\n)}"'''>>>src_bad2='''(f"{f(a=lambda: '、'\n)}")'''>>>>>>fortokenintokenize(BytesIO(src_good.encode()).readline):print(token)...TokenInfo(type=65 (ENCODING),string='utf-8',start=(0,0),end=(0,0),line='')TokenInfo(type=59 (FSTRING_START),string='f"',start=(1,0),end=(1,2),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='{',start=(1,2),end=(1,3),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='f',start=(1,3),end=(1,4),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='(',start=(1,4),end=(1,5),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='a',start=(1,5),end=(1,6),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='=',start=(1,6),end=(1,7),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='lambda',start=(1,7),end=(1,13),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=':',start=(1,13),end=(1,14),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=3 (STRING),string="'、'",start=(1,15),end=(1,18),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=63 (NL),string='\n',start=(1,19),end=(1,20),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=')',start=(2,0),end=(2,1),line=')}"')TokenInfo(type=55 (OP),string='}',start=(2,1),end=(2,2),line=')}"')TokenInfo(type=61 (FSTRING_END),string='"',start=(2,2),end=(2,3),line=')}"')TokenInfo(type=4 (NEWLINE),string='',start=(2,3),end=(2,4),line=')}"')TokenInfo(type=0 (ENDMARKER),string='',start=(3,0),end=(3,0),line='')
>>>fortokenintokenize(BytesIO(src_bad1.encode()).readline):print(token)...TokenInfo(type=65 (ENCODING),string='utf-8',start=(0,0),end=(0,0),line='')TokenInfo(type=59 (FSTRING_START),string='f"',start=(1,0),end=(1,2),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='{',start=(1,2),end=(1,3),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='f',start=(1,3),end=(1,4),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='(',start=(1,4),end=(1,5),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='a',start=(1,5),end=(1,6),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string='=',start=(1,6),end=(1,7),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=1 (NAME),string='lambda',start=(1,7),end=(1,13),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=':',start=(1,13),end=(1,14),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=3 (STRING),string="'、'",start=(1,15),end=(1,18),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=63 (NL),string='\n',start=(1,18),end=(1,19),line='f"{f(a=lambda:\'、\'\n')TokenInfo(type=55 (OP),string=')',start=(2,0),end=(2,1),line=')}"')Traceback (mostrecentcalllast):File"<python-input-8>",line1,in<module>fortokenintokenize(BytesIO(src_bad1.encode()).readline):print(token)~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/usr/local/lib/python3.13/tokenize.py",line492,intokenizeyieldfrom_generate_tokens_from_c_tokenizer(rl_gen.__next__,encoding,extra_tokens=True)File"/usr/local/lib/python3.13/tokenize.py",line582,in_generate_tokens_from_c_tokenizerforinfoinit:^^UnicodeDecodeError:'utf-8'codeccan'tdecodebytesinposition13-14:unexpectedendofdata
The other bad source and other permutations give same error. You get immediate error typing the bad src interactively.
CPython versions tested on:
3.13, 3.14, 3.15
Operating systems tested on:
Linux