Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Closed
Description
Bug report
Bug description:
Code which contains line breaks is not round-trip invariant:
importtokenize,iosource_code=r"""1 + \ 2"""tokens=list(tokenize.generate_tokens(io.StringIO(source_code).readline))x=tokenize.untokenize(tokens)print(x)# 1 +\# 2
Notice that the space between+ and\ is now missing. The current tokenizer code simply inserts a backslash when it encounters two subsequent tokens with a differeing row offset:
Lines 179 to 182 in9c2bb7d
| row_offset=row-self.prev_row | |
| ifrow_offset: | |
| self.tokens.append("\\\n"*row_offset) | |
| self.prev_col=0 |
I think this should be fixed. The docstring oftokenize.untokenize says:
Round-trip invariant for full input:
Untokenized source will match input source exactly
To fix this, it will probably be necessary to inspect the raw line contents and count how much whitespace there is at the end of the line.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux