python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork34.1k
Star71.6k

untokenize() does not round-trip for code containing line breaks (`\` +`\n`) #125553

New issue

Closed

untokenize() does not round-trip for code containing line breaks (\ +\n)#125553

Labels

stdlibStandard Library Python modules in the Lib/ directorytopic-parsertype-bugAn unexpected behavior, bug, or error

Description

tomasr8

opened

on Oct 15, 2024

Bug report

Bug description:

Code which contains line breaks is not round-trip invariant:

importtokenize,iosource_code=r"""1 + \    2"""tokens=list(tokenize.generate_tokens(io.StringIO(source_code).readline))x=tokenize.untokenize(tokens)print(x)# 1 +\#     2

Notice that the space between+ and\ is now missing. The current tokenizer code simply inserts a backslash when it encounters two subsequent tokens with a differeing row offset:

cpython/Lib/tokenize.py

Lines 179 to 182 in9c2bb7d

	row_offset=row-self.prev_row
	ifrow_offset:
	self.tokens.append("\\\n"*row_offset)
	self.prev_col=0

I think this should be fixed. The docstring oftokenize.untokenize says:

Round-trip invariant for full input:
Untokenized source will match input source exactly

To fix this, it will probably be necessary to inspect the raw line contents and count how much whitespace there is at the end of the line.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Assignees

No one assigned

Labels

stdlibStandard Library Python modules in the Lib/ directorytopic-parsertype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

untokenize() does not round-trip for code containing line breaks (`\` +`\n`) #125553

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Movatterモバイル変換

Uh oh!

untokenize() does not round-trip for code containing line breaks (\ +\n) #125553

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

untokenize() does not round-trip for code containing line breaks (`\` +`\n`) #125553