Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue2180

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:tokenize: mishandles line joining
Type:behaviorStage:commit review
Components:Extension ModulesVersions:Python 3.9, Python 3.8
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: gregory.p.smithNosy List: Anthony Sottile, gregory.p.smith, jaredgrubb, jhylton, meador.inge, miss-islington, rhettinger
Priority:normalKeywords:patch

Created on2008-02-25 01:55 byjaredgrubb, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Pull Requests
URLStatusLinkedEdit
PR 13401mergedAnthony Sottile,2019-05-18 01:39
Messages (8)
msg62956 -(view)Author: Jared Grubb (jaredgrubb)Date: 2008-02-25 01:59
tokenize does not handle line joining properly, as the following stringfails the CPython tokenizer but passes the tokenize module.Example 1:>>> s = "if 1:\n  \\\n  #hey\n  print 1">>> exec sTraceback (most recent call last):  File "<stdin>", line 1, in <module>  File "<string>", line 3    #hey       ^SyntaxError: invalid syntax>>> tokenize.tokenize(StringIO(s).readline)1,0-1,2:NAME'if'1,3-1,4:NUMBER'1'1,4-1,5:OP':'1,5-1,6:NEWLINE'\n'2,0-2,2:INDENT'  '3,2-3,6:COMMENT'#hey'3,6-3,7:NEWLINE'\n'4,2-4,7:NAME'print'4,8-4,9:NUMBER'1'5,0-5,0:DEDENT''5,0-5,0:ENDMARKER''
msg62960 -(view)Author: Jared Grubb (jaredgrubb)Date: 2008-02-25 02:22
CPython allows \ at EOF, but tokenize does not.>>> s = 'print 1\\\n'>>> exec s1>>> tokenize.tokenize(StringIO(s).readline)1,0-1,5:NAME'print'1,6-1,7:NUMBER'1'Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",line 153, in tokenize    tokenize_loop(readline, tokeneater)  File"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",line 159, in tokenize_loop    for token_info in generate_tokens(readline):  File"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",line 283, in generate_tokens    raise TokenError, ("EOF in multi-line statement", (lnum, 0))tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
msg116977 -(view)Author: Mark Lawrence (BreamoreBoy)*Date: 2010-09-20 21:26
Nobody appears to be interested so I'll close this in a couple of weeks unless someone objects, unless a patch is provided.
msg116985 -(view)Author: Raymond Hettinger (rhettinger)*(Python committer)Date: 2010-09-20 21:51
Mark, please stop closing these based on age.The needs to be a determination whether thisis a valid bug.  If so, then a patch is needed.If not, it can be closed.
msg143716 -(view)Author: Meador Inge (meador.inge)*(Python committer)Date: 2011-09-08 01:39
That syntax error is coming from the CPython parser and *not* the tokenizer.  Both CPython and the 'tokenizer' modules produce the same tokenization:[meadori@motherbrain cpython]$ cat repro.pyif 1:  \  pass[meadori@motherbrain cpython]$ ./python tokenize.py repro.py 0,0-0,0:        ENCODING        'utf-8'1,0-1,2:        NAME            'if'1,3-1,4:        NUMBER          '1'1,4-1,5:        OP              ':'1,5-1,6:        NEWLINE         '\n'2,0-2,2:        INDENT          '  '3,0-3,1:        NEWLINE         '\n'4,2-4,6:        NAME            'pass'4,6-4,7:        NEWLINE         '\n'5,0-5,0:        DEDENT          ''5,0-5,0:        ENDMARKER       ''[44319 refs][meadori@motherbrain cpython]$ ./python -d repro.py | grep Token | tail -10  File "repro.py", line 3        ^SyntaxError: invalid syntax[44305 refs]Token NEWLINE/'' ... It's a token we knowToken DEDENT/'' ... It's a token we knowToken NEWLINE/'' ... It's a token we knowToken ENDMARKER/'' ... It's a token we knowToken NAME/'if' ... It's a keywordToken NUMBER/'1' ... It's a token we knowToken COLON/':' ... It's a token we knowToken NEWLINE/'' ... It's a token we knowToken INDENT/'' ... It's a token we knowToken NEWLINE/'' ... It's a token we knowThe NEWLINE INDENT NEWLINE tokenization causes the parser to choke because 'suite' nonterminals:suite: simple_stmt | NEWLINE INDENT stmt+ DEDENTare defined as NEWLINE INDENT.It seems appropriate that the NEWLINE after INDENT should be dropped by both tokenizers.  In other words, I think:"""if 1:  \  pass"""should produce the same tokenization as:"""if 1:    pass"""This seems consistent with with how explicit line joining is defined [2].[1]http://hg.python.org/cpython/file/92842e347d98/Grammar/Grammar[2]http://docs.python.org/reference/lexical_analysis.html#explicit-line-joining
msg339576 -(view)Author: Anthony Sottile (Anthony Sottile)*Date: 2019-04-07 14:32
Here's an example in the wild which still reproduces with python3.8a3:https://github.com/SecureAuthCorp/impacket/blob/194b22ed2fc85c4f241375fb7ebe4e0d89626c8c/impacket/examples/remcomsvc.py#L1669This was reported as a bug on flake8:https://gitlab.com/pycqa/flake8/issues/532Here's the reproduction with python3.8:$ python3.8 --version --versionPython 3.8.0a3 (default, Mar 27 2019, 03:46:44) [GCC 7.3.0]$ python3.8 impacket/examples/remcomsvc.py $ python3.8 -mtokenize impacket/examples/remcomsvc.py impacket/examples/remcomsvc.py:1670:0: error: EOF in multi-line statement
msg342807 -(view)Author: miss-islington (miss-islington)Date: 2019-05-18 18:27
New changesetabea73bf4a320ff658c9a98fef3d948a142e61a9 by Miss Islington (bot) (Anthony Sottile) in branch 'master':bpo-2180: Treat line continuation at EOF as a `SyntaxError` (GH-13401)https://github.com/python/cpython/commit/abea73bf4a320ff658c9a98fef3d948a142e61a9
msg342817 -(view)Author: Gregory P. Smith (gregory.p.smith)*(Python committer)Date: 2019-05-18 21:02
Thanks for figuring this one out Anthony! :)
History
DateUserActionArgs
2022-04-11 14:56:31adminsetgithub: 46433
2019-05-18 21:02:56gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: +msg342817

stage: patch review -> commit review
2019-05-18 18:27:30miss-islingtonsetnosy: +miss-islington
messages: +msg342807
2019-05-18 07:16:47gregory.p.smithsetassignee:gregory.p.smith

nosy: +gregory.p.smith
2019-05-18 01:39:12Anthony Sottilesetkeywords: +patch
stage: needs patch -> patch review
pull_requests: +pull_request13312
2019-04-07 14:32:44Anthony Sottilesetnosy: +Anthony Sottile

messages: +msg339576
versions: + Python 3.8, Python 3.9, - Python 3.1, Python 2.7, Python 3.2
2014-02-03 19:15:35BreamoreBoysetnosy: -BreamoreBoy
2011-09-08 01:39:11meador.ingesetmessages: +msg143716
stage: test needed -> needs patch
2010-09-27 03:19:42meador.ingesetnosy: +meador.inge
2010-09-20 21:51:51rhettingersetstatus: pending -> open

nosy: +rhettinger
messages: +msg116985

assignee:jhylton -> (no value)
2010-09-20 21:26:23BreamoreBoysetstatus: open -> pending
nosy: +BreamoreBoy
messages: +msg116977

2010-08-21 17:06:34BreamoreBoyunlinkissue1230484 dependencies
2010-08-21 17:03:39BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-02-16 02:26:11ajaksu2linkissue1230484 dependencies
2009-02-16 02:20:41ajaksu2setstage: test needed
versions: + Python 2.6, - Python 2.5
2008-03-20 03:08:15jafosetassignee:jhylton
nosy: +jhylton
2008-02-25 02:22:29jaredgrubbsetmessages: +msg62960
2008-02-25 01:59:17jaredgrubbsetmessages: +msg62956
2008-02-25 01:55:51jaredgrubbcreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp