Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-40612: Fix SyntaxError edge cases in traceback formatting#20072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
miss-islington merged 8 commits intomasterfromfix-traceback-syntax-error
May 15, 2020
Merged
Changes from1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
Refactor print_error_text()
  • Loading branch information
@gvanrossum
gvanrossum committedMay 13, 2020
commitea1c6f23492ea2baad8cefe7a47d07c0c7807579
64 changes: 43 additions & 21 deletionsPython/pythonrun.c
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -554,36 +554,58 @@ parse_syntax_error(PyObject *err, PyObject **message, PyObject **filename,
static void
print_error_text(PyObject *f, int offset, PyObject *text_obj)
{
const char *text;
const char *nl;

text = PyUnicode_AsUTF8(text_obj);
/* Convert text to a char pointer; return if error */
const char *text = PyUnicode_AsUTF8(text_obj);
if (text == NULL)
return;

if (offset >= 0) {
if (offset > 0 && (size_t)offset == strlen(text) && text[offset - 1] == '\n')
offset--;
for (;;) {
nl = strchr(text, '\n');
if (nl == NULL || nl-text >= offset)
break;
offset -= (int)(nl+1-text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just fyi this code right here is the one that I mentioned inwe-like-parsers#121 (comment)

Whereby a SyntaxError with an offset relative to the start of the file will end up pointing to the right place. I'm tempted to say we should eventually just remove it since the new parser will always provide line-relative offsets.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hm... Why would the offset in the SyntaxError object ever end up being file-relative? Do you know of any code that produces such SyntaxErrors?

Copy link
Member

@ammaraskarammaraskarMay 15, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In the old parser it was this code:

err_ret->offset=col_offset!=-1 ?col_offset+1 : ((int)(tok->cur-tok->buf));
len=tok->inp-tok->buf;
err_ret->text= (char*)PyObject_MALLOC(len+1);
if (err_ret->text!=NULL) {
if (len>0)
strncpy(err_ret->text,tok->buf,len);
err_ret->text[len]='\0';

tok->cur is the current read index of the tokenizer andtok->buf is thestart of the file. Also see how it copies the entire file up till the error into theSyntaxError.text field.

Try this out with the old parser as a quick example:

code="""\a =\\\\\\?"""try:compile(code,'<stdin>','exec')exceptSyntaxErrorase:print(e)print(e.lineno,e.offset)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh, you mean the opposite of file. :-) It occurs when it's not read from a file.

Also it doesn't occur all the time -- perhaps only when there's a continuation line? E.g. here all is good:

>>> try: compile("def f():\n 1+\n", "", "exec")... except SyntaxError as e: e... SyntaxError('invalid syntax', ('', 2, 4, ' 1+\n'))>>>

Copy link
MemberAuthor

@gvanrossumgvanrossumMay 15, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

FWIW, your example does show up weird in the old parser:

>>> code = """\a = \\    \\    \\?"""... ... ... >>> code'a = \\\n    \\\n    \\?'>>> try: compile(code, "", "exec")... except SyntaxError as e: e... SyntaxError('unexpected character after line continuation character', ('', 3, 19, 'a = \\\n    \\\n    \\?\n'))>>>

The new parser seems to solve the dilemma by suppressing the source text (also the offset is set to zero, meaning unknown):

>>> code = """\a = \\    \\    \\?"""... ... ... >>> >>> try: compile(code, "", "exec")... except Exception as e: e; e.lineno, e.offset, e.text... SyntaxError('unexpected character after line continuation character')(3, 0, None)>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Aah yes, it seems like it's just an issue with line continuations. I thought I had another example but it must have been a misunderstanding becausetok->buf gets advanced with newlines.

So I guess this code here inpythonrun.c is just for this one case?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Okay, so I have a serious question here. Is there a regression due to this PR for people using-X oldparser? I have tried to research this and cannot find a regression. The C code in pythonrun.c still skips through newlines. The traceback.py code doesn't, but it never did.

Here is what I did for research. First Python 3.8:

>>> import traceback; traceback.print_exception(None, SyntaxError("msg", ("f.py", 3, 10, "aaa\nbbb\nccc\n")), None)  File "f.py", line 3    aaabbbccc           ^SyntaxError: msg>>> raise SyntaxError("msg", ("f.py", 3, 10, "aaa\nbbb\nccc\n"))Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "f.py", line 3    ccc     ^SyntaxError: msg>>>

Then the master branch:

>>> import traceback; traceback.print_exception(None, SyntaxError("msg", ("f.py", 3, 10, "aaa\nbbb\nccc\n")), None)  File "f.py", line 3    aaabbbccc           ^SyntaxError: msg>>> raise SyntaxError("msg", ("f.py", 3, 10, "aaa\nbbb\nccc\n"))Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "f.py", line 3    ccc     ^SyntaxError: msg>>>

To me that looks like in both versions, the C formatter (invoked byraise) does the right thing, while traceback.py messes up the output.

text = nl+1;
}
while (*text == ' ' || *text == '\t' || *text == '\f') {
text++;
offset--;
}
/* Convert offset from 1-based to 0-based */
offset--;

/* Strip leading whitespace from text, adjusting offset as we go */
while (*text == ' ' || *text == '\t' || *text == '\f') {
text++;
offset--;
}

/* Calculate text length excluding trailing newline */
Py_ssize_t len = strlen(text);
if (len > 0 && text[len-1] == '\n')
len--;

/* Clip offset to at most len */
if (offset > len)
offset = len;

/* Skip past newlines embedded in text */
for (;;) {
const char *nl = strchr(text, '\n');
if (nl == NULL)
break;
Py_ssize_t inl = nl - text;
if (inl >= (Py_ssize_t)offset)
break;
inl += 1;
text += inl;
len -= inl;
offset -= (int)inl;
}

/* Print text */
PyFile_WriteString(" ", f);
PyFile_WriteString(text, f);
if (*text == '\0' || text[strlen(text)-1] != '\n')

/* Make sure there's a newline at the end */
if (text[len] != '\n')
PyFile_WriteString("\n", f);
if (offset == -1)

/* Don't print caret if it points to the left of the text */
if (offset < 0)
return;

/* Write caret line */
PyFile_WriteString(" ", f);
while (--offset > 0)
while (--offset >= 0)
PyFile_WriteString(" ", f);
PyFile_WriteString("^\n", f);
}
Expand Down

[8]ページ先頭

©2009-2025 Movatter.jp