Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

capture more errors in fstring replacement field#67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

sunmy2019
Copy link

This PR aims to capture all possible partial failures infstring_replacement_field.

It breaks some tests, mostly due to change of the error message.

RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN was introduced to help user to locate the exact failure location.

@sunmy2019
Copy link
Author

this reopens#66.

It has 3 main features:

Correct Place

it outputs the exact place of the first error thanks to the newly introducedRAISE_SYNTAX_ERROR_ON_NEXT_TOKEN
the old one cannot do that:

File"<stdin>",line1f"{123:{234+}}"                 ^old: SyntaxError: f-string: expecting '}'File"<stdin>",line1f"{123:{234+}}"                ^new: SyntaxError: f-string: expecting '=', or '!', or ':', or '}'

This is because the parser needs to look ahead, soRAISE_SYNTAX_ERROR will output on the farthest token seen.
In this example, it points to a} but expecting a}, which is kinda confusing.

Capture All Kinds of Errors

Actually we can prove this rule will capture nearly all possible kinds of failure infstring_replacement_field.

Friendlier Message

Sometimes you are expecting more than'}', the new one will output more to expect thanks to the rules being enriched.

File"<string>",line1f'{"s"!r{":10"}}'^old:SyntaxError:f-string:expecting'}'File"<string>",line1f'{"s"!r{":10"}}'^new:SyntaxError:f-string:expecting':'or'}'

CC:
@pablogsal@lysnikolaou@isidentical

Copy link
Collaborator

@lysnikolaoulysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Generally looks great! Only on comment we could discuss.

@sunmy2019
Copy link
Author

I will add a commit to fix the test case once the grammar in this pr is formalized and approved,

lysnikolaou reacted with thumbs up emoji

@pablogsal
Copy link
Owner

The grammar changes make sense to me, but for this PR to get landed we need to fix the new failing tests for sure

lysnikolaou reacted with thumbs up emoji

@sunmy2019
Copy link
Author

sunmy2019 commentedApr 12, 2023
edited
Loading

During the testing with flag-ucpu, I discovered a new problem:

tokenize is a handwritten tokenizer in pure python, and its definition of tokens is different from what is used in c.

I think these updates of tokens should cause at least a modification oftokenize._tokenize, or even a huge one.

I am wondering if we can utilize what we have in the c module. Since now the f-string supports recursion, makingf'{f'{f'{f'{0}'}'}'}' much harder for handwritten tokenizer to keep track of things. And maintaining 2 systems sounds error-prone.

MWE

importtokenizefromioimportBytesIOf1=BytesIO(b"f'{0}'")f2=BytesIO(b"f'{0\n}'")print(*tokenize.tokenize(f1.readline),sep="\n")print("-"*80)print(*tokenize.tokenize(f2.readline),sep="\n")
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string="f'{0}'", start=(1, 0), end=(1, 6), line="f'{0}'")TokenInfo(type=4 (NEWLINE), string='', start=(1, 6), end=(1, 7), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')--------------------------------------------------------------------------------TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')>> TokenInfo(type=1 (NAME), string='f', start=(1, 0), end=(1, 1), line="f'{0\n")>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(1, 1), end=(1, 2), line="f'{0\n")>> TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line="f'{0\n")>> TokenInfo(type=2 (NUMBER), string='0', start=(1, 3), end=(1, 4), line="f'{0\n")>> TokenInfo(type=66 (NL), string='\n', start=(1, 4), end=(1, 5), line="f'{0\n")>> TokenInfo(type=55 (OP), string='}', start=(2, 0), end=(2, 1), line="}'")>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(2, 1), end=(2, 2), line="}'")TokenInfo(type=4 (NEWLINE), string='', start=(2, 2), end=(2, 3), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')

Also, a case

f1=BytesIO(b"f'{f\"{0}\"}'")f2=BytesIO(b"f'{f'{0}'}'")print(*tokenize.tokenize(f1.readline),sep="\n")print("-"*80)print(*tokenize.tokenize(f2.readline),sep="\n")
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string='f\'{f"{0}"}\'', start=(1, 0), end=(1, 11), line='f\'{f"{0}"}\'')TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')--------------------------------------------------------------------------------TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string="f'{f'", start=(1, 0), end=(1, 5), line="f'{f'{0}'}'")TokenInfo(type=55 (OP), string='{', start=(1, 5), end=(1, 6), line="f'{f'{0}'}'")TokenInfo(type=2 (NUMBER), string='0', start=(1, 6), end=(1, 7), line="f'{f'{0}'}'")TokenInfo(type=55 (OP), string='}', start=(1, 7), end=(1, 8), line="f'{f'{0}'}'")TokenInfo(type=3 (STRING), string="'}'", start=(1, 8), end=(1, 11), line="f'{f'{0}'}'")TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')

@sunmy2019
Copy link
Author

We might need to reach to the maintainer oftokenize for opinions.

@sunmy2019
Copy link
Author

sunmy2019 commentedApr 12, 2023
edited
Loading

One possible solution is to use the c_tokenizer withpython#27924

With c_tokenizer,

TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f'{f'{{':{<123}}'\n")TokenInfo(type=25 (LBRACE), string='{', start=(1, 2), end=(1, 3), line="f'{f'{{':{<123}}'\n")TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 3), end=(1, 5), line="f'{f'{{':{<123}}'\n")TokenInfo(type=62 (FSTRING_MIDDLE), string='{', start=(1, 5), end=(1, 6), line="f'{f'{{':{<123}}'\n")TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 7), end=(1, 8), line="f'{f'{{':{<123}}'\n")TokenInfo(type=11 (COLON), string=':', start=(1, 8), end=(1, 9), line="f'{f'{{':{<123}}'\n")TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="f'{f'{{':{<123}}'\n")TokenInfo(type=20 (LESS), string='<', start=(1, 10), end=(1, 11), line="f'{f'{{':{<123}}'\n")TokenInfo(type=2 (NUMBER), string='123', start=(1, 11), end=(1, 14), line="f'{f'{{':{<123}}'\n")TokenInfo(type=26 (RBRACE), string='}', start=(1, 14), end=(1, 15), line="f'{f'{{':{<123}}'\n")TokenInfo(type=26 (RBRACE), string='}', start=(1, 15), end=(1, 16), line="f'{f'{{':{<123}}'\n")TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 16), end=(1, 17), line="f'{f'{{':{<123}}'\n")TokenInfo(type=4 (NEWLINE), string='', start=(1, 17), end=(1, 17), line="f'{f'{{':{<123}}'\n")

@lysnikolaou
Copy link
Collaborator

@sunmy2019 Thanks for raising this!@mgmacias95 is already workin on rewriting thetokenize module to incorporate the new changes. The latest update was that she'll have a first version by the end of the week.

Copy link
Collaborator

@lysnikolaoulysnikolaou left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One more typo nit (probably my fault, sorry).

| '{' a='!' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '!'") }
| '{' a=':' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before ':'") }
| '{' a='}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '}'") }
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambdaexpression are not allowed without parentheses") }
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambdaexpressions are not allowed without parentheses") }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, but one thing to notice, this won't capture all forbidden usage of lambda, such asf"{1, lambda x:x}"

I got a new idea to capture all possible errors, maybe you can check the following commit?

@sunmy2019
Copy link
Author

sunmy2019 commentedApr 12, 2023
edited
Loading

@lysnikolaou Hi, check my latest commit91faa31 that handleslamdba properly.

File"<stdin>",line1f"{lambdax:x}"^^^^^^^^^SyntaxError:f-string:lambdaexpressionsarenotallowedwithoutparentheses

Copy link
Collaborator

@lysnikolaoulysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks great!

@sunmy2019
Copy link
Author

Oh I am gonna revert that, since I found some corner cases.

lysnikolaou reacted with eyes emoji

@sunmy2019
Copy link
Author

sunmy2019 commentedApr 12, 2023
edited
Loading

Oh, no need to revert.

I discovered that, this case

f"{1,lambdax:{123}}"

it compiles. It matches the rule oflambdef, since{123} is indeed a valid expression evaluates to a set, and noFSTRING_MIDDLE inside.

It is not relavent to my change, so I don't need to revert.

And with my change, we can inform user when to add parens.

To correct this behavior, we can insert an emptyFSTRING_MIDDLE in front of the replacement field.

@pablogsalpablogsal merged commitef9c43a intopablogsal:fstring-grammar-rebased-after-sprintApr 13, 2023
@pablogsal
Copy link
Owner

Mergin this for the time being as this is a great improvement and is getting quite large already.

Thanks a lot for the fantastic work@sunmy2019 🚀

@lysnikolaou
Copy link
Collaborator

Does this mean what I think it means? 🎉

@lysnikolaou
Copy link
Collaborator

We should have a passing test suite now!

@pablogsal
Copy link
Owner

pablogsal commentedApr 13, 2023
edited
Loading

We should have a passing test suite now!

Almost! We have still a failure in the buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

I also pushed a fix forpython#102856 (comment)

But we are super super super close!!!

@sunmy2019sunmy2019 deleted the capture-more-errors-in-fstring-replacement-field branchMay 10, 2023 14:00
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@lysnikolaoulysnikolaoulysnikolaou approved these changes

@pablogsalpablogsalAwaiting requested review from pablogsal

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@sunmy2019@pablogsal@lysnikolaou

[8]ページ先頭

©2009-2025 Movatter.jp