NotificationsYou must be signed in to change notification settings
Fork1
Star3

capture more errors in fstring replacement field#67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

pablogsal merged 5 commits intopablogsal:fstring-grammar-rebased-after-sprintfromsunmy2019:capture-more-errors-in-fstring-replacement-field

Apr 13, 2023

Merged

capture more errors in fstring replacement field#67

pablogsal merged 5 commits intopablogsal:fstring-grammar-rebased-after-sprintfromsunmy2019:capture-more-errors-in-fstring-replacement-field

Apr 13, 2023

Conversation

Copy link

sunmy2019 commentedApr 9, 2023

This PR aims to capture all possible partial failures infstring_replacement_field.

It breaks some tests, mostly due to change of the error message.

RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN was introduced to help user to locate the exact failure location.

capture more errors in fstring replacement field

040810a

sunmy2019 requested review frompablogsal andlysnikolaou ascode owners

April 9, 2023 20:34

Copy link

Author

sunmy2019 commentedApr 9, 2023

this reopens#66.

It has 3 main features:

Correct Place

it outputs the exact place of the first error thanks to the newly introducedRAISE_SYNTAX_ERROR_ON_NEXT_TOKEN
the old one cannot do that:

File"<stdin>",line1f"{123:{234+}}"                 ^old: SyntaxError: f-string: expecting '}'File"<stdin>",line1f"{123:{234+}}"                ^new: SyntaxError: f-string: expecting '=', or '!', or ':', or '}'

This is because the parser needs to look ahead, soRAISE_SYNTAX_ERROR will output on the farthest token seen.
In this example, it points to a} but expecting a}, which is kinda confusing.

Capture All Kinds of Errors

Actually we can prove this rule will capture nearly all possible kinds of failure infstring_replacement_field.

Friendlier Message

Sometimes you are expecting more than'}', the new one will output more to expect thanks to the rules being enriched.

File"<string>",line1f'{"s"!r{":10"}}'^old:SyntaxError:f-string:expecting'}'File"<string>",line1f'{"s"!r{":10"}}'^new:SyntaxError:f-string:expecting':'or'}'

CC:
@pablogsal @lysnikolaou @isidentical

lysnikolaou reviewed

Apr 10, 2023

View reviewed changes

Copy link

Collaborator

lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Generally looks great! Only on comment we could discuss.

Grammar/python.gram OutdatedShow resolvedHide resolved

Copy link

Author

sunmy2019 commentedApr 10, 2023

I will add a commit to fix the test case once the grammar in this pr is formalized and approved,

add some specific rules

2470dfa

sunmy2019 mentioned this pull request

Apr 11, 2023

PEP 701 – Syntactic formalization of f-stringspython/cpython#102856

Closed

4 tasks

Copy link

Owner

pablogsal commentedApr 12, 2023

The grammar changes make sense to me, but for this PR to get landed we need to fix the new failing tests for sure

add an entity for empty expression, fix tests

11cd51b

sunmy2019 commented

Apr 12, 2023

View reviewed changes

Lib/test/test_fstring.pyShow resolvedHide resolved

Copy link

Author

sunmy2019 commentedApr 12, 2023•
edited
Loading

During the testing with flag-ucpu, I discovered a new problem:

tokenize is a handwritten tokenizer in pure python, and its definition of tokens is different from what is used in c.

I think these updates of tokens should cause at least a modification oftokenize._tokenize, or even a huge one.

I am wondering if we can utilize what we have in the c module. Since now the f-string supports recursion, makingf'{f'{f'{f'{0}'}'}'}' much harder for handwritten tokenizer to keep track of things. And maintaining 2 systems sounds error-prone.

MWE

importtokenizefromioimportBytesIOf1=BytesIO(b"f'{0}'")f2=BytesIO(b"f'{0\n}'")print(*tokenize.tokenize(f1.readline),sep="\n")print("-"*80)print(*tokenize.tokenize(f2.readline),sep="\n")

TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string="f'{0}'", start=(1, 0), end=(1, 6), line="f'{0}'")TokenInfo(type=4 (NEWLINE), string='', start=(1, 6), end=(1, 7), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')--------------------------------------------------------------------------------TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')>> TokenInfo(type=1 (NAME), string='f', start=(1, 0), end=(1, 1), line="f'{0\n")>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(1, 1), end=(1, 2), line="f'{0\n")>> TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line="f'{0\n")>> TokenInfo(type=2 (NUMBER), string='0', start=(1, 3), end=(1, 4), line="f'{0\n")>> TokenInfo(type=66 (NL), string='\n', start=(1, 4), end=(1, 5), line="f'{0\n")>> TokenInfo(type=55 (OP), string='}', start=(2, 0), end=(2, 1), line="}'")>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(2, 1), end=(2, 2), line="}'")TokenInfo(type=4 (NEWLINE), string='', start=(2, 2), end=(2, 3), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')

Also, a case

f1=BytesIO(b"f'{f\"{0}\"}'")f2=BytesIO(b"f'{f'{0}'}'")print(*tokenize.tokenize(f1.readline),sep="\n")print("-"*80)print(*tokenize.tokenize(f2.readline),sep="\n")

TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string='f\'{f"{0}"}\'', start=(1, 0), end=(1, 11), line='f\'{f"{0}"}\'')TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')--------------------------------------------------------------------------------TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')TokenInfo(type=3 (STRING), string="f'{f'", start=(1, 0), end=(1, 5), line="f'{f'{0}'}'")TokenInfo(type=55 (OP), string='{', start=(1, 5), end=(1, 6), line="f'{f'{0}'}'")TokenInfo(type=2 (NUMBER), string='0', start=(1, 6), end=(1, 7), line="f'{f'{0}'}'")TokenInfo(type=55 (OP), string='}', start=(1, 7), end=(1, 8), line="f'{f'{0}'}'")TokenInfo(type=3 (STRING), string="'}'", start=(1, 8), end=(1, 11), line="f'{f'{0}'}'")TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')

Copy link

Author

sunmy2019 commentedApr 12, 2023

We might need to reach to the maintainer oftokenize for opinions.

Copy link

Author

sunmy2019 commentedApr 12, 2023•
edited
Loading

One possible solution is to use the c_tokenizer withpython#27924

With c_tokenizer,

TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f'{f'{{':{<123}}'\n")TokenInfo(type=25 (LBRACE), string='{', start=(1, 2), end=(1, 3), line="f'{f'{{':{<123}}'\n")TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 3), end=(1, 5), line="f'{f'{{':{<123}}'\n")TokenInfo(type=62 (FSTRING_MIDDLE), string='{', start=(1, 5), end=(1, 6), line="f'{f'{{':{<123}}'\n")TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 7), end=(1, 8), line="f'{f'{{':{<123}}'\n")TokenInfo(type=11 (COLON), string=':', start=(1, 8), end=(1, 9), line="f'{f'{{':{<123}}'\n")TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="f'{f'{{':{<123}}'\n")TokenInfo(type=20 (LESS), string='<', start=(1, 10), end=(1, 11), line="f'{f'{{':{<123}}'\n")TokenInfo(type=2 (NUMBER), string='123', start=(1, 11), end=(1, 14), line="f'{f'{{':{<123}}'\n")TokenInfo(type=26 (RBRACE), string='}', start=(1, 14), end=(1, 15), line="f'{f'{{':{<123}}'\n")TokenInfo(type=26 (RBRACE), string='}', start=(1, 15), end=(1, 16), line="f'{f'{{':{<123}}'\n")TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 16), end=(1, 17), line="f'{f'{{':{<123}}'\n")TokenInfo(type=4 (NEWLINE), string='', start=(1, 17), end=(1, 17), line="f'{f'{{':{<123}}'\n")

Copy link

Collaborator

lysnikolaou commentedApr 12, 2023

@sunmy2019 Thanks for raising this!@mgmacias95 is already workin on rewriting thetokenize module to incorporate the new changes. The latest update was that she'll have a first version by the end of the week.

lysnikolaou reviewed

Apr 12, 2023

View reviewed changes

Grammar/python.gram OutdatedShow resolvedHide resolved

Lib/test/test_fstring.py OutdatedShow resolvedHide resolved

specialize error message forlambda

143524b

lysnikolaou reviewed

Apr 12, 2023

View reviewed changes

Copy link

Collaborator

lysnikolaou left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One more typo nit (probably my fault, sorry).

Grammar/python.gram Outdated

		\| '{' a='!' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '!'") }
		\| '{' a=':' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before ':'") }
		\| '{' a='}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '}'") }
		\| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }

Copy link

Collaborator

lysnikolaouApr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	\| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambdaexpression are not allowed without parentheses") }
	\| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambdaexpressions are not allowed without parentheses") }

Copy link

Author

sunmy2019Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, but one thing to notice, this won't capture all forbidden usage of lambda, such asf"{1, lambda x:x}"

I got a new idea to capture all possible errors, maybe you can check the following commit?

handlelambda in f-string

91faa31

Copy link

Author

sunmy2019 commentedApr 12, 2023•
edited
Loading

@lysnikolaou Hi, check my latest commit91faa31 that handleslamdba properly.

File"<stdin>",line1f"{lambdax:x}"^^^^^^^^^SyntaxError:f-string:lambdaexpressionsarenotallowedwithoutparentheses

lysnikolaou approved these changes

Apr 12, 2023

View reviewed changes

Copy link

Collaborator

lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks great!

Copy link

Author

sunmy2019 commentedApr 12, 2023

Oh I am gonna revert that, since I found some corner cases.

Copy link

Author

sunmy2019 commentedApr 12, 2023•
edited
Loading

Oh, no need to revert.

I discovered that, this case

f"{1,lambdax:{123}}"

it compiles. It matches the rule oflambdef, since{123} is indeed a valid expression evaluates to a set, and noFSTRING_MIDDLE inside.

It is not relavent to my change, so I don't need to revert.

And with my change, we can inform user when to add parens.

To correct this behavior, we can insert an emptyFSTRING_MIDDLE in front of the replacement field.

pablogsal merged commitef9c43a intopablogsal:fstring-grammar-rebased-after-sprint

Apr 13, 2023

Copy link

Owner

pablogsal commentedApr 13, 2023

Mergin this for the time being as this is a great improvement and is getting quite large already.

Thanks a lot for the fantastic work@sunmy2019 🚀

Copy link

Collaborator

lysnikolaou commentedApr 13, 2023

Does this mean what I think it means? 🎉

Copy link

Collaborator

lysnikolaou commentedApr 13, 2023

We should have a passing test suite now!

sunmy2019 mentioned this pull request

Apr 13, 2023

gh-102856: Initial implementation of PEP 701python/cpython#102855

Merged

Copy link

Owner

pablogsal commentedApr 13, 2023•
edited
Loading

We should have a passing test suite now!

Almost! We have still a failure in the buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

I also pushed a fix forpython#102856 (comment)

But we are super super super close!!!

sunmy2019 deleted the capture-more-errors-in-fstring-replacement-field branch

May 10, 2023 14:00

sunmy2019 mentioned this pull request

May 20, 2023

gh-102856: Python tokenizer implementation for PEP 701python/cpython#104323

Merged

Labels

None yet

3 participants

Movatterモバイル変換

capture more errors in fstring replacement field#67

capture more errors in fstring replacement field#67

Uh oh!

Conversation

sunmy2019 commentedApr 9, 2023

Uh oh!

sunmy2019 commentedApr 9, 2023

Correct Place

Capture All Kinds of Errors

Friendlier Message

Uh oh!

lysnikolaou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sunmy2019 commentedApr 10, 2023

Uh oh!

pablogsal commentedApr 12, 2023

Uh oh!

Uh oh!

sunmy2019 commentedApr 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sunmy2019 commentedApr 12, 2023

Uh oh!

sunmy2019 commentedApr 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lysnikolaou commentedApr 12, 2023

Uh oh!

Uh oh!

Uh oh!

lysnikolaou left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lysnikolaouApr 12, 2023

Choose a reason for hiding this comment

Uh oh!

sunmy2019Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

sunmy2019 commentedApr 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lysnikolaou left a comment

Choose a reason for hiding this comment

Uh oh!

sunmy2019 commentedApr 12, 2023

Uh oh!

sunmy2019 commentedApr 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

pablogsal commentedApr 13, 2023

Uh oh!

lysnikolaou commentedApr 13, 2023

Uh oh!

lysnikolaou commentedApr 13, 2023

Uh oh!

pablogsal commentedApr 13, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sunmy2019 commentedApr 12, 2023•
edited
Loading

sunmy2019 commentedApr 12, 2023•
edited
Loading

lysnikolaou left a comment•
edited
Loading

sunmy2019 commentedApr 12, 2023•
edited
Loading

sunmy2019 commentedApr 12, 2023•
edited
Loading

pablogsal commentedApr 13, 2023•
edited
Loading