Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-102856: Initial implementation of PEP 701#102855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

pablogsal
Copy link
Member

@pablogsalpablogsal commentedMar 20, 2023
edited by bedevere-bot
Loading

tusharsadhwani reacted with hooray emoji
@pablogsalpablogsal changed the titleInitial implementation of PEP 701gh-102856: Initial implementation of PEP 701Mar 20, 2023
@pablogsalpablogsalforce-pushed thefstring-grammar-rebased-after-sprint branch from7cb2e44 toed0ef34CompareMarch 20, 2023 22:29
@ghost
Copy link

ghost commentedMar 26, 2023
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@pablogsalpablogsal marked this pull request as ready for reviewApril 13, 2023 10:05
@sunmy2019
Copy link
Member

One issue is that, with current grammar

f"{lambdax:{123}}"

will be recognized as a valid lambda, but

f"{lambdax: {123}}"f"{lambdax:{123}}"

won't. It definitely confuses the users.

I can't figure out an elegant way to fix this under current tokens. Since the info ofin_format_spec only exists when the token is being tokenized, then the information is lost when exiting that

One workaround is to emit an empty fstring_middle to prevent any further match by thelambdef.

Another workaround is to add 2 tokens:FSTRING_REPLACEMENT_FIELD_START/END, this preserves thein_format_spec info when passed to the parser.

@pablogsal
Copy link
MemberAuthor

@sunmy2019 the changes may also maketest_tokenize.TestRoundtrip.test_random_files fail for some cases, but that may be an older failure.

@sunmy2019
Copy link
Member

@sunmy2019 the changes may also maketest_tokenize.TestRoundtrip.test_random_files fail for some cases, but that may be an older failure.

I ran cpu heavy tests yeterday, and found this failure. See here:pablogsal#67 (comment)

Both the tokenize and the untokenize function needs a rewrite.

@pablogsalpablogsal added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 13, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@pablogsal for commit18f69e6 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 13, 2023
@pablogsal
Copy link
MemberAuthor

We are almost there! We have a failing test in some buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

I cannot reproduce in my mac machine. Maybe someone has more luck with a Linux system

@isidentical
Copy link
Member

I cannot reproduce in my mac machine. Maybe someone has more luck with a Linux system

No luck on my side either (with a Linux machine + debug build + refleaks) fortest_ast/test_fstring/test_tokenize. Trying the whole test suite (is there a specific option I might be missing?)

@lysnikolaou
Copy link
Member

I'm able to reproduce on a Debian container using Docker on my macOS. The problem has to do with code likeeval('f""'). When the f-string is too small, it results in eitherstart_char orpeek1 or bothhere to be EOF. For some reason, on this machine with this configuration they're not-1 (EOF), but rather255, which means that therelevant check intok_backup fails and we have a fatar error raised fromhere. I can't explain why they wouldn't beEOF until now, but I'm looking.

@lysnikolaou
Copy link
Member

More info. When running it with Python, I get the following error:

root@9ee555036b0f:/usr/src/cpython# cat t.pyeval('f"a"')root@9ee555036b0f:/usr/src/cpython# ./python t.pyFatalPythonerror:tok_backup:tok_backup:wrongcharacterPythonruntimestate:initializedCurrentthread0x0000ffff9de38750 (mostrecentcallfirst):File"/usr/src/cpython/t.py",line1in<module>Aborted

Here's a simple step thoughtok_get_fstring_mode on gdb in the last pass that generates the error

(gdb) file ./pythonReading symbols from ./python...warning: File "/usr/src/cpython/python-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".To enable execution of this file addadd-auto-load-safe-path /usr/src/cpython/python-gdb.pyline to your configuration file "/root/.gdbinit".To completely disable this security protection addset auto-load safe-path /line to your configuration file "/root/.gdbinit".For more information about this security protection see the"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:info "(gdb)Auto-loading safe path"(gdb) break tok_get_fstring_modeBreakpoint 1 at 0x17119c: file Parser/tokenizer.c, line 2442.(gdb) r t.pyStarting program: /usr/src/cpython/python t.pywarning: Error disabling address space randomization: Operation not permitted[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".Breakpoint 1, tok_get_fstring_mode (tok=0xaaab0ecfec60, current_tok=0xaaab0ecff7d0, token=0xffffc95f8278) at Parser/tokenizer.c:24422442{(gdb) cContinuing.Breakpoint 1, tok_get_fstring_mode (tok=0xaaab0ecfec60, current_tok=0xaaab0ecff7d0, token=0xffffc95f8278) at Parser/tokenizer.c:24422442{(gdb) p tok->cur$1 = 0xffff883fc1e3 "\""(gdb) p tok->buf$2 = 0xffff883fc1e0 "f\"a\""(gdb) n2448    tok->start = tok->cur;(gdb)2449    tok->first_lineno = tok->lineno;(gdb)2450    tok->starting_col_offset = tok->col_offset;(gdb)2454    char start_char = tok_nextc(tok);(gdb)2455    char peek1 = tok_nextc(tok);(gdb) p start_char$3 = 34 '"'(gdb) stok_nextc (tok=0xaaab0ecfec60) at Parser/tokenizer.c:11691169{(gdb) n1172        if (tok->cur != tok->inp) {(gdb)1176        if (tok->done != E_OK) {(gdb)1179        if (tok->fp == NULL) {(gdb)1180            rc = tok_underflow_string(tok);(gdb) stok_underflow_string (tok=0xaaab0ecfec60) at Parser/tokenizer.c:965965tok_underflow_string(struct tok_state *tok) {(gdb) list960    } while (tok->inp[-1] != '\n');961    return 1;962}963964static int965tok_underflow_string(struct tok_state *tok) {966    char *end = strchr(tok->inp, '\n');967    if (end != NULL) {968        end++;969    }(gdb)970    else {971        end = strchr(tok->inp, '\0');972        if (end == tok->inp) {973            tok->done = E_EOF;974            return 0;975        }976    }977    if (tok->start == NULL) {978        tok->buf = tok->cur;979    }(gdb) n966    char *end = strchr(tok->inp, '\n');(gdb)967    if (end != NULL) {(gdb)971        end = strchr(tok->inp, '\0');(gdb)972        if (end == tok->inp) {(gdb)973            tok->done = E_EOF;(gdb)974            return 0;(gdb)tok_nextc (tok=0xaaab0ecfec60) at Parser/tokenizer.c:11891189        if (tok->debug) {(gdb) list1184        }1185        else {1186            rc = tok_underflow_file(tok);1187        }1188#if defined(Py_DEBUG)1189        if (tok->debug) {1190            fprintf(stderr, "line[%d] = ", tok->lineno);1191            print_escape(stderr, tok->cur, tok->inp - tok->cur);1192            fprintf(stderr, "  tok->done = %d\n", tok->done);1193        }(gdb)1194#endif1195        if (!rc) {1196            tok->cur = tok->inp;1197            return EOF;1198        }1199        tok->line_start = tok->cur;12001201        if (contains_null_bytes(tok->line_start, tok->inp - tok->line_start)) {1202            syntaxerror(tok, "source code cannot contain null bytes");1203            tok->cur = tok->inp;(gdb) n1195        if (!rc) {(gdb)1196            tok->cur = tok->inp;(gdb)1197            return EOF;(gdb)tok_get_fstring_mode (tok=0xaaab0ecfec60, current_tok=0xaaab0ecff7d0, token=0xffffc95f8278) at Parser/tokenizer.c:24562456    tok_backup(tok, peek1);(gdb) p peek1$4 = 255 '\377'

@isidentical
Copy link
Member

isidentical commentedApr 13, 2023
edited
Loading

When the f-string is too small, it results in either start_char or peek1 or bothhere to be EOF.

Oh, this kind of makes sense. At least on how we got there. I wonder whether we could simply look at thepeek1 if thestart_char is{/}. This would prevent the secondarytok_nextc/tok_backup pair when in case the string is too small.

E.g. something like this (just as a hack to test if it works):

diff --git a/Parser/tokenizer.c b/Parser/tokenizer.cindex d88d737860..34f291cf89 100644--- a/Parser/tokenizer.c+++ b/Parser/tokenizer.c@@ -2452,8 +2452,14 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct     // If we start with a bracket, we defer to the normal mode as there is nothing for us to tokenize     // before it.     char start_char = tok_nextc(tok);-    char peek1 = tok_nextc(tok);-    tok_backup(tok, peek1);+    char peek1;+    if (start_char == '{' || start_char == '}') {+        peek1 = tok_nextc(tok);+        tok_backup(tok, peek1);+    }+    else {+        peek1 = '0';+    }     tok_backup(tok, start_char);      if ((start_char == '{' && peek1 != '{') || (start_char == '}' && peek1 != '}')) {

For me,eval(f"a")

@pablogsalpablogsal added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 13, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@pablogsal for commitd28efe1 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelApr 13, 2023
@lysnikolaou
Copy link
Member

lysnikolaou commentedApr 13, 2023
edited
Loading

When the f-string is too small, it results in either start_char or peek1 or bothhere to be EOF.

Oh, this kind of makes sense. At least on how we got there. I wonder whether we could simply look at thepeek1 if thestart_char is{/}. This would prevent the secondarytok_nextc/tok_backup pair when in case the string is too small.

Not sure whether this is the actual problem though.tok_backup is okay to handleEOF and, on the other platforms we're testing, everything seems to work okay. The reason is that every check will fail, until we reachthis which should be able to handle things correctly.

The big questions to me is how do we end up withpeek1 == 255, when it very clearly came fromreturn EOF and the subsequent checkc == EOF intok_backup fails.

@sunmy2019
Copy link
Member

sunmy2019 commentedApr 13, 2023
edited
Loading

The big questions to me is how do we end up withpeek1 == 255, when it very clearly came fromreturn EOF and the subsequent checkc == EOF intok_backup fails.

char is unsigned on those platforms (arm). Thus,

   char start_char = tok_nextc(tok);   char peek1 = tok_nextc(tok);

will lead to a 255.

Then 255 was converted to int again intok_backup.

I can reproduce this problem on x86 with

   unsigned char start_char = tok_nextc(tok);   unsigned char peek1 = tok_nextc(tok);

@sunmy2019
Copy link
Member

C allows anyint to convert tochar, which may silently change its value. It is exactly what we have in this case.

explicitly usingsigned char orint should fix this 255 problem. (I prefer theint one)

@lysnikolaou
Copy link
Member

Oooh, that's right! Didn't know that ARM has unsigned chars by default. Pushed a fix.

isidentical reacted with thumbs up emoji

@isidentical
Copy link
Member

Wow, thats a nice find!!

@pablogsal
Copy link
MemberAuthor

Same thing for@lysnikolaou 😉

@pablogsalpablogsal added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 18, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@pablogsal for commitafb310d 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 18, 2023
Copy link
Member

@lysnikolaoulysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM! Let's merge! 🚀

pablogsal reacted with heart emoji
Copy link
Member

@isidenticalisidentical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💫 Looks great, thanks everyone for their amazing work!!

lysnikolaou and pablogsal reacted with thumbs up emoji
@pablogsalpablogsal merged commit1ef61cf intopython:mainApr 19, 2023
@pablogsalpablogsal deleted the fstring-grammar-rebased-after-sprint branchApril 19, 2023 16:18
@pythonpython deleted a comment frombedevere-botApr 19, 2023
carljm added a commit to carljm/cpython that referenced this pull requestApr 20, 2023
* main: (24 commits)pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561)pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545)pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634)pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633)pythongh-102856: Initial implementation of PEP 701 (python#102855)pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589)pythongh-83004: Harden msvcrt further (python#103420)pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626)pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314)pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586)pythongh-103583: Always pass multibyte codec structs as const (python#103588)pythongh-103617: Fix compiler warning in _iomodule.c (python#103618)pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600)pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576)pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613)  [Doc] Fix a typo in optparse.rst (python#103504)pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531)pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039)pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569)pythongh-67230: update whatsnew note for csv changes (python#103598)  ...
@lysnikolaoulysnikolaou restored the fstring-grammar-rebased-after-sprint branchJuly 22, 2023 09:14
@lysnikolaoulysnikolaou deleted the fstring-grammar-rebased-after-sprint branchJuly 22, 2023 09:18
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 14, 2023
## SummaryThis PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.### GrammarWithout an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.### `string.rs`This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.### `Constant::kind` changed in the AST***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details> But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details> 2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details> ### ErrorsWith the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.## Test Plan1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.## Benchmarks#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 18, 2023
## SummaryThis PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.### GrammarWithout an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.### `string.rs`This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.### `Constant::kind` changed in the AST***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details> But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details> 2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details> ### ErrorsWith the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.## Test Plan1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.## Benchmarks#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 19, 2023
## SummaryThis PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.### GrammarWithout an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.### `string.rs`This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.### `Constant::kind` changed in the AST***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details> But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details> 2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details> ### ErrorsWith the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.## Test Plan1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.## Benchmarks#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 20, 2023
## SummaryThis PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.### GrammarWithout an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.### `string.rs`This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.### `Constant::kind` changed in the AST***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details> But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details> 2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details> ### ErrorsWith the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.## Test Plan1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.## Benchmarks#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 22, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 22, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 22, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 26, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 27, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 28, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 29, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
dhruvmanila added a commit to astral-sh/ruff that referenced this pull requestSep 29, 2023
This PR adds support for PEP 701 in the parser to use the new tokensemitted by the lexer to construct the f-string node.Without an official grammar, the f-strings were parsed manually. Nowthat we've the specification, that is being used in the LALRPOP to parsethe f-strings.This file includes the logic for parsing string literals and joining theimplicit string concatenation. Now that we don't require parsingf-strings manually a lot of code involving the same is removed.Earlier, there were 2 entry points to this module:* `parse_string`: Used to parse a single string literal* `parse_strings`: Used to parse strings which were implicitlyconcatenatedNow, there are 3 entry points:* `parse_string_literal`: Renamed from `parse_string`* `parse_fstring_middle`: Used to parse a `FStringMiddle` token which isbasically a string literal without the quotes* `concatenate_strings`: Renamed from `parse_strings` but now it takesthe parsed nodes instead. So, we just need to concatenate them into asingle node.> A short primer on `FStringMiddle` token: This includes the portion oftext inside the f-string that's not part of the expression and isn't anopening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the`foo `, `.3f` and ` bar` are `FStringMiddle` token content.***Discussion in the official implementation:python/cpython#102855 (comment)This change in the AST is when unicode strings (prefixed with `u`) andf-strings are used in an implicitly concatenated string value. Forexample,```pythonu"foo" f"{bar}" "baz" " some"```Pre Python 3.12, the kind field would be assigned only if the prefix wason the first string. So, taking the above example, both `"foo"` and`"baz some"` (implicit concatenation) would be given the `u` kind:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some', kind='u')```</p></details>But, post Python 3.12, only the string with the `u` prefix will beassigned the value:<details><summary>Pre 3.12 AST:</summary><p>```pythonConstant(value='foo', kind='u'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='baz some')```</p></details>Here are some more iterations around the change:1. `"foo" f"{bar}" u"baz" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno', kind='u')```</p></details>2. `"foo" f"{bar}" "baz" u"no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foo'),FormattedValue(  value=Name(id='bar', ctx=Load()),  conversion=-1),Constant(value='bazno')```</p></details>3. `u"foo" f"bar {baz} realy" u"bar" "no"`<details><summary>Pre 3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno', kind='u')```</p></details><details><summary>3.12</summary><p>```pythonConstant(value='foobar ', kind='u'),FormattedValue(  value=Name(id='baz', ctx=Load()),  conversion=-1),Constant(value=' realybarno')```</p></details>With the hand written parser, we were able to provide better errormessages in case of any errors such as the following but now they allare removed and in those cases an "unexpected token" error will bethrown by lalrpop:* A closing delimiter was not opened properly* An opening delimiter was not closed properly* Empty expression not allowedThe "Too many nested expressions in an f-string" was removed and insteadwe can create a lint rule for that.And, "The f-string expression cannot include the given character" wasremoved because f-strings now support those characters which are mainlysame quotes as the outer ones, escape sequences, comments, etc.1. Refactor existing test cases to use `parse_suite` instead of`parse_fstrings` (doesn't exists anymore)2. Additional test cases are added as requiredUpdated the snapshots. The change from `parse_fstrings` to `parse_suite`means that the snapshot would produce the module node instead of just alist of f-string parts. I've manually verified that the parts are stillthe same along with the node ranges.#7263 (comment)fixes:#7043fixes:#6835
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sunmy2019sunmy2019sunmy2019 requested changes

@Eclips4Eclips4Eclips4 left review comments

@lysnikolaoulysnikolaoulysnikolaou approved these changes

@isidenticalisidenticalisidentical approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@pablogsal@sunmy2019@bedevere-bot@isidentical@lysnikolaou@Eclips4

[8]ページ先頭

©2009-2025 Movatter.jp