This document proposes to lift some of the restrictions originally formulated inPEP 498 and to provide a formalized grammar for f-strings that can beintegrated into the parser directly. The proposed syntactic formalization off-strings will have some small side-effects on how f-strings are parsed andinterpreted, allowing for a considerable number of advantages for end users andlibrary developers, while also dramatically reducing the maintenance cost ofthe code dedicated to parsing f-strings.
When f-strings were originally introduced inPEP 498, the specification wasprovided without providing a formal grammar for f-strings. Additionally, thespecification contains several restrictions that are imposed so the parsing off-strings could be implemented into CPython without modifying the existinglexer. These limitations have been recognized previously and previous attemptshave been made to lift them inPEP 536, butnone of this work was ever implemented.Some of these limitations (collected originally byPEP 536) are:
>>>f'Magic wand:{bag['wand']}' ^SyntaxError: invalid syntax
>>> f'Magic wand { bag[\'wand\'] } string'SyntaxError: f-string expression portion cannot include a backslash>>> f'''A complex trick: {... bag['bag'] # recursive bags!... }'''SyntaxError: f-string expression part cannot include '#'# Ruby"#{ "#{1+2}" }"# JavaScript`${`${1+2}`}`# Swift"\("\(1+2)")"# C#$"{$"{1+2}"}"These limitations serve no purpose from a language user perspective andcan be lifted by giving f-string literals a regular grammar without exceptionsand implementing it using dedicated parse code.
The other issue that f-strings have is that the current implementation inCPython relies on tokenising f-strings asSTRING tokens and a post processing ofthese tokens. This has the following problems:
f"{y:=3}" is not an assignment expression).A version of this proposal was originallydiscussed on Python-Dev andpresented at the Python Language Summit 2022 where it was enthusiasticallyreceived.
By building on top of the new Python PEG Parser (PEP 617), this PEP proposesto redefine “f-strings”, especially emphasizing the clear separation of thestring component and the expression (or replacement,{...}) component.PEP 498summarizes the syntactical part of “f-strings” as the following:
In Python source code, an f-string is a literal string, prefixed with ‘f’, whichcontains expressions inside braces. The expressions are replaced with their values.
However,PEP 498 also contained a formal list of exclusions on whatcan or cannot be contained inside the expression component (primarily due to thelimitations of the existing parser). By clearly establishing the formal grammar, wenow also have the ability to define the expression component of an f-string as truly “anyapplicable Python expression” (in that particular context) without being boundby the limitations imposed by the details of our implementation.
The formalization effort and the premise above also has a significant benefit forPython programmers due to its ability to simplify and eliminate the obscurelimitations. This reduces the mental burden and the cognitive complexity off-string literals (as well as the Python language in general).
>>>f"These are the things:{", ".join(things)}">>>f"{source.removesuffix(".py")}.c: $(srcdir)/{source}">>>f"{f"{f"infinite"}"}"+" "+f"{f"nesting!!!"}"
This “feature” is not universally agreed to be desirable, and some users find this unreadable.For a discussion on the different views on this, see theconsiderations regarding quote reuse section.
>>>a=["hello","world"]>>>f"{'\n'.join(a)}"File "<stdin>", line 1 f"{'\n'.join(a)}" ^SyntaxError: f-string expression part cannot include a backslash
A common work-around for this was to either assign the newline to an intermediate variable orpre-create the whole string prior to creating the f-string:
>>>a=["hello","world"]>>>joined='\n'.join(a)>>>f"{joined}"'hello\nworld'
It only feels natural to allow backslashes in the expression part now that the new PEG parsercan easily support it.
>>>a=["hello","world"]>>>f"{'\n'.join(a)}"'hello\nworld'
>>>f"""{f'''{f'{f"{1+1}"}'}'''}"""'2'
As this PEP allows placingany valid Python expression inside theexpression component of the f-strings, it is now possible to reuse quotes andtherefore is possible to nest f-strings arbitrarily:
>>>f"{f"{f"{f"{f"{f"{1+1}"}"}"}"}"}"'2'
Although this is just a consequence of allowing arbitrary expressions, theauthors of this PEP do not believe that this is a fundamental benefit and wehave decided that the language specification will not explicitly mandate thatthis nesting can be arbitrary. This is because allowing arbitrarily-deepnesting imposes a lot of extra complexity to the lexer implementation(particularly as lexer/parser pipelines need to allow “untokenizing” tosupport the ‘f-string debugging expressions’ and this is especially taxing whenarbitrary nesting is allowed). Implementations are therefore free to impose alimit on the nesting depth if they need to. Note that this is not an uncommonsituation, as the CPython implementation already imposes several limits allover the place, including a limit on the nesting depth of parentheses andbrackets, a limit on the nesting of the blocks, a limit in the number ofbranches inif statements, a limit on the number of expressions instar-unpacking, etc.
The formal proposed PEG grammar specification for f-strings is (seePEP 617for details on the syntax):
fstring|FSTRING_STARTfstring_middle*FSTRING_ENDfstring_middle|fstring_replacement_field|FSTRING_MIDDLEfstring_replacement_field|'{'(yield_expr|star_expressions)"="?[ "!" NAME][ ':' fstring_format_spec*]'}'fstring_format_spec:|FSTRING_MIDDLE|fstring_replacement_field
The new tokens (FSTRING_START,FSTRING_MIDDLE,FSTRING_END) are definedlater in this document.
This PEP leaves up to the implementation the level of f-string nesting allowed(f-strings within the expression parts of other f-strings) butspecifies alower bound of 5 levels of nesting. This is to ensure that users can have areasonable expectation of being able to nest f-strings with “reasonable” depth.This PEP implies that limiting nesting isnot part of the languagespecification but also the language specificationdoesn’t mandate arbitrarynesting.
Similarly, this PEP leaves up to the implementation the level of expression nestingin format specifiers butspecifies a lower bound of 2 levels of nesting. This meansthat the following should always be valid:
f"{'':*^{1:{1}}}"
but the following can be valid or not depending on the implementation:
f"{'':*^{1:{1:{1}}}}"
The new grammar will preserve the Abstract Syntax Tree (AST) of the currentimplementation. This means that no semantic changes will be introduced by thisPEP on existing code that uses f-strings.
Since Python 3.8, f-strings can be used to debug expressions by using the= operator. For example:
>>>a=1>>>f"{1+1=}"'1+1=2'
This semantics were not introduced formally in a PEP and they were implementedin the current string parser as a special case inbpo-36817 and documented inthe f-string lexical analysis section.
This feature is not affected by the changes proposed in this PEP but isimportant to specify that the formal handling of this feature requires the lexerto be able to “untokenize” the expression part of the f-string. This is not aproblem for the current string parser as it can operate directly on the stringtoken contents. However, incorporating this feature into a given parserimplementation requires the lexer to keep track of the raw string contents ofthe expression part of the f-string and make them available to the parser whenthe parse tree is constructed for f-string nodes. A pure “untokenization” is notenough because as specified currently, f-string debug expressions preserve whitespace in the expression,including spaces after the{ and the= characters. This means that theraw string contents of the expression part of the f-string must be kept intactand not just the associated tokens.
How parser/lexer implementations deal with this problem is of course up to theimplementation.
Three new tokens are introduced:FSTRING_START,FSTRING_MIDDLE andFSTRING_END. Different lexers may have different implementations that may bemore efficient than the ones proposed here given the context of the particularimplementation. However, the following definitions will be used as part of thepublic APIs of CPython (such as thetokenize module) and are also providedas a reference so that the reader can have a better understanding of theproposed grammar changes and how the tokens are used:
FSTRING_START: This token includes the f-string prefix (f/F/fr) and the opening quote(s).FSTRING_MIDDLE: This token includes a portion of text inside the string that’s not part of theexpression part and isn’t an opening or closing brace. This can include the text between the opening quoteand the first expression brace ({), the text between two expression braces (} and{) and the textbetween the last expression brace (}) and the closing quote.FSTRING_END: This token includes the closing quote.These tokens are always string parts and they are semantically equivalent to theSTRING token with the restrictions specified. These tokens must be produced by the lexerwhen lexing f-strings. This means thatthe tokenizer cannot produce a single token for f-strings anymore.How the lexer emits this token isnot specified as this will heavily depend on everyimplementation (even the Python version of the lexer in the standard library is implementeddifferently to the one used by the PEG parser).
As an example:
f'some words{a+b:.3f} more words{c+d=} final words'
will be tokenized as:
FSTRING_START-"f'"FSTRING_MIDDLE-'some words 'LBRACE-'{'NAME-'a'PLUS-'+'NAME-'b'OP-':'FSTRING_MIDDLE-'.3f'RBRACE-'}'FSTRING_MIDDLE-' more words 'LBRACE-'{'NAME-'c'PLUS-'+'NAME-'d'OP-'='RBRACE-'}'FSTRING_MIDDLE-' final words'FSTRING_END-"'"
whilef"""somewords""" will be tokenized simply as:
FSTRING_START-'f"""'FSTRING_MIDDLE-'some words'FSTRING_END-'"""'
Thetokenize module will be adapted to emit these tokens as described in the previous sectionwhen parsing f-strings so tools can take advantage of this new tokenization schema and avoid havingto implement their own f-string tokenizer and parser.
One way existing lexers can be adapted to emit these tokens is to incorporate astack of “lexer modes” or to use a stack of different lexers. This is becausethe lexer needs to switch from “regular Python lexing” to “f-string lexing” whenit encounters an f-string start token and as f-strings can be nested, thecontext needs to be preserved until the f-string closes. Also, the “lexer mode”inside an f-string expression part needs to behave as a “super-set” of theregular Python lexer (as it needs to be able to switch back to f-string lexingwhen it encounters the} terminator for the expression part as well ashandling f-string formatting and debug expressions). For reference, here is adraft of the algorithm to modify a CPython-like tokenizer to emit these newtokens:
",""",' or''') and emit aFSTRING_START token with the contents captured (the ‘f/F’ and thestarting quote). Push a new tokenizer mode to the tokenizer mode stack for“F-string tokenization”. Go to step 2.{), aclosing brace (}), or a newline token (\n).{) ora closing brace (}) that is not immediately followed by another opening/closingbrace.In all cases, if the character buffer is not empty, emit aFSTRING_MIDDLEtoken with the contents captured so far but transform any doubleopening/closing braces into single opening/closing braces. Now, proceed asfollows depending on the character encountered:
: or a}character is encountered with the same level of nesting as the openingbracket token that was pushed when we enter the f-string part. Using this mode,emit tokens until one of the stop points are reached. When this happens, emitthe corresponding token for the stopping character encountered and, pop thecurrent tokenizer mode from the tokenizer mode stack and go to step 2. If thestopping point is a: character, enter step 2 in “format specifier” mode.FSTRING_END token with the contents captured and pop the currenttokenizer mode (corresponding to “F-string tokenization”) and go back to“Regular Python mode”.Of course, as mentioned before, it is not possible to provide a precisespecification of how this should be done for an arbitrary tokenizer as it willdepend on the specific implementation and nature of the lexer to be changed.
All restrictions mentioned in the PEP are lifted from f-string literals, as explained below:
>>>x=1>>>f"___{...x...}___"'___1___'>>>f"___{(...x...)}___"'___1___'
# character, are allowed within the expression part of an f-string.Note that comments require that the closing bracket (}) of the expression part to be present ina different line as the one the comment is in or otherwise it will be ignored as part of the comment.One of the consequences of the grammar proposed here is that, as mentioned above,f-string expressions can now contain strings delimited with the same kind of quotethat is used to delimit the external f-string literal. For example:
>>>f" something{my_dict["key"]} something else "
In thediscussion thread for this PEP,several concerns have been raised regarding this aspect and we want to collect them here,as these should be taken into consideration when accepting or rejecting this PEP.
Some of these objections include:
Here are some of the arguments in favour:
f(x+1), assuminga is a brand new variable, itshould behave the same asa=x+1;f(a). And vice versa. So if we have:defpy2c(source):prefix=source.removesuffix(".py")returnf"{prefix}.c"
It should be expected that if we replace the variableprefix with its definition, the answer should be the same:
defpy2c(source):returnf"{source.removesuffix(".py")}.c"
To gather feedback from the community,a pollhas been initiated to get a sense of how the community feels about this aspect of the PEP.
This PEP does not introduce any backwards incompatible syntactic or semantic changesto the Python language. However, thetokenize module (a quasi-public part of the standardlibrary) will need to be updated to support the new f-string tokens (to allow tool authorsto correctly tokenize f-strings). Seechanges to the tokenize module for more details regardinghow the public API oftokenize will be affected.
As the concept of f-strings is already ubiquitous in the Python community, there isno fundamental need for users to learn anything new. However, as the formalized grammarallows some new possibilities, it is important that the formal grammar is added to thedocumentation and explained in detail, explicitly mentioning what constructs are possiblesince this PEP is aiming to avoid confusion.
It is also beneficial to provide users with a simple framework for understanding what canbe placed inside an f-string expression. In this case the authors think that this work willmake it even simpler to explain this aspect of the language, since it can be summarized as:
You can place any valid Python expression inside an f-string expression.
With the changes in this PEP, there is no need to clarify that string quotes arelimited to be different from the quotes of the enclosing string, because this isnow allowed: as an arbitrary Python string can contain any possible choice ofquotes, so can any f-string expression. Additionally there is no need to clarifythat certain things are not allowed in the expression part because ofimplementation restrictions such as comments, new line characters orbackslashes.
The only “surprising” difference is that as f-strings allow specifying aformat, expressions that allow a: character at the top level still need to beenclosed in parenthesis. This is not new to this work, but it is important toemphasize that this restriction is still in place. This allows for an easiermodification of the summary:
You can place any valid Python expression insidean f-string expression, and everything after a:character at the top level willbe identified as a format specification.
A reference implementation can be found in theimplementation fork.
':' and'!' in parentheses at the top level, e.g.:>>>f'Useless use of lambdas:{lambdax: x*2}'SyntaxError: unexpected EOF while parsing
The reason is that this will introduce a considerable amount ofcomplexity for no real benefit. This is due to the fact that the: characternormally separates the f-string format specification. This format specificationis currently tokenized as a string. As the tokenizer MUST tokenize what’s on theright of the: as either a string or a stream of tokens, this won’t allow theparser to differentiate between the different semantics as that would require thetokenizer to backtrack and produce a different set of tokens (this is, first tryas a stream of tokens, and if it fails, try as a string for a format specifier).
As there is no fundamental advantage in being able to allow lambdas and similarexpressions at the top level, we have decided to keep the restriction that these mustbe parenthesized if needed:
>>>f'Useless use of lambdas:{(lambdax:x*2)}'
\{ and\})in addition to the{{ and}} syntax. Although the authors of the PEP believe thatallowing escaped braces is a good idea, we have decided to not include it in this PEP, as it is not strictlynecessary for the formalization of f-strings proposed here, and it can beadded independently in a regular CPython issue.None yet
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0701.rst
Last modified:2025-11-07 04:32:09 GMT