Python Enhancement Proposals

Python »
PEP Index »
PEP 501

PEP 501 – General purpose template literal strings

Author:: Alyssa Coghlan <ncoghlan at gmail.com>, Nick Humrich <nick at humrich.us>
Discussions-To:

Table of Contents

Important

This PEP has been superseded byPEP 750.

Abstract

Though easy and elegant to use, Pythonf-stringscan be vulnerable to injection attacks when used to constructshell commands, SQL queries, HTML snippets and similar(for example,os.system(f"echo{message_from_user}")).This PEP introduces template literal strings (or “t-strings”),which have syntax and semantics that are similar to f-strings,but with rendering deferred untilformat() or anothertemplate rendering function is called on them.This will allow standard library calls, helper functionsand third party tools to safety and intelligently performappropriate escaping and other string processing on inputswhile retaining the usability and convenience of f-strings.

PEP Withdrawal

WhenPEP 750 was first published as a “tagged strings” proposal(allowing for arbitrary string prefixes), this PEP was kept open tocontinue championing the simpler “template literal” approach thatused a single dedicated string prefix to produce instances of a new“interpolation template” type.

TheOctober 2024 updatestoPEP 750 agreed that template strings were a better fit for Pythonthan the broader tagged strings concept.

All of the other concerns the authors of this PEP had withPEP 750were also either addressed in those updates, or else left in a statewhere they could reasonably be addressed in a future change proposal.

Due to the clear improvements in the updatedPEP 750 proposal,this PEP has been withdrawn in favour ofPEP 750.

Important

The remainder of this PEP still reflects the state of the tagged stringsproposal in August 2024. It hasnot been updated to reflect theOctober 2024 changes toPEP 750, since the PEP withdrawal makes doingso redundant.

Relationship with other PEPs

This PEP is inpired by and builds on top of the f-string syntax first implementedinPEP 498 and formalised inPEP 701.

This PEP complements the literal string typing support added to Python’s formal typesystem inPEP 675 by introducing asafe way to do dynamic interpolation of runtimevalues into security sensitive strings.

This PEP competes with some aspects of the tagged string proposal inPEP 750(most notably in whether template rendering is expressed asrender(t"templateliteral")or asrender"templateliteral"), but also sharesmany common features (afterPEP 750 was published, this PEP was updated withseveral new changesinspired by the tagged strings proposal).

This PEP does NOT propose an alternative toPEP 292 for user interfaceinternationalization use cases (but does note the potential for future syntacticenhancements aimed at that use case that would benefit from the compiler-supportedvalue interpolation machinery that this PEP andPEP 750 introduce).

Motivation

PEP 498 added new syntactic support for string interpolation that istransparent to the compiler, allowing name references from the interpolationoperation full access to containing namespaces (as with any other expression),rather than being limited to explicit name references. These are referredto in the PEP (and elsewhere) as “f-strings” (a mnemonic for “formatted strings”).

Since acceptance ofPEP 498, f-strings have become well-established and very popular.f-strings became even more useful and flexible with the formalised grammar inPEP 701.While f-strings are great, eager rendering has its limitations. For example, theeagerness of f-strings has made code like the following unfortunately plausible:

os.system(f"echo{message_from_user}")

This kind of code is superficially elegant, but poses a significant problemif the interpolated valuemessage_from_user is in fact provided by anuntrusted user: it’s an opening for a form of code injection attack, wherethe supplied user data has not been properly escaped before being passed totheos.system call.

While theLiteralString type annotation introduced inPEP 675 means that typecheckersare able to report a type error for this kind of unsafe function usage, those errors don’thelp make it easier to write code that uses safer alternatives (such assubprocess.run()).

To address that problem (and a number of other concerns), this PEP proposesthe complementary introduction of “t-strings” (a mnemonic for “template literal strings”),whereformat(t"Messagewith{data}") would produce the same result asf"Messagewith{data}", but the template literal instance can instead be passedto other template rendering functions which process the contents of the templatedifferently.

Proposal

Dedicated template literal syntax

This PEP proposes a new string prefix that declares thestring to be a template literal rather than an ordinary string:

template=t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"

This would be effectively interpreted as:

template=TemplateLiteral(r"Substitute {names:>{field_width}} and {expressions()} at runtime",TemplateLiteralText(r"Substitute "),TemplateLiteralField("names",names,f">{field_width}",""),TemplateLiteralText(r" and "),TemplateLiteralField("expressions()",expressions(),f"","r"),)

(Note: this is an illustrative example implementation. The exact compile time constructionsyntax oftypes.TemplateLiteral is considered an implementation detail not specified bythe PEP. In particular, the compiler may bypass the default constructor’s runtime logic thatdetects consecutive text segments and merges them into a single text segment, as well aschecking the runtime types of all supplied arguments).

The__format__ method ontypes.TemplateLiteral would thenimplement the followingstr.format() inspired semantics:

>>>importdatetime>>>name='Jane'>>>age=50>>>anniversary=datetime.date(1991,10,12)>>>format(t'My name is{name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B%d, %Y}.')'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'>>>format(t'She said her name is{name!r}.')"She said her name is 'Jane'."

The syntax of template literals would be based onPEP 701, and largely use the samesyntax for the string portion of the template. Aside from using a different prefix, the oneother syntactic change is in the definition and handling of conversion specifiers, both toallow!() as a standard conversion specifier to request evaluation of a field atrendering time, and to allow custom renderers to also define custom conversion specifiers.

This PEP does not propose to remove or deprecate any of the existingstring formatting mechanisms, as those will remain valuable when formattingstrings that are not present directly in the source code of the application.

Lazy field evaluation conversion specifier

In addition to the existing support for thea,r, ands conversion specifiers,str.format(),str.format_map(), andstring.Formatter will be updatedto accept() as a conversion specifier that means “call the interpolated value”.

To support application of the standard conversion specifiers in custom template renderingfunctions, a newoperator.convert_field() function will be added.

The signature and behaviour of theformat() builtin will also be updated to accept aconversion specifier as a third optional parameter. If a non-empty conversion specifieris given, the value will be converted withoperator.convert_field() before looking upthe__format__ method.

Custom conversion specifiers

To allow additional field-specific directives to be passed to custom rendering functions ina way that still allows formatting of the template with the default renderer, the conversionspecifier field will be allowed to contain a second! character.

operator.convert_field() andformat() (and hence the defaultTemplateLiteral.render template rendering method), will ignore that character and anysubsequent text in the conversion specifier field.

str.format(),str.format_map(), andstring.Formatter will also beupdated to accept (and ignore) custom conversion specifiers.

Template renderer for POSIX shell commands

As both a practical demonstration of the benefits of delayed rendering support, and asa valuable feature in its own right, a newsh template renderer will be added totheshlex module. This renderer will produce strings where all interpolated fieldsare escaped withshlex.quote().

Thesubprocess.Popen API (and higher level APIs that depend on it, such assubprocess.run()) will be updated to accept interpolation templates and handlethem in accordance with the newshlex.sh renderer.

Background

This PEP was initially proposed as a competitor toPEP 498. After it became clear thatthe eager rendering proposal had sustantially more immediate support, it then spent severalyears in a deferred state, pending further experience withPEP 498’s simpler approach ofonly supporting eager rendering without the additional complexity of also supporting deferredrendering.

Since then, f-strings have become very popular andPEP 701 was introduced to tidy up somerough edges and limitations in their syntax and semantics. The template literal proposalwas updated in 2023 to reflect current knowledge of f-strings, and improvements fromPEP 701.

In 2024,PEP 750 was published, proposing a general purpose mechanism for custom taggedstring prefixes, rather than the narrower template literal proposal in this PEP. This PEPwas again updated, both to incorporate new ideas inspired by the tagged strings proposal,and to describe the perceived benefits of the narrower template literal syntax proposalin this PEP over the more general tagged string proposal.

Summary of differences from f-strings

The key differences between f-strings and t-strings are:

thet (template literal) prefix indicates delayed rendering, butotherwise largely uses the same syntax and semantics as formatted strings
template literals are available at runtime as a new kind of object(types.TemplateLiteral)
the default rendering used by formatted strings is invoked on atemplate literal object by callingformat(template) rather thanbeing done implicitly in the compiled code
unlike f-strings (where conversion specifiers are handled directly in the compiler),t-string conversion specifiers are handled at rendering time by the rendering function
the new!() conversion specifier indicates that the field expression is a callablethat should be called when using the defaultformat() rendering function. Thisspecifier is specificallynot being added to f-strings (since it is pointless there).
a second! is allowed in t-string conversion specifiers (with any subsequent textbeing ignored) as a way to allow custom template rendering functions to accept customconversion specifiers without breaking the defaultTemplateLiteral.render()rendering method. This feature is specificallynot being added to f-strings (sinceit is pointless there).
while f-stringf"Message{here}" would besemantically equivalent toformat(t"Message{here}"), f-strings will continue to be supported directly in thecompiler and hence avoid the runtime overhead of actually using the delayed renderingmachinery that is needed for t-strings

Summary of differences from tagged strings

When tagged strings werefirst proposed,there were several notable differences from the proposal in PEP 501 beyond the surfacesyntax difference between whether rendering function invocations are written asrender(t"templateliteral") or asrender"templateliteral".

Over the course of the initial PEP 750 discussion, many of those differences were eliminated,either by PEP 501 adopting that aspect of PEP 750’s proposal (such as lazily applyingconversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501’s proposal(such as defining a dedicated type to hold template segments rather than representing themas simple sequences).

The main remaining significant difference is that this PEP argues that addingonly thet-string prefix is a sufficient enhancement to give all the desired benefits described inPEP 750. The expansion to a generalised “tagged string” syntax isn’t necessary, and causesadditional problems that can be avoided.

The two PEPs also differ in their proposed approaches to handling lazy evaluation of templatefields.

While thereare other differences between the two proposals, those differences are morecosmetic than substantive. In particular:

this PEP proposes different names for the structural typing protocols
this PEP proposes specific names for the concrete implementation types
this PEP proposes exact details for the proposed APIs of the concrete implementation types(including concatenation and repetition support, which are not part of the structuraltyping protocols)
this PEP proposes changes to the existingformat() builtin to make it usabledirectly as a template field renderer

The two PEPs also differ inhow they make their case for delayed rendering support. ThisPEP focuses more on the concrete implementation concept of using template literals to allowthe “interpolation” and “rendering” steps in f-string processing to be separated in time,and then taking advantage of that to reduce the potential code injection risks associatedwith misuse of f-strings. PEP 750 focuses more on the way that native templating supportallows behaviours that are difficult or impossible to achieve via existing string basedtemplating methods. As with the cosmetic differences noted above, this is more a differencein style than a difference in substance.

Rationale

f-strings (PEP 498) made interpolating values into strings with full access to Python’slexical namespace semantics simpler, but it does so at the cost of creating asituation where interpolating values into sensitive targets like SQL queries,shell commands and HTML templates will enjoy a much cleaner syntax when handledwithout regard for code injection attacks than when they are handled correctly.

This PEP proposes to provide the option of delaying the actual renderingof a template literal to a formatted string to its__format__ method, allowing the useof other template renderers by passing the template around as a first class object.

While very different in the technical details, thetypes.TemplateLiteral interface proposed in this PEP isconceptually quite similar to theFormattableString type underlying thenative interpolationsupport introduced in C# 6.0, as well as theJavaScript template literalsintroduced in ES6.

While not the original motivation for developing the proposal, many of the benefits fordefining domain specific languages described inPEP 750 also apply to this PEP(including the potential for per-DSL semantic highlighting in code editors based on thetype specifications of declared template variables and rendering function parameters).

Specification

This PEP proposes a newt string prefix thatresults in the creation of an instance of a new type,types.TemplateLiteral.

Template literals are Unicode strings (bytes literals are notpermitted), and string literal concatenation operates as normal, with theentire combined literal forming the template literal.

The template string is parsed into literals, expressions, format specifiers, and conversionspecifiers as described for f-strings inPEP 498 andPEP 701. The syntax for conversionspecifiers is relaxed such that arbitrary strings are accepted (excluding those containing{,} or:) rather than being restricted to valid Python identifiers.

However, rather than being rendered directly into a formatted string, thesecomponents are instead organised into instances of new types with thefollowing behaviour:

classTemplateLiteralText(str):# This is a renamed and extended version of the DecodedConcrete type in PEP 750# Real type would be implemented in C, this is an API compatible Python equivalent_raw:strdef__new__(cls,raw:str):decoded=raw.encode("utf-8").decode("unicode-escape")ifdecoded==raw:decoded=rawtext=super().__new__(cls,decoded)text._raw=rawreturntext@staticmethoddefmerge(text_segments:Sequence[TemplateLiteralText])->TemplateLiteralText:iflen(text_segments)==1:returntext_segments[0]returnTemplateLiteralText("".join(t._rawfortintext_segments))@propertydefraw(self)->str:returnself._rawdef__repr__(self)->str:returnf"{type(self).__name__}(r{self._raw!r})"def__add__(self,other:Any)->TemplateLiteralText|NotImplemented:ifisinstance(other,TemplateLiteralText):returnTemplateLiteralText(self._raw+other._raw)returnNotImplementeddef__mul__(self,other:Any)->TemplateLiteralText|NotImplemented:try:factor=operator.index(other)exceptTypeError:returnNotImplementedreturnTemplateLiteralText(self._raw*factor)__rmul__=__mul__classTemplateLiteralField(NamedTuple):# This is mostly a renamed version of the InterpolationConcrete type in PEP 750# However:#    - value is eagerly evaluated (values were all originally lazy in PEP 750)#    - conversion specifiers are allowed to be arbitrary strings#    - order of fields is adjusted so the text form is the first field and the#      remaining parameters match the updated signature of the `*format` builtin# Real type would be implemented in C, this is an API compatible Python equivalentexpr:strvalue:Anyformat_spec:str|None=Noneconversion_spec:str|None=Nonedef__repr__(self)->str:return(f"{type(self).__name__}({self.expr},{self.value!r}, "f"{self.format_spec!r},{self.conversion_spec!r})")def__str__(self)->str:returnformat(self.value,self.format_spec,self.conversion_spec)def__format__(self,format_override)->str:ifformat_override:format_spec=format_overrideelse:format_spec=self.format_specreturnformat(self.value,format_spec,self.conversion_spec)classTemplateLiteral:# This type corresponds to the TemplateConcrete type in PEP 750# Real type would be implemented in C, this is an API compatible Python equivalent_raw_template:str_segments=tuple[TemplateLiteralText|TemplateLiteralField]def__new__(cls,raw_template:str,*segments:TemplateLiteralText|TemplateLiteralField):self=super().__new__(cls)self._raw_template=raw_template# Check if there are any adjacent text segments that need merging# or any empty text segments that need discardingtype_err="Template literal segments must be template literal text or field instances"text_expected=Trueneeds_merge=Falseforsegmentinsegments:matchsegment:caseTemplateLiteralText():ifnottext_expectedornotsegment:needs_merge=Truebreaktext_expected=FalsecaseTemplateLiteralField():text_expected=Truecase_:raiseTypeError(type_err)ifnotneeds_merge:# Match loop above will have checked all segmentsself._segments=segmentsreturnself# Merge consecutive runs of text fields and drop any empty text fieldsmerged_segments:list[TemplateLiteralText|TemplateLiteralField]=[]pending_merge:list[TemplateLiteralText]=[]forsegmentinsegments:matchsegment:caseTemplateLiteralText()astext_segment:iftext_segment:pending_merge.append(text_segment)caseTemplateLiteralField():ifpending_merge:merged_segments.append(TemplateLiteralText.merge(pending_merge))pending_merge.clear()merged_segments.append(segment)case_:# First loop above may not check all segments when a merge is neededraiseTypeError(type_err)ifpending_merge:merged_segments.append(TemplateLiteralText.merge(pending_merge))pending_merge.clear()self._segments=tuple(merged_segments)returnself@propertydefraw_template(self)->str:returnself._raw_template@propertydefsegments(self)->tuple[TemplateLiteralText|TemplateLiteralField]:returnself._segmentsdef__len__(self)->int:returnlen(self._segments)def__iter__(self)->Iterable[TemplateLiteralText|TemplateLiteralField]:returniter(self._segments)# Note: template literals do NOT define any relative orderingdef__eq__(self,other):ifnotisinstance(other,TemplateLiteral):returnNotImplementedreturn(self._raw_template==other._raw_templateandself._segments==other._segmentsandself.field_values==other.field_valuesandself.format_specifiers==other.format_specifiers)def__repr__(self)->str:return(f"{type(self).__name__}(r{self._raw!r}, "f"{', '.join(map(repr,self._segments))})")def__format__(self,format_specifier)->str:# When formatted, render to a string, and then use string formattingreturnformat(self.render(),format_specifier)defrender(self,*,render_template=''.join,render_text=str,render_field=format):...# See definition of the template rendering semantics belowdef__add__(self,other)->TemplateLiteral|NotImplemented:ifisinstance(other,TemplateLiteral):combined_raw_text=self._raw+other._rawcombined_segments=self._segments+other._segmentsreturnTemplateLiteral(combined_raw_text,*combined_segments)ifisinstance(other,str):# Treat the given string as a new raw text segmentcombined_raw_text=self._raw+othercombined_segments=self._segments+(TemplateLiteralText(other),)returnTemplateLiteral(combined_raw_text,*combined_segments)returnNotImplementeddef__radd__(self,other)->TemplateLiteral|NotImplemented:ifisinstance(other,str):# Treat the given string as a new raw text segment. This effectively# has precedence over string concatenation in CPython due to# https://github.com/python/cpython/issues/55686combined_raw_text=other+self._rawcombined_segments=(TemplateLiteralText(other),)+self._segmentsreturnTemplateLiteral(combined_raw_text,*combined_segments)returnNotImplementeddef__mul__(self,other)->TemplateLiteral|NotImplemented:try:factor=operator.index(other)exceptTypeError:returnNotImplementedifnotselforfactor==1:returnselfiffactor<1:returnTemplateLiteral("")repeated_text=self._raw_template*factorrepeated_segments=self._segments*factorreturnTemplateLiteral(repeated_text,*repeated_segments)__rmul__=__mul__

(Note: this is an illustrative example implementation, the exact compile time constructionmethod and internal data management details oftypes.TemplateLiteral are considered animplementation detail not specified by the PEP. However, the expected post-constructionbehaviour of the public APIs ontypes.TemplateLiteral instances is specified by theabove code, as is the constructor signature for building template instances at runtime)

The result of a template literal expression is an instance of thistype, rather than an already rendered string. Rendering only takesplace when the instance’srender method is called (either directly, orindirectly via__format__).

The compiler will pass the following details to the template literal forlater use:

a string containing the raw template as written in the source code
a sequence of template segments, with each segment being either:
- a literal text segment (a regular Python string that also provides accessto its raw form)
- a parsed template interpolation field, specifying the text of the interpolatedexpression (as a regular string), its evaluated result, the format specifier text(with any substitution fields eagerly evaluated as an f-string), and the conversionspecifier text (as a regular string)

The raw template is just the template literal as a string. By default,it is used to provide a human-readable representation for thetemplate literal, but template renderers may also use it for other purposes (e.g. as acache lookup key).

The parsed template structure is taken fromPEP 750 and consists of a sequence oftemplate segments corresponding to the text segments and interpolation fields in thetemplate string.

This approach is designed to allow compilers to fully process each segment of the templatein order, before finally emitting code to pass all of the template segments to the templateliteral constructor.

For example, assuming the following runtime values:

names=["Alice","Bob","Carol","Eve"]field_width=10defexpressions():return42

The template from the proposal section would be represented at runtime as:

TemplateLiteral(r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",TemplateLiteralText(r"Substitute "),TemplateLiteralField("names",["Alice","Bob","Carol","Eve"],">10",""),TemplateLiteralText(r" and "),TemplateLiteralField("expressions()",42,"","r"),)

Rendering templates

TheTemplateLiteral.render implementation defines the renderingprocess in terms of the following renderers:

an overallrender_template operation that defines how the sequence ofrendered text and field segments are composed into a fully rendered result.The default template renderer is string concatenation using''.join.
a per text segmentrender_text operation that receives the individual literaltext segments within the template. The default text renderer is the builtinstrconstructor.
a per field segmentrender_field operation that receives the field value, formatspecifier, and conversion specifier for substitution fields within the template. Thedefault field renderer is theformat() builtin.

Given the parsed template representation above, the semantics of template rendering wouldthen be equivalent to the following:

defrender(self,*,render_template=''.join,render_text=str,render_field=format):rendered_segments=[]forsegmentinself._segments:matchsegment:caseTemplateLiteralText()astext_segment:rendered_segments.append(render_text(text_segment))caseTemplateLiteralField()asfield_segment:rendered_segments.append(render_field(*field_segment[1:]))returnrender_template(rendered_segments)

Format specifiers

The syntax and processing of field specifiers in t-strings is defined to be the same as itis for f-strings.

This includes allowing field specifiers to themselves contain f-string substitution fields.The raw text of the field specifiers (without processing any substitution fields) isretained as part of the full raw template string.

The parsed field specifiers receive the field specifier string with those substitutionsalready resolved. The: prefix is also omitted.

Aside from separating them out from the substitution expression during parsing,format specifiers are otherwise treated as opaque strings by the interpolationtemplate parser - assigning semantics to those (or, alternatively,prohibiting their use) is handled at rendering time by the field renderer.

Conversion specifiers

In addition to the existing support fora,r, ands conversion specifiers,str.format() andstr.format_map() will be updated to accept() as aconversion specifier that means “call the interpolated value”.

WherePEP 701 restricts conversion specifiers toNAME tokens, this PEP will insteadallowFSTRING_MIDDLE tokens (such that only{,} and: are disallowed). Thischange is made primarily to support lazy field rendering with the!() conversionspecifier, but also allows custom rendering functions more flexibility when defining theirown conversion specifiers in preference to those defined for the defaultformat() fieldrenderer.

Conversion specifiers are still handled as plain strings, and do NOT support the useof substitution fields.

The parsed conversion specifiers receive the conversion specifier string with the! prefix omitted.

To allow custom template renderers to define their own custom conversion specifiers withoutcausing the default renderer to fail, conversion specifiers will be permitted to contain acustom suffix prefixed with a second! character. That is,!!<custom>,!a!<custom>,!r!<custom>,!s!<custom>, and!()!<custom> would all bevalid conversion specifiers in a template literal.

As described above, the default rendering supports the original!a,!r and!sconversion specifiers defined inPEP 3101, together with the new!() lazy fieldevaluation conversion specifier defined in this PEP. The default rendering ignores anycustom conversion specifier suffixes.

The full mapping between the standard conversion specifiers and the special methods calledon the interpolated value when the field is rendered:

No conversion (empty string):__format__ (with format specifier as parameter)
a:__repr__ (as per theascii() builtin)
r:__repr__ (as per therepr() builtin)
s:__str__ (as per thestr builtin)
():__call__ (with no parameters)

When a conversion occurs,__format__ (with the format specifier) is called on the resultof the conversion rather than being called on the original object.

The changes toformat() and the addition ofoperator.convert_field() make itstraightforward for custom renderers to also support the standard conversion specifiers.

f-strings themselves will NOT support the new!() conversion specifier (as it isredundant when value interpolation and value rendering always occur at the same time). Theyalso will NOT support the use of custom conversion specifiers (since the rendering functionis known at compile time and doesn’t make use of the custom specifiers).

New field conversion API in the `operator` module

To support application of the standard conversion specifiers in custom template renderingfunctions, a newoperator.convert_field() function will be added:

defconvert_field(value,conversion_spec=''):"""Apply the given string formatting conversion specifier to the given value"""std_spec,sep,custom_spec=conversion_spec.partition("!")matchstd_spec:case'':returnvaluecase'a':returnascii(value)case'r':returnrepr(value)case's':returnstr(value)case'()':returnvalue()ifnotsep:err=f"Invalid conversion specifier{std_spec!r}"else:err=f"Invalid conversion specifier{std_spec!r} in{conversion_spec!r}"raiseValueError(f"{err}: expected '', 'a', 'r', 's' or '()')

Conversion specifier parameter added to `format()`

The signature and behaviour of theformat() builtin will be updated:

defformat(value,format_spec='',conversion_spec=''):ifconversion_spec:value_to_format=operator.convert_field(value)else:value_to_format=valuereturntype(value_to_format).__format__(value,format_spec)

If a non-empty conversion specifier is given, the value will be converted withoperator.convert_field() before looking up the__format__ method.

The signature of the__format__ special method does NOT change (only format specifiersare handled by the object being formatted).

Structural typing and duck typing

To allow custom renderers to accept alternative interpolation template implementations(rather than being tightly coupled to the native template literal types), thefollowing structural protocols will be added to thetyping module:

@runtime_checkableclassTemplateText(Protocol):# Renamed version of PEP 750's Decoded protocoldef__str__(self)->str:...raw:str@runtime_checkableclassTemplateField(Protocol):# Renamed and modified version of PEP 750's Interpolation protocoldef__len__(self):...def__getitem__(self,index:int):...def__str__(self)->str:...expr:strvalue:Anyformat_spec:str|None=Noneconversion_spec:str|None=None@runtime_checkableclassInterpolationTemplate(Protocol):# Corresponds to PEP 750's Template protocoldef__iter__(self)->Iterable[TemplateText|TemplateField]:...raw_template:str

Note that the structural protocol APIs are substantially narrower than the fullimplementation APIs defined forTemplateLiteralText,TemplateLiteralField,andTemplateLiteral.

Code that wants to accept interpolation templates and define specific handling for themwithout introducing a dependency on thetyping module, or restricting the code tohandling the concrete template literal types, should instead perform an attributeexistence check onraw_template.

Writing custom renderers

Writing a custom renderer doesn’t require any special syntax. Instead,custom renderers are ordinary callables that process an interpolationtemplate directly either by calling therender() method with alternaterender_template,render_text, and/orrender_field implementations, or byaccessing the template’s data attributes directly.

For example, the following function would render a template using objects’repr implementations rather than their native formatting support:

defrepr_format(template):defrender_field(value,format_spec,conversion_spec):converted_value=operator.convert_field(value,conversion_spec)returnformat(repr(converted_value),format_spec)returntemplate.render(render_field=render_field)

The customer renderer shown respects the conversion specifiers in the original template, butit is also possible to ignore them and render the interpolated values directly:

definput_repr_format(template):defrender_field(value,format_spec,__):returnformat(repr(value),format_spec)returntemplate.render(render_field=render_field)

When writing custom renderers, note that the return type of the overallrendering operation is determined by the return type of the passed inrender_templatecallable. While this will still be a string for formatting related use cases, producingnon-string objectsis permitted. For example, a custom SQLtemplate renderer could involve ansqlalchemy.sql.text call that produces anSQL Alchemy query object.A subprocess invocation related template renderer could produce a string sequence suitablefor passing tosubprocess.run, or it could even callsubprocess.run directly, andreturn the result.

Non-strings may also be returned fromrender_text andrender_field, as long asthey are paired with arender_template implementation that expects that behaviour.

Custom renderers using the pattern matching style described inPEP 750 are also supported:

# Use the structural typing protocols rather than the concrete implementation typesfromtypingimportInterpolationTemplate,TemplateText,TemplateFielddefgreet(template:InterpolationTemplate)->str:"""Render an interpolation template using structural pattern matching."""result=[]forsegmentintemplate:matchsegment:matchsegment:caseTemplateText()astext_segment:result.append(text_segment)caseTemplateField()asfield_segment:result.append(str(field_segment).upper())returnf"{''.join(result)}!"

Expression evaluation

As with f-strings, the subexpressions that are extracted from the interpolationtemplate are evaluated in the context where the template literalappears. This means the expression has full access to local, nonlocal and global variables.Any valid Python expression can be used inside{}, includingfunction and method calls.

Because the substitution expressions are evaluated where the string appears inthe source code, there are no additional security concerns related to thecontents of the expression itself, as you could have also just written thesame expression and used runtime field parsing:

>>>bar=10>>>deffoo(data):...returndata+20...>>>str(t'input={bar}, output={foo(bar)}')'input=10, output=30'

Is essentially equivalent to:

>>>'input={}, output={}'.format(bar,foo(bar))'input=10, output=30'

Handling code injection attacks

ThePEP 498 formatted string syntax makes it potentially attractive to writecode like the following:

runquery(f"SELECT{column} FROM{table};")runcommand(f"cat{filename}")return_response(f"<html><body>{response.body}</body></html>")

These all represent potential vectors for code injection attacks, if any of thevariables being interpolated happen to come from an untrusted source. Thespecific proposal in this PEP is designed to make it straightforward to writeuse case specific renderers that take care of quoting interpolated valuesappropriately for the relevant security context:

runquery(sql(t"SELECT{column} FROM{table} WHERE column={value};"))runcommand(sh(t"cat{filename}"))return_response(html(t"<html><body>{response.body}</body></html>"))

This PEP does not cover adding all such renderers to the standard libraryimmediately (though one for shell escaping is proposed), but rather proposes to ensurethat they can be readily provided by third party libraries, and potentially incorporatedinto the standard library at a later date.

Over time, it is expected that APIs processing potentially dangerous string inputs may beupdated to accept interpolation templates natively, allowing problematic code examples tobe fixed simply by replacing thef string prefix with at:

runquery(t"SELECT{column} FROM{table};")runcommand(t"cat{filename}")return_response(t"<html><body>{response.body}</body></html>")

It is proposed that a renderer is included in theshlex module, aiming to offer amore POSIX shell style experience for accessing external programs, without the significantrisks posed by runningos.system or enabling the system shell when using thesubprocess module APIs. This renderer will provide an interface for running externalprograms inspired by that offered by theJulia programming language,only with the backtick based\`cat$filename\` syntax replaced byt"cat{filename}"style template literals. See more in theRenderer for shell escaping added to shlex section.

Error handling

Either compile time or run time errors can occur when processing interpolationexpressions. Compile time errors are limited to those errors that can bedetected when parsing a template string into its component tuples. Theseerrors all raise SyntaxError.

Unmatched braces:

>>>t'x={x'  File"<stdin>", line1t'x={x'^SyntaxError:missing '}' in template literal expression

Invalid expressions:

>>> t'x={!x}'  File "<fstring>", line 1    !x    ^SyntaxError: invalid syntax

Run time errors occur when evaluating the expressions inside atemplate string before creating the template literal object. SeePEP 498for some examples.

Different renderers may also impose additional runtimeconstraints on acceptable interpolated expressions and other formattingdetails, which will be reported as runtime exceptions.

Renderer for shell escaping added to `shlex`

As a reference implementation, a renderer for safe POSIX shell escaping can be added totheshlex module. This renderer would be calledsh and would be equivalent tocallingshlex.quote on each field value in the template literal.

Thus:

os.system(shlex.sh(t'cat{myfile}'))

would have the same behavior as:

os.system('cat '+shlex.quote(myfile)))

The implementation would be:

defsh(template:TemplateLiteral):defrender_field(value,format_spec,conversion_spec)field_text=format(value,format_spec,conversion_spec)returnquote(field_text)returntemplate.render(render_field=render_field)

The addition ofshlex.sh will NOT change the existing admonishments in thesubprocess documentation that passingshell=True is best avoided, nor thereference from theos.system() documentation the higher levelsubprocess APIs.

Changes to subprocess module

With the additional renderer in the shlex module, and the addition of template literals,thesubprocess module can be changed to handle accepting template literalsas an additional input type toPopen, as it already accepts a sequence, or a string,with different behavior for each.

With the addition of template literals,subprocess.Popen (and in return, all itshigher level functions such assubprocess.run()) could accept strings in a safe way(at least onPOSIX systems).

For example:

subprocess.run(t'cat{myfile}',shell=True)

would automatically use theshlex.sh renderer provided in this PEP. Therefore, usingshlex inside asubprocess.run call like so:

subprocess.run(shlex.sh(t'cat{myfile}'),shell=True)

would be redundant, asrun would automatically render any template literalsthroughshlex.sh

Alternatively, whensubprocess.Popen is run withoutshell=True, it could stillprovide subprocess with a more ergonomic syntax. For example:

subprocess.run(t'cat{myfile} --flag{value}')

would be equivalent to:

subprocess.run(['cat',myfile,'--flag',value])

or, more accurately:

subprocess.run(shlex.split(f'cat{shlex.quote(myfile)} --flag{shlex.quote(value)}'))

It would do this by first using theshlex.sh renderer, as above, then usingshlex.split on the result.

The implementation insidesubprocess.Popen._execute_child would look like:

ifhasattr(args,"raw_template"):importshlexifshell:args=[shlex.sh(args)]else:args=shlex.split(shlex.sh(args))

How to Teach This

This PEP intentionally includes two standard renderers that will always be available inteaching environments: theformat() builtin and the newshlex.sh POSIX shellrenderer.

Together, these two renderers can be used to build an initial understanding of delayedrendering on top of a student’s initial introduction to string formatting with f-strings.This initial understanding would have the goal of allowing students touse templateliterals effectively, in combination with pre-existing template rendering functions.

For example,f"{'sometext'}",f"{value}",f"{value!r}", ,f"{callable()}"could all be introduced.

Those same operations could then be rewritten asformat(t"{'sometext'}"),format(t"{value}"),format(t"{value!r}"), ,format(t"{callable()}") toillustrate the relationship between the eager rendering form and the delayed renderingform.

The difference between “template definition time” (or “interpolation time” ) and“template rendering time” can then be investigated further by storing the template literalsas local variables and looking at their representations separately from the results of theformat calls. At this point, thet"{callable!()}" syntax can be introduced todistinguish between field expressions that are called at template definition time and thosethat are called at template rendering time.

Finally, the differences between the results off"{'sometext'}",format(t"{'sometext'}"), andshlex.sh(t"{'sometext'}") could be explored toillustrate the potential for differences between the default rendering function and customrendering functions.

Actually defining your own custom template rendering functions would then be a separate moreadvanced topic (similar to the way students are routinely taught to use decorators andcontext managers well before they learn how to write their own custom ones).

PEP 750 includes further ideas for teaching aspects of the delayed rendering topic.

Discussion

Refer toPEP 498 for previous discussion, as several of the points therealso apply to this PEP.PEP 750’s design discussions are also highly relevant,as that PEP inspired several aspects of the current design.

Support for binary interpolation

As f-strings don’t handle byte strings, neither will t-strings.

Interoperability with str-only interfaces

For interoperability with interfaces that only accept strings, interpolationtemplates can still be prerendered withformat(), rather than delegating therendering to the called function.

This reflects the key difference fromPEP 498, whichalways eagerly appliesthe default rendering, without any way to delegate the choice of renderer toanother section of the code.

Preserving the raw template string

Earlier versions of this PEP failed to make the raw template string availableon the template literal. Retaining it makes it possible to provide a moreattractive template representation, as well as providing the ability toprecisely reconstruct the original string, including both the expression textand the details of any eagerly rendered substitution fields in format specifiers.

Creating a rich object rather than a global name lookup

Earlier versions of this PEP used an__interpolate__ builtin, rather thancreating a new kind of object for later consumption by interpolationfunctions. Creating a rich descriptive object with a useful default renderermade it much easier to support customisation of the semantics of interpolation.

Building atop f-strings rather than replacing them

Earlier versions of this PEP attempted to serve as a complete substitute forPEP 498 (f-strings) . With the acceptance of that PEP and the more recentPEP 701,this PEP can instead build a more flexible delayed rendering capabilityon top of the existing f-string eager rendering.

Assuming the presence of f-strings as a supporting capability simplified anumber of aspects of the proposal in this PEP (such as how to handle substitutionfields in format specifiers).

Defining repetition and concatenation semantics

This PEP explicitly defines repetition and concatenation semantics forTemplateLiteralandTemplateLiteralText. While not strictly necessary, defining these is expectedto make the types easier to work with in code that historically only supported regularstrings.

New conversion specifier for lazy field evaluation

The initially published version ofPEP 750 defaulted to lazy evaluation for allinterpolation fields. While it was subsequently updated to default to eager evaluation(as happens for f-strings and this PEP), the discussions around the topic prompted the ideaof providing a way to indicate to rendering functions that the interpolated field valueshould be called at rendering time rather than being used without modification.

Since PEP 750 also deferred the processing of conversion specifiers until evaluation time,the suggestion was put forward that invoking__call__ without arguments could be seenas similar to the existing conversion specifiers that invoke__repr__ (!a,!r)or__str__ (!s).

Accordingly, this PEP was updated to also make conversion specifier processing theresponsibility of rendering functions, and to introduce!() as a new conversionspecifier for lazy evaluation.

Addingoperator.convert_field() and updating theformat() builtin was thana matter of providing appropriate support to rendering function implementations thatwanted to accept the default conversion specifiers.

Allowing arbitrary conversion specifiers in custom renderers

Accepting!() as a new conversion specifier necessarily requires updating the syntaxthat the parser accepts for conversion specifiers (they are currently restricted toidentifiers). This then raised the question of whether t-string compilation should enforcethe additional restriction that f-string compilation imposes: that the conversion specifierbe exactly one of!a,!r, or!s.

With t-strings already being updated to allow!() when compiled, it made sense to treatconversion specifiers as relating to rendering function similar to the way that formatspecifiers related to the formatting of individual objects: aside from some characters thatare excluded for parsing reasons, they are otherwise free text fields with the meaningdecided by the consuming function or object. This reduces the temptation to introducerenderer specific metaformatting into the template’s format specifiers (since anyrenderer specific information can be placed in the conversion specifier instead).

Only reserving a single new string prefix

The primary difference between this PEP andPEP 750 is that the latter aims to enablethe use of arbitrary string prefixes, rather than requiring the creation of templateliteral instances that are then passed to other APIs. For example, PEP 750 would allowthesh render described in this PEP to be used assh"cat{somefile}" rather thanrequiring the template literal to be created explicitly and then passed to a regularfunction call (as insh(t"cat{somefile}")).

The main reason the PEP authors prefer the second spelling is because it makes it clearerto a reader what is going on: a template literal instance is being created, and thenpassed to a callable that knows how to do something useful with interpolation templateinstances.

Adraft proposalfrom one of thePEP 750 authors also suggests that static typecheckers will be ableto infer the use of particular domain specific languages just as readily from the formthat uses an explicit function call as they would be able to infer it from a directlytagged string.

With the tagged string syntax at least arguably reducing clarity for human readers withoutincreasing the overall expressiveness of the construct, it seems reasonable to start withthe smallest viable proposal (a single new string prefix), and then revisit the potentialvalue of generalising to arbitrary prefixes in the future.

As a lesser, but still genuine, consideration, only using a single new string prefix forthis use case leaves open the possibility of defining alternate prefixes in the future thatstill produceTemplateLiteral objects, but use a different syntax within the string todefine the interpolation fields (see thei18n discussion below).

Deferring consideration of more concise delayed evaluation syntax

During the discussions of delayed evaluation,{->expr} wassuggestedas potential syntactic sugar for the already supportedlambda based syntax:{(lambda:expr)} (the parentheses are required in the existing syntax to avoidmisinterpretation of the: character as indicating the start of the format specifier).

While adding such a spelling would complement the rendering time function call syntaxproposed in this PEP (that is, writing{->expr!()} to evaluate arbitrary expressionsat rendering time), it is a topic that the PEP authors consider to be better left to afuture PEP if this PEP orPEP 750 is accepted.

Deferring consideration of possible logging integration

One of the challenges with the logging module has been that we have previouslybeen unable to devise a reasonable migration strategy away from the use ofprintf-style formatting. While the logging module does allow formatters to specify theuse ofstr.format() orstring.Template style substitution, it can be awkwardto ensure that messages written that way are only ever processed by log record formattersthat are expecting that syntax.

The runtime parsing and interpolation overhead for logging messages also poses a problemfor extensive logging of runtime events for monitoring purposes.

While beyond the scope of this initial PEP, template literal supportcould potentially be added to the logging module’s event reporting APIs,permitting relevant details to be captured using forms like:

logging.debug(t"Event:{event}; Details:{data}")logging.critical(t"Error:{error}; Details:{data}")

Rather than the historical mod-formatting style:

logging.debug("Event:%s; Details:%s",event,data)logging.critical("Error:%s; Details:%s",event,data)

As the template literal is passed in as an ordinary argument, otherkeyword arguments would also remain available:

logging.critical(t"Error:{error}; Details:{data}",exc_info=True)

The approach to standardising lazy field evaluation described in this PEP isprimarily based on the anticipated needs of this hypothetical integration intothe logging module:

logging.debug(t"Eager evaluation of {expensive_call()}")logging.debug(t"Lazy evaluation of {expensive_call!()}")logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")

It’s an open question whether the definition of logging formatters would be updated tosupport template strings, but if they were, the most likely way of defining fields whichshould belooked up on the log record instead of beinginterpreted eagerly is simply to escape them so they’re available as part of the literaltext:

proc_id=get_process_id()formatter=logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")

Deferring consideration of possible use in i18n use cases

The initial motivating use case for this PEP was providing a cleaner syntaxfor i18n (internationalization) translation, as that requires access to the originalunmodified template. As such, it focused on compatibility with the substitution syntaxused in Python’sstring.Template formatting and Mozilla’s l20n project.

However, subsequent discussion revealed there are significant additionalconsiderations to be taken into account in the i18n use case, which don’timpact the simpler cases of handling interpolation into security sensitivecontexts (like HTML, system shells, and database queries), or producingapplication debugging messages in the preferred language of the developmentteam (rather than the native language of end users).

Due to that realisation, the PEP was switched to use thestr.format() substitutionsyntax originally defined inPEP 3101 and subsequently used as the basis forPEP 498.

While it would theoretically be possible to updatestring.Template to supportthe creation of instances from native template literals, and to implement the structuraltyping.Template protocol, the PEP authors have not identified any practical benefitin doing so.

However, one significant benefit of the “only one string prefix” approach used in this PEPis that while it generalises the existing f-string interpolation syntax to support delayedrendering through t-strings, it doesn’t imply that that should be theonly compilersupported interpolation syntax that Python should ever offer.

Most notably, it leaves the door open to an alternate “t$-string” syntax that would allowTemplateLiteral instances to be created using aPEP 292 based interpolation syntaxrather than aPEP 3101 based syntax:

template = t$”Substitute $words and ${other_values} at runtime”

The only runtime distinction between templates created that way and templates created fromregular t-strings would be in the contents of theirraw_template attributes.

Deferring escaped rendering support for non-POSIX shells

shlex.quote() works by classifying the regex character set[\w@%+=:,./-] to besafe, deeming all other characters to be unsafe, and hence requiring quoting of the stringcontaining them. The quoting mechanism used is then specific to the way that string quotingworks in POSIX shells, so it cannot be trusted when running a shell that doesn’t followPOSIX shell string quoting rules.

For example, runningsubprocess.run(f'echo{shlex.quote(sys.argv[1])}',shell=True) issafe when using a shell that follows POSIX quoting rules:

$ cat > run_quoted.pyimport sys, shlex, subprocesssubprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)$ python3 run_quoted.py pwdpwd$ python3 run_quoted.py '; pwd'; pwd$ python3 run_quoted.py "'pwd'"'pwd'

but remains unsafe when running a shell from Python invokescmd.exe (or Powershell):

S:\> echo import sys, shlex, subprocess > run_quoted.pyS:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.pyS:\> type run_quoted.pyimport sys, shlex, subprocesssubprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)S:\> python3 run_quoted.py "echo OK"'echo OK'S:\> python3 run_quoted.py "'& echo Oh no!"''"'"'Oh no!'

Resolving this standard library limitation is beyond the scope of this PEP.

Acknowledgements

Eric V. Smith for creatingPEP 498 and demonstrating the feasibility ofarbitrary expression substitution in string interpolation
The authors ofPEP 750 for the substantial design improvements that tagged stringsinspired for this PEP, their general advocacy for the value of language level delayedtemplate rendering support, and their efforts to ensure that any native interpolationtemplate support lays a strong foundation for future efforts in providing robust syntaxhighlighting and static type checking support for domain specific languages
Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions toexploring the feasibility of using this model of delayed rendering in i18nuse cases (even though the ultimate conclusion was that it was a poor fit,at least for current approaches to i18n in Python)

References

%-formatting
str.format
string.Template documentation
PEP 215: String Interpolation
PEP 292: Simpler String Substitutions
PEP 3101: Advanced String Formatting
PEP 498: Literal string formatting
PEP 675: Arbitrary Literal String Type
PEP 701: Syntactic formalization of f-strings
FormattableString and C# native string interpolation
IFormattable interface in C# (see remarks for globalization notes)
TemplateLiterals in Javascript
Running external commands in Julia

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.

Source:https://github.com/python/peps/blob/main/peps/pep-0501.rst

Last modified:2024-10-19 14:00:43 GMT

Movatterモバイル変換

PEP 501 – General purpose template literal strings

New field conversion API in theoperator module

New field conversion API in the `operator` module