Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 675 – Arbitrary Literal String Type

PEP 675 – Arbitrary Literal String Type

Author:
Pradeep Kumar Srinivasan <gohanpra at gmail.com>, Graham Bleaney <gbleaney at gmail.com>
Sponsor:
Jelle Zijlstra <jelle.zijlstra at gmail.com>
Discussions-To:
Typing-SIG thread
Status:
Final
Type:
Standards Track
Topic:
Typing
Created:
30-Nov-2021
Python-Version:
3.11
Post-History:
07-Feb-2022
Resolution:
Python-Dev message

Table of Contents

Important

This PEP is a historical document: seeLiteralString andtyping.LiteralString for up-to-date specs and documentation. Canonical typing specs are maintained at thetyping specs site; runtime typing behaviour is described in the CPython documentation.

×

See thetyping specification update process for how to propose changes to the typing spec.

Abstract

There is currently no way to specify, using type annotations, that afunction parameter can be of any literal string type. We have tospecify a precise literal string type, such asLiteral["foo"]. This PEP introduces a supertype of literal stringtypes:LiteralString. This allows a function to accept arbitraryliteral string types, such asLiteral["foo"] orLiteral["bar"].

Motivation

Powerful APIs that execute SQL or shell commands often recommend thatthey be invoked with literal strings, rather than arbitrary usercontrolled strings. There is no way to express this recommendation inthe type system, however, meaning security vulnerabilities sometimesoccur when developers fail to follow it. For example, a naive way tolook up a user record from a database is to accept a user id andinsert it into a predefined SQL query:

defquery_user(conn:Connection,user_id:str)->User:query=f"SELECT * FROM data WHERE user_id ={user_id}"conn.execute(query)...# Transform data to a User object and return itquery_user(conn,"user123")# OK.

However, the user-controlled datauser_id is being mixed with theSQL command string, which means a malicious user could run arbitrarySQL commands:

# Delete the table.query_user(conn,"user123; DROP TABLE data;")# Fetch all users (since 1 = 1 is always true).query_user(conn,"user123 OR 1 = 1")

To prevent such SQL injection attacks, SQL APIs offer parameterizedqueries, which separate the executed query from user-controlled dataand make it impossible to run arbitrary queries. For example, withsqlite3, ouroriginal function would be written safely as a query with parameters:

defquery_user(conn:Connection,user_id:str)->User:query="SELECT * FROM data WHERE user_id = ?"conn.execute(query,(user_id,))...

The problem is that there is no way to enforce thisdiscipline. sqlite3’s owndocumentation can only admonishthe reader to not dynamically build thesql argument from externalinput; the API’s authors cannot express that through the typesystem. Users can (and often do) still use a convenient f-string asbefore and leave their code vulnerable to SQL injection.

Existing tools, such as the popular security linterBandit,attempt to detect unsafe external data used in SQL APIs, by inspectingthe AST or by other semantic pattern-matching. These tools, however,preclude common idioms like storing a large multi-line query in avariable before executing it, adding literal string modifiers to thequery based on some conditions, or transforming the query string usinga function. (We survey existing tools in theRejected Alternativessection.) For example, many tools will detect a false positive issuein this benign snippet:

defquery_data(conn:Connection,user_id:str,limit:bool)->None:query="""        SELECT            user.name,            user.age        FROM data        WHERE user_id = ?    """iflimit:query+=" LIMIT 1"conn.execute(query,(user_id,))

We want to forbid harmful execution of user-controlled data whilestill allowing benign idioms like the above and not requiring extrauser work.

To meet this goal, we introduce theLiteralString type, which onlyaccepts string values that are known to be made of literals. This is ageneralization of theLiteral["foo"] type fromPEP 586.A string of typeLiteralString cannot contain user-controlled data. Thus, any APIthat only acceptsLiteralString will be immune to injectionvulnerabilities (withpragmatic limitations).

Since we want thesqlite3execute method to disallow stringsbuilt with user input, we would make itstypeshed stubaccept asql query that is of typeLiteralString:

fromtypingimportLiteralStringdefexecute(self,sql:LiteralString,parameters:Iterable[str]=...)->Cursor:...

This successfully forbids our unsafe SQL example. The variablequery below is inferred to have typestr, since it is createdfrom a format string usinguser_id, and cannot be passed toexecute:

defquery_user(conn:Connection,user_id:str)->User:query=f"SELECT * FROM data WHERE user_id ={user_id}"conn.execute(query)# Error: Expected LiteralString, got str....

The method remains flexible enough to allow our more complicatedexample:

defquery_data(conn:Connection,user_id:str,limit:bool)->None:# This is a literal string.query="""        SELECT            user.name,            user.age        FROM data        WHERE user_id = ?    """iflimit:# Still has type LiteralString because we added a literal string.query+=" LIMIT 1"conn.execute(query,(user_id,))# OK

Notice that the user did not have to change their SQL code at all. Thetype checker was able to infer the literal string type and complainonly in case of violations.

LiteralString is also useful in other cases where we want strictcommand-data separation, such as when building shell commands or whenrendering a string into an HTML response without escaping (seeAppendix A: Other Uses). Overall, this combination of strictnessand flexibility makes it easy to enforce safer API usage in sensitivecode without burdening users.

Usage statistics

In a sample of open-source projects usingsqlite3, we found thatconn.execute was called~67% of the timewith a safe string literal and~33% of the timewith a potentially unsafe, local string variable. Using this PEP’sliteral string type along with a type checker would prevent the unsafeportion of that 33% of cases (ie. the ones where user controlled datais incorporated into the query), while seamlessly allowing the safeones to remain.

Rationale

Firstly, why usetypes to prevent security vulnerabilities?

Warning users in documentation is insufficient - most users eithernever see these warnings or ignore them. Using an existing dynamic orstatic analysis approach is too restrictive - these prevent naturalidioms, as we saw in theMotivation section (and will discuss moreextensively in theRejected Alternatives section). The typing-basedapproach in this PEP strikes a user-friendly balance betweenstrictness and flexibility.

Runtime approaches do not work because, at runtime, the query stringis a plainstr. While we could prevent some exploits usingheuristics, such as regex-filtering for obviously malicious payloads,there will always be a way to work around them (perfectlydistinguishing good and bad queries reduces to the halting problem).

Static approaches, such as checking the AST to see if the query stringis a literal string expression, cannot tell when a string is assignedto an intermediate variable or when it is transformed by a benignfunction. This makes them overly restrictive.

The type checker, surprisingly, does better than both because it hasaccess to information not available in the runtime or static analysisapproaches. Specifically, the type checker can tell us whether anexpression has a literal string type, sayLiteral["foo"]. The typechecker already propagates types across variable assignments orfunction calls.

In the current type system itself, if the SQL or shell commandexecution function only accepted three possible input strings, our jobwould be done. We would just say:

defexecute(query:Literal["foo","bar","baz"])->None:...

But, of course,execute can acceptany possible query. How do weensure that the query does not contain an arbitrary, user-controlledstring?

We want to specify that the value must be of some typeLiteral[<...>] where<...> is some string. This is whatLiteralString represents.LiteralString is the “supertype” ofall literal string types. In effect, this PEP just introduces a typein the type hierarchy betweenLiteral["foo"] andstr. Anyparticular literal string, such asLiteral["foo"] orLiteral["bar"], is compatible withLiteralString, but not theother way around. The “supertype” ofLiteralString itself isstr. So,LiteralString is compatible withstr, but not theother way around.

Note that aUnion of literal types is naturally compatible withLiteralString because each element of theUnion is individuallycompatible withLiteralString. So,Literal["foo","bar"] iscompatible withLiteralString.

However, recall that we don’t just want to represent exact literalqueries. We also want to support composition of two literal strings,such asquery+"LIMIT1". This too is possible with the aboveconcept. Ifx andy are two values of typeLiteralString,thenx+y will also be of type compatible withLiteralString. We can reason about this by looking at specificinstances such asLiteral["foo"] andLiteral["bar"]; the valueof the added stringx+y can only be"foobar", which has typeLiteral["foobar"] and is thus compatible withLiteralString. The same reasoning applies whenx andy areunions of literal types; the result of pairwise adding any two literaltypes fromx andy respectively is a literal type, which meansthat the overall result is aUnion of literal types and is thuscompatible withLiteralString.

In this way, we are able to leverage Python’s concept of aLiteralstring type to specify that our API can only accept strings that areknown to be constructed from literals. More specific details follow inthe remaining sections.

Specification

Runtime Behavior

We propose addingLiteralString totyping.py, with animplementation similar totyping.NoReturn.

Note thatLiteralString is a special form used solely for typechecking. There is no expression for whichtype(<expr>) willproduceLiteralString at runtime. So, we do not specify in theimplementation that it is a subclass ofstr.

Valid Locations forLiteralString

LiteralString can be used where any other type can be used:

variable_annotation:LiteralStringdefmy_function(literal_string:LiteralString)->LiteralString:...classFoo:my_attribute:LiteralStringtype_argument:List[LiteralString]T=TypeVar("T",bound=LiteralString)

It cannot be nested within unions ofLiteral types:

bad_union:Literal["hello",LiteralString]# Not OKbad_nesting:Literal[LiteralString]# Not OK

Type Inference

InferringLiteralString

Any literal string type is compatible withLiteralString. Forexample,x:LiteralString="foo" is valid because"foo" isinferred to be of typeLiteral["foo"].

As per theRationale, we also inferLiteralString in thefollowing cases:

  • Addition:x+y is of typeLiteralString if bothx andy are compatible withLiteralString.
  • Joining:sep.join(xs) is of typeLiteralString ifsep’stype is compatible withLiteralString andxs’s type iscompatible withIterable[LiteralString].
  • In-place addition: Ifs has typeLiteralString andx hastype compatible withLiteralString, thens+=x preservess’s type asLiteralString.
  • String formatting: An f-string has typeLiteralString if and onlyif its constituent expressions are literal strings.s.format(...)has typeLiteralString if and only ifs and the arguments havetypes compatible withLiteralString.
  • Literal-preserving methods: InAppendix C,we have provided an exhaustive list ofstr methods that preserve theLiteralString type.

In all other cases, if one or more of the composed values has anon-literal typestr, the composition of types will have typestr. For example, ifs has typestr, then"hello"+shas typestr. This matches the pre-existing behavior of typecheckers.

LiteralString is compatible with the typestr. It inherits allmethods fromstr. So, if we have a variables of typeLiteralString, it is safe to writes.startswith("hello").

Some type checkers refine the type of a string when doing an equalitycheck:

deffoo(s:str)->None:ifs=="bar":reveal_type(s)# => Literal["bar"]

Such a refined type in the if-block is also compatible withLiteralString because its type isLiteral["bar"].

Examples

See the examples below to help clarify the above rules:

literal_string:LiteralStrings:str=literal_string# OKliteral_string:LiteralString=s# Error: Expected LiteralString, got str.literal_string:LiteralString="hello"# OK

Addition of literal strings:

defexpect_literal_string(s:LiteralString)->None:...expect_literal_string("foo"+"bar")# OKexpect_literal_string(literal_string+"bar")# OKliteral_string2:LiteralStringexpect_literal_string(literal_string+literal_string2)# OKplain_string:strexpect_literal_string(literal_string+plain_string)# Not OK.

Join using literal strings:

expect_literal_string(",".join(["foo","bar"]))# OKexpect_literal_string(literal_string.join(["foo","bar"]))# OKexpect_literal_string(literal_string.join([literal_string,literal_string2]))# OKxs:List[LiteralString]expect_literal_string(literal_string.join(xs))# OKexpect_literal_string(plain_string.join([literal_string,literal_string2]))# Not OK because the separator has type 'str'.

In-place addition using literal strings:

literal_string+="foo"# OKliteral_string+=literal_string2# OKliteral_string+=plain_string# Not OK

Format strings using literal strings:

literal_name:LiteralStringexpect_literal_string(f"hello{literal_name}")# OK because it is composed from literal strings.expect_literal_string("hello{}".format(literal_name))# OKexpect_literal_string(f"hello")# OKusername:strexpect_literal_string(f"hello{username}")# NOT OK. The format-string is constructed from 'username',# which has type 'str'.expect_literal_string("hello{}".format(username))# Not OK

Other literal types, such as literal integers, are not compatible withLiteralString:

some_int:intexpect_literal_string(some_int)# Error: Expected LiteralString, got int.literal_one:Literal[1]=1expect_literal_string(literal_one)# Error: Expected LiteralString, got Literal[1].

We can call functions on literal strings:

defadd_limit(query:LiteralString)->LiteralString:returnquery+" LIMIT = 1"defmy_query(query:LiteralString,user_id:str)->None:sql_connection().execute(add_limit(query),(user_id,))# OK

Conditional statements and expressions work as expected:

defreturn_literal_string()->LiteralString:return"foo"ifcondition1()else"bar"# OKdefreturn_literal_str2(literal_string:LiteralString)->LiteralString:return"foo"ifcondition1()elseliteral_string# OKdefreturn_literal_str3()->LiteralString:ifcondition1():result:Literal["foo"]="foo"else:result:LiteralString="bar"returnresult# OK

Interaction with TypeVars and Generics

TypeVars can be bound toLiteralString:

fromtypingimportLiteral,LiteralString,TypeVarTLiteral=TypeVar("TLiteral",bound=LiteralString)defliteral_identity(s:TLiteral)->TLiteral:returnshello:Literal["hello"]="hello"y=literal_identity(hello)reveal_type(y)# => Literal["hello"]s:LiteralStringy2=literal_identity(s)reveal_type(y2)# => LiteralStrings_error:strliteral_identity(s_error)# Error: Expected TLiteral (bound to LiteralString), got str.

LiteralString can be used as a type argument for generic classes:

classContainer(Generic[T]):def__init__(self,value:T)->None:self.value=valueliteral_string:LiteralString="hello"x:Container[LiteralString]=Container(literal_string)# OKs:strx_error:Container[LiteralString]=Container(s)# Not OK

Standard containers likeList work as expected:

xs:List[LiteralString]=["foo","bar","baz"]

Interactions with Overloads

Literal strings and overloads do not need to interact in a specialway: the existing rules work fine.LiteralString can be used as afallback overload where a specificLiteral["foo"] type does notmatch:

@overloaddeffoo(x:Literal["foo"])->int:...@overloaddeffoo(x:LiteralString)->bool:...@overloaddeffoo(x:str)->str:...x1:int=foo("foo")# First overload.x2:bool=foo("bar")# Second overload.s:strx3:str=foo(s)# Third overload.

Backwards Compatibility

We propose addingtyping_extensions.LiteralString for use inearlier Python versions.

AsPEP 586 mentions,type checkers “should feel free to experiment with more sophisticatedinference techniques”. So, if the type checker infers a literal stringtype for an unannotated variable that is initialized with a literalstring, the following example should be OK:

x="hello"expect_literal_string(x)# OK, because x is inferred to have type 'Literal["hello"]'.

This enables precise type checking of idiomatic SQL query code withoutannotating the code at all (as seen in theMotivation sectionexample).

However, likePEP 586, this PEP does not mandate the above inferencestrategy. In case the type checker doesn’t inferx to have typeLiteral["hello"], users can aid the type checker by explicitlyannotating it asx:LiteralString:

x:LiteralString="hello"expect_literal_string(x)

Rejected Alternatives

Why not use tool X?

Tools to catch issues such as SQL injection seem to come in threeflavors: AST based, function level analysis, and taint flow analysis.

AST-based tools:Bandithas a plugin to warn when SQL queries are not literalstrings. The problem is that many perfectly safe SQLqueries are dynamically built out of string literals, as shown in theMotivation section. At theAST level, the resultant SQL query is not going to appear as a stringliteral anymore and is thus indistinguishable from a potentiallymalicious string. To use these tools would require significantlyrestricting developers’ ability to build SQL queries.LiteralStringcan provide similar safety guarantees with fewer restrictions.

Semgrep and pyanalyze: Semgrep supports a more sophisticatedfunction level analysis, includingconstant propagationwithin a function. This allows us to prevent injection attacks whilepermitting some forms of safe dynamic SQL queries within afunction.pyanalyzehas a similar extension. But neither handles function calls thatconstruct and return safe SQL queries. For example, in the code samplebelow,build_insert_query is a helper function to create a querythat inserts multiple values into the corresponding columns. Semgrepand pyanalyze forbid this natural usage whereasLiteralStringhandles it with no burden on the programmer:

defbuild_insert_query(table:LiteralStringinsert_columns:Iterable[LiteralString],)->LiteralString:sql="INSERT INTO "+tablecolumn_clause=", ".join(insert_columns)value_clause=", ".join(["?"]*len(insert_columns))sql+=f" ({column_clause}) VALUES ({value_clause})"returnsqldefinsert_data(conn:Connection,kvs_to_insert:Dict[LiteralString,str])->None:query=build_insert_query("data",kvs_to_insert.keys())conn.execute(query,kvs_to_insert.values())# Example usagedata_to_insert={"column_1":value_1,# Note: values are not literals"column_2":value_2,"column_3":value_3,}insert_data(conn,data_to_insert)

Taint flow analysis: Tools such asPysa orCodeQL are capable of tracking data flowingfrom a user controlled input into a SQL query. These tools arepowerful but involve considerable overhead in setting up the tool inCI, defining “taint” sinks and sources, and teaching developers how touse them. They also usually take longer to run than a type checker(minutes instead of seconds), which means feedback is notimmediate. Finally, they move the burden of preventing vulnerabilitieson to library users instead of allowing the libraries themselves tospecify precisely how their APIs must be called (as is possible withLiteralString).

One final reason to prefer using a new type over a dedicated tool isthat type checkers are more widely used than dedicated securitytooling; for example, MyPy was downloadedover 7 million times in Jan 2022 vsless than2 million times forBandit. Having security protections built right into type checkerswill mean that more developers benefit from them.

Why not use aNewType forstr?

Any API for whichLiteralString would be suitable could instead beupdated to accept a different type created within the Python typesystem, such asNewType("SafeSQL",str):

SafeSQL=NewType("SafeSQL",str)defexecute(self,sql:SafeSQL,parameters:Iterable[str]=...)->Cursor:...execute(SafeSQL("SELECT * FROM data WHERE user_id = ?"),user_id)# OKuser_query:strexecute(user_query)# Error: Expected SafeSQL, got str.

Having to create a new type to call an API might give some developerspause and encourage more caution, but it doesn’t guarantee thatdevelopers won’t just turn a user controlled string into the new type,and pass it into the modified API anyway:

query=f"SELECT * FROM data WHERE user_id = f{user_id}"execute(SafeSQL(query))# No error!

We are back to square one with the problem of preventing arbitraryinputs toSafeSQL. This is not a theoretical concerneither. Django uses the above approach withSafeString andmark_safe. Issuessuch asCVE-2020-13596show how this technique canfail.

Also note that this requires invasive changes to the source code(wrapping the query withSafeSQL) whereasLiteralStringrequires no such changes. Users can remain oblivious to it as long asthey pass in literal strings to sensitive APIs.

Why not try to emulate Trusted Types?

Trusted Types is a W3Cspecification for preventing DOM-based Cross Site Scripting (XSS). XSSoccurs when dangerous browser APIs accept raw user-controlledstrings. The specification modifies these APIs to accept only the“Trusted Types” returned by designated sanitizing functions. Thesesanitizing functions must take in a potentially malicious string andvalidate it or render it benign somehow, for example by verifying thatit is a valid URL or HTML-encoding it.

It can be tempting to assume porting the concept of Trusted Types toPython could solve the problem. The fundamental difference, however,is that the output of a Trusted Types sanitizer is usually intendedto not be executable code. Thus it’s easy to HTML encode the input,strip out dangerous tags, or otherwise render it inert. With a SQLquery or shell command, the end resultstill needs to be executablecode. There is no way to write a sanitizer that can reliably figureout which parts of an input string are benign and which ones arepotentially malicious.

Runtime CheckableLiteralString

TheLiteralString concept could be extended beyond static typechecking to be a runtime checkable property ofstr objects. Thiswould provide some benefits, such as allowing frameworks to raiseerrors on dynamic strings. Such runtime errors would be a more robustdefense mechanism than type errors, which can potentially besuppressed, ignored, or never even seen if the author does not use atype checker.

This extension to theLiteralString concept would dramaticallyincrease the scope of the proposal by requiring changes to one of themost fundamental types in Python. While runtime taint checking onstrings, similar to Perl’staint,has beenconsidered andattempted in the past, andothers may consider it in the future, such extensions are out of scopefor this PEP.

Rejected Names

We considered a variety of names for the literal string type andsolicited ideas ontyping-sig.Some notable alternatives were:

  • Literal[str]: This is a natural extension of theLiteral["foo"] type name, but typing-sigobjectedthat users could mistake this for the literal type of thestrclass.
  • LiteralStr: This is shorter thanLiteralString but looksweird to the PEP authors.
  • LiteralDerivedString: This (along withMadeFromLiteralString) best captures the technical meaning ofthe type. It represents not just the type of literal expressions,such as"foo", but also that of expressions composed fromliterals, such as"foo"+"bar". However, both names seem wordy.
  • StringLiteral: Users might confuse this with the existingconcept of“string literals”where the string exists as a syntactic token in the source code,whereas our concept is more general.
  • SafeString: While this comes close to our intended meaning, itmay mislead users into thinking that the string has been sanitized insome way, perhaps by escaping HTML tags or shell-related specialcharacters.
  • ConstantStr: This does not capture the idea of composing literalstrings.
  • StaticStr: This suggests that the string is staticallycomputable, i.e., computable without running the program, which isnot true. The literal string may vary based on runtime flags, asseen in theMotivation examples.
  • LiteralOnly[str]: This has the advantage of being extensible toother literal types, such asbytes orint. However, we didnot find the extensibility worth the loss of readability.

Overall, there was no clear winner on typing-sig over a long period,so we decided to tip the scales in favor ofLiteralString.

LiteralBytes

We could generalize literal byte types, such asLiteral[b"foo"],toLiteralBytes. However, literal byte types are used much lessfrequently than literal string types and we did not find much userdemand forLiteralBytes, so we decided not to include it in thisPEP. Others may, however, consider it in future PEPs.

Reference Implementation

This is implemented in Pyre v0.9.8 and is actively being used.

The implementation simply extends the type checker withLiteralString as a supertype of literal string types.

To support composition via addition, join, etc., it was sufficient tooverload the stubs forstr in Pyre’s copy of typeshed.

Appendix A: Other Uses

To simplify the discussion and require minimal security knowledge, wefocused on SQL injections throughout the PEP.LiteralString,however, can also be used to prevent many other kinds ofinjectionvulnerabilities.

Command Injection

APIs such assubprocess.run accept a string which can be run as ashell command:

subprocess.run(f"echo 'Hello{name}'",shell=True)

If user-controlled data is included in the command string, the code isvulnerable to “command injection”; i.e., an attacker can run maliciouscommands. For example, a value of'&&rm-rf/# would result inthe following destructive command being run:

echo'Hello '&&rm-rf/#'

This vulnerability could be prevented by updatingrun to onlyacceptLiteralString when used inshell=True mode. Here is onesimplified stub:

defrun(command:LiteralString,*args:str,shell:bool=...):...

Cross Site Scripting (XSS)

Most popular Python web frameworks, such as Django, use a templatingengine to produce HTML from user data. These templating languagesauto-escape user data before inserting it into the HTML template andthus prevent cross site scripting (XSS) vulnerabilities.

But a common way tobypass auto-escapingand render HTML as-is is to use functions likemark_safe inDjangoordo_mark_safe inJinja2,which cause XSS vulnerabilities:

dangerous_string=django.utils.safestring.mark_safe(f"<script>{user_input}</script>")return(dangerous_string)

This vulnerability could be prevented by updatingmark_safe toonly acceptLiteralString:

defmark_safe(s:LiteralString)->str:...

Server Side Template Injection (SSTI)

Templating frameworks, such as Jinja, allow Python expressions whichwill be evaluated and substituted into the rendered result:

template_str="There are {{ len(values) }} values: {{ values }}"template=jinja2.Template(template_str)template.render(values=[1,2])# Result: "There are 2 values: [1, 2]"

If an attacker controls all or part of the template string, they caninsert expressions which execute arbitrary code andcompromisethe application:

malicious_str="{{''.__class__.__base__.__subclasses__()[408]('rm - rf /',shell=True)}}"template=jinja2.Template(malicious_str)template.render()# Result: The shell command 'rm - rf /' is run

Template injection exploits like this could be prevented by updatingtheTemplate API to only acceptLiteralString:

classTemplate:def__init__(self,source:LiteralString):...

Logging Format String Injection

Logging frameworks often allow their input strings to containformatting directives. At its worst, allowing users to control thelogged string has led toCVE-2021-44228 (colloquiallyknown aslog4shell), which has been described as the“mostcritical vulnerability of the last decade”.While no Python frameworks are currently known to be vulnerable to asimilar attack, the built-in logging framework does provide formattingoptions which are vulnerable to Denial of Service attacks fromexternally controlled logging strings. The following exampleillustrates a simple denial of service scenario:

external_string="%(foo)999999999s"...# Tries to add > 1GB of whitespace to the logged string:logger.info(f'Received:{external_string}',some_dict)

This kind of attack could be prevented by requiring that the formatstring passed to the logger be aLiteralString and that allexternally controlled data be passed separately as arguments (asproposed inIssue 46200):

definfo(msg:LiteralString,*args:object)->None:...

Appendix B: Limitations

There are a number of waysLiteralString could still fail toprevent users from passing strings built from non-literal data to anAPI:

1. If the developer does not use a type checker or does not add typeannotations, then violations will go uncaught.

2.cast(LiteralString,non_literal_string) could be used to lie tothe type checker and allow a dynamic string value to masquerade as aLiteralString. The same goes for a variable that has typeAny.

3. Comments such as#type:ignore could be used to ignorewarnings about non-literal strings.

4. Trivial functions could be constructed to convert astr to aLiteralString:

defmake_literal(s:str)->LiteralString:letters:Dict[str,LiteralString]={"A":"A","B":"B",...}output:List[LiteralString]=[letters[c]forcins]return"".join(output)

We could mitigate the above using linting, code review, etc., butultimately a clever, malicious developer attempting to circumvent theprotections offered byLiteralString will always succeed. Theimportant thing to remember is thatLiteralString is not intendedto protect againstmalicious developers; it is meant to protectagainst benign developers accidentally using sensitive APIs in adangerous way (without getting in their way otherwise).

WithoutLiteralString, the best enforcement tool API authors haveis documentation, which is easily ignored and often not seen. WithLiteralString, API misuse requires conscious thought and artifactsin the code that reviewers and future developers can notice.

Appendix C:str methods that preserveLiteralString

Thestr class has several methods that would benefit fromLiteralString. For example, users might expect"hello".capitalize() to have the typeLiteralString similar tothe other examples we have seen in theInferring LiteralString section. Inferring the typeLiteralString is correct because the string is not an arbitraryuser-supplied string - we know that it has the typeLiteral["HELLO"], which is compatible withLiteralString. Inother words, thecapitalize method preserves theLiteralStringtype. There are several otherstr methods that preserveLiteralString.

We propose updating the stub forstr in typeshed so that themethods are overloaded with theLiteralString-preservingversions. This means type checkers do not have to hardcodeLiteralString behavior for each method. It also lets us easilysupport new methods in the future by updating the typeshed stub.

For example, to preserve literal types for thecapitalize method,we would change the stub as below:

# beforedefcapitalize(self)->str:...# after@overloaddefcapitalize(self:LiteralString)->LiteralString:...@overloaddefcapitalize(self)->str:...

The downside of changing thestr stub is that the stub becomesmore complicated and can make error messages harder tounderstand. Type checkers may need to special-casestr to makeerror messages understandable for users.

Below is an exhaustive list ofstr methods which, when called witharguments of typeLiteralString, must be treated as returning aLiteralString. If this PEP is accepted, we will update thesemethod signatures in typeshed:

@overloaddefcapitalize(self:LiteralString)->LiteralString:...@overloaddefcapitalize(self)->str:...@overloaddefcasefold(self:LiteralString)->LiteralString:...@overloaddefcasefold(self)->str:...@overloaddefcenter(self:LiteralString,__width:SupportsIndex,__fillchar:LiteralString=...)->LiteralString:...@overloaddefcenter(self,__width:SupportsIndex,__fillchar:str=...)->str:...ifsys.version_info>=(3,8):@overloaddefexpandtabs(self:LiteralString,tabsize:SupportsIndex=...)->LiteralString:...@overloaddefexpandtabs(self,tabsize:SupportsIndex=...)->str:...else:@overloaddefexpandtabs(self:LiteralString,tabsize:int=...)->LiteralString:...@overloaddefexpandtabs(self,tabsize:int=...)->str:...@overloaddefformat(self:LiteralString,*args:LiteralString,**kwargs:LiteralString)->LiteralString:...@overloaddefformat(self,*args:str,**kwargs:str)->str:...@overloaddefjoin(self:LiteralString,__iterable:Iterable[LiteralString])->LiteralString:...@overloaddefjoin(self,__iterable:Iterable[str])->str:...@overloaddefljust(self:LiteralString,__width:SupportsIndex,__fillchar:LiteralString=...)->LiteralString:...@overloaddefljust(self,__width:SupportsIndex,__fillchar:str=...)->str:...@overloaddeflower(self:LiteralString)->LiteralString:...@overloaddeflower(self)->LiteralString:...@overloaddeflstrip(self:LiteralString,__chars:LiteralString|None=...)->LiteralString:...@overloaddeflstrip(self,__chars:str|None=...)->str:...@overloaddefpartition(self:LiteralString,__sep:LiteralString)->tuple[LiteralString,LiteralString,LiteralString]:...@overloaddefpartition(self,__sep:str)->tuple[str,str,str]:...@overloaddefreplace(self:LiteralString,__old:LiteralString,__new:LiteralString,__count:SupportsIndex=...)->LiteralString:...@overloaddefreplace(self,__old:str,__new:str,__count:SupportsIndex=...)->str:...ifsys.version_info>=(3,9):@overloaddefremoveprefix(self:LiteralString,__prefix:LiteralString)->LiteralString:...@overloaddefremoveprefix(self,__prefix:str)->str:...@overloaddefremovesuffix(self:LiteralString,__suffix:LiteralString)->LiteralString:...@overloaddefremovesuffix(self,__suffix:str)->str:...@overloaddefrjust(self:LiteralString,__width:SupportsIndex,__fillchar:LiteralString=...)->LiteralString:...@overloaddefrjust(self,__width:SupportsIndex,__fillchar:str=...)->str:...@overloaddefrpartition(self:LiteralString,__sep:LiteralString)->tuple[LiteralString,LiteralString,LiteralString]:...@overloaddefrpartition(self,__sep:str)->tuple[str,str,str]:...@overloaddefrsplit(self:LiteralString,sep:LiteralString|None=...,maxsplit:SupportsIndex=...)->list[LiteralString]:...@overloaddefrsplit(self,sep:str|None=...,maxsplit:SupportsIndex=...)->list[str]:...@overloaddefrstrip(self:LiteralString,__chars:LiteralString|None=...)->LiteralString:...@overloaddefrstrip(self,__chars:str|None=...)->str:...@overloaddefsplit(self:LiteralString,sep:LiteralString|None=...,maxsplit:SupportsIndex=...)->list[LiteralString]:...@overloaddefsplit(self,sep:str|None=...,maxsplit:SupportsIndex=...)->list[str]:...@overloaddefsplitlines(self:LiteralString,keepends:bool=...)->list[LiteralString]:...@overloaddefsplitlines(self,keepends:bool=...)->list[str]:...@overloaddefstrip(self:LiteralString,__chars:LiteralString|None=...)->LiteralString:...@overloaddefstrip(self,__chars:str|None=...)->str:...@overloaddefswapcase(self:LiteralString)->LiteralString:...@overloaddefswapcase(self)->str:...@overloaddeftitle(self:LiteralString)->LiteralString:...@overloaddeftitle(self)->str:...@overloaddefupper(self:LiteralString)->LiteralString:...@overloaddefupper(self)->str:...@overloaddefzfill(self:LiteralString,__width:SupportsIndex)->LiteralString:...@overloaddefzfill(self,__width:SupportsIndex)->str:...@overloaddef__add__(self:LiteralString,__s:LiteralString)->LiteralString:...@overloaddef__add__(self,__s:str)->str:...@overloaddef__iter__(self:LiteralString)->Iterator[str]:...@overloaddef__iter__(self)->Iterator[str]:...@overloaddef__mod__(self:LiteralString,__x:Union[LiteralString,Tuple[LiteralString,...]])->str:...@overloaddef__mod__(self,__x:Union[str,Tuple[str,...]])->str:...@overloaddef__mul__(self:LiteralString,__n:SupportsIndex)->LiteralString:...@overloaddef__mul__(self,__n:SupportsIndex)->str:...@overloaddef__repr__(self:LiteralString)->LiteralString:...@overloaddef__repr__(self)->str:...@overloaddef__rmul__(self:LiteralString,n:SupportsIndex)->LiteralString:...@overloaddef__rmul__(self,n:SupportsIndex)->str:...@overloaddef__str__(self:LiteralString)->LiteralString:...@overloaddef__str__(self)->str:...

Appendix D: Guidelines for usingLiteralString in Stubs

Libraries that do not contain type annotations within their source mayspecify type stubs in Typeshed. Libraries written in other languages,such as those for machine learning, may also provide Python typestubs. This means the type checker cannot verify that the typeannotations match the source code and must trust the type stub. Thus,authors of type stubs need to be careful when usingLiteralString,since a function may falsely appear to be safe when it is not.

We recommend the following guidelines for usingLiteralString in stubs:

  • If the stub is for a pure function, we recommend usingLiteralStringin the return type of the function or of its overloads only if allthe corresponding parameters have literal types (i.e.,LiteralString orLiteral["a","b"]).
    # OK@overloaddefmy_transform(x:LiteralString,y:Literal["a","b"])->LiteralString:...@overloaddefmy_transform(x:str,y:str)->str:...# Not OK@overloaddefmy_transform(x:LiteralString,y:str)->LiteralString:...@overloaddefmy_transform(x:str,y:str)->str:...
  • If the stub is for astaticmethod, we recommend the sameguideline as above.
  • If the stub is for any other kind of method, we recommend againstusingLiteralString in the return type of the method or any ofits overloads. This is because, even if all the explicit parametershave typeLiteralString, the object itself may be created usinguser data and thus the return type may be user-controlled.
  • If the stub is for a class attribute or global variable, we alsorecommend against usingLiteralString because the untyped codemay write arbitrary values to the attribute.

However, we leave the final call to the library author. They may useLiteralString if they feel confident that the string returned bythe method or function or the string stored in the attribute isguaranteed to have a literal type - i.e., the string is created byapplying only literal-preservingstr operations to a stringliteral.

Note that these guidelines do not apply to inline type annotationssince the type checker can verify that, say, a method returningLiteralString does in fact return an expression of that type.

Resources

Literal String Types in Scala

ScalausesSingleton as the supertype for singleton types, which includesliteral string types, such as"foo".Singleton is Scala’sgeneralized analogue of this PEP’sLiteralString.

Tamer Abdulradi showed how Scala’s literal string types can be usedfor “Preventing SQL injection at compile time”, Scala Days talkLiteral types: What are they good for?(slides 52 to 68).

Thanks

Thanks to the following people for their feedback on the PEP:

Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев,CAM Gerlach, Arie Bovenberg, David Foster, and Shengye Wan

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0675.rst

Last modified:2024-06-11 22:12:09 GMT


[8]ページ先頭

©2009-2026 Movatter.jp