Commit0e4cd89

hartwork

and

picnixz

authored

[3.12]gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (GH-139234) (#139527)

* [3.12]gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (GH-139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)(cherry picked from commit68a1778)Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

1 parentd849cf5 commit0e4cd89Copy full SHA for 0e4cd89

File tree

7 files changed

+585

-29

lines changed

Doc/library
- pyexpat.rst
Include
- pyexpat.h
Lib/test
- test_pyexpat.py
Misc/NEWS.d/next/Library
- 2025-09-22-14-40-11.gh-issue-90949.UM35nb.rst
Modules
- clinic
  - pyexpat.c.h
- expat
  - pyexpatns.h
- pyexpat.c

7 files changed

+585

-29

lines changed

`‎Doc/library/pyexpat.rst‎`

Lines changed: 57 additions & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -73,6 +73,13 @@ The :mod:`xml.parsers.expat` module contains two functions:
`73`	`73`	`encoding [1]_ is given it will override the implicit or explicit encoding of the`
`74`	`74`	`document.`
`75`	`75`
	`76`	`+ .. _xmlparser-non-root:`
	`77`	`+`
	`78`	+ Parsers created through:func:`!ParserCreate` are called "root" parsers,
	`79`	`+ in the sense that they do not have any parent parser attached. Non-root`
	`80`	+ parsers are created by:meth:`parser.ExternalEntityParserCreate
	`81`	+ <xmlparser.ExternalEntityParserCreate>`.
	`82`	`+`
`76`	`83`	`Expat can optionally do XML namespace processing for you, enabled by providing a`
`77`	`84`	`value for namespace_separator. The value must be a one-character string; a`
`78`	`85`	:exc:`ValueError` will be raised if the string has an illegal length (``None``
`@@ -232,6 +239,55 @@ XMLParser Objects`
`232`	`239`	`..versionadded::3.12.3`
`233`	`240`
`234`	`241`
	`242`	+:class:`!xmlparser` objects have the following methods to mitigate some
	`243`	`+common XML vulnerabilities.`
	`244`	`+`
	`245`	`+..method::xmlparser.SetAllocTrackerActivationThreshold(threshold, /)`
	`246`	`+`
	`247`	`+ Sets the number of allocated bytes of dynamic memory needed to activate`
	`248`	`+ protection against disproportionate use of RAM.`
	`249`	`+`
	`250`	`+ By default, parser objects have an allocation activation threshold of 64 MiB,`
	`251`	`+ or equivalently 67,108,864 bytes.`
	`252`	`+`
	`253`	+ An:exc:`ExpatError` is raised if this method is called on a
	`254`	`+ \|xml-non-root-parser\| parser.`
	`255`	+ The corresponding:attr:`~ExpatError.lineno` and:attr:`~ExpatError.offset`
	`256`	`+ should not be used as they may have no special meaning.`
	`257`	`+`
	`258`	`+ ..versionadded::next`
	`259`	`+`
	`260`	`+..method::xmlparser.SetAllocTrackerMaximumAmplification(max_factor, /)`
	`261`	`+`
	`262`	`+ Sets the maximum amplification factor between direct input and bytes`
	`263`	`+ of dynamic memory allocated.`
	`264`	`+`
	`265`	+ The amplification factor is calculated as ``allocated / direct``
	`266`	+ while parsing, where ``direct`` is the number of bytes read from
	`267`	+ the primary document in parsing and ``allocated`` is the number
	`268`	`+ of bytes of dynamic memory allocated in the parser hierarchy.`
	`269`	`+`
	`270`	+ The max_factor value must be a non-NaN:class:`float` value greater than
	`271`	`+ or equal to 1.0. Amplification factors greater than 100.0 can be observed`
	`272`	`+ near the start of parsing even with benign files in practice. In particular,`
	`273`	`+ the activation threshold should be carefully chosen to avoid false positives.`
	`274`	`+`
	`275`	`+ By default, parser objects have a maximum amplification factor of 100.0.`
	`276`	`+`
	`277`	+ An:exc:`ExpatError` is raised if this method is called on a
	`278`	`+ \|xml-non-root-parser\| parser or if max_factor is outside the valid range.`
	`279`	+ The corresponding:attr:`~ExpatError.lineno` and:attr:`~ExpatError.offset`
	`280`	`+ should not be used as they may have no special meaning.`
	`281`	`+`
	`282`	`+ ..note::`
	`283`	`+`
	`284`	`+ The maximum amplification factor is only considered if the threshold`
	`285`	+ that can be adjusted by:meth:`.SetAllocTrackerActivationThreshold`
	`286`	`+ is exceeded.`
	`287`	`+`
	`288`	`+ ..versionadded::next`
	`289`	`+`
	`290`	`+`
`235`	`291`	:class:`xmlparser` objects have the following attributes:
`236`	`292`
`237`	`293`
@@ -948,3 +1004,4 @@ The ``errors`` module has the following attributes:
`948`	`1004`	`not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl`
`949`	`1005`	`and https://www.iana.org/assignments/character-sets/character-sets.xhtml.`
`950`	`1006`
	`1007`	+.. \|xml-non-root-parser\|replace:::ref:`non-root<xmlparser-non-root>`

`‎Include/pyexpat.h‎`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -52,6 +52,11 @@ struct PyExpat_CAPI`
`52`	`52`	`int (*SetHashSalt)(XML_Parserparser,unsigned longhash_salt);`
`53`	`53`	`/* might be NULL for expat < 2.6.0 */`
`54`	`54`	`XML_Bool (*SetReparseDeferralEnabled)(XML_Parserparser,XML_Boolenabled);`
	`55`	`+/* might be NULL for expat < 2.7.2 */`
	`56`	`+XML_Bool (*SetAllocTrackerActivationThreshold)(`
	`57`	`+XML_Parserparser,unsigned long longactivationThresholdBytes);`
	`58`	`+XML_Bool (*SetAllocTrackerMaximumAmplification)(`
	`59`	`+XML_Parserparser,floatmaxAmplificationFactor);`
`55`	`60`	`/* always add new stuff to the end! */`
`56`	`61`	`};`
`57`	`62`

`‎Lib/test/test_pyexpat.py‎`

Lines changed: 199 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1,15 +1,19 @@`
`1`	`1`	`# XXX TypeErrors on calling handlers, or on bad return values from a`
`2`	`2`	`# handler, are obscure and unhelpful.`
`3`	`3`
	`4`	`+importabc`
	`5`	`+importfunctools`
`4`	`6`	`importos`
`5`	`7`	`importplatform`
	`8`	`+importre`
`6`	`9`	`importsys`
`7`	`10`	`importsysconfig`
	`11`	`+importtextwrap`
`8`	`12`	`importunittest`
`9`	`13`	`importtraceback`
`10`	`14`	`fromioimportBytesIO`
`11`	`15`	`fromtestimportsupport`
`12`		`-fromtest.supportimportos_helper`
	`16`	`+fromtest.supportimportimport_helper,os_helper`
`13`	`17`
`14`	`18`	`fromxml.parsersimportexpat`
`15`	`19`	`fromxml.parsers.expatimporterrors`
`@@ -848,5 +852,199 @@ def start_element(name, _):`
`848`	`852`	`self.assertEqual(started, ['doc'])`
`849`	`853`
`850`	`854`
	`855`	`+classAttackProtectionTestBase(abc.ABC):`
	`856`	`+"""`
	`857`	`+ Base class for testing protections against XML payloads with`
	`858`	`+ disproportionate amplification.`
	`859`	`+`
	`860`	`+ The protections being tested should detect and prevent attacks`
	`861`	`+ that leverage disproportionate amplification from small inputs.`
	`862`	`+ """`
	`863`	`+`
	`864`	`+@staticmethod`
	`865`	`+defexponential_expansion_payload(*,nrows,ncols,text='.'):`
	`866`	`+"""Create a billion laughs attack payload.`
	`867`	`+`
	`868`	`+ Be careful: the number of total items is pow(n, k), thereby`
	`869`	`+ requiring at least pow(ncols, nrows) * sizeof(text) memory!`
	`870`	`+ """`
	`871`	`+template=textwrap.dedent(f"""\`
	`872`	`+ <?xml version="1.0"?>`
	`873`	`+ <!DOCTYPE doc [`
	`874`	`+ <!ENTITY row0 "{text}">`
	`875`	`+ <!ELEMENT doc (#PCDATA)>`
	`876`	`+ {{body}}`
	`877`	`+ ]>`
	`878`	`+ <doc>&row{nrows};</doc>`
	`879`	`+ """).rstrip()`
	`880`	`+`
	`881`	`+body='\n'.join(`
	`882`	`+f'<!ENTITY row{i+1} "{f"&row{i};"*ncols}">'`
	`883`	`+foriinrange(nrows)`
	`884`	`+ )`
	`885`	`+body=textwrap.indent(body,' '*4)`
	`886`	`+returntemplate.format(body=body)`
	`887`	`+`
	`888`	`+deftest_payload_generation(self):`
	`889`	`+# self-test for exponential_expansion_payload()`
	`890`	`+payload=self.exponential_expansion_payload(nrows=2,ncols=3)`
	`891`	`+self.assertEqual(payload,textwrap.dedent("""\`
	`892`	`+ <?xml version="1.0"?>`
	`893`	`+ <!DOCTYPE doc [`
	`894`	`+ <!ENTITY row0 ".">`
	`895`	`+ <!ELEMENT doc (#PCDATA)>`
	`896`	`+ <!ENTITY row1 "&row0;&row0;&row0;">`
	`897`	`+ <!ENTITY row2 "&row1;&row1;&row1;">`
	`898`	`+ ]>`
	`899`	`+ <doc>&row2;</doc>`
	`900`	`+ """).rstrip())`
	`901`	`+`
	`902`	`+defassert_root_parser_failure(self,func,/,args,*kwargs):`
	`903`	`+"""Check that func(args, *kwargs) is invalid for a sub-parser."""`
	`904`	`+msg="parser must be a root parser"`
	`905`	`+self.assertRaisesRegex(expat.ExpatError,msg,func,args,*kwargs)`
	`906`	`+`
	`907`	`+@abc.abstractmethod`
	`908`	`+defassert_rejected(self,func,/,args,*kwargs):`
	`909`	`+"""Assert that func(args, *kwargs) triggers the attack protection.`
	`910`	`+`
	`911`	`+ Note: this method must ensure that the attack protection being tested`
	`912`	`+ is the one that is actually triggered at runtime, e.g., by matching`
	`913`	`+ the exact error message.`
	`914`	`+ """`
	`915`	`+`
	`916`	`+@abc.abstractmethod`
	`917`	`+defset_activation_threshold(self,parser,threshold):`
	`918`	`+"""Set the activation threshold for the tested protection."""`
	`919`	`+`
	`920`	`+@abc.abstractmethod`
	`921`	`+defset_maximum_amplification(self,parser,max_factor):`
	`922`	`+"""Set the maximum amplification factor for the tested protection."""`
	`923`	`+`
	`924`	`+@abc.abstractmethod`
	`925`	`+deftest_set_activation_threshold__threshold_reached(self):`
	`926`	`+"""Test when the activation threshold is exceeded."""`
	`927`	`+`
	`928`	`+@abc.abstractmethod`
	`929`	`+deftest_set_activation_threshold__threshold_not_reached(self):`
	`930`	`+"""Test when the activation threshold is not exceeded."""`
	`931`	`+`
	`932`	`+deftest_set_activation_threshold__invalid_threshold_type(self):`
	`933`	`+parser=expat.ParserCreate()`
	`934`	`+setter=functools.partial(self.set_activation_threshold,parser)`
	`935`	`+`
	`936`	`+self.assertRaises(TypeError,setter,1.0)`
	`937`	`+self.assertRaises(TypeError,setter,-1.5)`
	`938`	`+self.assertRaises(ValueError,setter,-5)`
	`939`	`+`
	`940`	`+deftest_set_activation_threshold__invalid_threshold_range(self):`
	`941`	`+_testcapi=import_helper.import_module("_testcapi")`
	`942`	`+parser=expat.ParserCreate()`
	`943`	`+setter=functools.partial(self.set_activation_threshold,parser)`
	`944`	`+`
	`945`	`+self.assertRaises(OverflowError,setter,_testcapi.ULLONG_MAX+1)`
	`946`	`+`
	`947`	`+deftest_set_activation_threshold__fail_for_subparser(self):`
	`948`	`+parser=expat.ParserCreate()`
	`949`	`+subparser=parser.ExternalEntityParserCreate(None)`
	`950`	`+setter=functools.partial(self.set_activation_threshold,subparser)`
	`951`	`+self.assert_root_parser_failure(setter,12345)`
	`952`	`+`
	`953`	`+@abc.abstractmethod`
	`954`	`+deftest_set_maximum_amplification__amplification_exceeded(self):`
	`955`	`+"""Test when the amplification factor is exceeded."""`
	`956`	`+`
	`957`	`+@abc.abstractmethod`
	`958`	`+deftest_set_maximum_amplification__amplification_not_exceeded(self):`
	`959`	`+"""Test when the amplification factor is not exceeded."""`
	`960`	`+`
	`961`	`+deftest_set_maximum_amplification__infinity(self):`
	`962`	`+inf=float('inf')# an 'inf' threshold is allowed by Expat`
	`963`	`+parser=expat.ParserCreate()`
	`964`	`+self.assertIsNone(self.set_maximum_amplification(parser,inf))`
	`965`	`+`
	`966`	`+deftest_set_maximum_amplification__invalid_max_factor_type(self):`
	`967`	`+parser=expat.ParserCreate()`
	`968`	`+setter=functools.partial(self.set_maximum_amplification,parser)`
	`969`	`+`
	`970`	`+self.assertRaises(TypeError,setter,None)`
	`971`	`+self.assertRaises(TypeError,setter,'abc')`
	`972`	`+`
	`973`	`+deftest_set_maximum_amplification__invalid_max_factor_range(self):`
	`974`	`+parser=expat.ParserCreate()`
	`975`	`+setter=functools.partial(self.set_maximum_amplification,parser)`
	`976`	`+`
	`977`	`+msg=re.escape("'max_factor' must be at least 1.0")`
	`978`	`+self.assertRaisesRegex(expat.ExpatError,msg,setter,float('nan'))`
	`979`	`+self.assertRaisesRegex(expat.ExpatError,msg,setter,0.99)`
	`980`	`+`
	`981`	`+deftest_set_maximum_amplification__fail_for_subparser(self):`
	`982`	`+parser=expat.ParserCreate()`
	`983`	`+subparser=parser.ExternalEntityParserCreate(None)`
	`984`	`+setter=functools.partial(self.set_maximum_amplification,subparser)`
	`985`	`+self.assert_root_parser_failure(setter,123.45)`
	`986`	`+`
	`987`	`+`
	`988`	`+@unittest.skipIf(expat.version_info< (2,7,2),"requires Expat >= 2.7.2")`
	`989`	`+classMemoryProtectionTest(AttackProtectionTestBase,unittest.TestCase):`
	`990`	`+`
	`991`	`+# NOTE: with the default Expat configuration, the billion laughs protection`
	`992`	`+# may hit before the allocation limiter if exponential_expansion_payload()`
	`993`	`+# is not carefully parametrized. As such, the payloads should be chosen so`
	`994`	`+# that either the allocation limiter is hit before other protections are`
	`995`	`+# triggered or no protection at all is triggered.`
	`996`	`+`
	`997`	`+defassert_rejected(self,func,/,args,*kwargs):`
	`998`	`+"""Check that func(args, *kwargs) hits the allocation limit."""`
	`999`	`+msg=r"out of memory: line \d+, column \d+"`
	`1000`	`+self.assertRaisesRegex(expat.ExpatError,msg,func,args,*kwargs)`
	`1001`	`+`
	`1002`	`+defset_activation_threshold(self,parser,threshold):`
	`1003`	`+returnparser.SetAllocTrackerActivationThreshold(threshold)`
	`1004`	`+`
	`1005`	`+defset_maximum_amplification(self,parser,max_factor):`
	`1006`	`+returnparser.SetAllocTrackerMaximumAmplification(max_factor)`
	`1007`	`+`
	`1008`	`+deftest_set_activation_threshold__threshold_reached(self):`
	`1009`	`+parser=expat.ParserCreate()`
	`1010`	`+# Choose a threshold expected to be always reached.`
	`1011`	`+self.set_activation_threshold(parser,3)`
	`1012`	`+# Check that the threshold is reached by choosing a small factor`
	`1013`	`+# and a payload whose peak amplification factor exceeds it.`
	`1014`	`+self.assertIsNone(self.set_maximum_amplification(parser,1.0))`
	`1015`	`+payload=self.exponential_expansion_payload(ncols=10,nrows=4)`
	`1016`	`+self.assert_rejected(parser.Parse,payload,True)`
	`1017`	`+`
	`1018`	`+deftest_set_activation_threshold__threshold_not_reached(self):`
	`1019`	`+parser=expat.ParserCreate()`
	`1020`	`+# Choose a threshold expected to be never reached.`
	`1021`	`+self.set_activation_threshold(parser,pow(10,5))`
	`1022`	`+# Check that the threshold is reached by choosing a small factor`
	`1023`	`+# and a payload whose peak amplification factor exceeds it.`
	`1024`	`+self.assertIsNone(self.set_maximum_amplification(parser,1.0))`
	`1025`	`+payload=self.exponential_expansion_payload(ncols=10,nrows=4)`
	`1026`	`+self.assertIsNotNone(parser.Parse(payload,True))`
	`1027`	`+`
	`1028`	`+deftest_set_maximum_amplification__amplification_exceeded(self):`
	`1029`	`+parser=expat.ParserCreate()`
	`1030`	`+# Unconditionally enable maximum activation factor.`
	`1031`	`+self.set_activation_threshold(parser,0)`
	`1032`	`+# Choose a max amplification factor expected to always be exceeded.`
	`1033`	`+self.assertIsNone(self.set_maximum_amplification(parser,1.0))`
	`1034`	`+# Craft a payload for which the peak amplification factor is > 1.0.`
	`1035`	`+payload=self.exponential_expansion_payload(ncols=1,nrows=2)`
	`1036`	`+self.assert_rejected(parser.Parse,payload,True)`
	`1037`	`+`
	`1038`	`+deftest_set_maximum_amplification__amplification_not_exceeded(self):`
	`1039`	`+parser=expat.ParserCreate()`
	`1040`	`+# Unconditionally enable maximum activation factor.`
	`1041`	`+self.set_activation_threshold(parser,0)`
	`1042`	`+# Choose a max amplification factor expected to never be exceeded.`
	`1043`	`+self.assertIsNone(self.set_maximum_amplification(parser,1e4))`
	`1044`	`+# Craft a payload for which the peak amplification factor is < 1e4.`
	`1045`	`+payload=self.exponential_expansion_payload(ncols=1,nrows=2)`
	`1046`	`+self.assertIsNotNone(parser.Parse(payload,True))`
	`1047`	`+`
	`1048`	`+`
`851`	`1049`	`if__name__=="__main__":`
`852`	`1050`	`unittest.main()`

`‎Misc/NEWS.d/next/Library/2025-09-22-14-40-11.gh-issue-90949.UM35nb.rst‎`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,5 @@`
	`1`	+Add:meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
	`2`	+and:meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
	`3`	+to:ref:`xmlparser<xmlparser-objects>` objects to prevent use of
	`4`	`+disproportional amounts of dynamic memory from within an Expat parser.`
	`5`	`+Patch by Bénédikt Tran.`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit0e4cd89

File tree

7 files changed

7 files changed

`‎Doc/library/pyexpat.rst‎`

`‎Include/pyexpat.h‎`

`‎Lib/test/test_pyexpat.py‎`

`‎Misc/NEWS.d/next/Library/2025-09-22-14-40-11.gh-issue-90949.UM35nb.rst‎`

0 commit comments