Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-90949: add Expat API to prevent XML deadly allocations#139234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
3fa5b10 to9c7371fComparepicnixz commentedSep 22, 2025
I plan to also expose the billion laugh mitigation API, but in a separate PR. |
Uh oh!
There was an error while loading.Please reload this page.
hartwork commentedSep 22, 2025
@picnixz thanks for taking on this project — very much appreciated! 🙏
@picnixz there is mention of the activation threshold in the final note ofhttps://libexpat.github.io/doc/api/latest/#XML_SetAllocTrackerMaximumAmplification but I'm happy if the topic gets more attention in CPython docs. I already started review and spotted a few things. I will likely submit a first round of review some time later today. |
picnixz commentedSep 22, 2025
I assume you are actually talking about this:
I actually understood it only as "if you use a smaller factor thenin practice, you might want to change the activation threshold" but I didn't understand it as "the amplification factor is only used if we exceed the threshold in the first place". I assumed from the note on
|
picnixz commentedSep 22, 2025
Thank you! I'll hold the work on the billion laugh API as it'll be easier to add the API once this one is merged. |
4ba478d tod636685Compare
hartwork left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@picnixz here's everything I found so far. Happy to learn what I missed in these findings 🍻
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| } | ||
| #endif | ||
| #ifXML_COMBINED_VERSION >=20702 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I understand that Expat >=2.7.2 is needed to offer this functionality, but maybe the the Python API should be available in any case and raise exceptions that the compile time Expat was not recent enough to support this feature. Is that what's happening? I remember we had this very question when introducingSetReparseDeferralEnabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Argument Clinic will handle this automatically by "not" having the function (an AttributeError will be raised). ForSetReparseDeferralEnabled, it appears that the API call will be a no-op otherwise.
Maybe it's better to raiseNotImplementedError instead of the default AttributeError, as we usually do in ssl when the libssl backend is not the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@picnixz I'd like to vote forNotImplementedError or no-op rather thanAttributeError for now. I'll need to sleep over this and potentially dig up past communication on the reparse deferall thing. There was past rationale on this somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I would prefer aNotImplementedError over a no-op, as otherwise people might assume that they are protected against some attacks while they are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@picnixz I should note that protection is active by default (or it would not have fixed the vulnerability). So a no-op in the tuning function would leave them protected under default settings, not unprotected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I like the#error approach. It simplifies the checks. I think I'll do a separate PR for that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@picnixz please feel free to CC me for review there 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually, if I includeexpat_config.h before, those symbols will anyway be defined (unless someone changesexpat_config.h). But I don't think I need to assume thatexpat_config.h should be changed. Until now, it was not even used... (it wasn't included). I prefer not to overcomplicate the file (but at least now I know that it should be fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@picnixz fileexpat_config.h is auto-generated by the Expat build system (if systemwide Expat is used) and hardcoded by CPython when the bundle is used. So in the former case,XML_DTD andXML_GE could be defined or not: it's under the sysadmin's control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh thanks for the explanation. I thought it was a python-only file.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
hartwork commentedSep 22, 2025
@picnixz I confirm, yes, exactly 👍
I see. I think that means that upstream docs should be adjusted some way to make it easier to read that (or them) as interconnected. How about I try a related pull request upstream and you take the review seat there?
Understood, good plan 👍 |
…on API (python#139366)Fix some typos left inf04bea4,and simplify some internal functions to ease maintenance of futuremitigation APIs.
hartwork commentedSep 28, 2025
@picnixz any ideas about what to do about Regarding |
picnixz commentedSep 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think we could just add a method for tuning the protections? maybe a single method for all protections? or a configuration object for that? I don't really know which one would be the best here. |
picnixz commentedSep 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Note: I think having the user calling |
hartwork commentedSep 28, 2025
@picnixz it's a pity that etree and sax are not reusing pyexpat internally (as can be seen in |
picnixz commentedSep 28, 2025
But they do I think? at least there is some ExpatReader in the sax module and Ithink there is some |
hartwork commentedSep 28, 2025
@picnixz maybe we need to look at sax and etree separately. Other than sax, etree has a C and a pure-Python version of the same module that — I think — need to satisfy the same interface. The C version of etree does not seem to use |
picnixz commentedSep 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Indeed, for I would instead suggest that we don't do anything for this one, and that by default, it's left untouched (namely make those methods a no-op). For the C implementation, we would add new methods at the level of For SAX, it's also a pure Python implementation (no C), so I suggest leaving the API empty as well. The pure Python implementation however seems toonly use I think checking for the existence of the underlying C API through |
CVE-2025-59375) (pythonGH-139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…on API (python#139366)Fix some typos left inf04bea4,and simplify some internal functions to ease maintenance of futuremitigation APIs.(cherry picked from commit68a1778)
GH-139527 is a backport of this pull request to the3.12 branch. |
…2025-59375) (python#139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)
…on API (python#139366)Fix some typos left inf04bea4,and simplify some internal functions to ease maintenance of futuremitigation APIs.(cherry picked from commit68a1778)
GH-139529 is a backport of this pull request to the3.11 branch. |
…2025-59375) (python#139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)
…on API (python#139366)Fix some typos left inf04bea4,and simplify some internal functions to ease maintenance of futuremitigation APIs.(cherry picked from commit68a1778)
GH-139532 is a backport of this pull request to the3.10 branch. |
pythonGH-139234 (python#139558)Fix a compiler warning `-Wunused-function` afterf04bea4.The `set_invalid_arg` function in `Modules/pyexpat.c` may be unused if the underlying Expatversion is less than 2.4.0.
…-2025-59375) (GH-139234) (#139359)* [3.14]gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (GH-139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)(cherry picked from commit68a1778)
…-2025-59375) (GH-139234) (#139367)*gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (#139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)(cherry picked from commit68a1778)
…-2025-59375) (GH-139234) (#139532)Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…-2025-59375) (GH-139234) (#139529)Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…-2025-59375) (GH-139234) (#139527)* [3.12]gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (GH-139234)Expose the XML Expat 2.7.2 mitigation APIs to disallow use ofdisproportional amounts of dynamic memory from within an Expatparser (seeCVE-2025-59375 for instance).The exposed APIs are available on Expat parsers, that is,parsers created by `xml.parsers.expat.ParserCreate()`, as:- `parser.SetAllocTrackerActivationThreshold(threshold)`, and- `parser.SetAllocTrackerMaximumAmplification(max_factor)`.(cherry picked from commitf04bea4)(cherry picked from commit68a1778)Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Uh oh!
There was an error while loading.Please reload this page.
@hartwork I observed that the maximum allocation factor is only checked if we exceed the activation threshold. The online docs for Expat didn't explicitly mention it so I mentioned it in our docs.
I eventually decided to redo the float-check in case of an error so that we can give a better message to the user (as it relates to security practices, I think it's good to explain why the API call failed as much as we can).
📚 Documentation preview 📚:https://cpython-previews--139234.org.readthedocs.build/