Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
gh-95778: Add pre-check for int-to-str conversion#96537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
mdickinson commentedSep 3, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
To give an idea of how much the crude check misses: assuming 30 bits per digit and EDIT: Whoops, sorry; the bound is |
A RPi4 takes 10 seconds for the int_to_str test now.
skip rather than fail if we find unexpectedly high performance.
bedevere-bot commentedSep 4, 2022
bedevere-bot commentedSep 4, 2022
miss-islington commentedSep 4, 2022
Thanks@mdickinson for the PR, and@gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11. |
…GH-96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:pythongh-95778 -->* Issue:pythongh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>(cherry picked from commitb126196)Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
bedevere-bot commentedSep 4, 2022
GH-96562 is a backport of this pull request to the3.11 branch. |
miss-islington commentedSep 4, 2022
Sorry,@mdickinson and@gpshead, I could not cleanly backport this to |
…ythonGH-96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:pythongh-95778 -->* Issue:pythongh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>(cherry picked from commitb126196)Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
bedevere-bot commentedSep 4, 2022
GH-96563 is a backport of this pull request to the3.10 branch. |
…#96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:pythongh-95778 -->* Issue:pythongh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:gh-95778 -->* Issue:gh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>(cherry picked from commitb126196)Co-authored-by: Mark Dickinson <dickinsm@gmail.com>…#96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:pythongh-95778 -->* Issue:pythongh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
…#96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:pythongh-95778 -->* Issue:pythongh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
) (#96563)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:gh-95778 -->* Issue:gh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>(cherry picked from commitb126196)Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* Correctly pre-check for int-to-str conversion (#96537)Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.The justification for the current check. The C code check is:```cmax_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10```In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$From this it follows that$$\frac{M}{3L} < \frac{s-1}{10}$$hence that$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$So$$2^{L(s-1)} > 10^M.$$But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.<!-- gh-issue-number:gh-95778 -->* Issue:gh-95778<!-- /gh-issue-number -->Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>Co-authored-by: Christian Heimes <christian@python.org>Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
Summary:cherry-picked the upstream 3.10 backport```git cherry-pick8f0fa4beace09e```this one ispython/cpython#96537original commit message below--------Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.This PR comes fresh from a pile of work done in our private PSRT security response team repo.This backportspython/cpython#96499 aka511ca94Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).<!-- gh-issue-number: gh-95778 -->* Issue: gh-95778<!-- /gh-issue-number -->I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).Reviewed By: alexmalyshevDifferential Revision: D39369518fbshipit-source-id:6ed2def
Uh oh!
There was an error while loading.Please reload this page.
On current
main, converting a large enoughintto a decimal string raisesValueErroras expected. However, the raise comesafter the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop.This PR gives a proof-of-concept quick fix: essentially we catchmost values that exceed the threshold up front. Those that slip through will still be on the small side, and will get caught by the existing check.
For the record, here's the justification for the current check. The C code check is:
In math-speak, writing$M$ for$L$ for$s$ for
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$
max_str_digits,PyLong_SHIFTandsize_a, that check is:From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
$$2^{L(s-1)} > 10^M.$$ $a$ satisfies$|a| \ge 2^{L(s-1)}$ , so$|a|$ is larger than$10^M$ . This shows that we don't accidentally capture anythingbelow the intended limit in the check.
hence that
So
But our input integer
I don't think this is ready to merge as-is - there are some details to figure out, and I'll add line-by-line comments for those.