Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
bpo-39280: Don't allow datetime parsing to accept non-Ascii digits#17931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Closed
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
9957578
to75bc88f
CompareSign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading.Please reload this page.
I've been doing some research into the use of
\d
in regular expressions in CPython, and any security vulnerabilities that might happen as a result of the fact that it accepts non-Ascii digits like ٢ and 5.In most places in the CPython codebase, the
re.ASCII
flag is used for such cases, thus ensuring there
module prohibits these non-Ascii digits. Personally, my preference is to never use\d
and always use[0-9]
. I think that it's rule that's more easy to enforce and less likely to result in a slipup, but that's a matter of personal taste.I found a few places where we don't use the
re.ASCII
flag and we do accept non-Ascii digits.The first and less interesting place is platform.py, where we define patterns used for detecting versions of PyPy and IronPython. I don't know how anyone would exploit that, but personally I'd change that to a [0-9] just to be safe. I've openedbpo-39279 for that.
The more sensitive place is the
datetime
module.Happily, the
datetime.datetime.fromisoformat
function rejects non-Ascii digits. But thedatetime.datetime.strptime
function does not:If user code were to check for uniqueness of a datetime by comparing it as a string, this is where an attacker could fool this logic, by using a non-Ascii digit.
Two more interesting points about this:
strptime
, we're referencing the 1989 C standard. Since the first version of Unicode was published in 1991, it's reasonable not to expect the standard to support digits that were introduced in Unicode.If you'd scroll down in that documentation, you'll see that we also implement the less-known ISO 8601 standard, where
%G-%V-%u
represents a year, week number, and day of week. The%G
is vulnerable:I looked at the ISO 8601:2004 document, and under the "Fundamental principles" chapter, it says:
Note the "unique and unambiguous". By accepting non-Ascii digits, we're breaking the uniqueness requirement of ISO 8601.
https://bugs.python.org/issue39280