Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
Description
Bug report
Bug description:
While tinkering around withemail.utils.parsedate_to_datetime
, I found some behavior that may be worth adjusting.
1. low-number years aren't handled according to spec:
The year is any numeric year 1900 or later. [section 3.3]
[section 4.3] The syntax for the obsolete date format allows a 2 digit year.
[..]
Where a two or three digit year occurs in a date, the year is to be
interpreted as follows: If a two digit year is encountered whose
value is between 00 and 49, the year is interpreted by adding 2000,
ending up with a value between 2000 and 2049. If a two digit year is
encountered with a value between 50 and 99, or any three digit year
is encountered, the year is interpreted by adding 1900.
>>>parsedate_to_datetime("Sat, 15 Aug 0001 23:12:09 +0500")datetime.datetime(2001,8,15,23,12,9, ...)
expected: either year 1, or a parsing failure. Neither the new or old format interpret 4-digit years this way.
2. offset minutes larger than 59 don't lead to parsing failure
>>>parsedate_to_datetime('Sat, 15 Aug 0001 23:12:09 +0590')datetime.datetime(2001,8,15,23,12,9,tzinfo=datetime.timezone(datetime.timedelta(seconds=23400)))
expected: parse failure. Instead, the "90 minutes" component is parsed without issue (0590 being equal to 0630). The spec is actually not explicit about this, although"A date-time specification MUST be semantically valid". Note that a "90" value as minute in the time componentdoes give the appropriate parsing failure.
Note:datetime.fromisoformat()
has the same behavior. Also in this case, I can't determine whether ISO8601 explicitly disallows it. RFC3339is clear ondisallowing this.
3. Invalid day-of-week doesn't lead to parsing failure
>>>parsedate_to_datetime('Sun, 15 Aug 0001 23:12:09 +0520')# actually a saturday
expected: parsing failure
A date-time specification MUST be semantically valid. That is, the
day-of-the-week (if included) MUST be the day implied by the date,
4. Non-ASCII digits don't lead to parsing failure
If I'm reading the RFC correctly, only ASCII characters are valid.
>>>parsedate_to_datetime('Sat, 15 Aug 01 𝟚𝟛:𝟝𝟛:𝟛𝟛 +0500')# note the fancy numbersdatetime.datetime(2001,8,15,23,53,33, ...)
expected: parsing failure
5. Handling of the-0000
case may be inconsistent with drive to eliminate the practice of "naive UTC" datetimes.
Lately, thedatetime
module appears to discourage the usage of naive datetimes to mean UTC, as evidenced by the deprecation ofutcnow()
and other methods.
However,parsedate_to_datetime
will return anaive datetime in the-0000
case.
>>>parsedate_to_datetime("Sat, 15 Aug 01 23:53:33 -0000")datetime.datetime(2001,8,15,23,53,33)
expected:tzinfo=UTC
The spec says:
"-0000" also indicates Universal Time, it is
used to indicate that the time was generated on a system that may be
in a local time zone other than Universal Time and therefore
indicates that the date-time contains no information about the local
time zone.
The spec again is a bit fuzzy, but my reading here is that-0000
means "UTC, with no offset known". In contrast,+0000
means "UTC offset known to be 0". My impression would be that onlyomission of the offset should result in a naive datetime. What do you think?
CPython versions tested on:
3.13
Operating systems tested on:
macOS
edit: typo
Linked PRs
- gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #134311
- gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #126845 #134350
- gh-126845: Some edge cases in email.utils.parsedate_to_datetime seem to differ from RFC2822 spec #134438