Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.1k
Open
Description
Bug report
Bug description:
OSS-Fuzz has found a heap buffer overflow in_PyTokenizer_ensure_utf8.Link to OSS-Fuzz bug report.
The root cause is thatvalid_utf8() inParser/tokenizer/helpers.c checks continuation bytes in reverse order thus readers[expected] befores[1] on these lines:
cpython/Parser/tokenizer/helpers.c
Lines 497 to 499 in8b7b5a9
| for (;expected;expected--) | |
| if (s[expected]<0x80||s[expected] >=0xC0) | |
| return0; |
When a multi-byte UTF-8 sequence is truncated - such as a 3-byte lead\xEA followed immediately by a null terminator - the backward loop reads past the end of the valid data before encountering the null byte that would stop it.
This is not a security-critical issue.
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response