Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
PyPy received the following performance bug today:https://foss.heptapod.net/pypy/pypy/-/issues/3961
Somebody who was trying to process a lot of emails from an mbox file was complaining about terrible performance on PyPy. The problem turned out to be fact thatemail.feedparser.FeedParser._parsegen is compiling a new regular expression for every multipart message in the mbox file. On PyPy this is particularly bad, because those regular expressions are jitted and that costs even more time. However, even on CPython compiling these regular expressions takes a noticeable portion of the benchmark.
Ifixed this problem in PyPy by simply usingstr.startswith with the multipart separator, followed by a generic regular expression that can be used for arbitrary boundaries. In PyPy this helps massively, but in CPython it's still a 20% performance improvement. Will open a PR for it.