Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-132042: Prebuild mro_dict for find_name_in_mro to speedup class creation#132618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
gh-132042: Prebuild mro_dict for find_name_in_mro to speedup class creation#132618
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
All optimizations give about 40% speedup on tests.
Would you mind to run benchmarks on this PR?
First, rebase the PR on the main branch to retrieve the "Do not lookup tp_dict each time to speedup class creation" change.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Yes, but I can do it on the weekends (at least I will try later today on my traveling laptop, but results will not be the same as from original PR). |
sergey-miryanov commentedApr 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@vstinner benchmark's results: Details
There are two mro column - for two runs. I believe results are not very stable due throttling (I ran benchmarks on macbook retina 2013 on windows, cpu - i5-4258U @ 2.40GHz) |
@vstinner Please take a look. |
sergey-miryanov commentedApr 20, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Updated benchmarks (ran windows 11 x64 desktop, cpu- 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz): Details
|
…hile type initialized- in the old realisation this exception swallowed and this base not checked while finding in mro. so we don't change observed behavior with this change.
Updated results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM. I just have a remark on a comment.
I'm surprised that the latest benchmark run no longer shows slower tests.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Victor Stinner <vstinner@python.org>
Yeah, ref2 became a bit slower than ref (ref is ran on main 3 days ago, ref2 on actual main). Details
|
Merged with main due conflicts after#131174 |
Updated benchmarks with main (a bit slower again - ref3 today main)
This PR - 1.20x faster
|
Uh oh!
There was an error while loading.Please reload this page.
I also benchmarked with FT. main vs main-ft (34% slower)
main-ft vs PR-ft (20% faster)
|
Yhg1s commentedApr 29, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
FYI, I've scheduled full benchmark runs (regular and free-threaded) onhttps://github.com/facebookexperimental/free-threading-benchmarking. (It'll be a couple of hours before they show up.) I'm a little concerned about the need for the test change, but I guess it's rare enough to have custom |
sergey-miryanov commentedApr 29, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks for benchmarking. I'm looking forward to results :)
IIUC, |
This is an interesting optimization and worth considering IMHO. However, I am also concerned about unexpected behavioral differences caused by it. Doing a merge on each type dict is not the same has doing a dict lookup on each slot. As the broken unit test shows, you can have code that executes on the dict lookup and that can change the result of the operation (e.g. the hash method on a name value). To be fair, code that does that is probably "cursed" and deserves to break. However, I'd be a little worried about applying this optimization in the 3.14 release. Maybe we should defer and revisit in the 3.15 cycle? |
sergey-miryanov commentedApr 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks! I'm crying! 😿 |
Yeah, I agree with you. In the initial version I just stopped building of MRO-dict and fallback to original version of
It is up to core-devs - I just write code a bit :) |
nascheme commentedApr 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Don't cry, this is the nature of doing performance optimizations of Python. The low hanging fruit is pretty much gone and so optimizations are either quite complex or they only provide benefit to a specific subset of code. Just because the pyperformance benchmarks don't show a win, it doesn't mean this is not worth pursing. I'm sure some code out there does create more type objects and would see a significant benefit. So it's a matter of how costly the optimization is, in terms of code maintenance and risk of changing behavior (breaking currently working programs). In this case, the maintenance cost looks pretty minor, the implementation is not complex. It's the change in behavior that's the concern and I think we are too close to the beta of 3.14 to risk it. Again, I think this is worth re-visiting and I hope you are not discouraged from looking for other kinds of optimizations. |
@nascheme Thank you for kind words! |
sergey-miryanov commentedMay 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I have built table with results like in PR#132156. The following tests call
The only 'expected' 'faster' result is for So, I repeat myself, IMHO this set of benchmarks is not suitable to catch differences from this PR. |
Uh oh!
There was an error while loading.Please reload this page.
This is one of the optimizations from#132156 that moved to separate PR.
All three optimizations from original PR give about 40% speedup on tests.
This optimization give about 15%-18% speedup.