Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-142145: Avoid timing measurements in quadratic behavior test#143105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Count the number of Element attribute accesses as a proxy for work done.With double the amount of work, a ratio of 2.0 indicates linear scalingand 4.0 quadratic scaling. Use 3.2 as an intermediate threshold.
colesbury commentedDec 23, 2025
The context for this is that I'm trying to get the full test suite in a state where we can run it under TSan. TSan tends to be much slower, so timing tests are even more flaky. Let me know what you think about this appraoch. For context, I used ChatGPT to generate the test and then edited it. (It was pretty good, but not perfect). |
sethmlarson commentedDec 23, 2025
Thanks@colesbury, did you happen to run this test with the commit in question reverted to see that this test would indeed catch quadratic behavior in number of accesses? |
colesbury commentedDec 23, 2025
Yes, if you revert the fix then it fails with: |
sethmlarson left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, thanks@colesbury!
Lib/test/test_minidom.py Outdated
| @support.requires_resource('cpu') | ||
| deftestAppendChildNoQuadraticComplexity(self): | ||
| # Don't use wall-clock timing (too flaky). Instead count a proxy for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Elide the model's explanatory comment about the old no longer present code.
| self.assertEqual(dom.documentElement.childNodes[-1].data,"Hello") | ||
| dom.unlink() | ||
| @support.requires_resource('cpu') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
why remove this? i realize the test shouldn't be "slow" now, but the intent of adding it was to mark it as a test that may not be suitable for slow builds and loaded systems. some of our buildbots run with the cpu resource disabled for that reason. it's more of a performance test no matter how it is implemented.
for tests where it isn't platform specific and we just needsomething in our support tiers to catch an unlikely regression the issue, resource tags save effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll add it back. I removed it because my understanding was that "cpu" was for tests that were CPU-heavy, and now this test is very fast (~30 ms in a debug build).
| total_calls+=1 | ||
| returnobject.__getattribute__(self,attr) | ||
| withsupport.swap_attr(Element,"__getattribute__",getattribute_counter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
doing this is... gross, but likely works. yourassertGreater(w1, 0) below covers the case when it stops doing so. LGTM
the other thought i was having as i did the simple "raise the timeout to 4s" to unblock the earlier ones was that we don't need absolute times at all. measuring relative time taken as we increase the amount of work to ensure that it scales - similar to this scaling measurement of attribute accesses - would be sufficient and should produce similar results on super slow builds or heavily loaded systems. it would still be time based (so maybe useresource.getrusage(resource.RUSAGE_SELF).ru_utime instead oftime. APIs... though i suspect rusage is low-res) but far less "works on my system"
we have a smattering of other timing based cpu performancy tests in the repo. i don't think we've codified a reusable best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
measuring relative time taken as we increase the amount of work ... would be sufficient
It'd be great to have some sort of standard best practice for this. Ifresource.getrusage(resource.RUSAGE_SELF).ru_utime isn't too affected by other processes, that would be good.
Some other things I was thinking about:
- Count cycles (via
perf_event_open?) - Count bytecodes executed (by instrumenting
ceval.cin debug builds)
Uh oh!
There was an error while loading.Please reload this page.
Count the number of Element attribute accesses as a proxy for work done. With double the amount of work, a ratio of 2.0 indicates linear scaling and 4.0 quadratic scaling. Use 3.2 as an intermediate threshold.