Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.1k
FIX: Move all tests using subprocess to the same worker on windows#29981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
dstansby commentedApr 28, 2025
Well it passes here 🎉 Comparing with latest main (that failed), the same number of tests were run, and the test time was about a minute slower here, which all checks out. |
timhoffm commentedApr 28, 2025
Trying to power-cycle, but apparently azure pipeline results are cached and not rerun 😞. |
timhoffm commentedApr 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Ok second try (only reworded commit message to let azure believe this is new). Let's see if we can get this passing multiple times. |
timhoffm commentedApr 28, 2025
Second run also successful. Yet another rewording. Three's the charm. |
tacaswell commentedApr 29, 2025
4 errors (two look like timeouts, one looks like a failure to read back a png and the last is some random test later when the warning about unclosed files got raised during GC and pytest failed the test it was in). |
timhoffm commentedApr 29, 2025
One of the failing tests was not marked with the xdist_group. -> fix and retry. |
This is a somewhat wild guess based onmatplotlib#29797 (comment)> which makes me think we are somehow crossing state what launching the subprocesses.Using the `xdist_group` with `--dist=loadgroup` should put all tests ofthat group to the same worker according tohttps://pytest-xdist.readthedocs.io/en/stable/distribution.html.I've only added `--dist=loadgroup` to the windows pipelines, so tests onother systems are not affected at all.The first test is to see whether this works in the PR. - But to be sure,it think we would need to put this on master and monitor whether thetimeouts disappear.
timhoffm commentedApr 29, 2025
First test run after fix successful. Force-push to re-run a second time. |
tacaswell commentedApr 29, 2025
The annulus test that is failing is a bit pathological and there are two tests that share the same baseline image, I'm going to push a commit merging them. |
QuLogic commentedApr 29, 2025
There is a lock for re-used filenames, but it is only used if you re-use a filename in thesame test, so it is best to merge these, even if it would just be two figures in one test (which I don't think you did anyway.) |
lib/matplotlib/tests/test_patches.py Outdated
| ax.set_aspect('equal') | ||
| @pytest.mark.parametrize('mode', ('a','b')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It seems likea is (setting by)axis andb isradius.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
yeah, that is what it is doing. I went with the most obvious to me solution, making two figures with the same name is a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
On third thought, I left it paramaterized, but gave better names (and re-ordered the code a bit to only have one if/elif).
tacaswell commentedApr 30, 2025
It is a bit aggressive to backport this PR, if anyone has the slightest concern lets not. |
tacaswell commentedApr 30, 2025
If we are correct that the problem is trying to launch simultaneous subprocesses from the pytest-xdist workers, do we have any theory as towhy this is a problem? This seems like something that should be safe. My best guess (I did look at the source of |
timhoffm commentedApr 30, 2025
I'm also not clear on the mechanics in the background, but the issues liked in#29797 (comment) indicate that there can be interplay between the subprocesses and timeout handling, and that could be specific to windows. It's a bit weird that it only(?) happens for Python 3.10/3.11. I'm inclined to limit the the |
timhoffm commentedApr 30, 2025
The recent change showed the timeout again. Can somebody with more pipeline configuration skills please verify whether my attempt to apply |
QuLogic commentedApr 30, 2025
I'm not sure that |
timhoffm commentedMay 1, 2025
I still have an echo statement in azure-pipelines.yml to veryify the code works as intended. It does. I'm letting the CI complete to get another data point on "this does not time out". When that's complete, I'll remove the echo. |
timhoffm commentedMay 1, 2025
Everything seems to work. Echo removed. |
Note: Some of them have been marked flaky - may be worth reconstructingwhether that was due to timeouts.
timhoffm commentedMay 1, 2025
Got a timeout, and found out there were more tests using subprocesses, which weren't yet in the xdist_group. Added them. |
timhoffm commentedMay 1, 2025
Let's put this on hold in favor of#29992. |
timhoffm commentedMay 5, 2025
Superseeded by#29992. |
Uh oh!
There was an error while loading.Please reload this page.
This is a somewhat wild guess based on#29797 (comment)
Using the
xdist_groupwith--dist=loadgroupshould put all tests of that group to the same worker according tohttps://pytest-xdist.readthedocs.io/en/stable/distribution.html. I've only added--dist=loadgroupto the windows pipelines, so tests on other systems are not affected at all.The first step is to see whether this works in the PR. - But to be sure, it think we would need to put this on master and monitor whether the timeouts disappear.