This disables the AVX512 instruction set at runtime within the CI using theNPY_DISABLE_CPU_FEATURES environment variable. This is due to small floating point differences causing test failures when using that instruction set on numpy 1.22 wheels on the GitHub Actions runners.

PR Checklist

Tests and Styling

Has pytest style unit tests (andpytest passes).
IsFlake 8 compliant (installflake8-docstrings and runflake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).

CI: Disable numpy CPU features at runtime

24d9f37

greglucas force-pushed thenumpy-cpu-avx512 branch from2fdc989 toa2eed8eCompare

January 3, 2022 18:09

TST: Remove numpy cpu disabling from some subprocess tests

d6f6875

This removes the NPY_DISABLE_CPU_FEATURES flag from the sphinx and tk testsas they emit warnings on CI which leads to failure from the subprocess.These don't need to be disabled on these tests, so remove them fromthe environment variables that are passed in.

greglucas force-pushed thenumpy-cpu-avx512 branch froma2eed8e tod6f6875Compare

January 3, 2022 19:11

timhoffm mentioned this pull request

Jan 3, 2022

DOC: Add new tutorial to external resources.#22073

Merged

1 task

timhoffm approved these changes

Jan 3, 2022

View reviewed changes

timhoffm added this to thev3.5.2 milestone

Jan 3, 2022

timhoffm added the topic: testing label

Jan 3, 2022

Copy link

Member

jklymak commentedJan 3, 2022

This is pretty obscure. Does it make sense to just up the test tolerance instead? Though I think a couple of tests werequite wrong. Why did these flags have such a large effect?

Copy link

Member

dstansby commentedJan 3, 2022

This is pretty obscure. Does it make sense to just up the test tolerance instead? Though I think a couple of tests werequite wrong. Why did these flags have such a large effect?

I don't think any were unexpectedly wrong. I downloaded the failed images and all were minor chnages apart fromerrorbar_mixed, which was a large change because (I presume) a small difference was causing the axes limit to jump from 1e-2 to 1e-3.

I think this is the most pragmatic way to go forward, without having to play whack-a-mole with test tolerances across the test suite.

dstansby approved these changes

Jan 3, 2022

View reviewed changes

Copy link

Member

jklymak commentedJan 3, 2022

Oh I have the opposite point of view. If we have tests that are susceptible to compiler foibles maybe we should fix the tests or increase their tolerance, because there will always be more compiler foibles.

Copy link

ContributorAuthor

greglucas commentedJan 3, 2022

I agree this is a pretty big hammer, saying we won't test against certain floating point instruction sets. However, I assume we rely on numpy for some verification that their floating point calculations are "good enough" for them and this is impacting us because we are doing pixel-perfect comparisons with floating point inputs. I can see both arguments here, so I'm not sure which one people would be more generally comfortable with (increased tolerances, or reduced instruction sets).

There are only a few tests failing, so the tolerances on a few of them could probably be bumped, but the errorbar one we should probably update the autoscaling in that test by just bumping thenp.minimum() call to add in a small epsilon rather than adding a large tolerance.

Copy link

Member

timhoffm commentedJan 3, 2022•
edited
Loading

We have to live with the fact that our dependencies may introduce minor variations. First and foremost we don't want brittle tests. There are two ways to get there:

Make the test environment as reproducible as possible
Increase tolerances

I argue that if we can easily get away with (1) that's better than (2). We have more control and don't blindly accept other changes as well. For the same reason we pin freetype or remove text overall.

Copy link

Member

timhoffm commentedJan 4, 2022

I merge as is to unbreak the builds.@jklymak if you think tolerances are the better approach we can discuss this in the next call, and then maybe change.

timhoffm merged commit82e5939 intomatplotlib:main

Jan 4, 2022

meeseeksmachine mentioned this pull request

Jan 4, 2022

Backport PR #22099 on branch v3.5.x (CI: Disable numpy avx512 instructions)#22101

Merged

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request

Jan 4, 2022

Backport PRmatplotlib#22099: CI: Disable numpy avx512 instructions

c7cb34a

Copy link

Member

jklymak commentedJan 4, 2022

Sure, I've added to the call agenda.

dstansby added a commit that referenced this pull request

Jan 4, 2022

Merge pull request#22101from meeseeksmachine/auto-backport-of-pr-22…

b5d83e8

…099-on-v3.5.xBackport PR#22099 on branch v3.5.x (CI: Disable numpy avx512 instructions)

dstansby mentioned this pull request

Jan 4, 2022

See if pinning numpy to < 1.22 fixes tests#22076

Closed

6 tasks

greglucas deleted the numpy-cpu-avx512 branch

January 4, 2022 15:23

pllim mentioned this pull request

Jan 4, 2022

TST: Use NPY_DISABLE_CPU_FEATURES for numpy 1.22astropy/astropy#12684

Closed

10 tasks

Copy link

Member

tacaswell commentedJan 4, 2022

I may have over-learned from our issue we had with "rendering the wrong glyph but tests are passing", but I am very very concerned about bumping tolerances in anything but a per-test basis.

Copy link

Member

jklymak commentedJan 4, 2022

Sure, I wasn't suggesting a general large tolerance. But here we have one test that had a large change because of floating point slop. So in my opinion that test should be made more robust. The other changes are pretty small, so upping those tolerances would also make sense to me.

Copy link

ContributorAuthor

greglucas commentedJan 6, 2022

Here is a link to a commit that would be what@jklymak suggests to increase tolerances/dtypes. All of them were pretty small.
greglucas@c030e05
I tried to move all of the failing tests to uselongdouble rather than increasing tolerances, but there ended up being some unsafe downcasting to float64 in the qhull procedures of trisurf.

Copy link

Member

tacaswell commentedJan 6, 2022

I came acrosshttps://etna.math.kent.edu/vol.52.2020/pp358-369.dir/pp358-369.pdf recently which has a case where increasing the precision of floats can make things worse (sometimes the rounding is actually in your favor!).