- Notifications
You must be signed in to change notification settings - Fork3.8k
blitters fix compile error when no SIMD#3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Look, I know there are lots of opportunities for cleanups around the SIMD code, but please separate the essential changes from the cleanups in this PR.
If I do that, it would make the diff smaller/easier, but the contents of the alphablit file would get more messier as stated in the opening comment. But if you insist, sure I'll separate the cleanups bits of this PR into a separate branch which I can PR post 2.1.3 |
Starbuck5 commentedOct 13, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I do insist. There may be an easier way to do it than you're thinking, but I can't tell without seeing the actual changes on their own. Thanks for working on fixing this! I just don't want to have to parse a 900 line diff when we're in a merge freeze. |
Oh well, this PR is broken even in the current state. The main problem I just figured, is that we can't easily do compile-time AVX2 checks outside of files compiled with AVX2 (which ATM is only |
ankith26 commentedOct 13, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I have now implemented a fix, but I'm not quite a fan. Now if AVX2 is not available at compile time, a runtime fallback is opened up. Due to this change, avx2 -> non-simd fallback is possible as well. This is the smallest-diff solution I can come up with. Another idea involves restructuring many more things. |
This PR has been testing on the ppc64le/s390x branch where the codebase now compiles after cherry picking this commit. Also locally tested on windows with |
I don't think this is true. SDL_HasAVX2() is a runtime check. If AVX2 was missing at compile but present at runtime, it diverts to the SSE2 implementation, which is guaranteed to be present if runtime AVX2 is. |
Yeah, SSE2 is guaranteed to be present atruntime if we already confirmed avx2 is available at runtime but not compiled. But is it also guaranteed to be compiled in? This PR is a compilation error fix like I mentioned in my opening comment. The previous implementation assumed sse2 was presentat compile time when avx2 was missing atcompile time which is ofc, true for intel platforms of today where sse2 is almost universally available, but untrue for all the other platforms. It just didn't fail on our ARM CI because sse2neon header we use, so it only became apparent when I tried to add ppc64le and s360x CI. If you want to see the run that failed due to the issue this PR is fixing, it's thishttps://github.com/pygame/pygame/actions/runs/3232565797 |
Isn't the solution then to have the AVX2 blitter functions fall all the way back to the normal C implementations when SSE is also not available at compile time? Or have the AVX2 blitter functions compile to no ops? It's hard to imagine a system where it is compiled without SSE or AVX and then run with AVX. |
That's exactly what this PR is doing, but in the
That is an alternative I considered. I also agree with the second part since it sounds like a really rare case, but it's probably notimpossible. If someone decides to compile pygame on an old 32-bit intel CPU that doesn't support sse2, or alternatively, an old compiler that does not have sse2 on by default and runs it on a modern CPU which has AVX2. In both these cases, having a warning and falling back to the non-simd path is better than having a no-op |
Starbuck5 commentedOct 15, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I feel like there must be a better way. Is there a reason that the AVX2 function calls couldn't be completely ifdefed out if AVX isn't defined at compile time like the SSE ones are? Edit: oh, because AVX2 preprocessor is only defined in the avx2 blitter file, duh. Edit edit: New idea, PG_HasAVX2() function defined in simd_blitters_avx2. It checks both the runtime presence of AVX support as well as whether pygame was compiled to support it. When AVX2 is not available at compile time, the avx2 blitter functions compile to no-ops. Using PG_HasAVX2() instead of SDL_HasAVX2() guarantees the no-ops will never be called, it always falls down to the next level (SSE, if available, then C) all in alphablit.c Only thing we lose in that is the runtime warnings when using unoptimized pygame. |
I know there are better ways, but you asked for a minimal diff and a safe PR and this is the best I can come up with :) Your edit edit sounds very much like my original implementation, good idea except it didn't work. The alphablit file can't use macros defined in the avx2 C file. If those macros are defined in the header and the header is used by alphablit file, |
ankith26 commentedOct 15, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
So starbuck and I discussed this on discord and I properly understood the proposal he made in his latest comment, and I liked it as well (The key here is that we use a pygame avx checkfunction and not a macro. This function returns 0 when avx is not compiled in) I've also moved the simd warnings to surface init (and also added some warnings for sse2 and neon) in a separate commit, which I'm open to postponing to a future PR if other reviewers would prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, I chatted about this with@ankith26 on discord and understand the original compiler errors and the resulting change now.
It is basically a sidestep around an issue with some badly scoped #else define blocks (by me), these just did not work when__SSE2__ was not defined at compile time.
Previously we used a fallback in the case ofruntime == AVX2 but compiled with only SSE2. Now we just force the runtime checks to fail with compiler defines if there is no compile time support for AVX2 or SSE2 so the code branches down the right path earlier on rather than in the AVX2 file.
Overall, I think it is a simpler solution that should scale better if we add AVX512 or beyond support in the future. Probably reduces the chance of any rogue AVX2 machine code escaping the avx2.c file too.
Cool this should fix ppc64le and s390x. I wonder how you tested this? Also wondered about adding tests to gh actions... The intel emulator (for win, linux, mac) is quite nice and seems very convenient for testing. Theoretically, this should work to test no-avx2 available: $ sde64 -snb -- python -m pygame.tests Dunno if it'll work on github actions. Qemu can take params like
Also saw Virtualbox (and vagrant) can disable instruction sets. > VBoxManage.exe setextradata"Dev" VBoxInternal/CPUM/IsaExts/AVX2 0 Vagrant: |
ankith26 commentedOct 16, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I have a WIP PR adding s390x and ppc64le to CI (which I'm confident of getting working) in#3496 (that PR prompted this PR in the first place) I tested this PR by cherry picking a commit on there (but now that branch has an older version of this PR) I could also test the no-avx2 and no-sse2 cases on windows by manually undef-ing the relevant macros. |
That info about vagrant and qemu is definitely useful and interesting, thanks! I have had issues with qemu locally in the past, but never tried vagrant yet |
Cool. I'll add a new issue about having tests for different SIMD instructions being available. Probably running the tests an extra time no AVX2 under the intel sde64 emulator on each of (mac, linux, windows) is enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hopefully everyone is ok with this now. Mergey mergey.
Thanks@ankith26 :)
Uh oh!
There was an error while loading.Please reload this page.
I'm making this PR to tackle the ppc64le and s390x compile fails from#3496
Basically, after the recent SIMD changes, the code fails to compile whenboth avx2 and sse2 (or neon on ARM) are missing. Currently, the avx2 functions are always defined, but when avx2 is missing at compile time, the sse2 fallback is used without testing for (compile time) sse2 presence.
As usual, while I was looking into fixing this, the urge to do related cleanups was also strong 😓 and I ended up cleaning up some repetition, and the like. Without doing this, the changes I wanted to make would only increase the ifdef hell in those parts of the code and it was getting fairly hard to read. So I'm hoping this cleanup actually makes review easier 😉EDIT: This has been reverted now, with an alternate fix in place