Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

blitters fix compile error when no SIMD#3498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
illume merged 1 commit intomainfromankith26-simd-cleanups-fixes
Oct 16, 2022

Conversation

@ankith26
Copy link
Contributor

@ankith26ankith26 commentedOct 12, 2022
edited
Loading

I'm making this PR to tackle the ppc64le and s390x compile fails from#3496

Basically, after the recent SIMD changes, the code fails to compile whenboth avx2 and sse2 (or neon on ARM) are missing. Currently, the avx2 functions are always defined, but when avx2 is missing at compile time, the sse2 fallback is used without testing for (compile time) sse2 presence.

As usual, while I was looking into fixing this, the urge to do related cleanups was also strong 😓 and I ended up cleaning up some repetition, and the like. Without doing this, the changes I wanted to make would only increase the ifdef hell in those parts of the code and it was getting fairly hard to read. So I'm hoping this cleanup actually makes review easier 😉
EDIT: This has been reverted now, with an alternate fix in place

Starbuck5
Starbuck5 previously requested changesOct 12, 2022
Copy link
Contributor

@Starbuck5Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Look, I know there are lots of opportunities for cleanups around the SIMD code, but please separate the essential changes from the cleanups in this PR.

@ankith26
Copy link
ContributorAuthor

If I do that, it would make the diff smaller/easier, but the contents of the alphablit file would get more messier as stated in the opening comment.

But if you insist, sure I'll separate the cleanups bits of this PR into a separate branch which I can PR post 2.1.3

@Starbuck5
Copy link
Contributor

Starbuck5 commentedOct 13, 2022
edited
Loading

But if you insist, sure I'll separate the cleanups bits of this PR into a separate branch which I can PR post 2.1.3

I do insist. There may be an easier way to do it than you're thinking, but I can't tell without seeing the actual changes on their own.

Thanks for working on fixing this! I just don't want to have to parse a 900 line diff when we're in a merge freeze.

@ankith26
Copy link
ContributorAuthor

Oh well, this PR is broken even in the current state. The main problem I just figured, is that we can't easily do compile-time AVX2 checks outside of files compiled with AVX2 (which ATM is onlysimd_blitters_avx2.c). To all other files__AVX2__ is not defined by the compiler.

@ankith26
Copy link
ContributorAuthor

ankith26 commentedOct 13, 2022
edited
Loading

I have now implemented a fix, but I'm not quite a fan. Now if AVX2 is not available at compile time, a runtime fallback is opened up. Due to this change, avx2 -> non-simd fallback is possible as well.

This is the smallest-diff solution I can come up with. Another idea involves restructuring many more things.

@ankith26
Copy link
ContributorAuthor

This PR has been testing on the ppc64le/s390x branch where the codebase now compiles after cherry picking this commit. Also locally tested on windows withblend_fill andblit_blends examples to make sure that the results are consistent with pre-PR. The performance doesn't seems to have been impacted, and the results are also the same

@Starbuck5
Copy link
Contributor

when avx2 is missing at compile time, the sse2 fallback is used without testing for sse2 presence.

I don't think this is true.

SDL_HasAVX2() is a runtime check. If AVX2 was missing at compile but present at runtime, it diverts to the SSE2 implementation, which is guaranteed to be present if runtime AVX2 is.

@ankith26
Copy link
ContributorAuthor

it diverts to the SSE2 implementation, which is guaranteed to be present if runtime AVX2 is.

Yeah, SSE2 is guaranteed to be present atruntime if we already confirmed avx2 is available at runtime but not compiled. But is it also guaranteed to be compiled in? This PR is a compilation error fix like I mentioned in my opening comment. The previous implementation assumed sse2 was presentat compile time when avx2 was missing atcompile time which is ofc, true for intel platforms of today where sse2 is almost universally available, but untrue for all the other platforms. It just didn't fail on our ARM CI because sse2neon header we use, so it only became apparent when I tried to add ppc64le and s360x CI. If you want to see the run that failed due to the issue this PR is fixing, it's thishttps://github.com/pygame/pygame/actions/runs/3232565797

@Starbuck5
Copy link
Contributor

Isn't the solution then to have the AVX2 blitter functions fall all the way back to the normal C implementations when SSE is also not available at compile time? Or have the AVX2 blitter functions compile to no ops? It's hard to imagine a system where it is compiled without SSE or AVX and then run with AVX.

@ankith26
Copy link
ContributorAuthor

Isn't the solution then to have the AVX2 blitter functions fall all the way back to the normal C implementations when SSE is also not available at compile time?

That's exactly what this PR is doing, but in thealphablit file and not the avx file. Because including the non-simd functions in the avx file would need more code reworking and generate a larger diff.

Or have the AVX2 blitter functions compile to no ops? It's hard to imagine a system where it is compiled without SSE or AVX and then run with AVX.

That is an alternative I considered. I also agree with the second part since it sounds like a really rare case, but it's probably notimpossible. If someone decides to compile pygame on an old 32-bit intel CPU that doesn't support sse2, or alternatively, an old compiler that does not have sse2 on by default and runs it on a modern CPU which has AVX2. In both these cases, having a warning and falling back to the non-simd path is better than having a no-op

@Starbuck5
Copy link
Contributor

Starbuck5 commentedOct 15, 2022
edited
Loading

I feel like there must be a better way.

Is there a reason that the AVX2 function calls couldn't be completely ifdefed out if AVX isn't defined at compile time like the SSE ones are?

Edit: oh, because AVX2 preprocessor is only defined in the avx2 blitter file, duh.

Edit edit: New idea, PG_HasAVX2() function defined in simd_blitters_avx2. It checks both the runtime presence of AVX support as well as whether pygame was compiled to support it. When AVX2 is not available at compile time, the avx2 blitter functions compile to no-ops. Using PG_HasAVX2() instead of SDL_HasAVX2() guarantees the no-ops will never be called, it always falls down to the next level (SSE, if available, then C) all in alphablit.c

Only thing we lose in that is the runtime warnings when using unoptimized pygame.

@ankith26
Copy link
ContributorAuthor

I feel like there must be a better way.

I know there are better ways, but you asked for a minimal diff and a safe PR and this is the best I can come up with :)

Your edit edit sounds very much like my original implementation, good idea except it didn't work. The alphablit file can't use macros defined in the avx2 C file. If those macros are defined in the header and the header is used by alphablit file,__AVX2__ is invisible there as well. The only "fix" for this I can thing of is compiling alphablit with avx2 at compile time, which is a bad and risky idea, definitely not worth doing just before a release. Probably someone like MyreMylar knows the details of this better, but all I know is that it can mess up unrelated stuffs, and is ofc, going to give a larger diff than this PR which you complained about in the first place 😆

@ankith26
Copy link
ContributorAuthor

ankith26 commentedOct 15, 2022
edited
Loading

So starbuck and I discussed this on discord and I properly understood the proposal he made in his latest comment, and I liked it as well (The key here is that we use a pygame avx checkfunction and not a macro. This function returns 0 when avx is not compiled in)

I've also moved the simd warnings to surface init (and also added some warnings for sse2 and neon) in a separate commit, which I'm open to postponing to a future PR if other reviewers would prefer
EDIT: I've moved that to a separate PR

Copy link
Contributor

@MyreMylarMyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, I chatted about this with@ankith26 on discord and understand the original compiler errors and the resulting change now.

It is basically a sidestep around an issue with some badly scoped #else define blocks (by me), these just did not work when__SSE2__ was not defined at compile time.

Previously we used a fallback in the case ofruntime == AVX2 but compiled with only SSE2. Now we just force the runtime checks to fail with compiler defines if there is no compile time support for AVX2 or SSE2 so the code branches down the right path earlier on rather than in the AVX2 file.

Overall, I think it is a simpler solution that should scale better if we add AVX512 or beyond support in the future. Probably reduces the chance of any rogue AVX2 machine code escaping the avx2.c file too.

ankith26 reacted with hooray emoji
@illume
Copy link
Member

Cool this should fix ppc64le and s390x.

I wonder how you tested this? Also wondered about adding tests to gh actions...

The intel emulator (for win, linux, mac) is quite nice and seems very convenient for testing.
https://stackoverflow.com/questions/55762372/disabling-avx2-in-cpu-for-testing-purposes

Theoretically, this should work to test no-avx2 available:

$ sde64 -snb -- python -m pygame.tests

Dunno if it'll work on github actions.

Qemu can take params like-cpu host,-sse4.1,-sse4.2

qemu-system-x86_64 -enable-kvm -cpu qemu64,+ssse3,+sse4.1,+sse4.2

https://ahelpme.com/howto/qemu-full-virtualization-cpu-emulations-enable-disable-cpu-flags-instruction-sets/

Also saw Virtualbox (and vagrant) can disable instruction sets.

> VBoxManage.exe setextradata"Dev" VBoxInternal/CPUM/IsaExts/AVX2 0

Vagrant:

Vagrant.configure('2') do |config|  config.vm.provider :virtualbox do |vb|    vb.customize ['setextradata', :id, 'VBoxInternal/CPUM/IsaExts/AVX', '0']    vb.customize ['setextradata', :id, 'VBoxInternal/CPUM/IsaExts/AVX2', '0']    vb.customize ['modifyvm', :id, '--ioapic', 'on']  endend
ankith26 reacted with rocket emoji

@ankith26
Copy link
ContributorAuthor

ankith26 commentedOct 16, 2022
edited
Loading

I have a WIP PR adding s390x and ppc64le to CI (which I'm confident of getting working) in#3496 (that PR prompted this PR in the first place)

I tested this PR by cherry picking a commit on there (but now that branch has an older version of this PR)

I could also test the no-avx2 and no-sse2 cases on windows by manually undef-ing the relevant macros.

illume reacted with thumbs up emoji

@illumeillume added this to the2.1.3 milestoneOct 16, 2022
@illumeillume added the bug labelOct 16, 2022
@ankith26
Copy link
ContributorAuthor

That info about vagrant and qemu is definitely useful and interesting, thanks! I have had issues with qemu locally in the past, but never tried vagrant yet

@illume
Copy link
Member

Cool.

I'll add a new issue about having tests for different SIMD instructions being available. Probably running the tests an extra time no AVX2 under the intel sde64 emulator on each of (mac, linux, windows) is enough.

Copy link
Member

@illumeillume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hopefully everyone is ok with this now. Mergey mergey.

Thanks@ankith26 :)

@illumeillume merged commit559b013 intomainOct 16, 2022
@illumeillume deleted the ankith26-simd-cleanups-fixes branchOctober 16, 2022 20:56
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@illumeillumeillume approved these changes

+2 more reviewers

@MyreMylarMyreMylarMyreMylar approved these changes

@Starbuck5Starbuck5Starbuck5 left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

bugSurfacepygame.Surface

Projects

None yet

Milestone

2.1.3

Development

Successfully merging this pull request may close these issues.

5 participants

@ankith26@Starbuck5@illume@MyreMylar

[8]ページ先頭

©2009-2025 Movatter.jp