Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tracking SSE2 Optimisations#3370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Starbuck5 merged 4 commits intopygame:mainfromPurityLake:issue-3358
Nov 5, 2022
Merged

Tracking SSE2 Optimisations#3370

Starbuck5 merged 4 commits intopygame:mainfromPurityLake:issue-3358
Nov 5, 2022

Conversation

@PurityLake
Copy link
Contributor

@PurityLakePurityLake commentedAug 2, 2022
edited
Loading

Refers to#3358

  • Copy Starbuck's optimization into simd_blitters_sse2.c
  • Find a way to use SSE2 Blitters explicitly
  • Benchmark SS2E code

This PR will be to track progress in optimising SSE2 Blitters based on Starbuck's AVX optimisations.

Currently have implemented the optimisations for SSE2.

@Starbuck5
Copy link
Contributor

I just tested how to use the SSE blitters on my recent PR. If you sayundef __AVX2__ at the top of simd_blitters_avx2.c, it will compile without the AVX ones so you can test the SSE ones.

PurityLake reacted with thumbs up emoji

@PurityLake
Copy link
ContributorAuthor

PurityLake commentedOct 9, 2022
edited
Loading

Did some testing with blitting a surface 10000 times using variousspecialflags when usingSurface.blit. As mentioned by@Starbuck5 using#undef __AVX2__ insimd_blitters_avx2.c disables AVX2 code and runs SSE2 in it's stread.

Here are the results

SSE2 without optimisations:

Testing BLEND_RGBA_ADD:9.718885699985549Testing BLEND_RGB_ADD:9.687931499996921Testing BLEND_RGBA_MULT:22.63133849998121Testing BLEND_RGB_MULT:20.664427600015188Testing BLEND_RGBA_SUB:9.188310300000012Testing BLEND_RGB_SUB:9.442354999977397Testing BLEND_RGBA_MAX:8.739872300007846Testing BLEND_RGB_MAX:9.480382699985057Testing BLEND_RGBA_MIN:8.817215299990494Testing BLEND_RGB_MIN:9.041105799988145

SSE2 with optimisations

Testing BLEND_RGBA_ADD:6.040507300000172Testing BLEND_RGB_ADD:6.040672200004337Testing BLEND_RGBA_MULT:6.910767900000792Testing BLEND_RGB_MULT:6.879146999999648Testing BLEND_RGBA_SUB:5.997558699978981Testing BLEND_RGB_SUB:6.124182700004894Testing BLEND_RGBA_MAX:6.010527999984333Testing BLEND_RGB_MAX:6.077654499997152Testing BLEND_RGBA_MIN:6.150629800016759Testing BLEND_RGB_MIN:6.141219600016484

Code used in testing:

importpygamefrompygame.localsimport*fromtimeitimportTimerimportrandomdefdo_the_blits(item,positions):forposinpositions:screen.blit(surface, (50,50),special_flags=item)pygame.init()width=800height=600screen=pygame.display.set_mode((width,height))surface=pygame.image.load("Pygame.png").convert_alpha()blend_types= {"BLEND_RGBA_ADD":BLEND_RGBA_ADD,"BLEND_RGB_ADD":BLEND_RGB_ADD,"BLEND_RGBA_MULT":BLEND_RGBA_MULT,"BLEND_RGB_MULT":BLEND_RGB_MULT,"BLEND_RGBA_SUB":BLEND_RGBA_SUB,"BLEND_RGB_SUB":BLEND_RGB_SUB,"BLEND_RGBA_MAX":BLEND_RGBA_MAX,"BLEND_RGB_MAX":BLEND_RGB_MAX,"BLEND_RGBA_MIN":BLEND_RGBA_MIN,"BLEND_RGB_MIN":BLEND_RGB_MIN,}BLITS_TO_DO=100000positions= [(random.randint(0,width-50),random.randint(0,height-50))for_inrange(BLITS_TO_DO)]defdo_the_blits(item):forposinpositions:screen.blit(surface, (50,50),special_flags=item)forkey,valueinblend_types.items():print(f"Testing{key}:")print(Timer(lambda:do_the_blits(value)).timeit(number=1))

@Starbuck5
Copy link
Contributor

PurityLake's test numbers as percentage improvements:

Testing BLEND_RGBA_ADD:46.6817%Testing BLEND_RGB_ADD:46.3774%Testing BLEND_RGBA_MULT:106.428%Testing BLEND_RGB_MULT:100.098%Testing BLEND_RGBA_SUB:42.0226%Testing BLEND_RGB_SUB:42.6321%Testing BLEND_RGBA_MAX:37.0071%Testing BLEND_RGB_MAX:44.7986%Testing BLEND_RGBA_MIN:35.6309%Testing BLEND_RGB_MIN:38.2008%

This is better than I was expecting. I'm especially suspicious of the MULT ones though, they seem too good to be true. I'll also see if I can replicate these testing numbers locally.

@Starbuck5
Copy link
Contributor

My Test Data

I used a 460 x 261 image as my surface basis.

Main (AVX)
Testing BLEND_RGBA_ADD:1.2763061Testing BLEND_RGB_ADD:1.3237208999999996Testing BLEND_RGBA_MULT:2.0318178999999996Testing BLEND_RGB_MULT:2.0239459Testing BLEND_RGBA_SUB:1.2739619999999992Testing BLEND_RGB_SUB:1.3502557999999993Testing BLEND_RGBA_MAX:1.2691035Testing BLEND_RGB_MAX:1.3137679999999996Testing BLEND_RGBA_MIN:1.2735774000000006Testing BLEND_RGB_MIN:1.2812029000000003
Main (SSE)
Testing BLEND_RGBA_ADD:2.1543634Testing BLEND_RGB_ADD:2.2162211999999997Testing BLEND_RGBA_MULT:5.3097915Testing BLEND_RGB_MULT:5.2853055000000015Testing BLEND_RGBA_SUB:2.1399480000000004Testing BLEND_RGB_SUB:2.2092721999999974Testing BLEND_RGBA_MAX:2.1506679999999996Testing BLEND_RGB_MAX:2.1953232000000007Testing BLEND_RGBA_MIN:2.1411617000000014Testing BLEND_RGB_MIN:2.1961531
This PR (SSE)
Testing BLEND_RGBA_ADD:1.7535319Testing BLEND_RGB_ADD:1.8411389000000002Testing BLEND_RGBA_MULT:4.414888299999999Testing BLEND_RGB_MULT:4.397666000000001Testing BLEND_RGBA_SUB:1.7444793Testing BLEND_RGB_SUB:1.8644549999999995Testing BLEND_RGBA_MAX:1.7534880999999984Testing BLEND_RGB_MAX:1.8469790999999987Testing BLEND_RGBA_MIN:1.7324574999999989Testing BLEND_RGB_MIN:1.8187704999999994
Percent improvement in SSE blit mode performance in this PR over main
Testing BLEND_RGBA_ADD:20.513932397318825%Testing BLEND_RGB_ADD:18.48898252831931%Testing BLEND_RGBA_MULT:18.40478490613132%Testing BLEND_RGB_MULT:18.334031035824083%Testing BLEND_RGBA_SUB:20.361750624088153%Testing BLEND_RGB_SUB:16.9288311696472%Testing BLEND_RGBA_MAX:20.34651739462986%Testing BLEND_RGB_MAX:17.234935645461356%Testing BLEND_RGBA_MIN:21.10192968890708%Testing BLEND_RGB_MIN:18.79899283762215%

In my testing, I see that this PR achieves a 15-20% performance improvement with these blit modes over main (when just using SSE), as expected. I believe@PurityLake forgot to turn AVX2 off for one of their tests.

Copy link
Contributor

@Starbuck5Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks good to me, thanks!

Copy link
Contributor

@MyreMylarMyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM 👍

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

2 more reviewers

@MyreMylarMyreMylarMyreMylar approved these changes

@Starbuck5Starbuck5Starbuck5 approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

PerformanceRelated to the speed or resource usage of the projectSurfacepygame.Surface

Projects

None yet

Milestone

2.2.0

Development

Successfully merging this pull request may close these issues.

Efficiently track pixels in SSE blitters

3 participants

@PurityLake@Starbuck5@MyreMylar

[8]ページ先頭

©2009-2025 Movatter.jp