Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SSE alpha blitter optimization#3378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
illume merged 2 commits intomainfromsse-alpha-blitter-enhancements
Sep 13, 2022
Merged

Conversation

@Starbuck5
Copy link
Contributor

Specifically inalphablit_alpha_sse2_argb_no_surf_alpha_opaque_dst

I found an opportunity to reduce the amount of CPU clocks / instructions needed to move the alpha component from the src pixels into an interleaved 16 bit format.

In my testing, this got 10k 512x512 alpha blits (of this format) from around 2 seconds to around 1.6 seconds, a 15-20% improvement.

importpygameimporttimepygame.init()screen=pygame.Surface((1920,1080),depth=32)screen.fill((255,0,23))surf=pygame.Surface((512,512),pygame.SRCALPHA)surf.fill((22,156,77,192))print(screen,surf)start=time.time()for_inrange(10000):screen.blit(surf, (51,67))print(time.time()-start)

This is my test program. Note that the "screen" surface has a bit depth of 32 but does not have per pixel alpha.

Zireael07, ankith26, and illume reacted with rocket emoji
@Starbuck5
Copy link
ContributorAuthor

This depends on#3375, which should be merged first.

Saves a net 3 CPU instructions in each double pixel operation.
Copy link
Contributor

@MyreMylarMyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM 👍

All seems to work in testing and I was able to follow the logic of the changes. I added a couple of minor suggestions but they don't change the actual code just comments around it that might be helpful for future intrinsic dabblers.

🍰 🎉

@Starbuck5
Copy link
ContributorAuthor

Thanks for the suggestions and reviews@MyreMylar.

There is now a much more in depth comment. I also found a mistaken "16 byte" reference that should have been "16 bit"

MyreMylar reacted with thumbs up emoji

@illumeillume removed the Awaiting MergeFor PRs that have at least one approving review and can be merged on subsequent reviews labelSep 13, 2022
mm_alpha_mask_1),
_mm_and_si128(_mm_srli_si128(src1,5),
mm_alpha_mask_2));
_mm_shufflelo_epi16(mm_src_alpha,0b11110101);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These are very helpful comments. Thanks.

I wonder if either of you have good docs in your code editor for these intrinsics?

It's possible to look them up, but would be quicker if it's shown in the editor.https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shufflelo_epi16&ig_expand=6448

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Occasionally I see short text descriptions of the intrinsics in VS code (pulled from the header I believe), but mainly I just rely on the intel intrinsics guide for these things.

@illumeillume added this to the2.1.3 milestoneSep 13, 2022
Copy link
Member

@illumeillume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

🎉🎈thanks

@illumeillume merged commitbae1d72 intomainSep 13, 2022
@illumeillume deleted the sse-alpha-blitter-enhancements branchSeptember 13, 2022 10:16
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@illumeillumeillume approved these changes

+1 more reviewer

@MyreMylarMyreMylarMyreMylar approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

PerformanceRelated to the speed or resource usage of the projectSurfacepygame.Surface

Projects

None yet

Milestone

2.1.3

Development

Successfully merging this pull request may close these issues.

4 participants

@Starbuck5@illume@MyreMylar

[8]ページ先頭

©2009-2025 Movatter.jp