- Notifications
You must be signed in to change notification settings - Fork3.8k
SSE alpha blitter optimization#3378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This depends on#3375, which should be merged first. |
Saves a net 3 CPU instructions in each double pixel operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM 👍
All seems to work in testing and I was able to follow the logic of the changes. I added a couple of minor suggestions but they don't change the actual code just comments around it that might be helpful for future intrinsic dabblers.
🍰 🎉
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Thanks for the suggestions and reviews@MyreMylar. There is now a much more in depth comment. I also found a mistaken "16 byte" reference that should have been "16 bit" |
| mm_alpha_mask_1), | ||
| _mm_and_si128(_mm_srli_si128(src1,5), | ||
| mm_alpha_mask_2)); | ||
| _mm_shufflelo_epi16(mm_src_alpha,0b11110101); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These are very helpful comments. Thanks.
I wonder if either of you have good docs in your code editor for these intrinsics?
It's possible to look them up, but would be quicker if it's shown in the editor.https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shufflelo_epi16&ig_expand=6448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Occasionally I see short text descriptions of the intrinsics in VS code (pulled from the header I believe), but mainly I just rely on the intel intrinsics guide for these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
🎉🎈thanks
Specifically in
alphablit_alpha_sse2_argb_no_surf_alpha_opaque_dstI found an opportunity to reduce the amount of CPU clocks / instructions needed to move the alpha component from the src pixels into an interleaved 16 bit format.
In my testing, this got 10k 512x512 alpha blits (of this format) from around 2 seconds to around 1.6 seconds, a 15-20% improvement.
This is my test program. Note that the "screen" surface has a bit depth of 32 but does not have per pixel alpha.