- Notifications
You must be signed in to change notification settings - Fork3.8k
SIMD versions of RGB_ADD, RGBA_ADD, RGB_MUL & RGBA_MUL#3170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
super cool! is the AVX2 version available to all or is it platform/hardware dependant? |
MyreMylar commentedMay 8, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
AVX2 is available on x86/x64 which is basically all windows, pre-Arm Macs and linux flavours on that hardware. Arm platforms should still get a ported SSE2 version via SSE2neon. Eventually we may try an upgrade from SSE2neon to SIMDe (https://github.com/simd-everywhere/simde) which ports AVX2 instructions to arm as well, but I wouldn't want to attempt such a switchover without an Arm development platform, something I don't have, but perhaps some future contributor will have as Arm Macs become more common. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
wonderful
Based on how long the last PR stayed up here... I'm not sure how many people want to read through this low level SIMD code. So I'll just merge this one now rather than wait for another reviewer. |
Using the same blueprint as in the RGBA_MUL/RGB_MUL pull requests, this one adds, the much simpler to create, SIMD versions of the ADD and MUL special blend modes for blitting in SSE2 and AVX2 flavours.
If you found the SIMD code tricky to understand in the other two PRs this one is a lot more approachable - especially for the RGBA flavours as the 'active' code that actually blends the pixels from one surface with another is just one instruction, helpfully named exactly the same thing as the blend mode. See:
which as intel describes it does:
Saturation meaning the values for each pixel are clamped between 0 and 255. In SSE2 we have 128 bit registers - so with four channel values between 0 and 255 in each pixel (red, green, blue and alpha) we can see that 128 divided by 4, then divided by 8 bits per channel, we get four pixels worth of data in each register and can thus speed through our adding operation pretty swiftly.
Everything else is just setup to get us to that single line that does the blend.
AVX2 version works the same except we can do 8 pixels at a time due to the larger registers.
There is one more of these PRs, after this one, for the remaining special effect blend modes MIN & MAX.
N.B.
I expect there is some future improvement work to be done on the standard alpha blend and the pre-multiplied blend, but I'd like to see how these special effect modes work out in user land first - plus I would need some time to work on those improvements when work is less busy :)