Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SIMD versions of RGB_ADD, RGBA_ADD, RGB_MUL & RGBA_MUL#3170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
illume merged 3 commits intopygame:mainfromMyreMylar:rgb_rgba_add_sub_simd
May 16, 2022

Conversation

@MyreMylar
Copy link
Contributor

Using the same blueprint as in the RGBA_MUL/RGB_MUL pull requests, this one adds, the much simpler to create, SIMD versions of the ADD and MUL special blend modes for blitting in SSE2 and AVX2 flavours.

If you found the SIMD code tricky to understand in the other two PRs this one is a lot more approachable - especially for the RGBA flavours as the 'active' code that actually blends the pixels from one surface with another is just one instruction, helpfully named exactly the same thing as the blend mode. See:

mm_dst = _mm_adds_epu8(mm_dst, mm_src);

which as intel describes it does:

Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.

Saturation meaning the values for each pixel are clamped between 0 and 255. In SSE2 we have 128 bit registers - so with four channel values between 0 and 255 in each pixel (red, green, blue and alpha) we can see that 128 divided by 4, then divided by 8 bits per channel, we get four pixels worth of data in each register and can thus speed through our adding operation pretty swiftly.

Everything else is just setup to get us to that single line that does the blend.

AVX2 version works the same except we can do 8 pixels at a time due to the larger registers.

There is one more of these PRs, after this one, for the remaining special effect blend modes MIN & MAX.

N.B.
I expect there is some future improvement work to be done on the standard alpha blend and the pre-multiplied blend, but I'd like to see how these special effect modes work out in user land first - plus I would need some time to work on those improvements when work is less busy :)

illume, ankith26, and s0lst1ce reacted with hooray emoji
@itzpr3d4t0r
Copy link
Contributor

super cool! is the AVX2 version available to all or is it platform/hardware dependant?

@MyreMylar
Copy link
ContributorAuthor

MyreMylar commentedMay 8, 2022
edited
Loading

super cool! is the AVX2 version available to all or is it platform/hardware dependant?

AVX2 is available on x86/x64 which is basically all windows, pre-Arm Macs and linux flavours on that hardware. Arm platforms should still get a ported SSE2 version via SSE2neon.

Eventually we may try an upgrade from SSE2neon to SIMDe (https://github.com/simd-everywhere/simde) which ports AVX2 instructions to arm as well, but I wouldn't want to attempt such a switchover without an Arm development platform, something I don't have, but perhaps some future contributor will have as Arm Macs become more common.

Copy link
Member

@illumeillume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

wonderful

@illume
Copy link
Member

Based on how long the last PR stayed up here... I'm not sure how many people want to read through this low level SIMD code. So I'll just merge this one now rather than wait for another reviewer.

@illumeillume merged commit1b270d8 intopygame:mainMay 16, 2022
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@illumeillumeillume approved these changes

Assignees

No one assigned

Labels

PerformanceRelated to the speed or resource usage of the projectSurfacepygame.Surface

Projects

None yet

Milestone

2.1.3

Development

Successfully merging this pull request may close these issues.

4 participants

@MyreMylar@itzpr3d4t0r@illume@ankith26

[8]ページ先頭

©2009-2025 Movatter.jp