Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
Use intrinsics forcvRound on x86_64__GNUC__ (clang/gcc linux) too.#24001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Use intrinsics forcvRound on x86_64__GNUC__ (clang/gcc linux) too.#24001
Conversation
asmorkalov commentedJul 18, 2023
@legrosbuffle Thanks a lot for the contribution. I made quick benchmark of imgproc module with and without the patch and outcome is the following:
|
asmorkalov commentedJul 18, 2023
@legrosbuffle could you describe your use case? |
legrosbuffle commentedJul 18, 2023
We're mostly interested in Diff between base and patch on machines we care about: |
vpisarev commentedJul 19, 2023
@legrosbuffle, thanks for PR. We just want to give compiler more chances to vectorize code. I believe, with higher-level intrinsics like __builtin_lrintf() the chances are higher than with manually embedded scalar SSE2 intrinsics. Another reason is that the code may be compiled with AVX2 or AVX512 and then those SSE2 intrinsics would look weird. May we get more detailed information about your test environment:
|
legrosbuffle commentedJul 19, 2023
Do you have an example where the buitin version vectorizes ? Neither GCC nor clang seem to vectorize any version of
When building with AVX2, the compiler will select the proper encoding (see godbolt link).
Google's linux disrib. Results are similar ongLinux and prod.
We're using
Skylake
HEAD
The only relevant one is |
…linux) too.We've measured a 7x improvement in speed for `cvRound` using the intrinsic.
23118c9 to3cce299Compareasmorkalov commentedJul 21, 2023
@legrosbuffle Thanks a lot for the patch. We discussed the solution on OpenCV core team meeting and decide to merge it. I removedx86_64 check to cover x86 32-bit configuration too. I'll merge it as soon as CI passed. |
asmorkalov left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
👍
There is no reason to enable this only on windows. We've measured a 7x improvement in speed for
cvRoundusing the intrinsic.Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.