Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
Add Neon optimised RGB2Lab conversion#19883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
alalek commentedApr 9, 2021
Marking thisRFC, because it doesn't follow OpenCV guidelines to avoid using of raw native intrinsics in OpenCV modules. |
jondea commentedApr 9, 2021
Hi@alalek, thank you for looking into this. We (me and@fpetrogalli) submitted it like this because we weren't sure what correct approach was. Also, sorry if this is an silly question, but what does it mean to mark it as RFC? One possible solution to the raw intrinsics is to keep the |
vpisarev commentedApr 15, 2021
@jondea, thank you for the contribution! I'd start with the first option that you suggested - keep the separate branch under CV_NEON, but rewrite it using HAL intrinsics. I briefly looked at the current implementation and I found it too bulky for the equivalent C code that it accelerates. So, I'm 60-80% sure that the HAL code that you write will be faster than the existing implementation not just on ARM, but on the other platforms as well. And then we will just replace that code with yours, i.e. remove |
jondea commentedApr 27, 2021
The changes have been rewritten to use just HAL intrinsics, any feedback would be appreciated. |
fpetrogalli left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
fpetrogalli left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, with a nit.
Final word on the maintainers, of course!
Thank you, Francesco
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>
jondea commentedMay 10, 2021
jondea commentedMay 19, 2021
vpisarev commentedMay 25, 2021
@jondea, thank you very much! I tested the code both on Mac-Intel and Mac-ARM (M1), it works well, the claimed acceleration is achieved. On Intel it's no slower than the previous version, but, unfortunately, it's 128-bit only. In any case, it can be merged as-is, and later we can modify this code to use some new variations of 👍 |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
fpetrogalli commentedMay 25, 2021
@vpisarev ,@jondea is working on an equivalent version that uses SVE2 intrinsics. He is using the intrinsic |
Uh oh!
There was an error while loading.Please reload this page.
A Neon specific implementation of RGB2Lab increases single threaded performance by ~25%, here's the numbers run on aws c6gd.4xlarge with gcc 9.3 (numbers are similar using gcc 10)
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.