Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
WIP: Vectorize cv::resize for INTER_AREA#23525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
6d8841d toc0c11f1Compareasmorkalov commentedApr 22, 2023
QR code decoding pipeline uses resize INTER_AREA internally. The python test crash may be caused by the optimization. |
192bbe0 toc5ffc7dComparevrabaud commentedApr 27, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@asmorkalov , I am trying to get accurate perf results with "modules/ts/misc/run.py" followinghttps://github.com/opencv/opencv/wiki/HowToUsePerfTests. What options can I use to get more accurate results ? Like @vpisarev , to speed-up loading and float conversion, I copied code from opencv/modules/core/src/convert.hpp Line 16 in6dbc5e0
|
be4331c to2c2b901Comparevrabaud commentedApr 28, 2023
BTW, I believe this is ready to be reviewed. Here are the speed-ups I get: |
asmorkalov commentedMay 23, 2023
Hello@vrabaud I made some research of your patch and result is very controversial. For my AMD Ryzen7 2700X it does not improve performance at all. The difference is comparable with statistics fluctuations. For Jetson NANO (arm v8 x64 by NVIDIA) I see performance speedup for small images. Looks like the speedup is related to more efficient cache reuse. HD images and larger with 3-4 channels become slower. For Jetson tk1 (armv7) I see the same behavior, but size threshold is even lower than for Jetson NANO. Most of resolutions have degradation. I attached archive with all experiments (xml). |
asmorkalov left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The PR does not demonstrate stable performance improvement.
vrabaud commentedMay 23, 2023
Thx for checking. Let me try two things:
|
asmorkalov commentedSep 20, 2023
@vrabaud do you plan to work on it or I may close the PR? |
vrabaud commentedSep 20, 2023
I renamed it as WIP to avoid confusion. I can close it and re-open later if you prefer. I thought this might get others interested in the meantime. |
opencv-alalek commentedSep 21, 2023
@vrabaud BTW, there is
|
Speed up line merging in INTER_AREA#24412This provides a 10 to 20% speed-up.Related perf testfix:#24417This is a split of#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
asmorkalov commentedOct 20, 2023
vrabaud commentedOct 20, 2023
Actually no: the other PR is just about line merging. I will now specialize this one for column merging. Almost done :) |
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
asmorkalov commentedMar 14, 2025
@vrabaud Is the PR still relevant? |
vrabaud commentedMar 20, 2025
#24412 actually got most of the speed-up. Doing the transpose here is too slow. A last solution is to first pack the image vertically, then process the columns in parallel. But the load/store are slow as the memory is not contiguous, for operations that are already partially parallel for 2,3,4 channels. parts of the registers are not used in those cases but that does not bring any measurable speed-up. |
This is just a vectorization of the original code. I'll make speed tests once this PR is reviewed.
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.