Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

WIP: Vectorize cv::resize for INTER_AREA#23525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
vrabaud wants to merge9 commits intoopencv:4.xfromvrabaud:avif

Conversation

@vrabaud
Copy link
Contributor

This is just a vectorization of the original code. I'll make speed tests once this PR is reviewed.

Pull Request Readiness Checklist

See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

kallaballa reacted with thumbs up emoji
@asmorkalov
Copy link
Contributor

QR code decoding pipeline uses resize INTER_AREA internally. The python test crash may be caused by the optimization.

@asmorkalovasmorkalov added this to the4.8.0 milestoneApr 22, 2023
@vrabaudvrabaudforce-pushed theavif branch 7 times, most recently from192bbe0 toc5ffc7dCompareApril 27, 2023 09:34
@vrabaud
Copy link
ContributorAuthor

vrabaud commentedApr 27, 2023
edited
Loading

@asmorkalov , I am trying to get accurate perf results with "modules/ts/misc/run.py" followinghttps://github.com/opencv/opencv/wiki/HowToUsePerfTests. What options can I use to get more accurate results ? Like--perf_force_samples.

@vpisarev , to speed-up loading and float conversion, I copied code from

staticinlinevoidvx_load_as(const uchar* ptr, v_float32& a)
. Should that be made part of HAL ?

@vrabaudvrabaudforce-pushed theavif branch 2 times, most recently frombe4331c to2c2b901CompareApril 27, 2023 14:52
@vrabaud
Copy link
ContributorAuthor

BTW, I believe this is ready to be reviewed. Here are the speed-ups I get:

Geometric mean (ms)                       Name of Test                        imgproc imgproc  imgproc                                                               old     new      new                                                                                   vs                                                                                imgproc                                                                                old                                                                               (x-factor)CreateHanningWindow::CreateHanningWindowFixture::640x480    0.177   0.177     1.00   CreateHanningWindow::CreateHanningWindowFixture::1920x1080  1.066   1.060     1.01   Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.3)         0.900   0.491     1.83   Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.5)         0.019   0.019     1.00   Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.6)         1.230   0.219     5.61   Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.3)        0.501   0.416     1.21   Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.5)        0.063   0.056     1.13   Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.6)        0.405   0.185     2.18   Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.3)         2.084   0.579     3.60   Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.5)         0.134   0.135     0.99   Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.6)         4.189   0.576     7.27   Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.3)        0.797   0.442     1.80   Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.5)        0.756   0.757     1.00   Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.6)        0.632   0.630     1.00   Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.3)         2.849   0.750     3.80   Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.5)         0.098   0.099     0.99   Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.6)         4.795   0.799     6.00   Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.3)        1.255   0.542     2.32   Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.5)        0.236   0.236     1.00   Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.6)        0.990   0.898     1.10   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.3)        2.679   0.535     5.00   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.5)        0.053   0.138     0.38   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.6)        2.449   0.260     9.42   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.3)       1.507   0.410     3.67   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.5)       0.105   0.131     0.80   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.6)       0.468   0.260     1.80   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.3)        6.095   1.726     3.53   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.5)        0.307   0.189     1.63   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.6)        5.023   0.954     5.26   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.3)       2.379   1.812     1.31   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.5)       0.658   0.683     0.96   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.6)       0.852   1.414     0.60   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.3)        8.442   2.121     3.98   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.5)        0.150   0.131     1.14   Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.6)        6.653   1.670     3.98   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.3)       4.122   2.461     1.67   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.5)       0.243   0.253     0.96   Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.6)       1.630   2.636     0.62   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.3)       3.461   0.495     7.00   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.5)       0.029   0.076     0.38   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.6)       3.029   0.379     7.99   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.3)      1.504   0.484     3.11   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.5)      0.141   0.155     0.92   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.6)      0.687   0.407     1.69   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.3)       6.281   3.112     2.02   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.5)       0.190   0.235     0.81   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.6)       6.523   2.035     3.21   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.3)      3.496   3.736     0.94   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.5)      0.854   0.874     0.98   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.6)      1.294   3.213     0.40   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.3)       6.389   3.846     1.66   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.5)       0.200   0.135     1.49   Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.6)       9.901   2.996     3.30   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.3)      4.487   4.302     1.04   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.5)      0.435   0.438     0.99   Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.6)      2.919   3.718     0.79   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.3)       4.462   0.987     4.52   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.5)       0.058   0.055     1.06   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.6)       5.996   1.144     5.24   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.3)      3.119   1.562     2.00   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.5)      0.390   0.391     1.00   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.6)      2.273   1.701     1.34   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.3)      10.888   4.802     2.27   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.5)       0.381   0.348     1.09   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.6)      13.213   6.481     2.04   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.3)      4.749   6.773     0.70   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.5)      2.764   3.308     0.84   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.6)      4.155   8.645     0.48   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.3)      14.787   7.452     1.98   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.5)       0.443   0.539     0.82   Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.6)      17.201   9.058     1.90   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.3)      6.319   8.625     0.73   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.5)      3.057   3.578     0.85   Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.6)      6.035  10.741     0.56   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 1.3)   0.097   0.142     0.69   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 2.4)   0.237   0.227     1.04   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 3.4)   0.239   0.347     0.69   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 1.3)   0.117   0.183     0.64   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 2.4)   0.221   0.340     0.65   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 3.4)   0.449   0.444     1.01   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 1.3)  0.178   0.205     0.87   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 2.4)  0.324   0.431     0.75   ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 3.4)  0.688   0.654     1.05   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 1.3)   0.312   0.243     1.28   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 2.4)   0.499   0.458     1.09   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 3.4)   0.725   0.629     1.15   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 1.3)   0.291   0.311     0.93   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 2.4)   0.630   0.594     1.06   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 3.4)   1.012   1.009     1.00   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 1.3)  0.527   0.465     1.13   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 2.4)  0.991   1.080     0.92   ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 3.4)  1.746   1.856     0.94   ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 640x480, 2)      0.018   0.018     1.01   ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 960x540, 2)      0.032   0.035     0.90   ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1280x720, 2)     0.036   0.033     1.08   ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1920x1080, 2)    0.044   0.037     1.18   ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 640x480, 2)     0.035   0.034     1.03   ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 960x540, 2)     0.048   0.048     1.01   ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1280x720, 2)    0.049   0.038     1.29   ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1920x1080, 2)   0.057   0.058     0.98   ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 640x480, 2)      0.133   0.135     0.99   ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 960x540, 2)      0.131   0.134     0.97   ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1280x720, 2)     0.125   0.123     1.01   ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1920x1080, 2)    0.149   0.159     0.94   ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 640x480, 2)     0.127   0.132     0.96   ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 960x540, 2)     0.126   0.127     0.99   ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1280x720, 2)    0.117   0.123     0.96   ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1920x1080, 2)   0.151   0.156     0.97   ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 640x480, 2)      0.097   0.098     0.99   ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 960x540, 2)      0.084   0.101     0.84   ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1280x720, 2)     0.094   0.095     0.99   ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1920x1080, 2)    0.117   0.118     1.00   ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 640x480, 2)     0.133   0.144     0.93   ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 960x540, 2)     0.131   0.137     0.96   ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1280x720, 2)    0.125   0.126     0.99   ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1920x1080, 2)   0.177   0.188     0.95

@asmorkalovasmorkalov self-requested a reviewMay 23, 2023 10:12
@asmorkalov
Copy link
Contributor

Hello@vrabaud I made some research of your patch and result is very controversial. For my AMD Ryzen7 2700X it does not improve performance at all. The difference is comparable with statistics fluctuations. For Jetson NANO (arm v8 x64 by NVIDIA) I see performance speedup for small images. Looks like the speedup is related to more efficient cache reuse. HD images and larger with 3-4 channels become slower. For Jetson tk1 (armv7) I see the same behavior, but size threshold is even lower than for Jetson NANO. Most of resolutions have degradation.

I attached archive with all experiments (xml).
resize_perf.zip

Copy link
Contributor

@asmorkalovasmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The PR does not demonstrate stable performance improvement.

@asmorkalovasmorkalov removed this from the4.8.0 milestoneMay 23, 2023
@vrabaud
Copy link
ContributorAuthor

Thx for checking. Let me try two things:

  • a load/store using a step for the second pass so that we do not do two transpositions
  • a vectorization of the multiplication but not the summation. That should beat the loop unrolling at least.
asmorkalov reacted with thumbs up emoji

@asmorkalov
Copy link
Contributor

@vrabaud do you plan to work on it or I may close the PR?

@vrabaudvrabaud changed the titleVectorize cv::resize for INTER_AREAWIP: Vectorize cv::resize for INTER_AREASep 20, 2023
@vrabaud
Copy link
ContributorAuthor

I renamed it as WIP to avoid confusion. I can close it and re-open later if you prefer. I thought this might get others interested in the meantime.

asmorkalov reacted with thumbs up emoji

@opencv-alalek
Copy link
Contributor

@vrabaud BTW, there isDraft feature for PRs:

Still in progress? Convert to draft under "Reviewers" section

@vrabaudvrabaud marked this pull request as draftSeptember 22, 2023 09:37
@vrabaudvrabaud mentioned this pull requestOct 16, 2023
6 tasks
asmorkalov pushed a commit that referenced this pull requestOct 19, 2023
Speed up line merging in INTER_AREA#24412This provides a 10 to 20% speed-up.Related perf testfix:#24417This is a split of#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov
Copy link
Contributor

@vrabaud Should I close this after#24412 merge?

@vrabaud
Copy link
ContributorAuthor

Actually no: the other PR is just about line merging. I will now specialize this one for column merging. Almost done :)

asmorkalov reacted with thumbs up emoji

IskXCr pushed a commit to Haosonn/opencv that referenced this pull requestDec 20, 2023
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull requestJan 4, 2024
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull requestMay 29, 2024
Speed up line merging in INTER_AREAopencv#24412This provides a 10 to 20% speed-up.Related perf testfix:opencv#24417This is a split ofopencv#23525 that will be updated to only deal with column merging.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov
Copy link
Contributor

@vrabaud Is the PR still relevant?

@vrabaud
Copy link
ContributorAuthor

#24412 actually got most of the speed-up. Doing the transpose here is too slow. A last solution is to first pack the image vertically, then process the columns in parallel. But the load/store are slow as the memory is not contiguous, for operations that are already partially parallel for 2,3,4 channels. parts of the registers are not used in those cases but that does not bring any measurable speed-up.
Stopping here.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@asmorkalovasmorkalovasmorkalov requested changes

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@vrabaud@asmorkalov@opencv-alalek

[8]ページ先頭

©2009-2025 Movatter.jp