Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
dnn: parallelize nary elementwise forward implementation & enable related conformance tests#25630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
| double nstripes =getNumThreads(); | ||
| parallel_for_(Range(0, nplanes), worker, nstripes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nstripes = getNumThreads();
This should not be used.
Already discussed several months ago - e.g.#23047
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you for review but take it easy, this pr is still drafting. I still remember our discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Changed. Performance results are also updated.
asmorkalov commentedJun 10, 2024
My results with Jetson tk1 (armv7+neon): |
asmorkalov commentedJun 11, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
My results for Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz (no AVX2): |
fengyuentau commentedJun 12, 2024
Thank you@asmorkalov for adding more performance results :) |
fengyuentau commentedJun 14, 2024
Any review comments? |
asmorkalov commentedJun 19, 2024
The patch leads to significant OpenCL pipelines degradation, e.g.: I use NVIDIA GF 1080 for benchmark. Looks like the patch prevents some graph fusing or some inference optimization. |
fengyuentau commentedJun 19, 2024
Ok, I will take a look at the problem. |
fengyuentau commentedJun 24, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@asmorkalov The performance "degradation" is due to very out-of-date code base (>450 commits behind 4.x). I have updated the code base. Performance testings (on Intel UHD 770) seem to be okay on my side. Feel free to retest on your side. Thinking positively, we have achieved a lot performance boosting from those commits (OCL is ~4x faster and CPU is ~1.3x faster). Maybe I can add the OCL backend for this layer later :) |
asmorkalov commentedJun 28, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
perf-dnn.zip |
asmorkalov commentedJul 1, 2024
I also tried Xiaomi Mi 10 phone. The result is volatile (m.b. power management), but I do not see significant performance gain, besides NCHW_C_sum and NCHW_NCHW_pow. |
fengyuentau commentedJul 2, 2024
It is tuned to have multi-theading if input scale is large enough. Traditional convolutional nets do not have such a large input scale for elementwise layers. |
…_threaddnn: merge#25630 to 5.x#25900Sync changes from#25630 to 5.x.### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
Uh oh!
There was an error while loading.Please reload this page.
This PR introduces the following changes:
Performance
i7-12700K, RAM 64GB, Ubuntu 22.04
Apple M1, RAM 16GB, macOS 14.4.1
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.