Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
Improve and refactor softmax layer#24466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
WanliZhong commentedOct 29, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The performance test result was updated, the speed increase is very obvious. BTW, I am not sure why windows CI failed, seems like it's not related to this PR. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
fengyuentau commentedOct 30, 2023
Please take a look at the failed log from |
fengyuentau commentedOct 30, 2023
@asmorkalov This build is actually failed but somehow the workflow did not catch a failed signal and it continued:https://github.com/opencv/opencv/actions/runs/6682987045/job/18158738007?pr=24466. It seems |
asmorkalov commentedOct 30, 2023
Windows: |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
asmorkalov commentedOct 30, 2023
Windows: |
WanliZhong commentedOct 30, 2023
Thanks@asmorkalov. I found the code will throw |
asmorkalov commentedOct 30, 2023
I just tried armv7 configuration locally. It produces the following warning (ubuntu 16.04): |
WanliZhong commentedOct 30, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@asmorkalov That's because the operators |
asmorkalov commentedOct 30, 2023
Armv7 (Jetson-tk1) perf results with and without NEON: |
Uh oh!
There was an error while loading.Please reload this page.
asmorkalov commentedOct 30, 2023
Jetson Tk1 with 2 GBs of RAM: |
WanliZhong commentedOct 30, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The performance test has a large input with |
WanliZhong commentedOct 30, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The error on windows because a marco was defined as |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
vpisarev commentedNov 1, 2023
@WanliZhong, excellent job, great acceleration numbers! As we discussed, please, refactor the code to reduce code duplication. Then we will gladly merge it. |
cbf0474 to790da1bCompareWanliZhong commentedNov 2, 2023
Update: As discuss with Vadim, I only use the universal intrinsics to accelerate the softmax layer. The results show that even faster than implementing it individually on each platform. Note: Added performance tests on different axis. The test results show some cases are slower than before, especially with small size softmax and 0 or 1 axis. |
Uh oh!
There was an error while loading.Please reload this page.
WanliZhong commentedNov 2, 2023
I have no idea why this error occur in some platforms. /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:78:32: error:'cv::hal_baseline::v_float32x4::<unnamed enum> cv::hal_baseline::v_float32x4::nlanes' is private within this context 78| size_t nlanes = v_float32::nlanes;| ^~~~~~In file included from /home/ci/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:221, from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.hpp:15, from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:13:/home/ci/opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:301:12: note: declared private here 301| enum { nlanes = 4 };| ^~~~~~ |
asmorkalov commentedNov 3, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
OpenCV migrated to new Universal Intrinsics approach to support scalable intrinsics like RISC-V RVV. The size of vector is not defined in compile time and may be different in runtime. You need to replace:
|
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Enable softmax layer vectorization on RISC-V RVV#24510 Related:#24466### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer* fix building error* compatible region layer* fix axisStep when disable SIMD* fix dynamic array* try to fix error* use nlanes from VTraits* move axisBias to srcOffset* fix bug caused by axisBias* remove macro* replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVVopencv#24510 Related:opencv#24466### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer* fix building error* compatible region layer* fix axisStep when disable SIMD* fix dynamic array* try to fix error* use nlanes from VTraits* move axisBias to srcOffset* fix bug caused by axisBias* remove macro* replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVVopencv#24510 Related:opencv#24466### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer* fix building error* compatible region layer* fix axisStep when disable SIMD* fix dynamic array* try to fix error* use nlanes from VTraits* move axisBias to srcOffset* fix bug caused by axisBias* remove macro* replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVVopencv#24510 Related:opencv#24466### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
Uh oh!
There was an error while loading.Please reload this page.
This PR improves softmax fromficus nn.
Performance Test result (use min value and Muti-threads):
macOS M2
UbuntuIntel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
Ubuntu Loongnix