Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

dnn (cuda): support broadcasting if a.rank() != b.rank()#24834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@fengyuentau
Copy link
Member

@fengyuentaufengyuentau commentedJan 9, 2024
edited
Loading

Inspired by#24786. This PR keeps the fusion ofNaryEltwise andConcat while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().

Resolves#23977
Resolves#24606
Resolves#24635
Resolves#24721

Pull Request Readiness Checklist

See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentaufengyuentau added bug category: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contrib category: dnn labelsJan 9, 2024
@fengyuentaufengyuentau added this to the4.10.0 milestoneJan 9, 2024
@fengyuentau
Copy link
MemberAuthor

Tried to add yolov8n to test on different backends, but turns out we may have more problems, especially in CUDA_FP16 target:

[ RUN      ] DNNTestNetwork.YOLOv8n/0, where GetParam() = CUDA/CUDA/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 0.0010376 vs 0.0001First run  |ref| = 638.03076171875/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 0.00109863 vs 0.0001Second run  |ref| = 638.5064697265625[  FAILED  ] DNNTestNetwork.YOLOv8n/0, where GetParam() = CUDA/CUDA (288 ms)[ RUN      ] DNNTestNetwork.YOLOv8n/1, where GetParam() = CUDA/CUDA_FP16/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:76: FailureExpected: (normL1) <= (l1), actual: 0.0118579 vs 0.004First run  |ref| = 638.03076171875/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 6.54901 vs 0.02First run  |ref| = 638.03076171875/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:76: FailureExpected: (normL1) <= (l1), actual: 0.0119177 vs 0.004Second run  |ref| = 638.5064697265625/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 8.83636 vs 0.02Second run  |ref| = 638.5064697265625[  FAILED  ] DNNTestNetwork.YOLOv8n/1, where GetParam() = CUDA/CUDA_FP16 (601 ms)[ RUN      ] DNNTestNetwork.YOLOv8n/2, where GetParam() = OCV/OCL[ WARN:0@1.061] global ocl4dnn_conv_spatial.cpp:1931 loadTunedConfig OpenCV(ocl4dnn): consider to specify kernel configuration cache directory through OPENCV_OCL4DNN_CONFIG_PATH parameter.OpenCL program build log: dnn/dummyStatus -11: CL_BUILD_PROGRAM_FAILURE-cl-no-subgroup-ifpError in processing command line: Don't understand command line argument "-cl-no-subgroup-ifp"!/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 0.00161743 vs 0.0001First run  |ref| = 638.03076171875/workspace/cuda_naryeltwise_broadcast/opencv/modules/dnn/test/test_common.impl.hpp:79: FailureExpected: (normInf) <= (lInf), actual: 0.00117493 vs 0.0001Second run  |ref| = 638.5064697265625[  FAILED  ] DNNTestNetwork.YOLOv8n/2, where GetParam() = OCV/OCL (3184 ms)[ RUN      ] DNNTestNetwork.YOLOv8n/3, where GetParam() = OCV/OCL_FP16[       OK ] DNNTestNetwork.YOLOv8n/3 (545 ms)[----------] 4 tests from DNNTestNetwork (4618 ms total)[----------] Global test environment tear-down[==========] 4 tests from 1 test case ran. (4618 ms total)[  PASSED  ] 1 test.[  FAILED  ] 3 tests, listed below:[  FAILED  ] DNNTestNetwork.YOLOv8n/0, where GetParam() = CUDA/CUDA[  FAILED  ] DNNTestNetwork.YOLOv8n/1, where GetParam() = CUDA/CUDA_FP16[  FAILED  ] DNNTestNetwork.YOLOv8n/2, where GetParam() = OCV/OCL

@Abdurrahheem
Copy link
Contributor

@fengyuentau once this PR is complete (currently yolov8 is not supported on CUDA here, AFAK) does it mean that PR#24786 is going be obsolete?

@fengyuentau
Copy link
MemberAuthor

does it mean that PR#24786 is going be obsolete?

Yes.

currently yolov8 is not supported on CUDA here

It's not true. There are some minor differences in the results between CPU and CUDA/CUDA, which is OK I think, but the differences are much bigger when it comes to the CUDA_FP16 target. I guess we lose some accuracy inSigmoid and such. Need an in-depth investigation.

@asmorkalov
Copy link
Contributor

Locally I observe several test failures like this:

[----------] 1 test from Layer_Test_Eltwise_bcast[ RUN      ] Layer_Test_Eltwise_bcast.brute_force/10, where GetParam() = ("sum", 3, CUDA/CUDA)Exception message: OpenCV(4.9.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/cuda/../cuda4dnn/csl/tensor.hpp:1047: error: (-215:Assertion failed) rank() >= 2 in function 'squeeze'/mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/test/test_layers.cpp:2053: FailureExpected: re = net.forward() doesn't throw an exception.  Actual: it throws.Exception message: OpenCV(4.9.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/cuda/../cuda4dnn/csl/tensor.hpp:1047: error: (-215:Assertion failed) rank() >= 2 in function 'squeeze'/mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/test/test_layers.cpp:2053: FailureExpected: re = net.forward() doesn't throw an exception.  Actual: it throws.Exception message: OpenCV(4.9.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/cuda/../cuda4dnn/csl/tensor.hpp:1047: error: (-215:Assertion failed) rank() >= 2 in function 'squeeze'/mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/test/test_layers.cpp:2053: FailureExpected: re = net.forward() doesn't throw an exception.  Actual: it throws.Exception message: OpenCV(4.9.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/cuda/../cuda4dnn/csl/tensor.hpp:1047: error: (-215:Assertion failed) rank() >= 2 in function 'squeeze'/mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/test/test_layers.cpp:2053: FailureExpected: re = net.forward() doesn't throw an exception.  Actual: it throws.[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/10, where GetParam() = ("sum", 3, CUDA/CUDA) (1536 ms)

Full list:

[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/10, where GetParam() = ("sum", 3, CUDA/CUDA)[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/11, where GetParam() = ("sum", 3, CUDA/CUDA_FP16)[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/15, where GetParam() = ("sum", 4, CUDA/CUDA)[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/16, where GetParam() = ("sum", 4, CUDA/CUDA_FP16)[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/20, where GetParam() = ("sum", 5, CUDA/CUDA)[  FAILED  ] Layer_Test_Eltwise_bcast.brute_force/21, where GetParam() = ("sum", 5, CUDA/CUDA_FP16)

@fengyuentau
Copy link
MemberAuthor

It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checkingrank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.

It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):

auto input_0_shape = inputs[0].dynamicCast<CUDABackendWrapper>()->getShape();
for (int i =1; i < inputs.size(); i++)
{
auto input_i_shape = inputs[i].dynamicCast<CUDABackendWrapper>()->getShape();
if (input_0_shape.size() != input_i_shape.size())
return Ptr<BackendNode>();
// check if the shape can be supported by `eltwise_ops.cu`, or return the default BackendNode
for (int j =0; j < input_0_shape.size(); j++)
if (input_0_shape[j] != input_i_shape[j] &&
input_0_shape[j] !=1 && input_i_shape[j] !=1)
return Ptr<BackendNode>();
}


With that being said, I propose to turn off these tests specifically for CUDA backend.@asmorkalov What do you think?

@WanliZhong Please join this talk as well.

@fengyuentau
Copy link
MemberAuthor

It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checkingrank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.

It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):

auto input_0_shape = inputs[0].dynamicCast<CUDABackendWrapper>()->getShape();
for (int i =1; i < inputs.size(); i++)
{
auto input_i_shape = inputs[i].dynamicCast<CUDABackendWrapper>()->getShape();
if (input_0_shape.size() != input_i_shape.size())
return Ptr<BackendNode>();
// check if the shape can be supported by `eltwise_ops.cu`, or return the default BackendNode
for (int j =0; j < input_0_shape.size(); j++)
if (input_0_shape[j] != input_i_shape[j] &&
input_0_shape[j] !=1 && input_i_shape[j] !=1)
return Ptr<BackendNode>();
}

With that being said, I propose to turn off these tests specifically for CUDA backend.@asmorkalov What do you think?

@WanliZhong Please join this talk as well.

Or we still fall back to CPU when dimension is 1.

@WanliZhong
Copy link
Member

I propose fallback when dim is 1 to make sure cuda run correctly rather than throw an error

@fengyuentau
Copy link
MemberAuthor

It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checkingrank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.
It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):

auto input_0_shape = inputs[0].dynamicCast<CUDABackendWrapper>()->getShape();
for (int i =1; i < inputs.size(); i++)
{
auto input_i_shape = inputs[i].dynamicCast<CUDABackendWrapper>()->getShape();
if (input_0_shape.size() != input_i_shape.size())
return Ptr<BackendNode>();
// check if the shape can be supported by `eltwise_ops.cu`, or return the default BackendNode
for (int j =0; j < input_0_shape.size(); j++)
if (input_0_shape[j] != input_i_shape[j] &&
input_0_shape[j] !=1 && input_i_shape[j] !=1)
return Ptr<BackendNode>();
}

With that being said, I propose to turn off these tests specifically for CUDA backend.@asmorkalov What do you think?
@WanliZhong Please join this talk as well.

Or we still fall back to CPU when dimension is 1.

It does not work due to the 1d mat is actually produced during the broadcasting implementation in the CUDA backend. Let me find another solution to this.

@fengyuentau
Copy link
MemberAuthor

New commits should resolve this problem.

@asmorkalov
Copy link
Contributor

Pass tests with CUDA locally now.

fengyuentau and WanliZhong reacted with rocket emoji

@fengyuentau
Copy link
MemberAuthor

Sporadic crash inPR:4.x / macOS-ARM64-Vulkan / BuildAndTest (pull_request). This patch makes no changes on Caffe and Vulkan.

asmorkalov reacted with thumbs up emoji

@asmorkalovasmorkalov merged commite7ccff9 intoopencv:4.xJan 11, 2024
@fengyuentaufengyuentau deleted the cuda_naryeltwise_broadcast branchJanuary 11, 2024 07:07
@AbdurrahheemAbdurrahheem mentioned this pull requestJan 11, 2024
6 tasks
@asmorkalovasmorkalov mentioned this pull requestJan 19, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@dkurtdkurtdkurt approved these changes

@AbdurrahheemAbdurrahheemAbdurrahheem approved these changes

@WanliZhongWanliZhongAwaiting requested review from WanliZhong

Assignees

@dkurtdkurt

Labels

bugcategory: dnncategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contrib

Projects

None yet

Milestone

4.10.0

5 participants

@fengyuentau@Abdurrahheem@asmorkalov@WanliZhong@dkurt

[8]ページ先頭

©2009-2025 Movatter.jp