Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
dnn: add attention layer#24476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
dnn: add attention layer#24476
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
304584b to5671decCompare32293fd to816c331Comparefengyuentau commentedNov 24, 2023
Benchmark results are added. |
dkurt commentedNov 24, 2023
PR is good, thanks a lot. The only concern about the potential regression/fallback in backends because of transition from separate ops to a new fused layer. However, there is no alternatives for this problem for now and I recommend to keep this PR only for default CPU implementation. Please also take a look at this comment:opencv/opencv_extra#1128 (comment) |
| net.setInput(bias, input_names[2]); | ||
| net.setPreferableBackend(backendId); | ||
| net.setPreferableTarget(targetId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
- I will benchmark on CUDA and with OpenVINO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Current numbers a bit confusing. Let's wait for#24476 (comment) because it fails.
input shape:[1, 320, 48]
model fromopencv/opencv_extra#1128
CPU: 12th Gen Intel(R) Core(TM) i9-12900
| backend | 4.x | PR |
|---|---|---|
| OpenCV, CPU | 18.46ms | 11.90ms (x0.64) |
| OpenVINO, CPU | 0.25ms | 11.91ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Well, the acc test have a reduced scale so that it does not take a lot time to run. For benchmark, I use [1, 197, 768] (I have twoattention.onnx models, one with input [1, 197, 768], the other one [1, 320, 48]).
For OpenVINO, since we do not have backend-specific graph fusion, we can recreate this subgraph ininitNgraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Updated results. Please ignore the table above:
input: 1x197x768
CPU: 12th Gen Intel(R) Core(TM) i9-12900
| backend | 4.x | PR |
|---|---|---|
| OpenCV, CPU | 16.93ms | 2.37ms (x7.14) |
| OpenVINO, CPU | 1.54ms | 2.33ms (x0.66) |
So there is a degradation in OpenVINO performance in case of fallback to OpenCV layer.
fengyuentau commentedNov 24, 2023
I will try to implement different backends for this layer in another pull request. Just try to reduce the review and merge this ASAP. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
669b503 tob3068a3Compareasmorkalov commentedDec 1, 2023
@dkurt@WanliZhong Friendly reminder. |
Uh oh!
There was an error while loading.Please reload this page.
modules/dnn/perf/perf_net.cpp Outdated
| | vit_b_32 | 89.92 | 116.22 | 30.36 | | ||
| | vit_l_16 | 1593.32 | 1730.74 | 419.92 | | ||
| | vit_l_32 | 468.11 | 577.41 | 134.12 | | ||
| | VitTrack | 3.80 | 3.87 | 2.25 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Please remove the results from source code - it's enough to add to PR's description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Running this models inperf_net.cpp is taking up too much time for now. How about we commentPERF_TEST_P_(DNNTestNetwork, VIT_B_16) and such for now until we get the close or even better inference speed as ORT? In this case, these perfamce results should be deleted and kept in the first comment of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You may add a corresponding skip exception:
applyTestTag(CV_TEST_TAG_LONG, CV_TEST_TAG_DEBUG_LONG);Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There is no issue if we specify 4 inputs for "Slice". So no need to add new logic with optional inputs:
Track#24609 and proposal#24609 (comment)
fengyuentau commentedDec 1, 2023
I dont think the proposal is good enough. We actually have other models that have |
dkurt commentedDec 4, 2023
@fengyuentau, got it. So can you please in this PR do a simpler workaround like a parameter with number of Slice inputs? This problem should be solved separately. classAttentionSubGraph :publicSubgraph {public:AttentionSubGraph(int numSliceInps) { std::vector<std::string>inps(1 + numSliceInps, att_add);for (int i =0; i < numSliceInps; ++i) inps[i +1] =addNodeToMatch(""); slice_v =addNodeToMatch("Slice", inps); } |
…; add test for attention subgraph fusion
…t is matched; clean comments
1bc226e to846237dCompareasmorkalov commentedDec 20, 2023
CUDA: |
asmorkalov commentedDec 20, 2023
@opencv-alalek Please update test data on Buildbot. |
fengyuentau commentedDec 20, 2023
I was also looking into this issue. It is so weird that it only fails on CUDA_FP16 with almost double the threshold value. Could we apply skip tag |
asmorkalov commentedDec 20, 2023
CV_TEST_TAG_DNN_SKIP_CUDA_FP16 - sure. |
asmorkalov left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
👍
Make default axis of softmax in onnx "-1" without opset optionopencv#24613Try to solve problem:opencv#24476 (comment)**ONNX**`opset <= 11` use 1`else` use -1**TensorFlow**`TF version = 2.x` use -1`else` use 1**Darknet, Caffe, Torch**use 1 by definition
dnn: add attention layeropencv#24476Resolvesopencv#24609Merge with:opencv/opencv_extra#1128.Attention operator spec from onnxruntime:https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.TODO:- [x] benchmark (before this PR vs. with this PR vs. ORT).- [x] Layer fusion: Take care Slice with end=INT64_MAX.- [x] Layer fusion: match more potential attention (VIT) patterns. - [x] Single-head attention is supported.- [x] Test AttentionSubgraph fusion.- [x] Add acc tests for VIT_B_32 and VitTrack- [x] Add perf tests for VIT_B_32 and VitTrack## BenchmarksPlatform: Macbook Air M1.### Attention SubgraphInput scale: [1, 197, 768].| | mean (ms) | median (ms) | min (ms) || ---------------------- | --------- | ----------- | -------- || w/ Attention (this PR) | 3.75 | 3.68 | 3.22 || w/o Attention | 9.06 | 9.01 | 8.24 || ORT (python) | 4.32 | 2.63 | 2.50 |### ViTsAll data in millisecond (ms).| ViTs | With Attention | Without Attention | ORT || -------- | -------------- | ----------------- | ------ || vit_b_16 | 302.77 | 365.35 | 109.70 || vit_b_32 | 89.92 | 116.22 | 30.36 || vit_l_16 | 1593.32 | 1730.74 | 419.92 || vit_l_32 | 468.11 | 577.41 | 134.12 || VitTrack | 3.80 | 3.87 | 2.25 |### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [x] The feature is well documented and sample code can be built with the project CMake
Uh oh!
There was an error while loading.Please reload this page.
Resolves#24609
Merge with:opencv/opencv_extra#1128.
Attention operator spec from onnxruntime:https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
Benchmarks
Platform: Macbook Air M1.
Attention Subgraph
Input scale: [1, 197, 768].
ViTs
All data in millisecond (ms).
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.