add layer norm onnx parser
add layer norm impl
add layer norm onnx simplifier for both cases of constants being Constant and Initializer
add test model generation code for layer_norm_expanded and layer_norm_expanded_initializer

Benchmark:

Layer	Mean (ms)	Median (ms)	Min (ms)
layer norm expanded	0.43	0.42	0.40
layer norm (this pr)	0.02	0.02	0.01

*: tested with size 1x50x768 on Apple M1.

Pull Request Readiness Checklist

See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Linux OpenCL

add layer norm onnx parser, impl and tests

148a148

fengyuentau mentioned this pull request

Dec 28, 2022

models and data for layer normopencv/opencv_extra#1032

Merged

fengyuentau added feature category: dnn labels

Dec 28, 2022

fengyuentau added this to the4.8.0 milestone

Dec 28, 2022

asmorkalov requested a review fromrogday

December 28, 2022 06:58

fengyuentau added3 commits

December 30, 2022 18:14

add onnx graph simplifier for layer norm expanded

5ad7ac4

handle the case when constants are of type Initializer

a3a762c

add test case for layer norm expanded with initializers

c1073b6

Copy link

MemberAuthor

fengyuentau commentedJan 13, 2023

@rogday Could you review this pull request if possible?

rogday approved these changes

Jan 13, 2023

View reviewed changes

Copy link

Member

rogday left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for contribution! LGTM 👍

modules/dnn/src/layers/layer_norm.cppShow resolvedHide resolved

fengyuentau commented

Jan 16, 2023

View reviewed changes

modules/dnn/src/layers/layer_norm.cpp OutdatedShow resolvedHide resolved

use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fal…

6a87f9b

…lback for OCL_FP16

alalek reviewed

Jan 16, 2023

View reviewed changes

Copy link

Member

alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Left some optimization and minor comments.

modules/dnn/src/layers/layer_norm.cpp OutdatedShow resolvedHide resolved

modules/dnn/src/layers/layer_norm.cpp Outdated

		std::vector<Mat> inputs, outputs;
		inputs_arr.getMatVector(inputs);
		outputs_arr.getMatVector(outputs);
		constint nstripes =getNumThreads();

Copy link

Member

alalekJan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

const int nstripes = getNumThreads();

This doesn't look as a reliable design.
This scheme assumes that all threads has the same speed and they are not interrupted.
It is not true for OS with preemptive execution (all widely used OS for the last 25+ years). Some cores may handle background tasks or interrupts.

Also it is not true at all for CPUs with big+little design (modern ARM, Intel CPUs with P+E cores).

nstripes should be based on subtask's "grain size" (subtask time >>> scheduling overhead) instead of number of available threads.

Some information is available here:https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Controlling_Chunking_os.html

Copy link

MemberAuthor

fengyuentauJan 17, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also it is not true at all for CPUs with big+little design (modern ARM, Intel CPUs with P+E cores).

And this is why I happened fix the openmp issue on macOS. I was doing benchmarking on vision transformers on onnxruntime, mnn and opencv dnn the other day, and found that both onnxruntime and mnn can run with 4 threads on my apple m1 by default, but opencv dnn uses all threads (8) instead. It is not available to set numThreads with gcd and when I tried with openmp, things went wrong with building issues.

I think somehow we should improve the multi-threading functionality of opencv, which detects big+little design and returns the numThreads of big cores if possible. Somehowcv::Range needs to support step as well in order to work with "grain size"...

Copy link

Member

alalekJan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think somehow we should improve the multi-threading functionality of opencv, which detects big+little design and returns the numThreads of big cores if possible

It doesn't look as an improvement really.

E.g., in case of ARM-based phones we a already have configurations with 2 big + 6 little cores.

Again, we should not assume or rely heterogenous or same performance of threads or equality of subtasks complexity.

Copy link

MemberAuthor

fengyuentauJan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not sure what you are exactly asking for here. Basically every other**Invoker of other layers uses the same strategy. If we want to take care of big+little core CPUs, that is going to be another pull request I think.

modules/dnn/src/layers/layer_norm.cpp OutdatedShow resolvedHide resolved

fengyuentau added3 commits

January 17, 2023 11:09

use const ref / ref in parameters of invoker::run; extract inner cons…

8beaad5

…t if from nested loop; use size_t in place of ull

template hasBias

1fe2521

remove trailing whitespace

27bd0f0

fengyuentau requested a review fromalalek

January 20, 2023 03:18

alalek reviewed

Jan 20, 2023

View reviewed changes

Copy link

Member

alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for the update!

modules/dnn/src/layers/layer_norm.cpp OutdatedShow resolvedHide resolved

modules/dnn/src/layers/layer_norm.cpp Outdated

		std::vector<Mat> inputs, outputs;
		inputs_arr.getMatVector(inputs);
		outputs_arr.getMatVector(outputs);
		constint nstripes =getNumThreads();

Copy link

Member

alalekJan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think somehow we should improve the multi-threading functionality of opencv, which detects big+little design and returns the numThreads of big cores if possible

It doesn't look as an improvement really.

E.g., in case of ARM-based phones we a already have configurations with 2 big + 6 little cores.

Again, we should not assume or rely heterogenous or same performance of threads or equality of subtasks complexity.

modules/dnn/src/layers/layer_norm.cpp OutdatedShow resolvedHide resolved

use pointer parameter with null check; move normSize division & mean_…

55ad62c

…square division outside of loop; use std::max to ensure positive value before std::sqrt

fengyuentau requested a review fromalalek

January 20, 2023 12:28

refactor implementation, optimize parallel_for

b540b9f

alalek reviewed

Jan 20, 2023

View reviewed changes

Copy link

Member

alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Refactored references and parallel_for usage.

modules/dnn/perf/perf_layer.cpp OutdatedShow resolvedHide resolved

modules/dnn/test/test_onnx_importer.cpp


		TEST_P(Test_ONNX_layers, LayerNorm)
		{
		testONNXModels("test_layer_normalization_2d_axis0", pb,0,0,false,true,3);

Copy link

Member

alalekJan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are many error messages during the model import:

[ RUN      ] Test_ONNX_layers.LayerNorm/0, where GetParam() = OCV/CPU[ INFO:0@132.156] global onnx_importer.cpp:831 populateNet DNN/ONNX: loading ONNX v8 model produced by 'backend-test'. Number of nodes = 1, initializers = 0, inputs = 3, outputs = 3[ INFO:0@132.156] global onnx_importer.cpp:725 parseOperatorSet DNN/ONNX: ONNX opset version = 17[ INFO:0@132.156] global onnx_importer.cpp:997 handleNode DNN/ONNX: processing node with 3 inputs and 3 outputs: [LayerNormalization]:(onnx_node_output_0!Y) from domain='ai.onnx'[ERROR:0@132.156] global onnx_importer.cpp:924 populateNet DNN/ONNX: can't find layer for output name: 'Mean'. Does model imported properly?[ERROR:0@132.156] global onnx_importer.cpp:924 populateNet DNN/ONNX: can't find layer for output name: 'InvStdDev'. Does model imported properly?

We should not have them.

Copy link

MemberAuthor

fengyuentauJan 21, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The reason why they exist:

These ONNX models are taken from the ONNX conformance tests. I think it is better not modifying them.

I took a look at the opencv-onnx functionalities and did not find how to remove them completely in our onnx importer:

// Remove additional outputs (Mean, InvStdDev)if (node_proto.output_size() >1){auto outputName = node_proto.output(0);    opencv_onnx::NodeProto node_proto_ = node_proto;    node_proto_.clear_output();    node_proto_.add_output(outputName);addLayer(layerParams, node_proto_);}

@rogday Do you happen to know how to remove optional (node & graph) output in ONNX importer?

Copy link

MemberAuthor

fengyuentauJan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I removed optional outputs from those ONNX models in the end. Turned out it is not straightforward to modify outputs of ONNX graph proto.

modules/dnn/src/layers/layer_norm.cpp

		CV_CheckTypeEQ(src.type(), dst.type(),"");
		CV_Assert(scale.isContinuous());

		CV_CheckGE(epsilon,0.0f,"");

Copy link

Member

alalekJan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added this extra check

fengyuentau commented

Jan 21, 2023

View reviewed changes

modules/dnn/src/layers/layer_norm.cpp

Comment on lines 107 to 108

		double nstripes = ((size_t)p.total * p.normSize) * (1 /1024.0);
		parallel_for_(Range(0, p.total), p, nstripes);

Copy link

MemberAuthor

fengyuentauJan 21, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the change! I learned a lot!

So if I understand correctly, you make grainsize appropriately small enough so that we can use both big and small cores, and big cores can naturally take more jobs. What about the "magic number" 1024?~~Like it was taken from the "bathtub curve" fromthis link~~?

Copy link

Member

alalekJan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

parallel_for() strategy should rely on subtask size and the scheduling overhead. 1024 is some empiric number here which specifies size of subtask.

Copy link

MemberAuthor

fengyuentauJan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Okay, thanks again. Benefit a lot from this. Another question is why multiplyingp.normSize. Another empirical operation? I tried without it and the speed is like twice slower.

By the way, I found that all other**Invokers use the same strategy assuming all threads have the same performance. I think they need to be upgraded as well.

fengyuentau added2 commits

January 21, 2023 11:17

disable layer norm expanded

ce09be5

remove the removal of layer norm optional outputs

2e9e0fd

fengyuentau requested a review fromalalek

January 24, 2023 03:22

Copy link

MemberAuthor

fengyuentau commentedJan 27, 2023

@alalek what is the status of this pull request? Could we make a step forward?

alalek approved these changes

Jan 27, 2023

View reviewed changes

alalek assignedrogday

Jan 27, 2023

alalek merged commit4d918ba intoopencv:4.x

Jan 27, 2023

alalek mentioned this pull request

Jan 28, 2023

(5.x) Merge 4.x#23189

Merged

fengyuentau mentioned this pull request

Mar 14, 2023

ONNX conformance test results#21078

Open

48 tasks

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request

Mar 30, 2023

Merge pull requestopencv#23047from fengyuentau:layer_norm

10b9567

dnn: add layer normalization for vision transformers* add layer norm onnx parser, impl and tests* add onnx graph simplifier for layer norm expanded* handle the case when constants are of type Initializer* add test case for layer norm expanded with initializers* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull* template hasBias* remove trailing whitespace* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt* refactor implementation, optimize parallel_for* disable layer norm expanded* remove the removal of layer norm optional outputs

asmorkalov mentioned this pull request

May 31, 2023

(5.x) Merge 4.x#23718

Merged

geversonsto pushed a commit to stodev-com-br/opencv that referenced this pull request

Jun 3, 2023

Merge pull requestopencv#23047from fengyuentau:layer_norm

0026bdb

dnn: add layer normalization for vision transformers* add layer norm onnx parser, impl and tests* add onnx graph simplifier for layer norm expanded* handle the case when constants are of type Initializer* add test case for layer norm expanded with initializers* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull* template hasBias* remove trailing whitespace* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt* refactor implementation, optimize parallel_for* disable layer norm expanded* remove the removal of layer norm optional outputs

dkurt mentioned this pull request

Aug 4, 2023

Merge MVN and LayerNorm in one layer#24105

Closed

9 tasks

fengyuentau mentioned this pull request

Oct 18, 2023

dnn: add shared fastNorm kernel for mvn, instance norm and layer norm#24409

Merged

11 tasks

fengyuentau deleted the layer_norm branch

October 19, 2023 07:34

opencv-alalek mentioned this pull request

May 27, 2024

dnn: parallelize nary elementwise forward implementation & enable related conformance tests#25630

Merged

10 tasks

Labels

category: dnn feature test

Movatterモバイル変換

Uh oh!

dnn: add layer normalization for vision transformers#23047

dnn: add layer normalization for vision transformers#23047

Uh oh!

Conversation

fengyuentau commentedDec 28, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

fengyuentau commentedJan 13, 2023

Uh oh!

rogday left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengyuentauJan 17, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengyuentauJan 21, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengyuentauJan 21, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengyuentau commentedJan 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fengyuentau commentedDec 28, 2022•
edited
Loading

rogday left a comment•
edited
Loading

fengyuentauJan 17, 2023•
edited
Loading

fengyuentauJan 21, 2023•
edited
Loading

fengyuentauJan 21, 2023•
edited
Loading