Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimization based on RISC-V P Packed SIMD Extension v0.5.2#24556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
asmorkalov merged 1 commit intoopencv:4.xfromplctlab:rvp
Jan 16, 2024

Conversation

@Junyan721113
Copy link
Contributor

Summary

Provides OpenCV optimizations for the RISC-V P extension (v0.5.2).

  1. Added RVP as a new backend to the OpenCV build system;
  2. Optimized some of the algorithms in the DNN, features2d (feature detection), and imgproc (image processing) modules using RVP Intrinsic functions;
  3. Verified the correctness of the optimized algorithms using the QEMU simulator.

The writer of the code and the author of the PR is an intern at ISCAS (Institute of Software, Chinese Academy of Sciences).

List of RVP optimizations

  • Optimization ofthree convolution functions for int8 layers of deep neural networks
// modules/dnn/src/int8layers/layers_common.simd.hppvoidcv::dnn::fastConv( ... );voidcv::dnn::fastDepthwiseConv( ... );voidcv::dnn::fastGEMM1T( ... );
  • Optimization of matrixaffine transformations
// modules/imgproc/src/imgwarp.rvp.cppintcv::opt_RVP::warpAffineBlockline( ... );
  • Optimization ofnearest neighbor interpolation for matrix scaling with pix_size 2 or 4
// modules/imgproc/src/resize.rvp.cppclasscv::opt_RVP::resizeNNInvokerRVP4;classcv::opt_RVP::resizeNNInvokerRVP2;
  • Optimization ofArray Accumulation with Squares or Element Multiplication
// modules/imgproc/src/accum.simd.hppvoidaccSqr_simd_( ... );voidaccProd_simd_( ... );
  • Optimization ofintegral for unsigned char arrays
// modules/imgproc/src/sumpixels.simd.hpptemplate<>structIntegral_SIMD<uchar,int,double>;
  • Optimization ofFAST corner detection algorithm with patternSize 16
// modules/features2d/src/fast.rvp.cppclasscv::opt_RVP::FAST_t_patternSize16_RVP;

Correctness validation (QEMU)

opencv_test_dnn_rvp Consistent with control (before adding RVP optimization)

opencv_test_imgproc_rvp Consistent with controls

opencv_test_features2d_rvp Consistent with controls

Q&A

Why RVP ?

As a lightweight extension, there is some potential for P extensions to be used in the embedded domain.

Why v0.5.2 ?

Although RVP is not frozen, Andes has massive plans based on version 0.5.2, just like T-Head and RVV071.

Why not Universal Intrinsics ?

RVP052 has no floating-point arithmetic and only supports parallel arithmetic up to 64 bits, which makes it less capable of implementing Universal Intrinsics, and thus most of its optimizations refer to existing function-specific optimizations.

How to perform tests ?

The correctness tests are as follows. (Due to hardware issues, performance test results are not available at this time)

Environment

export RISCV=/opt/andesexport OPENCV_TEST_DATA_PATH=**path_to_opencv_extra**/testdata

Toolchain

nds-gnu-toolchain

build_linux_toolchain.sh

TARGET=riscv64-linuxPREFIX=/opt/andesARCH=rv64imafdcxandesABI=lp64dCPU=andes-25-seriesXLEN=64BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF --toolchain ../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Related Tests

dnn module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_dnn# int8layers/layers_common_simd.hpp# --gtest_filter=*Int8*# --gtest_filter=*Conv*# --gtest_filter=*Gemm*

imgproc module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_imgproc# imgwarp.rvp.cpp# --gtest_filter=*Affine*## resize.rvp.cpp# --gtest_filter=*Resize*## sumpixels.simd.hpp# --gtest_filter=*Integ*

features2d module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_features2d# fast.rvp.cpp# --gtest_filter=*FAST*# --gtest_filter=*ORB*

Pull Request Readiness Checklist

See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

hanliutong, Junyan721113, asmorkalov, and dkurt reacted with hooray emoji
@asmorkalov
Copy link
Contributor

cc@hanliutong@vpisarev

@asmorkalov
Copy link
Contributor

@mshabunin Is it possible to add P extension to QEMU configuration on CI? It should help a lot.

@vpisarevvpisarev self-requested a reviewNovember 20, 2023 10:37
@vpisarev
Copy link
Contributor

vpisarev commentedNov 20, 2023
edited
Loading

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead:https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

@mshabunin
Copy link
Contributor

I have several questions, concerns and suggestions.

Lower level or technical:

  • CPU check uses__nd__ prefix while other code uses__rv__v_ prefix
  • code usesnds_intrinsic.h header, but I have seen other variant -riscv-dsp.h in the T-Head toolchain.
  • you claim that this is v0.5.2, but P-extension revisionhistory states that__nds__ prefix has been replaced with to__rv__ in v0.8
  • you used-mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCCdoesn't have it. T-Head toolchain, for example, uses common ISA-string syntax:-mcpu=rv64gcp.

Higher level or more strategic questions and proposals:

  • As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by theCarotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: seehttps://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
    Reference documentation is here:
  • Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?
  • P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have__rv_ prefix instead of__rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.
  • Is there any consumer-grade harware available for purchase for real tests?
  • Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.
opencv-alalek reacted with thumbs up emoji

@Junyan721113
Copy link
ContributorAuthor

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead:https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

Thank you for your guidance! Most of the current optimizations for P extensions are where other platform-specific optimizations already exist (such as int8layers/layers_common.simd.hpp). I would like to know exactly what parts of the code "critical paths in critical modules" refer to, so that P extensions can be optimized in other ways if Universal Intrinsics is not possible.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

However, I'm sorry to say that I'm currently having trouble implementing Universal Intrinsics with the P extension for the following reasons:

  1. P extensions do not have floating point instructions, thus making it difficult to implement the floating point vector part of Universal Intrinsics; moreover, P extensions do not have vector registers, limiting many optimization operations.
  2. Another solution is to fall back to a pure C++ implementation of Universal Intrinsics on floating-point vectors, but this may lead to negative optimizations, just as RVV generates redundant Load/Stores. (modules/core/include/opencv2/core/hal/intrin_rvv.hpp)

@Junyan721113
Copy link
ContributorAuthor

  • CPU check uses__nd__ prefix while other code uses__rv__v_ prefix
  • you claim that this is v0.5.2, but P-extension revisionhistory states that__nds__ prefix has been replaced with to__rv__ in v0.8

This is my fault. RVP v0.5.2 should use__nds__ prefix rather than__rv__ prefix.

  • code usesnds_intrinsic.h header, but I have seen other variant -riscv-dsp.h in the T-Head toolchain.
  • you used-mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCCdoesn't have it. T-Head toolchain, for example, uses common ISA-string syntax:-mcpu=rv64gcp.

I'm sorry, but Andes toolchain usesnds_intrinsic.h as header, and the-mext-dsp option is documented inAndes DSP Library.

As a test outside of this PR, A 3rdparty component calledndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved intondsrvp soon.

  • Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?

T-Head DSP implementation does not support__nds__ prefix, and has different intrinsic function definations usingintXLEN_t anduintXLEN_t, so it is possibly incompatible. And this PR is only intended to add optimizations based on rvp v0.5.2, which is Andes RVP.

  • P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have__rv_ prefix instead of__rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.

Supporting only v0.5.2 might be the best solution of this PR.RVP is renamed toRVP052 in order to distinguish RVP revisions.
Andes has plans for RVP052, just as T-Head has plans for RVV071.

  • Is there any consumer-grade harware available for purchase for real tests?

Communication has been made with Andes, development board will soon be available for perfromance tests.

  • Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.

I'm sorry, but currently I don't know about any plans related to Andes adding support to mainline.

Copy link
Contributor

@mshabuninmshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggest simplifying CPU-feature part: instead of adding RVP052 as a separate CPU feature, let's use custom macro defined in cmake toolchain file, like it is done in platforms/linux/riscv64-071-gcc.toolchain.cmake.

Basically you have to revert allcore modifications and add some macro definition to the riscv64-andes-gcc.toolchain.cmake (e.g.-D__riscv_andes_rvp052 or maybe there is one built into the compiler already?). Then use plain#ifdef guard for optimized code sections.

Tricky part is dispatchedfastConv,fastDepthwiseConv andfastGEMM - I suggest adding new filesconv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Probably some additional cmake variable should be set in the toolchain file, so thatdnn/CMakeLists.txt would know when to add new rvp052.cpp files to the build (or it can be just guarded by the same macro and added to the build unconditionally).

cc@opencv-alalek , what do you think?

@opencv-alalek
Copy link
Contributor

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.


platforms/linux/riscv64-071-gcc.toolchain.cmake

Could we reuse generic RISC-V toolchains? (with appropriate CPU_BASELINE/CPU_DISPATCH CMake parameters)

@mshabunin
Copy link
Contributor

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.

Yes, in general I agree, but in this specific case - limited HW availability, specialized toolchain, non-ratified extension, which is not available in generic toolchains - it looks more like RVV 0.7.1. Also there is no actual runtime check for this extension, so dispatched implementations do not make sense, in this PR dispatching was implemented only because of DNN module specifics (nohal::, no Universal Inrinsics, raw SIMD blocks, existing dispatching).

So, IMHO experimental less-invasive approach similar to early RVV 0.7.1 would fit better than generalized P-extension support. Later, when various implementations converge to some stable form and the extension is supported in the upstream, we will implement it as a full-fledged CPU feature.

@Junyan721113
Copy link
ContributorAuthor

Junyan721113 commentedDec 12, 2023
edited
Loading

Tricky part is dispatchedfastConv,fastDepthwiseConv andfastGEMM - I suggest adding new filesconv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Files with.rvp052.cpp suffix could trigger CMake CPU dispatch filter, resulting inExcluding from source files list: modules/dnn/src/int8layers/conv_depthwise.rvp052.cpp, soconv_depthwise.dispatch.cpp may be a better solution.

As for marcos, there are 2 marcos called__ANDES and__riscv_dsp filling the need.

Meanwhile, I wonder if it is acceptable to implement all these 3 convolution functions inside oneconv_depthwise.dispatch.cpp file (maybe renaming it tolayers_common.dispatch.cpp is better?), rather than put them in 3.cpp files.

In total, is the following code acceptable?

// modules/core/include/opencv2/core/cv_cpu_dispatch.h#if defined(__riscv) && defined(__riscv_dsp) && defined(__ANDES)#include<nds_intrinsic.h>#defineCV_RVP0521#endif
// modules/dnn/src/int8layers/layers_common.simd.hpp#include"layers_common.dispatch.hpp"
// modules/dnn/src/int8layers/layers_common.dispatch.cppnamespacecv {namespacednn {namespaceopt_RVP052 {#if CV_RVP052//RVP Optimizations
// modules/dnn/src/int8layers/convolution_layer.cpp#if CV_RVP052if(isConv2D)opt_RVP052::fastDepthwiseConv(wptr, kernel_h, kernel_w,            stride_h, stride_w, dilation_h, dilation_w,pad_t, pad_l,            biasptr, multptr, inptr_, height, width, outptr_, out_d, outH, outW, inpZp, outZp);else

Copy link
Contributor

@mshabuninmshabuninDec 20, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggest renaming files to something likelayers_rvp052.cpp/.hpp to avoid confusion with.dispatch files in other modules because they usually serve different purpose.

Disable whole.cpp body if macro is not defined or is false and include.hpp file intolayers_common.hpp with the same macro condition.

Junyan721113 reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

else
#endif
#if CV_RVP052
if(useRVP052)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

useRVP052 is always the same asCV_RVP052 and does not have external interface, so I suggest removing boolean flag completely. Here and in other files.

Junyan721113 reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Infully_connected_layer.cpp this is absolutely right. But inconvolution_layer.cpp,useRVP052 is not always the same asCV_RVP052, because ofline 769p.useRVP052 = CV_RVP052 && isConv2D; introducing a little difference.
So change this boolean flag intoisConv2D might be better.

mshabunin reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggest moving these changes to thednn module, maybe toint8layers/layers_common.hpp?

Junyan721113 reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Inlayers_rvp052.cpp, includinglayers_common.hpp to getCV_RVP052 could causeHAVE_OPENCL malfunction as follows:

In file included from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/./layers_common.hpp:17,                 from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/layers_rvp052.cpp:5:/home/junyan/opencv_rvp/modules/dnn/src/int8layers/./../ocl4dnn/include/ocl4dnn.hpp:196:9: error:'ocl' does not name atype; did you mean'ogl'?  196|         ocl::ProgramcompileKernel();|         ^~~|         ogl

So maybe moving them intolayers_rvp052.hpp is better.

mshabunin reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Modifications in this file will not be necessary.

Junyan721113 reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

@asmorkalovasmorkalov added this to the4.9.0 milestoneDec 20, 2023
@asmorkalovasmorkalov merged commit99c86bb intoopencv:4.xJan 16, 2024
@asmorkalovasmorkalov mentioned this pull requestJan 23, 2024
@Junyan721113
Copy link
ContributorAuthor

Development boards for accuracy test and performance test have been set up, results will soon come out.

@Junyan721113
Copy link
ContributorAuthor

Junyan721113 commentedMar 2, 2024
edited
Loading

Here's the accuracy test and performance test results!

TL; DR: EfficientDet_int8 in opencv_perf_dnn have gained a 1.95x performance boost.

The 3 functions optimized by RVP only appeared in the following tests:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8* --gtest_output=xml./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

MeanwhileTest_Int8_nets.CaffeNet andTest_Int8_nets.RCNN_ILSVRC13 took up too much memory to be run on the board.

So the final filter is:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8*--*CaffeNet*:*RCNN_ILSVRC13* --gtest_output=xml./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

opencv_perf_dnn summary

> python .\misc\summary.py .\opencv_bin_blank\opencv_perf_dnn.xml .\opencv_bin_rvp\opencv_perf_dnn.xmlGeometric mean (ms)               Name of Test                 opencv    opencv     opencv                                             perf      perf       perf                                              dnn       dnn       dnn                                                                   vs                                                                 opencv                                                                  perf                                                                  dnn                                                               (x-factor)EfficientDet_int8::DNNTestNetwork::OCV/CPU 42451.011 21728.436    1.95

opencv_perf_dnn optimized

<?xml version="1.0" encoding="UTF-8"?><testsuitestests="1"failures="0"disabled="0"errors="0"timestamp="2020-09-22T16:39:16"time="241.631"cv_module_name="dnn"cv_implementation="plain"cv_num_threads="-1"test_tags=""test_tags_skip="mem_6gb,verylong,debug_verylong"test_tags_force=""cv_version="4.9.0-dev"cv_version_build="4.9.0-dev"cv_vcs_version="4.9.0-216-g09c6961694-dirty"cv_build_type="Debug"cv_build_type_build="Debug"cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)"cv_parallel_framework="pthreads"cv_parallel_threads="1"cv_cpu_features=""cv_ocl="disabled"name="AllTests">  <testsuitename="DNNTestNetwork"tests="1"failures="0"disabled="0"errors="0"time="241.625">    <testcasename="EfficientDet_int8/0"value_param="OCV/CPU"status="run"time="241.623"classname="DNNTestNetwork"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="bytesIn"value="0"/><propertyname="bytesOut"value="0"/><propertyname="term"value="1"/><propertyname="samples"value="10"/><propertyname="outliers"value="0"/><propertyname="frequency"value="1000000000"/><propertyname="min"value="21683049745"/><propertyname="median"value="21712831994"/><propertyname="gmean"value="21728435820"/><propertyname="gstddev"value="0.002588"/><propertyname="mean"value="21728501353"/><propertyname="stddev"value="56310680"/></properties>    </testcase>  </testsuite></testsuites>

opencv_perf_dnn control

<?xml version="1.0" encoding="UTF-8"?><testsuitestests="1"failures="0"disabled="0"errors="0"timestamp="2020-09-22T15:10:47"time="469.468"cv_module_name="dnn"cv_implementation="plain"cv_num_threads="-1"test_tags=""test_tags_skip="mem_6gb,verylong,debug_verylong"test_tags_force=""cv_version="4.9.0-dev"cv_version_build="4.9.0-dev"cv_vcs_version="4.9.0-212-g0e44f3a544-dirty"cv_build_type="Debug"cv_build_type_build="Debug"cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)"cv_parallel_framework="pthreads"cv_parallel_threads="1"cv_cpu_features=""cv_ocl="disabled"name="AllTests">  <testsuitename="DNNTestNetwork"tests="1"failures="0"disabled="0"errors="0"time="469.462">    <testcasename="EfficientDet_int8/0"value_param="OCV/CPU"status="run"time="469.46"classname="DNNTestNetwork"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="bytesIn"value="0"/><propertyname="bytesOut"value="0"/><propertyname="term"value="1"/><propertyname="samples"value="10"/><propertyname="outliers"value="0"/><propertyname="frequency"value="1000000000"/><propertyname="min"value="42387522406"/><propertyname="median"value="42406298532"/><propertyname="gmean"value="42451010572"/><propertyname="gstddev"value="0.001703"/><propertyname="mean"value="42451066023"/><propertyname="stddev"value="72351781"/></properties>    </testcase>  </testsuite></testsuites>

opencv_test_dnn summary
opencv_test_dnn
Testcases shorter than 1s are not shown above.

How the graph came out:

importxml.etree.ElementTreeasETimportmatplotlib.pyplotasplt# Read the XML files and extract the mean valuesrvp_file='opencv_bin_rvp/opencv_test_dnn.xml'blank_file='opencv_bin_blank/opencv_test_dnn.xml'# parse the XML filesrvp_data=ET.parse(rvp_file).getroot()blank_data=ET.parse(blank_file).getroot()print(rvp_data.tag,rvp_data.attrib)test_names= []fortestsuiteinrvp_data.iter(tag='testsuite'):# print(testsuite.tag, testsuite.attrib)test_names.append('Total: '+testsuite.attrib['name'])fortestcaseintestsuite.iter(tag='testcase'):# print(testcase.tag, testcase.attrib)test_names.append(testcase.attrib['name'])# keyw = 'mean'keyw='time'rvp_means= []fortestsuiteinrvp_data.iter(tag='testsuite'):rvp_means.append(float(testsuite.attrib['time']))fortestcaseintestsuite.iter(tag='testcase'):# print(testcase.tag, testcase.attrib)ifkeywnotintestcase.attrib:continuervp_means.append(float(testcase.attrib[keyw]))blank_means= []fortestsuiteinblank_data.iter(tag='testsuite'):blank_means.append(float(testsuite.attrib['time']))fortestcaseintestsuite.iter(tag='testcase'):# print(testcase.tag, testcase.attrib)ifkeywnotintestcase.attrib:continueblank_means.append(float(testcase.attrib[keyw]))print(rvp_means)print(blank_means)ratio= [blank_means[i]/rvp_means[i]foriinrange(len(rvp_means))ifrvp_means[i]>=1.0]# Remove trivial casestest_names= [test_names[i]foriinrange(len(ratio))]# if ratio[i] > 1.05 or ratio[i] < 0.95]ratio= [ratio[i]foriinrange(len(ratio))]# if ratio[i] > 1.05 or ratio[i] < 0.95]# Plot the bar chartfig,ax=plt.subplots()ax.bar(range(len(ratio)),ratio,color='b')ax.set_xlabel('Test case')ax.set_ylabel('Speedup')ax.set_title('Speedup of RVP over blank')ax.set_xticks(range(len(ratio)))ax.set_xticklabels(test_names,rotation=90)ax.set_yticks(range(0,6,1))ax.set_yticklabels([f'{i}x'foriinrange(0,6,1)])ax.axhline(y=1,color='r',linestyle='--')ax.grid(True,axis='y')# margin the plotplt.tight_layout()# Save the plot# plt.savefig('speedup.png')# Show the plotplt.show()

opencv_test_dnn optimized

<?xml version="1.0" encoding="UTF-8"?><testsuitestests="86"failures="0"disabled="2"errors="0"timestamp="2020-09-22T13:21:45"time="2233.4"cv_version="4.9.0-dev"cv_version_build="4.9.0-dev"cv_vcs_version="4.9.0-216-g09c6961694-dirty"cv_build_type="Debug"cv_build_type_build="Debug"cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)"cv_parallel_framework="pthreads"cv_parallel_threads="1"cv_cpu_features=""cv_ocl="disabled"test_tags=""test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser"test_tags_force=""name="AllTests">  <testsuitename="Test_Int8_layers"tests="40"failures="0"disabled="2"errors="0"time="6.545">    <testcasename="Convolution1D/0"value_param="OCV/CPU"status="run"time="0.091"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Convolution2D/0"value_param="OCV/CPU"status="run"time="0.856"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Convolution3D/0"value_param="OCV/CPU"status="run"time="0.074"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Flatten/0"value_param="OCV/CPU"status="run"time="0.114"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Padding/0"value_param="OCV/CPU"status="run"time="0.266"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="AvePooling/0"value_param="OCV/CPU"status="run"time="0.348"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MaxPooling/0"value_param="OCV/CPU"status="run"time="0.445"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Reduce/0"value_param="OCV/CPU"status="run"time="0.24"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ReLU/0"value_param="OCV/CPU"status="run"time="0.126"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="LeakyReLU/0"value_param="OCV/CPU"status="run"time="0.015"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ReLU6/0"value_param="OCV/CPU"status="run"time="0.065"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid/0"value_param="OCV/CPU"status="run"time="0.026"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid_dynamic_axes/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid_1d/0"value_param="OCV/CPU"status="run"time="0.045"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Mish/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_Caffe/0"value_param="OCV/CPU"status="run"time="0.174"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_keras_TF/0"value_param="OCV/CPU"status="run"time="0.02"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_slim_TF/0"value_param="OCV/CPU"status="run"time="0.026"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_slim_v2_TF/0"value_param="OCV/CPU"status="run"time="0.036"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_ONNX/0"value_param="OCV/CPU"status="run"time="0.021"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_log_ONNX/0"value_param="OCV/CPU"status="run"time="0.02"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="DISABLED_Softmax_unfused_ONNX/0"value_param="OCV/CPU"status="notrun"time="0"classname="Test_Int8_layers" />    <testcasename="Concat/0"value_param="OCV/CPU"status="run"time="0.22"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="BatchNorm/0"value_param="OCV/CPU"status="run"time="0.411"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Scale/0"value_param="OCV/CPU"status="run"time="0.143"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="InnerProduct/0"value_param="OCV/CPU"status="run"time="1.244"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Reshape/0"value_param="OCV/CPU"status="run"time="0.412"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Permute/0"value_param="OCV/CPU"status="run"time="0.045"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Identity/0"value_param="OCV/CPU"status="run"time="0.077"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_split_tf/0"value_param="OCV/CPU"status="run"time="0.02"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_4d_tf/0"value_param="OCV/CPU"status="run"time="0.022"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_strided_tf/0"value_param="OCV/CPU"status="run"time="0.024"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="DISABLED_Slice_onnx/0"value_param="OCV/CPU"status="notrun"time="0"classname="Test_Int8_layers" />    <testcasename="Slice_dynamic_axes_onnx/0"value_param="OCV/CPU"status="run"time="0.026"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_2d_onnx11/0"value_param="OCV/CPU"status="run"time="0.042"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_3d_onnx11/0"value_param="OCV/CPU"status="run"time="0.053"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_4d_onnx11/0"value_param="OCV/CPU"status="run"time="0.041"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_5d_onnx11/0"value_param="OCV/CPU"status="run"time="0.042"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Dropout/0"value_param="OCV/CPU"status="run"time="0.143"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Eltwise/0"value_param="OCV/CPU"status="run"time="0.433"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_Int8_nets"tests="24"failures="0"disabled="0"errors="0"time="2172.39">    <testcasename="AlexNet/0"value_param="OCV/CPU"status="run"time="81.558"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="GoogLeNet/0"value_param="OCV/CPU"status="run"time="237.368"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ResNet50/0"value_param="OCV/CPU"status="run"time="0.065"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="DenseNet121/0"value_param="OCV/CPU"status="run"time="215.475"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="SqueezeNet_v1_1/0"value_param="OCV/CPU"status="run"time="30.185"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Inception_v2/0"value_param="OCV/CPU"status="run"time="168.073"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v2/0"value_param="OCV/CPU"status="run"time="38.848"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Shufflenet/0"value_param="OCV/CPU"status="run"time="15.797"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_SSD/0"value_param="OCV/CPU"status="run"time="89.717"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v1_SSD/0"value_param="OCV/CPU"status="run"time="99.273"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v1_SSD_PPN/0"value_param="OCV/CPU"status="run"time="92.041"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Inception_v2_SSD/0"value_param="OCV/CPU"status="run"time="368.112"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="opencv_face_detector/0"value_param="OCV/CPU"status="run"time="238.913"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="EfficientDet/0"value_param="OCV/CPU"status="run"time="0.002"classname="Test_Int8_nets"><properties><propertyname="tags"value="debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_resnet50/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_inceptionv2/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_vgg16/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_2gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_1gb,mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_zf/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="RFCN/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,long,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="YoloVoc/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="TinyYoloVoc/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="YOLOv3/0"value_param="OCV/CPU"status="run"time="0"classname="Test_Int8_nets"><properties><propertyname="tags"value="long,mem_1gb,debug_verylong"/><propertyname="tags_implied"value="debug_long,mem_512mb"/></properties>    </testcase>    <testcasename="YOLOv4/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="long,mem_1gb,debug_verylong"/><propertyname="tags_implied"value="debug_long,mem_512mb"/></properties>    </testcase>    <testcasename="YOLOv4_tiny/0"value_param="OCV/CPU"status="run"time="496.879"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>  </testsuite>  <testsuitename="Test_ONNX_layers"tests="20"failures="0"disabled="0"errors="0"time="1.548">    <testcasename="Quantized_Convolution/0"value_param="OCV/CPU"status="run"time="0.391"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MatMul/0"value_param="OCV/CPU"status="run"time="0.133"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Gemm/0"value_param="OCV/CPU"status="run"time="0.039"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MatMul_Variable_Weights/0"value_param="OCV/CPU"status="run"time="0.09"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise/0"value_param="OCV/CPU"status="run"time="0.051"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise_Scalar/0"value_param="OCV/CPU"status="run"time="0.041"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise_Broadcast/0"value_param="OCV/CPU"status="run"time="0.042"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_LeakyReLU/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Sigmoid/0"value_param="OCV/CPU"status="run"time="0.034"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MaxPool/0"value_param="OCV/CPU"status="run"time="0.036"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_AvgPool/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Split/0"value_param="OCV/CPU"status="run"time="0.045"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Pad/0"value_param="OCV/CPU"status="run"time="0.045"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Reshape/0"value_param="OCV/CPU"status="run"time="0.036"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Transpose/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Squeeze/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Unsqueeze/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Resize/0"value_param="OCV/CPU"status="run"time="0.112"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Concat/0"value_param="OCV/CPU"status="run"time="0.081"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Constant/0"value_param="OCV/CPU"status="run"time="0.159"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_ONNX_nets"tests="1"failures="0"disabled="0"errors="0"time="28.347">    <testcasename="ResNet50_Int8/0"value_param="OCV/CPU"status="run"time="28.345"classname="Test_ONNX_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_TFLite"tests="1"failures="0"disabled="0"errors="0"time="24.551">    <testcasename="EfficientDet_int8/0"value_param="OCV/CPU"status="run"time="24.55"classname="Test_TFLite"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite></testsuites>

opencv_test_dnn control

<?xml version="1.0" encoding="UTF-8"?><testsuitestests="86"failures="0"disabled="2"errors="0"timestamp="2020-09-22T13:59:53"time="2899.68"cv_version="4.9.0-dev"cv_version_build="4.9.0-dev"cv_vcs_version="4.9.0-212-g0e44f3a544-dirty"cv_build_type="Debug"cv_build_type_build="Debug"cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)"cv_parallel_framework="pthreads"cv_parallel_threads="1"cv_cpu_features=""cv_ocl="disabled"test_tags=""test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser"test_tags_force=""name="AllTests">  <testsuitename="Test_Int8_layers"tests="40"failures="0"disabled="2"errors="0"time="6.676">    <testcasename="Convolution1D/0"value_param="OCV/CPU"status="run"time="0.108"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Convolution2D/0"value_param="OCV/CPU"status="run"time="0.902"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Convolution3D/0"value_param="OCV/CPU"status="run"time="0.073"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Flatten/0"value_param="OCV/CPU"status="run"time="0.114"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Padding/0"value_param="OCV/CPU"status="run"time="0.267"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="AvePooling/0"value_param="OCV/CPU"status="run"time="0.292"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MaxPooling/0"value_param="OCV/CPU"status="run"time="0.489"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Reduce/0"value_param="OCV/CPU"status="run"time="0.238"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ReLU/0"value_param="OCV/CPU"status="run"time="0.127"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="LeakyReLU/0"value_param="OCV/CPU"status="run"time="0.014"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ReLU6/0"value_param="OCV/CPU"status="run"time="0.064"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid/0"value_param="OCV/CPU"status="run"time="0.025"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid_dynamic_axes/0"value_param="OCV/CPU"status="run"time="0.027"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Sigmoid_1d/0"value_param="OCV/CPU"status="run"time="0.024"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Mish/0"value_param="OCV/CPU"status="run"time="0.023"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_Caffe/0"value_param="OCV/CPU"status="run"time="0.213"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_keras_TF/0"value_param="OCV/CPU"status="run"time="0.032"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_slim_TF/0"value_param="OCV/CPU"status="run"time="0.025"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_slim_v2_TF/0"value_param="OCV/CPU"status="run"time="0.034"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_ONNX/0"value_param="OCV/CPU"status="run"time="0.021"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Softmax_log_ONNX/0"value_param="OCV/CPU"status="run"time="0.021"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="DISABLED_Softmax_unfused_ONNX/0"value_param="OCV/CPU"status="notrun"time="0"classname="Test_Int8_layers" />    <testcasename="Concat/0"value_param="OCV/CPU"status="run"time="0.231"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="BatchNorm/0"value_param="OCV/CPU"status="run"time="0.409"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Scale/0"value_param="OCV/CPU"status="run"time="0.095"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="InnerProduct/0"value_param="OCV/CPU"status="run"time="1.34"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Reshape/0"value_param="OCV/CPU"status="run"time="0.41"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Permute/0"value_param="OCV/CPU"status="run"time="0.045"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Identity/0"value_param="OCV/CPU"status="run"time="0.078"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_split_tf/0"value_param="OCV/CPU"status="run"time="0.02"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_4d_tf/0"value_param="OCV/CPU"status="run"time="0.022"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_strided_tf/0"value_param="OCV/CPU"status="run"time="0.024"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="DISABLED_Slice_onnx/0"value_param="OCV/CPU"status="notrun"time="0"classname="Test_Int8_layers" />    <testcasename="Slice_dynamic_axes_onnx/0"value_param="OCV/CPU"status="run"time="0.026"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_2d_onnx11/0"value_param="OCV/CPU"status="run"time="0.042"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_3d_onnx11/0"value_param="OCV/CPU"status="run"time="0.057"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_4d_onnx11/0"value_param="OCV/CPU"status="run"time="0.044"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Slice_steps_5d_onnx11/0"value_param="OCV/CPU"status="run"time="0.043"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Dropout/0"value_param="OCV/CPU"status="run"time="0.116"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Eltwise/0"value_param="OCV/CPU"status="run"time="0.476"classname="Test_Int8_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_Int8_nets"tests="24"failures="0"disabled="0"errors="0"time="2740.59">    <testcasename="AlexNet/0"value_param="OCV/CPU"status="run"time="97.623"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="GoogLeNet/0"value_param="OCV/CPU"status="run"time="300.924"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="ResNet50/0"value_param="OCV/CPU"status="run"time="0.031"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="DenseNet121/0"value_param="OCV/CPU"status="run"time="272.641"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="SqueezeNet_v1_1/0"value_param="OCV/CPU"status="run"time="38.057"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Inception_v2/0"value_param="OCV/CPU"status="run"time="208.418"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v2/0"value_param="OCV/CPU"status="run"time="47.593"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Shufflenet/0"value_param="OCV/CPU"status="run"time="18.378"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_SSD/0"value_param="OCV/CPU"status="run"time="112.532"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v1_SSD/0"value_param="OCV/CPU"status="run"time="123.763"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="MobileNet_v1_SSD_PPN/0"value_param="OCV/CPU"status="run"time="115.315"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Inception_v2_SSD/0"value_param="OCV/CPU"status="run"time="464.413"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>    <testcasename="opencv_face_detector/0"value_param="OCV/CPU"status="run"time="304.788"classname="Test_Int8_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="EfficientDet/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_resnet50/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_inceptionv2/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_vgg16/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_2gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_1gb,mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="FasterRCNN_zf/0"value_param="OCV/CPU"status="run"time="0.002"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="RFCN/0"value_param="OCV/CPU"status="run"time="0.002"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,long,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="YoloVoc/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_1gb,long,debug_verylong"/><propertyname="tags_implied"value="mem_512mb,debug_long"/></properties>    </testcase>    <testcasename="TinyYoloVoc/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb,debug_verylong"/><propertyname="tags_implied"value="debug_long"/></properties>    </testcase>    <testcasename="YOLOv3/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="long,mem_1gb,debug_verylong"/><propertyname="tags_implied"value="debug_long,mem_512mb"/></properties>    </testcase>    <testcasename="YOLOv4/0"value_param="OCV/CPU"status="run"time="0.001"classname="Test_Int8_nets"><properties><propertyname="tags"value="long,mem_1gb,debug_verylong"/><propertyname="tags_implied"value="debug_long,mem_512mb"/></properties>    </testcase>    <testcasename="YOLOv4_tiny/0"value_param="OCV/CPU"status="run"time="636.056"classname="Test_Int8_nets"><properties><propertyname="tags"value="mem_512mb"/><propertyname="tags_implied"value=""/></properties>    </testcase>  </testsuite>  <testsuitename="Test_ONNX_layers"tests="20"failures="0"disabled="0"errors="0"time="1.534">    <testcasename="Quantized_Convolution/0"value_param="OCV/CPU"status="run"time="0.343"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MatMul/0"value_param="OCV/CPU"status="run"time="0.132"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Gemm/0"value_param="OCV/CPU"status="run"time="0.038"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MatMul_Variable_Weights/0"value_param="OCV/CPU"status="run"time="0.082"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise/0"value_param="OCV/CPU"status="run"time="0.054"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise_Scalar/0"value_param="OCV/CPU"status="run"time="0.043"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Eltwise_Broadcast/0"value_param="OCV/CPU"status="run"time="0.059"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_LeakyReLU/0"value_param="OCV/CPU"status="run"time="0.059"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Sigmoid/0"value_param="OCV/CPU"status="run"time="0.035"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_MaxPool/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_AvgPool/0"value_param="OCV/CPU"status="run"time="0.04"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Split/0"value_param="OCV/CPU"status="run"time="0.049"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Pad/0"value_param="OCV/CPU"status="run"time="0.046"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Reshape/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Transpose/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Squeeze/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Unsqueeze/0"value_param="OCV/CPU"status="run"time="0.037"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Resize/0"value_param="OCV/CPU"status="run"time="0.118"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Concat/0"value_param="OCV/CPU"status="run"time="0.077"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>    <testcasename="Quantized_Constant/0"value_param="OCV/CPU"status="run"time="0.134"classname="Test_ONNX_layers"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_ONNX_nets"tests="1"failures="0"disabled="0"errors="0"time="105.911">    <testcasename="ResNet50_Int8/0"value_param="OCV/CPU"status="run"time="105.91"classname="Test_ONNX_nets"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite>  <testsuitename="Test_TFLite"tests="1"failures="0"disabled="0"errors="0"time="44.96">    <testcasename="EfficientDet_int8/0"value_param="OCV/CPU"status="run"time="44.958"classname="Test_TFLite"><properties><propertyname="ocl_memory_usage"value="0"/></properties>    </testcase>  </testsuite></testsuites>
mshabunin and asmorkalov reacted with thumbs up emojidkurt and asmorkalov reacted with hooray emoji

asmorkalov pushed a commit that referenced this pull requestMay 28, 2024
3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions#25167# Summary### Previous contextFrom PR#24556: >> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: seehttps://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp>>   Reference documentation is here:>>   >>   *https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html>>   *https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html>>   *https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html>>   * Carotene library is turned on here:https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911> As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.> All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.> This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.### Progress#### Part 1 (This PR)- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)- [x] Element-wise add and subtract- [x] Element-wise minimum or maximum- [x] Element-wise absolute difference- [x] Bitwise logical operations- [x] Element-wise compare- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)- [x] Integral- [x] Threshold- [x] WarpAffine- [x] WarpPerspective- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)#### Part 2 (Next PR)**Rough Estimate. Todo List May Change.**- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)- smaller remap HAL interface- AdaptiveThreshold- BoxFilter- Canny- Convert- Filter- GaussianBlur- MedianBlur- Morph- Pyrdown- Resize- Scharr- SepFilter- Sobel- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)- FAST### Performance TestsThe optimization does not contain floating point opreations.**Absolute Difference**Geometric mean (ms)|Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)||---|:-:|:-:|:-:||Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87||Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97||Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59||Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99||Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70||Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00||Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57||Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00||Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75||Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00||Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78||Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00||Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75||Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00||Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82||Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00||Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73||Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01||Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81||Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00||Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70||Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01||Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69||Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01|**Bitwise Operation**Geometric mean (ms)|Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)||---|:-:|:-:|:-:||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59||Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73||Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72||Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52||Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58||Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79||Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83||Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69||Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56||Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63||Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79||Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55||Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59||Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68||Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70||Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57||Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55|### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [x] There is a reference to the original bug report and related work- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@mshabuninmshabuninmshabunin approved these changes

@vpisarevvpisarevAwaiting requested review from vpisarev

Assignees

No one assigned

Projects

None yet

Milestone

4.10.0

Development

Successfully merging this pull request may close these issues.

5 participants

@Junyan721113@asmorkalov@vpisarev@mshabunin@opencv-alalek

[8]ページ先頭

©2009-2025 Movatter.jp