Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
hal/riscv-rvv: implement FAST keypoint detection#27391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
| @param type FAST type | ||
| */ | ||
| inlineinthal_ni_FAST(const uchar* src_data,size_t src_step,int width,int height,uchar* keypoints_data,size_t* keypoints_count,int threshold,bool nonmax_suppression,int/*cv::FastFeatureDetector::DetectorType*/ type) {return CV_HAL_ERROR_NOT_IMPLEMENTED; } | ||
| inlineinthal_ni_FAST(const uchar* src_data,size_t src_step,int width,int height,std::vector<cv::KeyPoint>& keypoints,int threshold,bool nonmax_suppression,int/*cv::FastFeatureDetector::DetectorType*/ type) {return CV_HAL_ERROR_NOT_IMPLEMENTED; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It's dangerous replacement. HAL uses C style API without std::vector on purpose. HAL may be a binary built with own STL. It means that std::vector on HAL side and OpenCV side may be different with different memory layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Indeed. But I haven't figured out a way to modify the membersize instd::vector<cv::KeyPoint>& keypoints by simply reinterpretingkeypoints.data() touchar*. The originalCALL_HAL, causes wrong keypoint count in accuracy test since it compares the membersize of the outputstd::vector<cv::KeyPoint>. Is there any solution for the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
A simple solution is that you can extract the macro of calling HAL, manually call HAL function and do a resize on the vector before returning. E.g.
diff --git a/modules/features2d/src/fast.cpp b/modules/features2d/src/fast.cppindex a9615da5bd..b00b80818d 100644--- a/modules/features2d/src/fast.cpp+++ b/modules/features2d/src/fast.cpp@@ -438,8 +438,12 @@ void FAST(InputArray _img, std::vector<KeyPoint>& keypoints, int threshold, bool size_t keypoints_count = 10000; keypoints.clear(); keypoints.resize(keypoints_count);- CALL_HAL(fast, cv_hal_FAST, img.data, img.step, img.cols, img.rows,- (uchar*)(keypoints.data()), &keypoints_count, threshold, nonmax_suppression, type);+ int hal_ret = cv_hal_FAST(img.data, img.step, img.cols, img.rows, (uchar *)(keypoints.data()),+ &keypoints_count, threshold, nonmax_suppression, type);+ if (hal_ret == CV_HAL_ERROR_OK) {+ keypoints.resize(keypoints_count);+ return;+ } switch(type) { case FastFeatureDetector::TYPE_5_8:
Uh oh!
There was an error while loading.Please reload this page.
64415e0 to2ed698eCompare
fengyuentau left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also take a look at the trailing whitespaces:
hal/riscv-rvv/CMakeLists.txt:20: trailing whitespace.+ ${CMAKE_SOURCE_DIR}/modules/imgproc/include hal/riscv-rvv/src/features2d/fast.cpp:60: trailing whitespace.+inline uint8_t cornerScore(const uint8_t* ptr, const vuint16m2_t& v_offset, int64_t row_stride) hal/riscv-rvv/src/features2d/fast.cpp:64: trailing whitespace.+ hal/riscv-rvv/src/features2d/fast.cpp:76: trailing whitespace.+ hal/riscv-rvv/src/features2d/fast.cpp:108: trailing whitespace.+inline int fast_16(const uchar* src_data, size_t src_step, int width, int height, std::vector<KeyPoint>& keypoints, int threshold, bool nonmax_suppression) hal/riscv-rvv/src/features2d/fast.cpp:[19](https://github.com/opencv/opencv/actions/runs/15389955155/job/43296989535?pr=27391#step:14:20)0: trailing whitespace.+ hal/riscv-rvv/src/features2d/fast.cpp:[20](https://github.com/opencv/opencv/actions/runs/15389955155/job/43296989535?pr=27391#step:14:21)7: trailing whitespace.Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
fengyuentau left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM 👍
asmorkalov commentedJun 16, 2025
Muse Pi v30 board (./opencv_test_features2d): |
Haosonn commentedJun 16, 2025
I've checked this test. This cause of this failure is that the count of keypoints in this case is greater than 10000 which is set in the initialization in |
asmorkalov commentedJun 16, 2025
Looks like we need some API change here. |
Uh oh!
There was an error while loading.Please reload this page.
| @param realloc_func function for reallocation | ||
| */ | ||
| inlineinthal_ni_FAST(const uchar* src_data,size_t src_step,int width,int height,uchar* keypoints_data,size_t* keypoints_count,int threshold,bool nonmax_suppression,int/*cv::FastFeatureDetector::DetectorType*/ type) {return CV_HAL_ERROR_NOT_IMPLEMENTED; } | ||
| inlineinthal_ni_FAST(const uchar* src_data,size_t src_step,int width,int height,void** keypoints_data,size_t* keypoints_count,int threshold,bool nonmax_suppression,int/*cv::FastFeatureDetector::DetectorType*/ type,void* (*realloc_func)(void*,size_t)) {return CV_HAL_ERROR_NOT_IMPLEMENTED; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We usually presume backward compatibility in HAL. I propose to presume the original API and add new entrypoint with realloc, e.g.hal_ni_FAST_withRealloc.
On caller side you need to try the new one, then the old one and then default impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
hal_ni_FAST_withRealloc is quite bad name, I would say. We may have some next revisions of API and describing all improvements in the function name is a super-verbose method. I'd suggest to simply add "v2": hal_ni_FASTv2.
Are you fine that inside cv::FAST() we will just call the new HAL API (v2)? Or we need to call the old API as well?
fengyuentau commentedAug 1, 2025
Found regression on the CI configuration But these two test were fine on spacemit k1. |
| { | ||
| pixel_u[i] = pixel_u[i -16]; | ||
| } | ||
| v_offset =__riscv_vle16_v_u16m2(pixel_u,25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
CI node has 128x2-bit width of register, so max number of uint16_t elements in a lane is 16. 25 is just wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fixed.
Uh oh!
There was an error while loading.Please reload this page.
| { | ||
| pixel_u[i] = pixel_u[i -16]; | ||
| } | ||
| v_offset =__riscv_vle16_v_u16m2(pixel_u,25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Reminder.
fengyuentau commentedAug 7, 2025
@asmorkalov Regression is fixed. Manually tested on my side both on the QEMU setup and K1. Performance is slightly better thanbefore: GCC: Clang: |
asmorkalov commentedAug 7, 2025
Build issue on Mac: |
75d9ac3 intoopencv:4.xUh oh!
There was an error while loading.Please reload this page.
hal/riscv-rvv: implement FAST keypoint detectionopencv#27391An implementation of FAST keypoint detection with NMS/noNMS version.A new perf test is written, and the perf test is evaluated in two platforms: K1/K230.Accelaration is achieved when threshold is high, however, weird stat shows that the acceleration doesn't work when threshold is low (the number of keypoint candidates is high).K1:```# GCC Name of Test scalar rvv rvv vs scalar (x-factor)detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png") 22.113 23.721 0.93detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png") 4.605 7.168 0.64detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png") 26.228 24.689 1.06detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png") 7.134 7.561 0.94detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png") 19.488 21.407 0.91detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png") 3.481 5.404 0.64detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png") 22.309 22.145 1.01detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png") 4.826 5.654 0.85detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 14.108 8.205 1.72detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png") 2.520 1.072 2.35detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png") 14.133 8.410 1.68detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png") 2.556 1.097 2.33# Clang Name of Test scalar rvv rvv vs scalar (x-factor)detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png") 25.130 23.695 1.06detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png") 4.987 7.168 0.70detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png") 28.035 24.467 1.15detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png") 6.760 7.503 0.90detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png") 22.954 21.373 1.07detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png") 3.838 5.330 0.72detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png") 24.523 21.998 1.11detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png") 4.795 5.543 0.87detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 16.799 8.102 2.07detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png") 2.874 1.024 2.81detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png") 16.950 8.073 2.10detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png") 2.899 1.027 2.82```K230```# GCC Name of Test scalar rvv rvv vs scalar (x-factor)detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png") 21.082 32.090 0.66detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png") 4.837 9.157 0.53detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png") 25.479 33.576 0.76detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png") 7.549 9.716 0.78detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png") 18.463 30.087 0.61detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png") 3.716 6.544 0.57detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png") 21.548 31.374 0.69detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png") 5.107 6.928 0.74detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 13.763 8.712 1.58detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png") 2.578 1.284 2.01detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png") 13.804 8.831 1.56detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png") 2.615 1.289 2.03# Clang Name of Test scalar rvv rvv vs scalar (x-factor)detect::Fast_Params::(20, 2, false, "cv/cameracalibration/chess9.png") 23.424 35.072 0.67detect::Fast_Params::(20, 2, false, "cv/inpaint/orig.png") 5.284 10.107 0.52detect::Fast_Params::(20, 2, true, "cv/cameracalibration/chess9.png") 26.487 35.978 0.74detect::Fast_Params::(20, 2, true, "cv/inpaint/orig.png") 7.146 10.612 0.67detect::Fast_Params::(30, 2, false, "cv/cameracalibration/chess9.png") 21.155 32.858 0.64detect::Fast_Params::(30, 2, false, "cv/inpaint/orig.png") 4.101 7.153 0.57detect::Fast_Params::(30, 2, true, "cv/cameracalibration/chess9.png") 23.321 33.505 0.70detect::Fast_Params::(30, 2, true, "cv/inpaint/orig.png") 5.106 7.415 0.69detect::Fast_Params::(100, 2, false, "cv/cameracalibration/chess9.png") 15.597 8.792 1.77detect::Fast_Params::(100, 2, false, "cv/inpaint/orig.png") 2.922 1.228 2.38detect::Fast_Params::(100, 2, true, "cv/cameracalibration/chess9.png") 15.626 8.817 1.77detect::Fast_Params::(100, 2, true, "cv/inpaint/orig.png") 2.963 1.240 2.39```### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [ ] The PR is proposed to the proper branch- [ ] There is a reference to the original bug report and related work- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
An implementation of FAST keypoint detection with NMS/noNMS version.
A new perf test is written, and the perf test is evaluated in two platforms: K1/K230.
Accelaration is achieved when threshold is high, however, weird stat shows that the acceleration doesn't work when threshold is low (the number of keypoint candidates is high).
K1:
K230
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.