Disclosure of Invention
In view of the above, the embodiment of the application provides a feature point matching method, a training method of a feature point matching model and related equipment, which can improve the accuracy of feature point matching.
The embodiment of the application provides a feature point matching method, which is applied to electronic equipment, wherein a pre-trained feature point matching model is deployed in the electronic equipment, the feature point matching model comprises a bidirectional feature matching layer, and the feature point matching method comprises the following steps:
acquiring a first image and a second image to be subjected to feature point matching;
extracting feature images of the first image and the second image to obtain a first feature image and a second feature image;
Performing bidirectional feature matching on the first feature map and the second feature map based on the bidirectional feature matching layer to obtain bidirectional feature matching information;
The bidirectional feature matching information comprises feature point pairs obtained by feature point matching with the first feature map and the second feature map as references respectively;
and obtaining a feature point matching result of the first image and the second image based on the bidirectional feature matching information.
According to the embodiment of the application, the feature point matching is performed by adopting the pre-trained feature point matching model, so that the accuracy of feature matching can be improved, the feature point matching model comprises the bidirectional feature matching layer, and the bidirectional feature matching layer can perform feature point matching by taking two feature images as references respectively, so that unmatched feature points in the two feature images can be removed, and the matching accuracy is further improved.
In some embodiments, performing bidirectional feature matching on the first feature map and the second feature map based on the bidirectional feature matching layer to obtain bidirectional feature matching information, including:
searching feature points which are respectively matched with each feature point in the first feature map in the second feature map based on the bidirectional feature matching layer to obtain a first feature point pair set;
searching the characteristic points which are respectively matched with the characteristic points in the second characteristic map in the first characteristic map based on the bidirectional characteristic matching layer to obtain a second characteristic point pair set;
the obtaining the feature point matching result of the first image and the second image based on the bidirectional feature matching information comprises the following steps:
And taking the intersection of the first characteristic point pair set and the second characteristic point pair set as a characteristic point matching result of the first image and the second image.
The embodiment of the application takes the intersection set as the characteristic point matching result, so that the mutually matched characteristic point pairs in the characteristic point matching result are not only elements in the first characteristic point set, but also elements in the second characteristic point set, and the mismatching characteristic point pairs in the first characteristic point set and the second characteristic point set can be effectively removed, thereby ensuring the accuracy of characteristic point matching. In some embodiments, the feature point matching model further includes a feature point extraction layer and a feature description layer;
the extracting the feature images of the first image and the second image to obtain a first feature image and a second feature image comprises the following steps:
Extracting features of the first image and the second image based on the feature point extraction layer to obtain feature points of the first image and the second image;
and determining feature descriptors of the feature points based on the feature description layer, generating feature graphs corresponding to each image based on the feature descriptors of the feature points, and obtaining the first feature graphs and the second feature graphs.
According to the embodiment of the application, the artificial intelligent model is adopted to extract the characteristic points and the characteristic descriptors thereof, so that the dependence of the characteristic point extraction process on image texture information can be reduced, and even if the image is shielded, illuminated and the like, the image texture is lost, the characteristic map can be extracted more accurately.
In some embodiments, the electronic device includes a neural network processor or a graphics processor, and the feature point matching model is deployed on the neural network processor or the graphics processor.
The neural network processor or the graphic processor can be adapted to a large number of parallel matrix operations, so that the embodiment of the application deploys the characteristic point matching model on the neural network processor or the graphic processor, thereby improving the characteristic point matching efficiency and the operation precision of the characteristic point matching model, and further improving the accuracy of the characteristic point matching.
In some embodiments, the electronic device further includes a central processor, and the electronic device is communicatively connected to at least one camera, the camera is configured in a mobile device, the camera is configured to capture an image of a mobile environment during a movement process of the mobile device, and the acquiring a first image and a second image to be subjected to feature point matching includes:
acquiring the first image and the second image to be subjected to feature point matching from the camera;
After the feature point matching result of the first image and the second image is obtained based on the bidirectional feature matching information, the method further comprises:
Controlling the neural network processor or the graphic processor to transmit the feature point matching result to the central processing unit;
The central processing unit is used for positioning the mobile device based on the characteristic point matching result.
The embodiment of the application also provides a training method of the feature point matching model, wherein the feature point matching model comprises a bidirectional feature matching layer, and the training method comprises the following steps:
Acquiring a training data set, wherein the training data set comprises a first sample image, a second sample image and a characteristic point matching label of each sample image;
Extracting feature images of the first sample image and the second sample image to obtain a first sample feature image and a second sample feature image;
performing bidirectional feature matching on the first sample feature map and the second sample feature map based on the bidirectional feature matching layer to obtain bidirectional feature matching information;
The bidirectional feature matching information comprises feature point pairs obtained by feature point matching with the first sample feature diagram and the second sample feature diagram as references respectively;
based on the bidirectional feature matching information, obtaining feature point matching results of the first sample image and the second sample image;
and updating network parameters of the feature point matching model based on the feature point matching result and the feature point matching label.
In some embodiments, the feature point matching model further includes a feature point extraction layer and a feature description layer;
extracting feature graphs of the first sample image and the second sample image to obtain a first sample feature graph and a second sample feature graph, wherein the method comprises the following steps:
extracting features of the first sample image and the second sample image based on the feature point extraction layer to obtain feature points of the first sample image and the second sample image;
And determining feature descriptors of the feature points based on the feature description layer, and generating feature graphs corresponding to each sample image based on the feature descriptors of the feature points to obtain the first sample feature graphs and the second sample feature graphs.
In some embodiments, the step of acquiring the first sample image and the second sample image comprises:
acquiring a first original image and a second original image;
Respectively performing resolution scaling treatment on the first original image and the second original image based on a preset feature pyramid model to obtain a first image set and a second image set;
wherein the first set of images includes a plurality of images of different resolutions corresponding to the first original image, and the second set of images includes a plurality of images of different resolutions corresponding to the second original image;
And selecting a pair of images from the first image set and the second image set respectively for two-by-two combination to obtain the first sample image and the second sample image.
According to the embodiment of the application, the original image is scaled by adopting the feature pyramid model so as to obtain images with different resolutions, so that the feature point matching model can learn and match the image features with different resolutions, and the robustness of the feature point matching model when the feature point matching model faces the images with different resolutions is improved.
The embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory is used for storing instructions, and the processor is used for calling the instructions in the memory so that the electronic equipment executes the characteristic point matching method or the training method of the characteristic point matching model.
The embodiment of the application also provides a computer readable storage medium, which stores computer instructions, and when the computer instructions run on the electronic equipment, the electronic equipment is caused to execute the characteristic point matching method or the training method of the characteristic point matching model.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. The embodiments of the present application and the features in the embodiments may be combined with each other without collision.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, and the described embodiments are merely some, rather than all, of the embodiments of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
It is further intended that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The term "at least one" in the present application means one or more, and "a plurality" means two or more. "and/or" describes an association relationship of associated objects, meaning that there may be three relationships, e.g., A and/or B may mean that A alone exists, while A and B together exist, and B alone exists, where A, B may be singular or plural. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Feature point matching may refer to a method of extracting features of a plurality of images, respectively, and then determining the same features in the plurality of images using the similarity of the features.
At present, two feature sets can be extracted from two images respectively by feature point matching, and for each feature in the first feature set, the most similar feature is found out in the second feature set to be used as a matched feature, so that feature points matched with each other in the two images are obtained.
However, if a feature in the first feature set does not actually have a matching feature point in the second feature set due to the view angle transformation of the two images, the difference between the photographed scenes, and the like, the most similar feature point is still found in the second feature set by the above method, and the most similar feature point is used as the matching feature point, so that the matching feature point is misdetected, that is, the feature point matching is performed by the above method, more matching errors are caused, and the accuracy of feature point matching is low.
In view of the foregoing, embodiments of the present application provide a feature point matching method, a training method of a feature point matching model, an electronic device, and a computer readable storage medium, so as to solve the problem of low accuracy of feature point matching in the related art.
First, referring to fig. 1, fig. 1 is a schematic view of an application scenario of a feature point matching system according to an embodiment of the present application. The feature point matching system can comprise electronic equipment, wherein the electronic equipment is used for executing the feature point matching method provided by the embodiment of the application.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a Processor, a micro-program controller (Microprogrammed Control Unit, MCU), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like. The electronic device may be a portable electronic device, a personal computer, a server, etc.
The electronic device may also be configured with a processor, such as a central processing unit (Central Processing Unit, CPU), a neural network processor (Neural network Processing Unit, NPU), and/or a graphics processor (Graphics Processing Unit, GPU).
The characteristic point matching method adopts a characteristic point matching model to match characteristic points. The feature point matching model can be deployed on a CPU, a GPU, an NPU, or the like. That is, the feature point matching model may run on a CPU, GPU, NPU, or the like.
The feature point matching model can identify feature points (also known as homonymous points) matched with each other between two images so as to obtain a feature matching result. The feature points of the image are corner points, edges and the like of the image. The homonymy point refers to a plurality of feature points corresponding to the same position in the imaging scene, for example, the feature points correspond to a specific angular point of a building.
After the electronic device obtains the feature matching result based on the feature point matching model, image stitching, target detection, positioning and the like can be performed based on the feature point matching result, but the electronic device is not limited to the above.
For example, in the application scenario shown in fig. 1, the electronic device may locate the mobile device based on Visual-Inertial Odometry (VIO) and feature matching results.
The mobile device may be a vehicle, an unmanned aerial vehicle, a mobile robot, etc., but is not limited thereto.
The electronic device may be configured in the mobile device or may be configured independently of the mobile device.
In the case where the electronic device is independent of the mobile device, the electronic device may be in communication with the mobile device, e.g., the electronic device may be remotely communicatively coupled to a camera and inertial sensor on the mobile device.
Fig. 1 illustrates an example in which the electronic device is provided independently of the mobile device, and in this application scenario, the electronic device is communicatively connected to at least one camera, the camera is configured to the mobile device, and the camera is configured to capture images of a mobile environment during movement of the mobile device.
The electronic device may obtain two images from the camera to be feature point matched, which may be denoted as a first image and a second image.
For example, in the images shot by the camera, two adjacent frames of images are used as two images to be subjected to feature point matching, or one image is selected every preset frame number, the selected two adjacent frames of images are used as two images to be subjected to feature point matching, and the two images to be subjected to feature point matching are transmitted to the electronic equipment.
The electronic device can perform feature point matching on the first image and the second image based on the feature point matching method provided by the embodiment of the application so as to obtain feature point matching results of the first image and the second image.
Then, the electronic device may further determine a displacement generated by the mobile device during the two image capturing periods based on the feature point matching result, and locate the mobile device based on the displacement.
For example, the electronic device may employ VIO technology to locate the mobile device. Specifically, the mobile device is further provided with an inertial sensor, and the electronic device can use the VIO technology and combine the displacement and the measured value of the inertial sensor to calculate the position of the mobile device, so as to position the mobile device.
In some embodiments, referring to fig. 2, the electronic device includes a CPU and at least one of an NPU and a GPU.
The feature point matching model can be deployed on the NPU or the GPU, after the NPU or the GPU obtains a feature point matching result based on the feature point matching model, the feature point matching result can be transmitted to the CPU, and the CPU can determine displacement generated by the mobile device in the two image shooting time periods based on the feature point matching result and position the mobile device based on the displacement.
The NPU and the GPU can be adapted to a large number of parallel matrix operations, so that the characteristic point matching model is deployed on the GPU or the NPU, the characteristic point matching speed can be improved, the time consumption of characteristic point matching is reduced, the operation precision of the characteristic point matching model is improved, and the accuracy of characteristic point matching is improved.
The matrix operation amount required in the process of positioning based on the feature point matching result is less, so that a CPU can be adopted for processing, and the positioning process can be adapted to the operation characteristics of each processor.
The above processor for deploying the feature point matching model and the processor for performing positioning calculation are examples, and may be set according to requirements in the actual application process, which is not limited by the embodiment of the present application.
For example, the feature point matching model may be deployed on the CPU, and the CPU performs positioning of the mobile device based on the feature point matching result, or for example, the feature point matching model is deployed on the NPU or the GPU, and the NPU or the GPU performs positioning of the mobile device based on the feature point matching result.
Referring to fig. 3, fig. 3 is a schematic diagram of a model structure of a feature point matching model according to an embodiment of the present application, where the feature point matching model at least includes a bidirectional feature matching layer. The feature point matching model may be trained based on a deep learning model, such as a convolutional neural network (Convolutional Neural Network, CNN).
The bidirectional feature matching layer is used for carrying out bidirectional feature matching on the features of the two images to obtain bidirectional feature matching information. The bidirectional feature matching information comprises feature point pairs obtained by feature point matching based on the first feature map and the second feature map respectively.
The feature map of an image may be used to represent feature points of the image and its feature descriptors. For example, the feature map presents the image coordinates of feature points and their feature descriptors in a vector form.
The feature descriptor Fu Youchen describes that the feature descriptor can uniquely represent the image area around the feature point, such as the brightness of surrounding pixels.
In some embodiments, the electronic device may receive a first feature map and a second feature map, where the first feature map is a feature map of a first image and the second feature map is a feature map of a second image, and input the first feature map and the second feature map into a bi-directional feature matching layer, so that the bi-directional feature matching layer may perform bi-directional feature matching based on the feature maps corresponding to the two images.
In other embodiments, the electronic device may receive the first image and the second image, extract feature maps corresponding to the first image and the second image based on a preset feature extraction algorithm, and then use the first feature map and the second feature map as inputs of the bi-directional feature matching layer.
The preset feature extraction algorithm may be, but not limited to, scale-invariant feature transform (Scale-INVARIANT FEATURE TRANSFORM, SIFT), directional fast rotation bulletin algorithm (Oriented FAST and Rotated BRIEF, ORB), acceleration robust feature algorithm (Speeded-Up Robust Features, SURF), etc.
In other embodiments, the feature point matching model may further include a feature point extraction layer and a feature description layer.
For example, the feature point matching model may include a feature point extraction layer and a feature description layer, and the electronic device may sequentially input the first image and the second image into the feature point extraction layer and the feature description layer according to a sequential order, that is, may sequentially extract feature graphs of the first image and the second image in a serial manner.
For another example, referring to fig. 3, the feature point matching model may include two feature point extraction layers and two feature description layers, and the first image and the second image may be input in parallel with the two feature point extraction layers and the two feature description layers to obtain a first feature map corresponding to the first image and a second feature map corresponding to the second image, that is, the feature maps of the first image and the second image may be sequentially extracted in parallel.
According to the embodiment, the feature extraction can be performed on the feature graphs of the first image and the second image in parallel, so that the time for extracting the feature graphs of the first image and the second image by the feature point matching model can be reduced, and the feature point matching efficiency of the feature point matching model is improved.
The feature point extraction layer and the feature description layer may be trained based on a deep learning model including, but not limited to, a convolutional neural network, a deep convolutional neural network (AlexNet), and the like.
The feature point extraction layer may be configured to extract features of the first image and the second image based on the first image and the second image, and obtain feature points of the two images.
For example, in the case where the feature point extraction layer is created based on a convolutional neural network, when an image is input to the feature point extraction layer, the feature point extraction layer may calculate a feature point probability for each position in the image, and then select a position where the feature point probability is larger as a feature point extracted based on the image.
The feature description layer is used for determining feature descriptors of feature points of the first image and the second image, and generating a feature map corresponding to each image based on the feature descriptors of the feature points.
For example, the characterization layer may include a plurality of convolution layers. The plurality of convolution layers enables the feature description layer to learn deep about the feature points of the image and their feature descriptors.
The number of the convolution layers can be set according to practical application requirements, and the embodiment of the application is not limited to the number.
Each convolution layer may include, but is not limited to, a convolution kernel, an activation function, a batch normalization process, and the like.
The size of the convolution kernel can be set according to practical application requirements, for example, the size of the convolution kernel can be set to be 3×3.
The type of the activation function may also be set according to practical application requirements, where the activation functions are different, the nonlinear capability of the feature description layer is different, and configuring a reasonable activation function in the convolution layer may improve the nonlinear capability of the feature description layer, for example, the activation function may select a linear rectification function (RECTIFIED LINEAR Unit, reLU), but is not limited thereto.
The batch normalization process can accelerate the training of the convolutional neural network and improve the performance of the model.
It can be understood that the feature point extraction layer and the feature description layer may belong to the same deep learning model, that is, one deep learning model is adopted to perform feature point extraction and feature point description, so as to improve the generation efficiency of the feature map.
In other embodiments, the feature point extraction layer and the feature description layer may also belong to different deep learning models, for example, the feature point extraction layer adopts the deep learning model 1, and the feature description layer adopts the deep learning model 2. According to the embodiment, the feature point extraction and the feature point description are respectively carried out through the two independent deep learning models, so that the corresponding relation between the feature points and the feature descriptors can be learned more accurately, and the scene is generated by adapting to a more complex feature map.
After the feature point extraction layer and the feature description layer obtain the feature images of the two images, the feature description layer can transmit the first feature image and the second feature image to the bidirectional feature matching layer.
And after the two-way feature matching layer acquires the feature images of the two images, the feature images of the two images are subjected to two-way feature matching to obtain two-way feature matching information.
The bidirectional feature matching layer can perform feature point matching by taking feature graphs of the two images as references respectively so as to determine feature point pairs, thereby obtaining bidirectional feature matching information.
Then, the electronic device can obtain feature point matching results of the two images based on the bidirectional feature matching information.
For example, the bidirectional feature matching layer is used for searching the feature points which are respectively matched with the feature points in the first feature map in the second feature map to obtain a first feature point pair set, and the bidirectional feature matching layer is also used for searching the feature points which are respectively matched with the feature points in the second feature map in the first feature map to obtain a second feature point pair set.
That is, the bi-directional feature matching information may include a first set of feature point pairs and a second set of feature point pairs.
The electronic device may calculate an intersection of the first set of feature point pairs and the second set of feature point pairs, with the intersection being a feature point matching result.
The embodiment of the application takes the intersection set as the characteristic point matching result, so that the mutually matched characteristic point pairs in the characteristic point matching result are not only elements in the first characteristic point set, but also elements in the second characteristic point set, and the mismatching characteristic point pairs in the first characteristic point set and the second characteristic point set can be effectively removed, thereby ensuring the accuracy of characteristic point matching.
The bi-directional feature matching layer may employ a convolutional neural network, but is not limited thereto.
The bidirectional feature matching layer may include one or more full-connection layers to evaluate similarity between feature points, so that when a feature map of one image is used as a reference, and a matching feature point is searched in a feature map of another image, a matching probability or a matching score of every two feature points may be output, and the matching feature point is determined based on the matching probability or the matching score, for example, if a feature point matched with a certain feature point is to be determined, a feature point with the largest matching probability or matching score with the feature point may be selected as a feature point matched with the feature point, so as to obtain a feature point pair.
The feature point matching result is used to describe two feature points that are bi-directionally matched, and in some embodiments, the matched feature points may be presented in the form of pixel coordinates of the image.
Referring to fig. 4, fig. 4 is a flowchart illustrating steps of an embodiment of a feature point matching method according to the present application. It will be appreciated that the order of the steps in the flow chart of the present embodiment may be changed and some steps may be omitted according to different needs.
The feature point matching method may include the following steps.
In step 401, a first image and a second image to be subjected to feature point matching are acquired.
For example, two images to be subjected to feature point matching may be acquired from the camera, for example, two adjacent frames of images captured by the camera may be respectively used as the first image and the second image.
And step 402, extracting feature graphs of the first image and the second image to obtain the first feature graph and the second feature graph.
The feature map of each sub-image is used for representing feature points and feature descriptors of the feature points of the sub-image.
In some embodiments, the electronic device may extract the feature map of the first image and the second image based on a preset feature extraction algorithm, which may be SIFT, ORB, SURF, but is not limited thereto.
In other embodiments, the feature point matching model further includes a feature point extraction layer and a feature description layer as shown in fig. 3, where the electronic device may obtain feature points of the first image and the second image in the step 401 based on the feature point extraction layer, determine a feature descriptor of the feature point of each image based on the feature description layer, and generate a feature map corresponding to each image based on the feature descriptor of the feature point.
Both the feature point extraction layer and the feature description layer may be created based on a deep learning model.
According to the embodiment of the application, the deep learning model is adopted to extract the characteristic points, so that the dependence of characteristic point extraction on image texture information can be reduced, and therefore, even if the image is shielded, illuminated and the like, the image texture is lost, the characteristic map can be extracted more accurately.
And step 403, performing bidirectional feature matching on the first feature map and the second feature map based on the bidirectional feature matching layer to obtain bidirectional feature matching information.
The bidirectional feature matching information comprises feature point pairs obtained by feature point matching with the first feature map and the second feature map as references respectively.
Further, in some embodiments, the bi-directional feature matching information includes a first set of feature point pairs and a second set of feature points.
Step 403 may include searching, based on the bi-directional feature matching layer, for feature points in the second feature map that are respectively matched with feature points in the first feature map, to obtain a first set of feature point pairs.
For example, the first feature map includes feature points A1, the second feature map includes feature points B1, B2, and B3, and the bi-directional feature matching layer may calculate the similarity between A1 and B1, for example, calculate the similarity between A1 feature descriptor and B1 feature descriptor, the similarity between A1 and B2, and the similarity between A1 and B3, and if the similarity between A1 and B1 is the largest among the three similarities, the first feature point pair set includes feature point pairs A1 and B1.
Step 403 may further include searching for feature points in the first feature map, where the feature points are respectively matched with feature points in the second feature map, to obtain a second set of feature point pairs.
For example, the first feature map includes feature points A1 and A2, the second feature map includes feature point B1, and the bidirectional feature matching layer may calculate the similarity between A1 and B1 and the similarity between A2 and B1, respectively, and if the similarity between A2 and B1 is the largest in the two similarities, the second feature point pair set includes feature point pairs A2 and B1.
And step 404, obtaining a feature point matching result of the first image and the second image based on the bidirectional feature matching information.
The feature point matching result is used for describing the feature point pairs of the bidirectional matching, and the matched feature points can be presented in the form of image coordinates.
In some embodiments, step 404 may include taking an intersection of the first set of feature point pairs and the second set of feature point pairs as the feature point matching result.
In some embodiments, after step 404, the electronic device may further reject the mismatching feature points in the feature point matching result based on a preset mismatching feature reject algorithm.
The mismatching feature eliminating algorithm includes a geometric consistency check algorithm and/or a bidirectional matching check algorithm, but is not limited thereto.
A geometric consistency check algorithm may be used to exclude false matches of the spatial geometric model, and the geometric consistency check algorithm may be a random sample consensus (Random Sample Consensus, RANSAC) algorithm, but is not limited thereto.
The bi-directional match checking algorithm may also be referred to as a symmetry test.
Assuming that two images are marked as a first image and a second image, a feature point pair is selected from the feature point matching result, wherein the feature point located in the first image is marked as a first feature point, and the feature point located in the second image is marked as a second feature point.
The symmetry test process includes judging whether the second feature point is the feature point which is most matched with the first feature point in the second image or not, for example, calculating the similarity between the first feature point and the feature points in the second image, judging whether the feature point with the largest similarity is the second feature point or not, judging whether the first feature point is the feature point which is most matched with the second feature point in the feature points of the first image or not, and judging whether the feature point is the feature point which is most matched with the second feature point or not when the first feature point is the feature point which is most matched with the second feature point in the feature points of the first image, and determining that the symmetry test passes when the second feature point is the feature point which is most matched with the first feature point in the feature points of the second image, wherein the feature point pair can be removed from the feature point matching result, and otherwise, the feature point pair needs to be removed again from the feature point matching result.
In some embodiments, the mismatching rejection algorithm may further employ a proportion test, where the proportion test may include searching for two feature points with the greatest similarity to the first feature point in the second image, calculating a ratio of a closest distance to a next closest distance to obtain a distance ratio when the similarity between the two feature points is measured by using a distance between the two feature points, and rejecting the feature point pair if the distance ratio is less than a preset threshold, and retaining the feature point pair if the distance ratio is greater than the preset threshold.
The embodiment of the application eliminates the feature points which are mismatched, and further improves the accuracy.
In some embodiments, the electronic device may also employ VIO techniques and feature point matching results to locate the mobile device.
According to the embodiment of the application, the feature point matching is performed by adopting the pre-trained feature point matching model, so that the feature point matching accuracy can be improved, the feature point matching model comprises the bidirectional feature matching layer, the bidirectional feature matching layer can adopt two feature graphs as references respectively to obtain bidirectional feature matching information, and abnormal feature points in the two feature graphs are removed, so that the feature point matching of the subsequent feature matching layer is facilitated, and the matching accuracy is further improved.
In addition, the embodiment of the application adopts the artificial intelligent model to extract the characteristic points, so that the dependence of characteristic point extraction on image texture information can be reduced, and the condition of characteristic point loss caused by image texture information loss is reduced, therefore, even if the image has the condition of image texture loss caused by shielding, illumination and the like, the characteristic map can be extracted more accurately, and the success rate of positioning is improved.
In addition, the characteristic point matching model of the embodiment of the application is deployed in the neural network processor or the graphic processor, so that the characteristic point matching efficiency can be improved, the operation precision of the characteristic point matching model can be improved, and the accuracy of characteristic point matching can be further improved.
The embodiment of the application also provides a training method of the feature point matching model. The training method of the feature point matching model can be applied to electronic equipment.
It may be understood that the electronic device performing the training method of the feature point matching model and the electronic device performing the feature point matching method may be the same electronic device or may be different electronic devices, which is not limited in the embodiment of the present application.
The feature point matching model may include a bi-directional feature matching layer as shown in fig. 3.
For details of the feature point matching model, reference may be made to description of the feature point matching model in the embodiment of the feature point matching method, which is not described herein.
Referring to fig. 5, the training method of the feature point matching model according to the embodiment of the present application may include:
Step 501, a training data set is acquired.
The training data set comprises a first sample image, a second sample image and a characteristic point matching label of each sample image.
The feature point matching tag may include a feature map, such as feature points and feature descriptors thereof, of each sample image, and a matching relationship of feature points in the two sample images.
The feature point matching labels can be marked manually, or feature graphs corresponding to the two sample images can be extracted by adopting a preset feature extraction algorithm for marking, and the preset feature extraction algorithm can be SIFT, ORB, SURF, but is not limited to the above.
For example, the following two sample images are taken as an example, and the step of generating the feature point matching label may include:
First, two sample images are acquired.
Then, a preset feature extraction algorithm, such as SIFT, ORB, SURF, is adopted to extract feature graphs corresponding to the two sample images.
After extracting the feature images corresponding to the two sample images, the feature points generated by image noise, image distortion, image blurring and the like can be removed from the feature images corresponding to the two sample images by adopting manpower.
And then, calculating the Hamming distance of the feature points based on the feature graphs corresponding to the two sample images, and adopting matching algorithms such as Hamming distance, violent matching and the like to obtain the matching relation of the feature points in the two sample images.
After the matching relation of the feature points in the two sample images is obtained, the feature points which are mistakenly matched in the matching relation can be removed based on a proportion test, a symmetry test and RANSA algorithm, so that a feature point matching label is obtained.
The feature point matching samples may be stored in a sample set, and the feature point matching samples may be taken out from the sample set during each training round.
In some embodiments, referring to fig. 6, the acquiring at least two sample images may include:
First, a first original image 601 and a second original image 602 are acquired. For example, the original images can be images shot by the unmanned aerial vehicle under various view angles and various shooting environments (such as illumination and shielding in the shooting environments), so that the trained feature point matching model can accurately match the feature points of the images of various view angles and various shooting environments.
And then, respectively carrying out resolution scaling processing on the first original image and the second original image based on a preset feature pyramid model to obtain a first image set and a second image set.
Wherein the first set of images includes a plurality of different resolution images corresponding to the first original image. For example, the first set of images includes a first reduced image 603 and a first enlarged image 605. The first reduced image 603 is an image obtained by subjecting the first original image to resolution reduction by the feature pyramid model. The first magnified image 605 is an image of the feature pyramid model obtained by resolution magnification of the first original image.
The second set of images includes a plurality of different resolution images corresponding to the second original image. For example, the second set of images includes a second reduced image 604 and a second enlarged image 606. The second reduced image 604 is an image of the feature pyramid model obtained by resolution reducing the second original image. The second magnified image 606 is an image of the feature pyramid model that is obtained by resolution magnification of the second original image.
After the first image set and the second image set are acquired, the electronic device can respectively select one pair of images in the first image set and the second image set to be combined in pairs to obtain a first sample image and a second sample image.
For example, the two sample images to be subjected to matching training are denoted as a sample image group, and the sample image group formed by combining the two sample images based on the first image set and the second image set shown in fig. 6 may include (a first original image 601, a second original image 602), (a first original image 601, a second reduced image 604), (a first original image 601, a second enlarged image 606), (a first reduced image 603, a second original image 602), (a first reduced image 603, a second reduced image 604), (a first reduced image 603, a second enlarged image 606), (a first enlarged image 605, a second original image 602), (a first enlarged image 605, a second reduced image 604), and (a first enlarged image 605, a second enlarged image 606). The electronic equipment can take the sample image groups as feature matching samples participating in training, so that the feature point matching model can learn image features with different resolutions and match the image features, and the robustness of the feature point matching model when the feature point matching model faces images with different resolutions is improved.
And step 502, extracting feature graphs of the first sample image and the second sample image to obtain the first sample feature graph and the second sample feature graph.
In some embodiments, the feature point matching model further comprises a feature point extraction layer and a feature description layer, wherein the feature map comprises feature points of the sample image and feature descriptors of the feature points, and step 502 may comprise extracting features of the first sample image and the second sample image based on the feature point extraction layer to obtain feature points of the first sample image and the second sample image, determining the feature descriptors of the feature points based on the feature description layer, and generating a feature map corresponding to each sample image based on the feature descriptors of the feature points to obtain the first sample feature map and the second sample feature map.
Step 503, performing bidirectional feature matching on the first sample feature map and the second sample feature map based on the bidirectional feature matching layer to obtain bidirectional feature matching information.
The bidirectional feature matching information comprises feature point pairs obtained by feature point matching with the first sample feature diagram and the second sample feature diagram as references respectively.
Further, in some embodiments, the bi-directional feature matching information includes a first set of sample feature point pairs and a second set of sample feature points. Step 403 may include searching, based on the bidirectional feature matching layer, for feature points in the second sample feature map that are respectively matched with feature points in the first sample feature map to obtain a first sample feature point pair set, and searching, in the first sample feature map, for feature points in the first sample feature map that are respectively matched with feature points in the second sample feature map to obtain a second sample feature point pair set.
And step 504, obtaining a feature point matching result of the first sample image and the second sample image based on the bidirectional feature matching information.
In some embodiments, step 504 may include taking an intersection of the first set of sample feature point pairs and the second set of sample feature point pairs as the feature point matching result.
And step 505, updating network parameters of the feature point matching model based on the feature point matching result and the feature point matching label.
In some embodiments, based on the feature point matching result and the feature point matching label, the network parameters of the feature point matching model can be updated by adopting a gradient descent algorithm, and the feature point matching model is iteratively trained until the loss function converges, so as to obtain a trained feature point matching model.
The above embodiment trains the layers of the feature point matching model together as a whole.
In other embodiments, the feature point extraction layer, the feature description layer, the bi-directional feature matching layer, and the feature matching layer may be trained separately.
For example, based on the feature map of each sample image in the feature point matching tag and the feature map output in step 502, a gradient descent algorithm is adopted to obtain updated network parameters of the feature point extraction layer and the feature description layer, so as to obtain the trained feature point extraction layer and feature description layer.
Then, based on the matching relation in the feature point matching label and the feature point matching result in the step 504, a gradient descent algorithm is adopted to obtain updated bidirectional feature matching layer and network parameters of the feature matching layer, so as to obtain the trained bidirectional feature matching layer and feature matching layer.
After the trained feature point matching model is obtained, the feature point matching model can be deployed in the electronic equipment shown in fig. 1, and if the expected feature point extraction precision is not achieved when the electronic equipment shown in fig. 1 extracts the feature points, the feature point extraction model can be trained again, for example, fine tuning optimization and training are performed on the feature point extraction model until the feature point extraction model achieves the expected feature point extraction precision.
Fig. 7 is a schematic diagram of an embodiment of an electronic device according to the present application.
The electronic device 100 comprises a memory 20, a processor 30 and a computer program 40 stored in the memory 20 and executable on the processor 30. The steps of the embodiment of the feature point matching method described above, such as steps 401 to 404 shown in fig. 4, are implemented when the processor 30 executes the computer program 40.
Or the processor 30 executes the computer program 40 to implement the steps in the embodiment of the feature point matching method, such as steps 501 to 505 shown in fig. 5.
By way of example, the computer program 40 may likewise be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30. The one or more modules/units may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 40 in the electronic device 100.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 100 and is not meant to be limiting of the electronic device 100, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device 100 may also include input-output devices, network access devices, buses, etc.
The Processor 30 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor, a single-chip microcomputer or the processor 30 may be any conventional processor or the like.
The memory 20 may be used to store computer programs 40 and/or modules/units, and the processor 30 implements various functions of the electronic device 100 by running or executing the computer programs and/or modules/units stored in the memory 20, as well as invoking data stored in the memory 20. The memory 20 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the electronic device 100 (such as audio data), etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
The modules/units integrated with the electronic device 100 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
In the several embodiments provided in the present application, it should be understood that the disclosed electronic device and method may be implemented in other manners. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be other manners of division when actually implemented.
In addition, each functional unit in the embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the embodiments are to be considered in all respects as illustrative and not restrictive. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or electronic devices recited in the electronic device claims may also be implemented in software or hardware by means of one and the same unit or electronic device. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and that it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.