Disclosure of Invention
The invention aims to provide a SLAM method based on a semantic beam adjustment method aiming at the defects of the technology, and the measuring method is used for evaluating the feature consistency of a human input frame and a feature library when low-latitude abstraction is carried out. More importantly, due to the addition of semantic information, the target detection channel can be seamlessly integrated into any BA type optimization system of the BA-based SLAM system without external equipment.
The invention is realized by the following modes: a SLAM method based on a semantic beam adjustment method comprises the following steps:
step 1, determining a series of characteristics of detection targets, and establishing a model database for each detection target.
And 2, extracting description features of a new frame and matching the description features with a model database along with the availability of the new frame, and then creating a determination graph for each given detection target.
Step 3, verifying the determination diagram in the step 2, removing error edges and keeping correct edges; dependent on the global weighted mean difference from a previous global optimizationAs verification threshold in the next target detection process:
suppose, the weight wijThe expression of (c) can be divided into all edges;
Step 3.1, for the 2D feature matching scheme, comparing the frame vertex to the mark vertex with a threshold value, and removing the edge if the following formula is met:
h (o, n) represents the index of the mark vertex related to the nth characteristic point on the ith object, and alpha is a given parameter and ranges from 4 to 9; if the movement from the frame to the mark edge leaves the mark vertex only attached to the object, the edge from the object to the mark needs to be deleted;
step 3.2, for 3D feature matching, comparing the edge from each frame to the object with a threshold value, and removing the edge if the following formula is met:
and 4, after the correct semantic edge is determined, whether the determination graph is added into the global graph or not is evaluated according to the threshold value of the edge.
Step 5, checking whether a new frame appears or not, and if no new frame appears, finishing global graph optimization; and if a new frame appears, returning to the step 2, and re-executing the step 2 to the step 5.
The threshold evaluation method of the edge is as follows:
this process can put the deterministic graph in 3 different states, passing through the final number of semantic edges, NseAnd 2 thresholds ηfAnd ηt(ηf<ηt) Defining by comparison;
If N is presentse<ηfTargeting false detections, this determination map is deleted and the detection target is removed from the global map;
if ηf≤Nse<ηtObject detection is ambiguous, determining that the graph is saved for more visual cues, but the detected object is removed from the global graph;
If N is presentse≥ηtTargets are detected and added to the global map.
The invention has the beneficial effects that:
1. the method of the invention combines 6DOF (6 directional degrees of freedom, i.e. translation in three directions and rotation around three axes) object and camera poses by a new semantic global optimization, which can work under 2D or 3D sensors due to the above features.
2. Because semantic information is added, the target detection channel can be seamlessly integrated into any SLAM system beam adjustment method type optimization system based on the beam adjustment method without external equipment.
3. The method of the invention is simple, easy to realize and has strong practicability, and SLAM constraint can be used for robust target detection and can adapt to more complex environment.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention is realized by the following modes: a SLAM method based on a semantic beam adjustment method comprises the following steps:
step 1, determining a series of characteristics of detection targets, and establishing a model database for each detection target: if a full 3D model is available, the 3D keypoint detectors and descriptors can be used; otherwise, the model requires the use of a stack of standardized pictures and 2D keypoint detectors and descriptors that provide the required features; the feature descriptors are saved as future matches; then, for the feature positions, the 3D coordinates are saved for the former, and the 2D picture coordinates and their associated view angle poses are saved for the latter.
Step 2, with the availability of a new frame, extracting description characteristics of the new frame and matching the description characteristics with a model database, and then creating a determination graph for each given detection target; for both the model and the frame, 3D keypoints are extracted at 3 different scales by a built-in shape signal detector, and then described by the rotated image; for each scale, creating an index containing descriptors of all models at the scale, and performing a k-neighbor search based on Euclidean distance at the scale; finally, a weighted average difference is calculated.
Step 2.1, if the 2D features match, a new landmark unknown 3D position and a series of edges are created in the cost function to include its projection error, for example, considering a landmark position corresponding to a position x4 and the nth target feature, and the related constraint conditions are as follows:
wherein,represents the nth 2D feature point learned from the mth object,indicates the ith frame is in possibilityLower matched 2D feature points, Vi[r]Rotate and translate R ∈ R3 to estimate the ith vertex using the current pose.
Step 2.2, when V is described in step 2.1iRelating to an object pose, this picture plane is one of the standard views of the object acquired during training, so the reprojection is constrained by a rigid transformation between the known object reference frame and the view reference frame, in which case the constraints of the formula in step 2.1) can be expressed as follows:
and
step 2.3, when 3D features are available, connect camera frame and more objects directly, therefore, if m is availableijIs to represent that i feature and j feature match, and known as Pr (m)ik)=sikAnd Pr (m)jk)=sjkAssume mikAnd mjkAre independent and satisfy the following formula:
then Pr (m)ij)=sik*sjkNow, nowAndrepresenting 3D features.
Step 3, verifying the determination diagram in the step 2, removing error edges and keeping correct edges; dependent on the global weighted mean difference from a previous global optimizationAs verification threshold in the next target detection process:
suppose, the weight wijThe expression of (c) can be divided into all edges;
step 3.1, for the 2D feature matching scheme, removing edges from the frame vertex to the mark vertex, and satisfying the following formula:
h (o, n) represents the index of the mark vertex related to the nth characteristic point on the ith object, and alpha is 7; if the frame-to-marker edge movement leaves the marker vertices attached only to the object, the object-to-marker edge needs to be deleted.
Step 3.2, for the 3D feature matching scheme, comparing each frame to the edge of the object with a threshold, the edge will be erased according to the following formula:
step 4, after the correct semantic edge is determined, whether the determination graph is added into the global graph or not is evaluated according to the threshold value of the edge: first, the deterministic graph is put in 3 different states, passing the final number N of semantic edgesseAnd 2 thresholds ηfAnd ηt(ηf<ηt) And comparing and defining.
If N is presentse<ηfTargeting false detections, this determination map is deleted and the detection target is removed from the global map;
if ηf≤Nse<ηtThe target detection is ambiguous, the determination graph is saved for more visual cues, but the detection target is removed from the global graph;
if N is presentse≥ηtTargets are detected and added to the global map.
Of these, 2 thresholds ηfAnd ηtIs not specific because at every frame, the validation of the new hypothesis is constrained by the matching of previously extracted features, and the final evaluation only maintains the best boundary, thus a higher ηfIt is possible to skip matching highly blocked objects in only a few views, with a difference of ηt–ηfIs about the robustness of this verification procedure: a higher number indicates that the consistency of different frames is more likely to be detected, but the least errors are passed to the global map and possibly deleted in the following frames.
Wherein, the global map in step 4 is a semantic global map, including: firstly, obtaining pose vertexes of all cameras with frame-to-frame constraint in a SLAM engine; verifying the posture vertexes of all the objects successfully in the process; and thirdly, all the constraints from frame to mark and object to mark, wherein the constraints are on 2D feature matching or frame to object and virtual frame to frame constraints, and the 3D feature matching is a verification graph from deleted objects.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.