CN105989586A

Movatterモバイル変換

Info

Publication number: CN105989586A
Application number: CN201510050181.3A
Authority: CN
Inventors: 廖鸿宇; 孙放
Original assignee: Beijing Thunderous Yun He Intellectual Technology Co Ltd
Current assignee: Beijing Thunderous Yun He Intellectual Technology Co Ltd
Priority date: 2015-03-04
Filing date: 2015-03-04
Publication date: 2016-10-05

Abstract

The invention provides an SLAM method based on a semantic bundle adjustment method, and belongs to the field of mobile robot simultaneous localization and mapping (SLAM). The method is characterized in that the method combines a 6DOF object and the camera attitude through new semantic global optimization, and can work under a 2D or 3D sensor. As semantic information is added, a target detection channel can be seamlessly integrated into a BA type optimization system of an SLAM system based on BA without peripherals. The method of the invention is simple and easy to implement, and has strong practicability. SLAM constraints can be used in robust target detection, and can adapt to a more complicated environment.

Description

SLAM method based on semantic beam adjustment method

Technical Field

The invention relates to a semantic beam adjustment (SLAM) based method, belonging to the field of synchronous positioning and map creation (SLAM) of a mobile robot.

Background

The visual SLAM (simultaneous localization and mapping) problem involves the ability to gradually reconstruct the map and simultaneously locate the sensing device based only on visual cues. Over the last decade, there has been a remarkable advance in this field with the application of more efficient tools such as AR (augmented reality technology) and machine navigation and composition.

The conventional SLAM problem is solved based on a filtering technique like a Kalman filter, in which visual features are tracked based on frames and their estimated 3D positions and uncertain camera positions. This method can only create sparse maps since only a small fraction of the image pixels are tracked. Alternatively, the visual SLAM problem can be solved by optimizing the BA (beam-balancing) that carries a subset of a series of selected frames. In recent years, various methods based on BA mode have been developed successively. One successful BA model is PTAM (parallel tracking and mapping). This approach presents a way to separate SLAM problems into 2 parallel related tasks: one task is to track the camera according to the currently evaluated landmark position; and the other responsible for managing the global optimization of the selected key frames. Since complexity increases dramatically with the number of features extracted from the environment, PTAMs can only be used in a small space. To overcome this drawback, the scheme is optimized to take into account the entire past tracked key frame, but a small fraction thereof, so that the algorithm reaches a constant time complexity.

Since the tracking and composition tasks have matured to some extent, none of the previous methods are able to seamlessly resolve and extract semantic information during the visual SLAM process. There are some algorithms based on the SLAM framework, most of which are object detection at one view angle, and do not detect multi-view consistency. Some schemes do not attempt to estimate the actual position of the target object despite the precise combination of the detection scheme and the SLAM framework; the rest utilizes geometric information to continuously detect the target, but does not effectively utilize the original detected object to construct a known map; there are also algorithms that use standard feature-based pipelines to detect objects and use estimated relative poses for position representation, which use laser and range data to detect objects on a map, but such algorithms still do not integrate the evaluation of the position of the object with the camera; some algorithms, while using a combined detection and reconstruction approach, either are limited to cars and pedestrians and assume the environment and camera positions, or propose common pixel labeling and dense stereo rendering methods that require calibration of the camera's small baseline; a semantic structure based on motor skills evaluates the problem of camera pose and identifies objects from a series of image sequences, but when the algorithm extends the SFM detector and creates all frames at once, it is necessary to create hypotheses and metrics with an external object detector.

Disclosure of Invention

The invention aims to provide a SLAM method based on a semantic beam adjustment method aiming at the defects of the technology, and the measuring method is used for evaluating the feature consistency of a human input frame and a feature library when low-latitude abstraction is carried out. More importantly, due to the addition of semantic information, the target detection channel can be seamlessly integrated into any BA type optimization system of the BA-based SLAM system without external equipment.

The invention is realized by the following modes: a SLAM method based on a semantic beam adjustment method comprises the following steps:

step 1, determining a series of characteristics of detection targets, and establishing a model database for each detection target.

And 2, extracting description features of a new frame and matching the description features with a model database along with the availability of the new frame, and then creating a determination graph for each given detection target.

Step 3, verifying the determination diagram in the step 2, removing error edges and keeping correct edges; dependent on the global weighted mean difference from a previous global optimizationAs verification threshold in the next target detection process:

\overset{&OverBar;}{Q} = \frac{Σ w_{ij} e_{ij}^{T} e_{ij}}{Σ w_{ij}}

suppose, the weight w_ijThe expression of (c) can be divided into all edges;

Step 3.1, for the 2D feature matching scheme, comparing the frame vertex to the mark vertex with a threshold value, and removing the edge if the following formula is met:

| | p_{i}^{o, n} - V_{i} [x_{h (o, n)}] {| |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

h (o, n) represents the index of the mark vertex related to the nth characteristic point on the ith object, and alpha is a given parameter and ranges from 4 to 9; if the movement from the frame to the mark edge leaves the mark vertex only attached to the object, the edge from the object to the mark needs to be deleted;

step 3.2, for 3D feature matching, comparing the edge from each frame to the object with a threshold value, and removing the edge if the following formula is met:

{| | q_{o}^{n} - x_{o}^{- 1} x_{i} [p_{i}^{o, n}] | |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

and 4, after the correct semantic edge is determined, whether the determination graph is added into the global graph or not is evaluated according to the threshold value of the edge.

Step 5, checking whether a new frame appears or not, and if no new frame appears, finishing global graph optimization; and if a new frame appears, returning to the step 2, and re-executing the step 2 to the step 5.

The threshold evaluation method of the edge is as follows:

this process can put the deterministic graph in 3 different states, passing through the final number of semantic edges, N_seAnd 2 thresholds η_fAnd η_t(η_f<η_t) Defining by comparison;

If N is present_se<η_fTargeting false detections, this determination map is deleted and the detection target is removed from the global map;

if η_f≤N_se<η_tObject detection is ambiguous, determining that the graph is saved for more visual cues, but the detected object is removed from the global graph；

If N is present_se≥η_tTargets are detected and added to the global map.

The invention has the beneficial effects that:

1. the method of the invention combines 6DOF (6 directional degrees of freedom, i.e. translation in three directions and rotation around three axes) object and camera poses by a new semantic global optimization, which can work under 2D or 3D sensors due to the above features.

2. Because semantic information is added, the target detection channel can be seamlessly integrated into any SLAM system beam adjustment method type optimization system based on the beam adjustment method without external equipment.

3. The method of the invention is simple, easy to realize and has strong practicability, and SLAM constraint can be used for robust target detection and can adapt to more complex environment.

Drawings

FIG. 1 is a flow chart of SLAM method based on semantic beam adjustment method

Detailed Description

The invention is further described below with reference to the accompanying drawings.

step 1, determining a series of characteristics of detection targets, and establishing a model database for each detection target: if a full 3D model is available, the 3D keypoint detectors and descriptors can be used; otherwise, the model requires the use of a stack of standardized pictures and 2D keypoint detectors and descriptors that provide the required features; the feature descriptors are saved as future matches; then, for the feature positions, the 3D coordinates are saved for the former, and the 2D picture coordinates and their associated view angle poses are saved for the latter.

Step 2, with the availability of a new frame, extracting description characteristics of the new frame and matching the description characteristics with a model database, and then creating a determination graph for each given detection target; for both the model and the frame, 3D keypoints are extracted at 3 different scales by a built-in shape signal detector, and then described by the rotated image; for each scale, creating an index containing descriptors of all models at the scale, and performing a k-neighbor search based on Euclidean distance at the scale; finally, a weighted average difference is calculated.

Step 2.1, if the 2D features match, a new landmark unknown 3D position and a series of edges are created in the cost function to include its projection error, for example, considering a landmark position corresponding to a position x4 and the nth target feature, and the related constraint conditions are as follows:

| | q_{3}^{n} - V_{3} [x_{4}] {| |}^{2} + s_{1}^{3, n} | | p_{1}^{3, n} - V_{1} [x_{4}] {| |}^{2} + s_{2}^{3, n} | | p_{2}^{3, n} - V_{2} [x_{4}] {| |}^{2}

wherein,represents the nth 2D feature point learned from the mth object,indicates the ith frame is in possibilityLower matched 2D feature points, V_i[r]Rotate and translate R ∈ R3 to estimate the ith vertex using the current pose.

Step 2.2, when V is described in step 2.1_iRelating to an object pose, this picture plane is one of the standard views of the object acquired during training, so the reprojection is constrained by a rigid transformation between the known object reference frame and the view reference frame, in which case the constraints of the formula in step 2.1) can be expressed as follows:

| | q_{3}^{n} - V_{3} [x_{4}] {| |}^{2} = {(q_{3}^{n} - V_{3} [x_{4}])}^{T} I_{2 \times 2} (q_{3}^{n} - V_{3} [x_{4}])

and

s_{1}^{3, n} | | p_{1}^{3, n} - V_{1} [x_{4}] {| |}^{2} = {(p_{1}^{3, n} - V_{1} [x_{4}])}^{T} (s_{1}^{3, n} I_{2 \times 2}) (p_{1}^{3, n} - V_{1} [x_{4}])

step 2.3, when 3D features are available, connect camera frame and more objects directly, therefore, if m is available_ijIs to represent that i feature and j feature match, and known as Pr (m)_ik)＝s_ikAnd Pr (m)_jk)＝s_jkAssume m_ikAnd m_jkAre independent and satisfy the following formula:

then Pr (m)_ij)＝s_ik*s_jkNow, nowAndrepresenting 3D features.

\overset{&OverBar;}{Q} = \frac{Σ w_{ij} e_{ij}^{T} e_{ij}}{Σ w_{ij}}

suppose, the weight w_ijThe expression of (c) can be divided into all edges;

step 3.1, for the 2D feature matching scheme, removing edges from the frame vertex to the mark vertex, and satisfying the following formula:

| | p_{i}^{o, n} - V_{i} [x_{h (o, n)}] {| |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

h (o, n) represents the index of the mark vertex related to the nth characteristic point on the ith object, and alpha is 7; if the frame-to-marker edge movement leaves the marker vertices attached only to the object, the object-to-marker edge needs to be deleted.

Step 3.2, for the 3D feature matching scheme, comparing each frame to the edge of the object with a threshold, the edge will be erased according to the following formula:

{| | q_{o}^{n} - x_{o}^{- 1} x_{i} [p_{i}^{o, n}] | |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

step 4, after the correct semantic edge is determined, whether the determination graph is added into the global graph or not is evaluated according to the threshold value of the edge: first, the deterministic graph is put in 3 different states, passing the final number N of semantic edges_seAnd 2 thresholds η_fAnd η_t(η_f<η_t) And comparing and defining.

if η_f≤N_se<η_tThe target detection is ambiguous, the determination graph is saved for more visual cues, but the detection target is removed from the global graph;

if N is present_se≥η_tTargets are detected and added to the global map.

Of these, 2 thresholds η_fAnd η_tIs not specific because at every frame, the validation of the new hypothesis is constrained by the matching of previously extracted features, and the final evaluation only maintains the best boundary, thus a higher η_fIt is possible to skip matching highly blocked objects in only a few views, with a difference of η_t–η_fIs about the robustness of this verification procedure: a higher number indicates that the consistency of different frames is more likely to be detected, but the least errors are passed to the global map and possibly deleted in the following frames.

Wherein, the global map in step 4 is a semantic global map, including: firstly, obtaining pose vertexes of all cameras with frame-to-frame constraint in a SLAM engine; verifying the posture vertexes of all the objects successfully in the process; and thirdly, all the constraints from frame to mark and object to mark, wherein the constraints are on 2D feature matching or frame to object and virtual frame to frame constraints, and the 3D feature matching is a verification graph from deleted objects.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A SLAM method based on a semantic beam adjustment method is characterized by comprising the following steps:

step 1, determining a series of characteristics of detection targets, and establishing a model database for each detection target;

step 2, with the availability of a new frame, extracting description characteristics of the new frame and matching the description characteristics with a model database, and then creating a determination graph for each given detection target;

step 3, verifying the determination diagram in the step 2, removing error edges and keeping correct edges; depend on one last fullGlobal weighted mean difference from local optimizationAs verification threshold in the next target detection process:

\overset{&OverBar;}{Q} = \frac{Σ w_{ij} e_{ij}^{T} e_{ij}}{Σ w_{ij}}

suppose, the weight w_ijThe expression of (c) can be divided into all edges;

{| | p_{i}^{o, n} - V_{i} [x_{h (o, n)}] | |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

{| | q_{o}^{n} - x_{o}^{- 1} x_{i} [p_{i}^{o, n}] | |}^{2} &GreaterEqual; α \overset{&OverBar;}{Q}

Step 5, checking whether a new frame appears or not, and if no new frame appears, finishing the global graph; and if a new frame appears, returning to the step 2, and re-executing the step 2 to the step 5.

2. The SLAM method based on semantic Beam adjustment as claimed in claim 1 wherein: the threshold evaluation method of the edge in the step 4 is as follows:

first, the deterministic graph is put in 3 different states, passing the final number N of semantic edges_seAnd 2 thresholds η_fAnd η_t(η_f<η_t) Defining by comparison;

if N is present_se≥η_tTargets are detected and added to the global map.