CN110490900A

Movatterモバイル変換

Info

Publication number: CN110490900A
Application number: CN201910634021.1A
Authority: CN
Inventors: 秦家虎; 余雷; 陈宇航
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-11-22
Anticipated expiration: 2039-07-12
Also published as: CN110490900B

Abstract

Translated fromChinese

一种动态环境下的双目视觉定位方法和系统，应用于机器人定位技术领域，提取图像特征点，通过对特征点聚类的方式区分运动物体点上特征点和非运动物体上特征点，这种方法可以有效剔除正在运动的物体上的特征点，避免剔除静止的动态属性物体上的特征点，进而保留更多可用特征点，通过IMU预积分算法得到的帧间位姿变换矩阵来筛选与IMU数据最接近的基于不同特征点集合估计的位姿，从而有效剔除一些外点比如属于运动物体的特征点；将单目的运动视觉与双目的立体视觉相融合的位姿与深度估计方法，可以有效提高估计的精度。

A binocular vision positioning method and system in a dynamic environment, which is applied in the field of robot positioning technology, extracts image feature points, and distinguishes feature points on moving objects from feature points on non-moving objects by clustering the feature points. This method can effectively remove the feature points on the moving object, avoid removing the feature points on the static dynamic attribute object, and then retain more available feature points, and use the inter-frame pose transformation matrix obtained by the IMU pre-integration algorithm to filter and The closest IMU data is based on the estimated pose of different feature point sets, so as to effectively eliminate some outliers such as feature points belonging to moving objects; the pose and depth estimation method that combines monocular motion vision and binocular stereo vision, It can effectively improve the estimation accuracy.

Description

Translated fromChinese

动态环境下的双目视觉定位方法及系统Binocular vision positioning method and system in dynamic environment

技术领域technical field

本发明涉及机器人定位技术领域，尤其涉及一种动态环境下的双目视觉定位方法及系统。The invention relates to the technical field of robot positioning, in particular to a binocular vision positioning method and system in a dynamic environment.

背景技术Background technique

传统的GPS定位技术在对卫星信号依赖性强，在建筑群较多，遮挡严重的环境下精度大幅度下降，特别是在室内环境基本无法使用。基于惯导模块(陀螺仪，加速度计)的惯性导航技术不依赖外部信息，通过自身惯导测量信息的积分就可以得到载体当前的位姿信息，但是惯性导航定位误差随时间而增大，由于累积误差的存在，长期精度差。基于视觉的定位方法，则是通过对图像序列的分析来确定机器人的位姿，基于视觉的方法既可以用于室内等遮挡环境也可以利用视觉的丰富信息进行闭环检测消除累积误差实现长期高精度定位。基于视觉的的定位方法在机器人领域主要是基于视觉的同时定位与建图(SLAM，Simultaneous Localization and Mapping)，简称视觉SLAM，而不建立地图的视觉SLAM也被称为视觉里程计，这里我们统称为视觉SLAM。当前视觉SLAM根据传感器的不同分为单目SLAM、双目SLAM、RGB-D SLAM，由于单目SLAM存在尺度不确定性问题而RGB-D SLAM成本过高，精度较低，因此我们采用双目SLAM既不存在单目的尺度不确定性问题，成本也相对较低。目前SLAM定位技术大多都是基于静态环境假设，然而实际环境大多是复杂的动态场景，在动态场景中，这些基于静态假设的SLAM算法的精度大大降低，近几年一些基于语义的动态场景SLAM技术被提出用来，简称语义SLAM，语义SLAM通过深度学习或其他计算机视觉方法检测场景中具有动态属性的物体类别，在利用图像信息进行位姿估计前先剔除动态属性类别的物体，从而降低动态物体对位姿估计的影响。Traditional GPS positioning technology is highly dependent on satellite signals, and its accuracy is greatly reduced in environments with many buildings and severe occlusion, especially in indoor environments, which are basically unusable. The inertial navigation technology based on the inertial navigation module (gyroscope, accelerometer) does not rely on external information, and the current position and orientation information of the carrier can be obtained through the integration of its own inertial navigation measurement information, but the inertial navigation positioning error increases with time. The existence of cumulative error, poor long-term accuracy. The vision-based positioning method is to determine the pose of the robot by analyzing the image sequence. The vision-based method can be used in indoor and other occluded environments, and can also use rich visual information for closed-loop detection to eliminate cumulative errors and achieve long-term high precision. position. Vision-based positioning methods in the field of robotics are mainly based on vision-based simultaneous localization and mapping (SLAM, Simultaneous Localization and Mapping), referred to as visual SLAM, and visual SLAM without maps is also called visual odometer, here we collectively refer to For visual SLAM. Currently, visual SLAM is divided into monocular SLAM, binocular SLAM, and RGB-D SLAM according to different sensors. Due to the scale uncertainty of monocular SLAM and RGB-D SLAM, the cost is too high and the accuracy is low, so we use binocular SLAM. SLAM does not have the problem of single-purpose scale uncertainty, and the cost is relatively low. At present, most SLAM positioning technologies are based on static environment assumptions. However, the actual environment is mostly complex dynamic scenes. In dynamic scenes, the accuracy of these SLAM algorithms based on static assumptions is greatly reduced. In recent years, some semantic-based dynamic scene SLAM technologies It is proposed to be used for short, semantic SLAM. Semantic SLAM detects object categories with dynamic attributes in the scene through deep learning or other computer vision methods, and removes objects with dynamic attribute categories before using image information for pose estimation, thereby reducing dynamic objects. Impact on pose estimation.

但是这种方法有以下缺点：(1)是图像检测技术本身具有很高的计算复杂度，而定位技术对实时性具有很高的要求，在SLAM系统中添加深度学习等框架会使的实时性受到很大的影响。(2)在SLAM系统中，特别是不建立环境地图的视觉里程计，在利用图像信息进行位姿估计时，影响位姿估计精度的并不是具有动态属性的物体，而是正在运动的物体，例如停在停车场的汽车属于动态属性物体，但是该汽车可能会在停车场停留很久，至少不影响当前图像信息的位姿估计，反而为位姿估计提供更多的特征信息，特别是在特征稀少的场景，这种暂时静止的动态物体会提供更多的特征反而增加位姿估计的精确性。However, this method has the following disadvantages: (1) Image detection technology itself has high computational complexity, while positioning technology has high requirements for real-time performance. Adding deep learning and other frameworks to the SLAM system will make the real-time performance have been greatly affected. (2) In the SLAM system, especially the visual odometry that does not establish an environmental map, when using image information for pose estimation, it is not objects with dynamic properties that affect the accuracy of pose estimation, but objects that are moving. For example, a car parked in a parking lot is a dynamic attribute object, but the car may stay in the parking lot for a long time, at least it does not affect the pose estimation of the current image information, but provides more feature information for pose estimation, especially in the feature In rare scenes, this temporarily static dynamic object will provide more features and increase the accuracy of pose estimation.

发明内容Contents of the invention

本发明的主要目的在于提供一种动态环境下的双目视觉定位方法及系统。The main purpose of the present invention is to provide a binocular vision positioning method and system in a dynamic environment.

为实现上述目的，本发明实施例第一方面提供一种动态环境下的双目视觉定位方法，包括：In order to achieve the above purpose, the first aspect of the embodiment of the present invention provides a binocular vision positioning method in a dynamic environment, including:

获取相机双目视觉下的左目图像和右目图像，并提取所述左目图像和右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点，所述地图点为加入地图的特征点；Obtain the left eye image and the right eye image under the binocular vision of the camera, and extract the feature points in the current frame to which the left eye image and the right eye image belong and the feature points corresponding to the map points in the previous frame to which the left eye image and the right eye image belong, the map points are Join the feature points of the map;

根据所述特征点，得到所述左目和右目的运动视觉匹配特征点对，以及，立体视觉匹配特征点对；According to the feature points, obtain the pair of feature points for the left eye and right eye motion vision matching, and the pair of stereo vision matching feature points;

筛选所述立体视觉匹配特征点对和运动视觉匹配特征点对的共有匹配特征点对；Screening the common matching feature point pairs of the stereo vision matching feature point pair and the motion vision matching feature point pair;

分别对所述共有匹配特征点对集中的特征点对进行立体深度估计，得到立体深度估计值；Performing stereo depth estimation on the feature point pairs in the set of shared matching feature point pairs respectively to obtain stereo depth estimation values;

利用K-means++聚类算法，将所述左目和右目的运动视觉匹配特征点对按照运动视觉匹配特征点对之间的光流速度和旋转角进行聚类，得到聚类结果，并按照聚类结果将所述运动视觉匹配特征点对分为多个类别；Utilize the K-means++ clustering algorithm, carry out clustering to described left eye and right object motion vision matching feature point pair according to the optical flow velocity and rotation angle between the motion vision matching feature point pair, obtain clustering result, and according to clustering As a result, the motion vision matching feature point is divided into a plurality of categories;

利用IMU预积分算法，计算所述左目和右目在当前帧与上一帧之间的时间段内的IMU的位姿估计结果，所述位姿估计结果包括所述当前帧与上一帧之间的帧间位姿；Using the IMU pre-integration algorithm, calculate the pose estimation results of the IMUs of the left and right eyes in the time period between the current frame and the previous frame, and the pose estimation results include the interval between the current frame and the previous frame The inter-frame pose of ;

对各类别下运动视觉匹配特征点对分别进行EPNP位姿估计和地图点估计，得到EPNP的位姿估计结果和地图点估计结果；Perform EPNP pose estimation and map point estimation on the motion vision matching feature point pairs under each category, and obtain the pose estimation results and map point estimation results of EPNP;

选取与IMU的位姿估计结果最相近的EPNP的位姿估计结果作为最佳EPNP位姿估计结果和地图点估计结果，并记录所述最佳EPNP位姿估计结果和地图点估计结果对应的有效匹配特征点对；Select the EPNP pose estimation result closest to the pose estimation result of the IMU as the best EPNP pose estimation result and the map point estimation result, and record the valid corresponding to the best EPNP pose estimation result and the map point estimation result Match feature point pairs;

从所述共有匹配特征点对中选出所述有效匹配特征点对对应的特征点对，并对所述特征点对对应的立体深度估计值与EPNP估计的深度值进行卡尔曼滤波融合，得到估计位姿和估计地图点。Select the feature point pair corresponding to the effective matching feature point pair from the common matching feature point pair, and perform Kalman filter fusion on the stereo depth estimation value corresponding to the feature point pair and the depth value estimated by EPNP to obtain Estimating pose and estimating map points.

进一步地，所述IMU预积分算法：Further, the IMU pre-integration algorithm:

其中，分别表示在第k帧相机坐标系下从第k帧到k+1帧时间段内的位移增量，速度增量、以及旋转角度变化的四元数，a_t表示加速计测量值，w_t表示角速度测量值，b_at和b_wt分别表示加速度计和陀螺仪的偏置，n_a和n_w分别表示噪声，表示时刻t相对于第k帧相机坐标系的旋转变换矩阵，可由相关公式得到，Ω是与四元数计算相关的矩阵，Δt_k表示t_k到t_k+1的时间间隔，g^w表示在世界坐标系下的重力向量。in, Respectively represent the displacement increment, velocity increment, and quaternion of the rotation angle change in the k-th frame camera coordinate system from the k-th frame to the k+1 frame time period, a_t represents the accelerometer measurement value, w_t Indicates the angular velocity measurement value, b_at and b_wt represent the bias of the accelerometer and gyroscope, respectively, n_a and n_w represent the noise, respectively, Represents the rotation transformation matrix of the moment t relative to the camera coordinate system of the kth frame, which can be obtained by According to the relevant formula, Ω is a matrix related to quaternion calculation, Δt_k represents the time interval from t_k to t_k+1 , and g^w represents the gravity vector in the world coordinate system.

进一步地，所述按照聚类结果将所述运动视觉匹配特征点对分为多个类别之后，包括：Further, after classifying the motion vision matching feature point pairs into multiple categories according to the clustering results, it includes:

对各类别下的特征点的个数做直方统计；Make histogram statistics on the number of feature points under each category;

对于各类别下对应的特征点的数量小于第一预设个数且与其它类别下的特征点的个数的差值大于第二预设个数的类别，删除所述类别，以及，所述类别下对应的特征点。For categories whose number of feature points corresponding to each category is less than the first preset number and whose difference with the number of feature points in other categories is greater than the second preset number, the category is deleted, and the The corresponding feature points under the category.

进一步地，所述对各类别下运动视觉匹配特征点对分别进行EPNP位姿估计和地图点估计，得到EPNP的位姿估计结果和地图点估计结果包括：Further, the EPNP pose estimation and map point estimation are respectively performed on the motion vision matching feature points under each category, and the pose estimation results and map point estimation results of the EPNP include:

对各类别下特征点分别进行EPNP位姿估计和地图点估计，通过RANSAC随机采用一致性算法，对每个类别的位姿估计结果进行外点剔除，得到每个类别相应的EPNP的位姿估计结果和地图点估计结果。Perform EPNP pose estimation and map point estimation on the feature points under each category, and use the consensus algorithm randomly through RANSAC to remove outliers from the pose estimation results of each category, and obtain the corresponding EPNP pose estimation of each category Results and map point estimation results.

进一步地，所述得到估计位姿和估计地图点之后，包括：Further, after obtaining the estimated pose and the estimated map point, it includes:

根据所述IMU的帧间位姿，利用基于图的优化算法，对所述估计位姿和估计地图点进行优化，得到优化后的地图点；According to the inter-frame pose of the IMU, using a graph-based optimization algorithm to optimize the estimated pose and estimated map points to obtain optimized map points;

所述图的优化算法为：The optimization algorithm for the graph is:

其由表示EPNP位姿估计得到的估计位姿变换矩阵，表示IMU预积分算法得到的位姿变换矩阵，z^k+1表示相机图像的像素坐标，P^k表示在第k帧相机坐标系下地图点3D坐标，π_c(.)是基于相机的模型的投影变换，用于将相机坐标系下3D坐标变化为相机图像中的2D像素坐标。by Represents the estimated pose transformation matrix obtained by EPNP pose estimation, Represents the pose transformation matrix obtained by the IMU pre-integration algorithm, z^k+1 represents the pixel coordinates of the camera image, P^k represents the 3D coordinates of the map point in the k-th frame camera coordinate system, π_c (.) is the model based on the camera Projective transformation, used to transform the 3D coordinates in the camera coordinate system into 2D pixel coordinates in the camera image.

进一步地，所述对所述估计位姿和估计地图点进行优化，得到优化后的地图点之后，包括：Further, after optimizing the estimated pose and the estimated map point, after obtaining the optimized map point, it includes:

利用所述优化后的地图点替换优化前的地图点，更新所述局部地图，剔除不属于所述局部地图的地图点；Using the optimized map points to replace the pre-optimized map points, updating the local map, and removing map points that do not belong to the partial map;

比较当前帧与次新关键帧之间的共视特征点是否高于预设阈值；Compare whether the common-view feature points between the current frame and the next new key frame are higher than the preset threshold;

若高于所述预设阈值，则利用当前关键帧替换最新关键帧；If it is higher than the preset threshold, the latest keyframe is replaced by the current keyframe;

若低于所述预设阈值，则将当前帧加入关键帧集合。If it is lower than the preset threshold, add the current frame into the key frame set.

进一步地，所述根据所述特征点，得到所述左目和右目的运动视觉匹配特征点对包括：Further, according to the feature points, obtaining the left-eye and right-eye motion vision matching feature point pairs includes:

对所述左目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，得到左目的匹配特征点对；Perform ORB descriptor matching on the feature points in the current frame to which the left-eye image belongs and the feature points corresponding to the map points in the previous frame to which the left-eye image belongs, to obtain a left-eye matching feature point pair;

对所述右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，得到右目的匹配特征点对；其中，所述左目和右目的运动视觉匹配特征点对包括所述左目的匹配特征点对和所述右目的匹配特征点对。Perform ORB descriptor matching on the feature points in the current frame to which the right-eye image belongs and the feature points corresponding to the map points in the previous frame to which the right-eye image belongs, to obtain a right-eye matching feature point pair; wherein, the left-eye and right-eye motion The visual matching feature point pair includes the left-object matching feature point pair and the right-object matching feature point pair.

进一步地，所述根据所述特征点，得到所述左目和右目的立体视觉匹配特征点对包括：Further, according to the feature points, obtaining the left-eye and right-eye stereo vision matching feature point pairs includes:

分别对所述左目图像所属的当前帧中的特征点和所述右目图像所属的当前帧中的特征点分别进行ORB描述子匹配和SAD立体匹配，得到所述左目和右目的立体视觉匹配特征点对。Perform ORB descriptor matching and SAD stereo matching on the feature points in the current frame to which the left-eye image belongs and the feature points in the current frame to which the right-eye image belongs, respectively, to obtain the stereo vision matching feature points for the left and right eyes right.

本发明实施例第二方面提供一种动态环境下的双目视觉定位系统，包括：The second aspect of the embodiment of the present invention provides a binocular vision positioning system in a dynamic environment, including:

主线程、位姿估计线程、回环线程和后端优化线程；Main thread, pose estimation thread, loopback thread and backend optimization thread;

所述主线程，用于获取相机双目视觉下的左目图像和右目图像，并进行初始化，当初始化成功后，接收所述位姿估计线程或回环线程发送的重启信号，以根据所述重启信号，再次进行初始化；The main thread is used to obtain the left eye image and the right eye image under the binocular vision of the camera, and perform initialization. After the initialization is successful, it receives the restart signal sent by the pose estimation thread or the loopback thread, so as to obtain the restart signal according to the restart signal. , initialize again;

所述位姿估计线程，用于获取相机双目视觉下的左目图像和右目图像，并提取所述左目图像和右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点，根据所述特征点，计算当前帧的估计位姿和估计地图点；The pose estimation thread is used to obtain the left-eye image and right-eye image under the binocular vision of the camera, and extract the feature points in the current frame to which the left-eye image and the right-eye image belong to correspond to the map points in the previous frame to which they belong The feature points, according to the feature points, calculate the estimated pose and estimated map points of the current frame;

所述回环线程，用于检测回环帧，估计当前帧与所述回环帧之间的位姿变换矩阵，并以所述回环帧的位姿为基准，通过所述位姿变换矩阵对所述当前帧的位姿进行更新；The loopback thread is used to detect the loopback frame, estimate the pose transformation matrix between the current frame and the loopback frame, and take the pose of the loopback frame as a reference, and use the pose transformation matrix to modify the current The pose of the frame is updated;

所述后端优化线程，用于根据IMU预积分算法的位姿估计结果以及当前帧与回环帧之间的位姿变换矩阵对所有帧的位姿进行优化。The back-end optimization thread is used to optimize the poses of all frames according to the pose estimation results of the IMU pre-integration algorithm and the pose transformation matrix between the current frame and the loopback frame.

进一步地，所述根据所述特征点，计算当前帧的估计位姿和估计地图点包括：Further, the calculating the estimated pose and estimated map point of the current frame according to the feature points includes:

本发明提取图像特征点，通过对特征点聚类的方式区分运动物体点上特征点和非运动物体上特征点，这种方法可以有效剔除正在运动的物体上的特征点，避免剔除静止的动态属性物体上的特征点，进而保留更多可用特征点；通过IMU预积分算法得到的帧间位姿变换矩阵来筛选与IMU数据最接近的基于不同特征点集合估计的位姿，从而有效剔除一些外点比如属于运动物体的特征点；将单目的运动视觉与双目的立体视觉相融合的位姿与深度估计方法，可以有效提高估计的精度。The present invention extracts image feature points, and distinguishes feature points on moving object points from feature points on non-moving objects by clustering the feature points. This method can effectively eliminate feature points on moving objects and avoid eliminating static dynamic points. Feature points on the attribute object, and then retain more available feature points; use the inter-frame pose transformation matrix obtained by the IMU pre-integration algorithm to filter the poses that are closest to the IMU data based on different feature point sets, thereby effectively eliminating some Outer points such as feature points belonging to moving objects; the pose and depth estimation method that combines monocular motion vision and binocular stereo vision can effectively improve the accuracy of estimation.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without creative work.

图1为本发明一实施例提供的动态环境下的双目视觉定位方法的流程示意图；FIG. 1 is a schematic flow diagram of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图2为本发明一实施例提供的动态环境下的双目视觉定位方法的总框架示意图；2 is a schematic diagram of the overall framework of the binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图3为本发明一实施例提供的动态环境下的双目视觉定位方法的主线程的流程示意图；3 is a schematic flow diagram of the main thread of the binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图4为本发明一实施例提供的动态环境下的双目视觉定位方法的位姿估计线程的流程示意图；4 is a schematic flow diagram of a pose estimation thread of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图5为本发明一实施例提供的动态环境下的双目视觉定位方法的位姿估计线程的流程示意图的局部放大图；5 is a partial enlarged view of a schematic flow diagram of a pose estimation thread of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图6为本发明一实施例提供的动态环境下的双目视觉定位方法的位姿估计线程的流程示意图的局部放大图；FIG. 6 is a partial enlarged view of a schematic flowchart of a pose estimation thread of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图7为本发明一实施例提供的动态环境下的双目视觉定位方法的位姿估计线程的流程示意图的局部放大图；FIG. 7 is a partial enlarged view of a schematic flowchart of a pose estimation thread of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图8为本发明一实施例提供的动态环境下的双目视觉定位方法的特征点对相对速度与角度示意图；Fig. 8 is a schematic diagram of the relative speed and angle of the feature point pair of the binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图9为本发明一实施例提供的动态环境下的双目视觉定位方法的回环线程的流程示意图；Fig. 9 is a schematic flowchart of a loopback thread of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention;

图10为本发明一实施例提供的回环检测示意图；FIG. 10 is a schematic diagram of loopback detection provided by an embodiment of the present invention;

图11为本发明一实施例提供的后端优化流程示意图；FIG. 11 is a schematic diagram of a back-end optimization process provided by an embodiment of the present invention;

图12为本发明一实施例提供的为后端优化效果示意图。Fig. 12 is a schematic diagram of back-end optimization effect provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使得本发明的发明目的、特征、优点能够更加的明显和易懂，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而非全部实施例。基于本发明中的实施例，本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

本申请中，筛选运动物体的方法不仅仅适用于双目相机，对于单目、多目、RGB-D相机也同样使用，因此采用单目、多目、RGB-D相机也同样能达到本发明的目的。本申请实施例以视觉SLAM算法为例，但是对于激光传感器来说，基于本发明提出的方法筛选运动物体而非动态物体机制在激光SLAM中也能完成，因此在激光SLAM中采用对类似的特征进行聚类方法也能完成动态环境下提出运动物体的位姿估计。In this application, the method for screening moving objects is not only applicable to binocular cameras, but also to monocular, multi-purpose, and RGB-D cameras. Therefore, the use of monocular, multi-eye, and RGB-D cameras can also achieve the present invention the goal of. The embodiment of this application takes the visual SLAM algorithm as an example, but for laser sensors, the mechanism of screening moving objects instead of dynamic objects based on the method proposed by the present invention can also be completed in laser SLAM, so similar features are used in laser SLAM The clustering method can also complete the pose estimation of moving objects in a dynamic environment.

请参阅图1，图1为本发明一实施例提供的动态环境下的双目视觉定位方法的流程示意图该方法主要包括：Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a binocular vision positioning method in a dynamic environment provided by an embodiment of the present invention. The method mainly includes:

S101、获取相机双目视觉下的左目图像和右目图像；S101. Obtain the left-eye image and right-eye image under binocular vision of the camera;

S102、提取该左目图像和右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点，该地图点为加入地图的特征点；S102. Extracting feature points corresponding to the feature points in the current frame to which the left-eye image and the right-eye image belong and the map points in the previous frame to which they belong, the map points are feature points added to the map;

S103、根据该特征点，得到该左目和右目的运动视觉匹配特征点对，以及，立体视觉匹配特征点对；S103. According to the feature point, obtain the left-eye and right-eye motion vision matching feature point pair, and the stereo vision matching feature point pair;

S104、筛选该立体视觉匹配特征点对和运动视觉匹配特征点对的共有匹配特征点对；S104. Screen the common matching feature point pairs of the stereo vision matching feature point pair and the motion vision matching feature point pair;

S105、分别对该共有匹配特征点对集中的特征点对进行立体深度估计，得到立体深度估计值；S105. Perform stereo depth estimation on the feature point pairs in the set of the shared matching feature point pairs respectively, to obtain stereo depth estimation values;

S106、利用K-means++聚类算法，将该左目和右目的运动视觉匹配特征点对按照运动视觉匹配特征点对之间的光流速度和旋转角进行聚类，得到聚类结果，并按照聚类结果将该运动视觉匹配特征点对分为多个类别；S106. Use the K-means++ clustering algorithm to cluster the left-eye and right-eye motion vision matching feature point pairs according to the optical flow speed and rotation angle between the motion vision matching feature point pairs to obtain a clustering result, and perform clustering according to the clustering results. The class result classifies the motion vision matching feature point pair into multiple categories;

S107、利用IMU预积分算法，计算该左目和右目在当前帧与上一帧之间的时间段内的IMU的位姿估计结果；S107. Using the IMU pre-integration algorithm, calculate the IMU pose estimation results of the left and right eyes in the time period between the current frame and the previous frame;

S108、对各类别下运动视觉匹配特征点对分别进行EPNP位姿估计和地图点估计，得到EPNP的位姿估计结果和地图点估计结果；S108. Perform EPNP pose estimation and map point estimation on the motion vision matching feature point pairs under each category, and obtain EPNP pose estimation results and map point estimation results;

S109、选取与IMU的位姿估计结果最相近的EPNP的位姿估计结果作为最佳EPNP位姿估计结果和地图点估计结果，并记录该最佳EPNP位姿估计结果和地图点估计结果对应的有效匹配特征点对；S109. Select the EPNP pose estimation result closest to the IMU pose estimation result as the best EPNP pose estimation result and the map point estimation result, and record the best EPNP pose estimation result and the map point estimation result corresponding Effectively match feature point pairs;

S110、从该共有匹配特征点对中选出该有效匹配特征点对对应的特征点对，并对该特征点对对应的立体深度估计值与EPNP估计的深度值进行卡尔曼滤波融合，得到估计位姿和估计地图点。S110. Select the feature point pair corresponding to the effective matching feature point pair from the common matching feature point pair, and perform Kalman filter fusion on the stereoscopic depth estimate value corresponding to the feature point pair and the depth value estimated by EPNP to obtain an estimate Pose and estimated map points.

进一步地，该IMU预积分算法：Further, the IMU pre-integration algorithm:

进一步地，该按照聚类结果将该运动视觉匹配特征点对分为多个类别之后，包括：Further, after classifying the motion vision matching feature point pair into multiple categories according to the clustering result, it includes:

对于各类别下对应的特征点的数量小于第一预设个数且与其它类别下的特征点的个数的差值大于第二预设个数的类别，删除该类别，以及，该类别下对应的特征点。For each category whose number of corresponding feature points is less than the first preset number and whose difference with the number of feature points in other categories is greater than the second preset number, delete the category, and corresponding feature points.

进一步地，该对各类别下运动视觉匹配特征点对分别进行EPNP位姿估计和地图点估计，得到EPNP的位姿估计结果和地图点估计结果包括：Further, the EPNP pose estimation and map point estimation are performed on the motion vision matching feature points under each category respectively, and the pose estimation results and map point estimation results of the EPNP include:

进一步地，该得到估计位姿和估计地图点之后，包括：Further, after obtaining the estimated pose and estimated map points, it includes:

根据该IMU的位姿估计结果，利用基于图的优化算法，对该估计位姿和估计地图点进行优化，得到优化后的地图点；According to the pose estimation result of the IMU, the estimated pose and the estimated map points are optimized using a graph-based optimization algorithm to obtain optimized map points;

该图的优化算法为：The optimization algorithm for this graph is:

其由表示EPNP位姿估计得到的估计位姿变换矩阵，表示IMU预积分算法得到的位姿变换矩阵，z^k+1表示相机图像的像素坐标，Pk表示在第k帧相机坐标系下地图点3D坐标，π_c(.)是基于相机的模型的投影变换，用于将相机坐标系下3D坐标变化为相机图像中的2D像素坐标。by Represents the estimated pose transformation matrix obtained by EPNP pose estimation, Represents the pose transformation matrix obtained by the IMU pre-integration algorithm, z^k+1 represents the pixel coordinates of the camera image, Pk represents the 3D coordinates of the map point in the k-th frame camera coordinate system, π_c (.) is the projection of the camera-based model Transformation, used to transform the 3D coordinates in the camera coordinate system into 2D pixel coordinates in the camera image.

进一步地，该对该估计位姿和估计地图点进行优化，得到优化后的地图点之后，包括：Further, the estimated pose and estimated map points are optimized, and after the optimized map points are obtained, it includes:

利用该优化后的地图点替换优化前的地图点，更新该局部地图，剔除不属于该局部地图的点；Use the optimized map point to replace the map point before optimization, update the local map, and eliminate points that do not belong to the local map;

若高于该预设阈值，则利用当前关键帧替换最新关键帧；If it is higher than the preset threshold, replace the latest keyframe with the current keyframe;

若低于该预设阈值，则将当前帧加入关键帧集合。If it is lower than the preset threshold, add the current frame to the keyframe set.

进一步地，该根据该特征点，得到该左目和右目的运动视觉匹配特征点对包括：Further, according to the feature points, obtaining the left-eye and right-eye motion vision matching feature point pairs includes:

对该左目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，得到左目的匹配特征点对；Perform ORB descriptor matching on the feature points in the current frame to which the left-eye image belongs and the feature points corresponding to the map points in the previous frame to which the left-eye image belongs, to obtain a left-eye matching feature point pair;

对该右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，得到右目的匹配特征点对；其中，该左目和右目的运动视觉匹配特征点对包括该左目的匹配特征点对和该右目的匹配特征点对。Perform ORB descriptor matching on the feature points in the current frame to which the right-eye image belongs and the feature points corresponding to the map points in the previous frame to which the right-eye image belongs, to obtain a right-eye matching feature point pair; wherein, the left-eye and right-eye motion visual matching The pair of feature points includes the pair of matching feature points of the left object and the pair of matching feature points of the right object.

进一步地，该根据该特征点，得到该左目和右目的立体视觉匹配特征点对包括：Further, according to the feature points, obtaining the left-eye and right-eye stereo vision matching feature point pairs includes:

分别对该左目图像所属的当前帧中的特征点和该右目图像所属的当前帧中的特征点分别进行ORB描述子匹配和SAD立体匹配，得到该左目和右目的立体视觉匹配特征点对。Perform ORB descriptor matching and SAD stereo matching on the feature points in the current frame to which the left-eye image belongs and the feature points in the current frame to which the right-eye image belongs to obtain the left-eye and right-eye stereo vision matching feature point pairs.

在本发明实施例中，本发明在剔除正在运动的物体而非动态物体，通过聚类的方式来区分运动物体和非运动物体的方法，通过对帧间特征点对速度和角度进行聚类，从而对运动物体的匹配特征点集与非运动物体特征点集区分开来；通过IMU预积分算法得到的帧间位姿变换矩阵来筛选与IMU数据最接近的基于不同特征点集合估计的位姿，从而有效剔除一些外点比如属于运动物体的特征点；将单目的运动视觉与双目的立体视觉相融合的位姿与深度估计方法，可以有效提高估计的精度。In the embodiment of the present invention, the present invention eliminates moving objects instead of dynamic objects, and distinguishes between moving objects and non-moving objects through clustering. By clustering the speed and angle of the inter-frame feature points, In this way, the matching feature point set of the moving object is distinguished from the feature point set of the non-moving object; the inter-frame pose transformation matrix obtained by the IMU pre-integration algorithm is used to filter the pose that is closest to the IMU data and estimated based on different feature point sets , so as to effectively eliminate some outliers such as feature points belonging to moving objects; the pose and depth estimation method that combines monocular motion vision and binocular stereo vision can effectively improve the accuracy of estimation.

请参阅图2，图2是本发明又一实施例提供的动态环境下的双目视觉定位系统的结构示意图，该系统主要包括：Please refer to FIG. 2. FIG. 2 is a schematic structural diagram of a binocular vision positioning system in a dynamic environment provided by another embodiment of the present invention. The system mainly includes:

该主线程，用于获取相机双目视觉下的左目图像和右目图像，并进行初始化，当初始化成功后，接收该位姿估计线程或回环线程发送的重启信号，以根据该重启信号，再次进行初始化；The main thread is used to obtain the left-eye image and right-eye image under the binocular vision of the camera, and initialize it. After the initialization is successful, it receives the restart signal sent by the pose estimation thread or the loopback thread, and performs the operation again according to the restart signal. initialization;

具体的，如图3所示，系统主线程的功能主要是开启其他线程和初始化地图，为后续其他线程提供初始可信的数据，主要流程与工作原理如下：Specifically, as shown in Figure 3, the function of the main thread of the system is mainly to open other threads and initialize the map, and provide initial credible data for other subsequent threads. The main process and working principle are as follows:

分别从双目相机左目和右目获取图像，利用FAST角点检测提取左目图像和右目图像的特征点，然后将图像划分为10个子图像区域，在每个子图像区域按照Harris响应值从大到小选取N个具有最大响应值的角点，作为最终的特征点集合。并且计算每个特征点的旋转方向，特征点的旋转方向定义为以改特征点为中心的图像块的几何中心到质心的方向向量。然后计算特征点的描述子，描述子采用的是与ORB(Oriented FAST and RotatedBRIEF)相同的二进制BRIEF描述子。Obtain images from the left and right eyes of the binocular camera respectively, use FAST corner detection to extract the feature points of the left and right images, and then divide the image into 10 sub-image areas, and select each sub-image area according to the Harris response value from large to small N corner points with the maximum response value are used as the final set of feature points. And calculate the rotation direction of each feature point, the rotation direction of the feature point is defined as the direction vector from the geometric center of the image block centered on the feature point to the centroid. Then calculate the descriptor of the feature point, the descriptor uses the same binary BRIEF descriptor as ORB (Oriented FAST and RotatedBRIEF).

对左目图像特征点与右目图像特征点先进行ORB描述子匹配，然后进行SAD(Sumof absolute differences)亚像素级匹配，接着利用特征点旋转角一致性进行有效特征点筛选从而得到左右相机图像精确的特匹配特征点对。For the feature points of the left-eye image and the feature points of the right-eye image, ORB descriptor matching is performed first, and then SAD (Sum of absolute differences) sub-pixel level matching is performed, and then the effective feature point selection is carried out by using the consistency of the rotation angle of the feature points to obtain accurate left and right camera images. Special matching feature point pairs.

利用双目相机模型计算匹配特征点的真实深度，这里z表示在相机参考系下，特征点沿z轴方向的真实坐标，f表示相机的焦距，b为相机的基线，u_L、u_R分别为特征点在左目图像和右目图像中的像素横坐标。检测z是否为正，若为正则深度估计成功，还原该特征点的3D坐标，此3D坐标我们叫做地图点，不为正则是该特征点深度估计失败，剔除该特征点对。Utilize binocular camera model Calculate the real depth of the matching feature points, where z represents the real coordinates of the feature points along the z-axis in the camera reference system, f represents the focal length of the camera, b is the baseline of the camera, u_L and u_R are respectively the feature points in the left eye The abscissa of pixels in the image and the right-eye image. Check whether z is positive, if it is regular, the depth estimation is successful, restore the 3D coordinates of the feature point, this 3D coordinate is called a map point, if it is not positive, the depth estimation of the feature point fails, and the feature point pair is eliminated.

如果成功匹配特征点对大于一定阈值，则认为初始化成功，否则重新进行初始化。If the successfully matched feature point pair is greater than a certain threshold, the initialization is considered successful, otherwise re-initialization.

一旦初始化确认成功，则将地图点加入局部地图和全局地图类当中以及将当前帧添加为关键帧，并且将当前帧标记为上一帧，以便进行后续处理。局部地图是指当前图像帧以及与当前图像帧有共视关系的过去帧上所有匹配特征点对对应的地图点组成的地图点集合，而全局地图则是过去所有地图点组成的地图点集合。关键帧集合则是为了维护数据集规模，只将符合条件的图像帧保存到本地作为关键帧，为后续闭环检测和后端优化所使用。Once the initialization is confirmed successfully, map points are added to the local map and global map classes and the current frame is added as a key frame, and the current frame is marked as the previous frame for subsequent processing. The local map refers to the map point set composed of the current image frame and the map points corresponding to all matching feature point pairs on the past frames that have a common view relationship with the current image frame, while the global map is the map point set composed of all past map points. The key frame collection is to maintain the scale of the data set, and only the qualified image frames are saved locally as key frames, which are used for subsequent loop closure detection and back-end optimization.

整个系统初始化成功之后，便进入轮询其他线程信号状态，若接收到其他线程发来的重启信号，便再次进入分别从双目相机左目和右目获取图像开始重新初始化系统。After the whole system is successfully initialized, it enters the state of polling other threads for signals. If it receives a restart signal from other threads, it enters to obtain images from the left and right eyes of the binocular camera again to start reinitializing the system.

该位姿估计线程，用于获取相机双目视觉下的左目图像和右目图像，并提取该左目图像和右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点，根据该特征点，计算当前帧的估计位姿和估计地图点；The pose estimation thread is used to obtain the left-eye image and right-eye image under the binocular vision of the camera, and extract the features corresponding to the feature points in the current frame to which the left-eye image and the right-eye image belong and the map points in the previous frame to which they belong Point, according to the feature point, calculate the estimated pose and estimated map point of the current frame;

具体的，请参阅图4、图5、图6和图7，位姿估计线程的工作过程如下：Specifically, please refer to Figure 4, Figure 5, Figure 6, and Figure 7. The working process of the pose estimation thread is as follows:

分别从双目相机的左目和右目获取图像，Obtain images from the left and right eyes of the binocular camera respectively,

分别从双目相机左目和右目获取图像，利用FAST角点检测提取左目图像和右目图像的特征点，然后将图像划分为10个子图像区域，在每个子图像区域按照Harris响应值从大到小选取N个具有最大响应值的角点，作为最终的特征点集合。并且计算每个特征点的旋转方向，特征点的旋转方向定义为以改特征点为中心的图像块的几何中心到质心的方向向量。然后计算特征点的描述子，描述子采用的是与ORB(Oriented FAST and RotatedBRIEF)相同的二进制BRIEF描述子。从预置惯导模块(IMU模块)读取加速度a_t和角速度w_t，并且对加速度与角速度做滤波处理。Obtain images from the left and right eyes of the binocular camera respectively, use FAST corner detection to extract the feature points of the left and right images, and then divide the image into 10 sub-image areas, and select each sub-image area according to the Harris response value from large to small N corner points with the maximum response value are used as the final set of feature points. And calculate the rotation direction of each feature point, the rotation direction of the feature point is defined as the direction vector from the geometric center of the image block centered on the feature point to the centroid. Then calculate the descriptor of the feature point, the descriptor uses the same binary BRIEF descriptor as ORB (Oriented FAST and RotatedBRIEF). Acceleration a_t and angular velocity w_t are read from the preset inertial navigation module (IMU module), and the acceleration and angular velocity are filtered.

其中，IMU惯导模块作为辅助手段用于动态环境下筛选合适的视觉定位结果，利用码盘、编码器、电子罗盘等能够得到机器人位移、速度、角速度、角度的传感器作为辅助手段一样也能完成本发明的动态环境下下配合视觉传感器的处理效果。Among them, the IMU inertial navigation module is used as an auxiliary means to screen the appropriate visual positioning results in a dynamic environment. Using code discs, encoders, electronic compass and other sensors that can obtain robot displacement, speed, angular velocity, and angle can also be used as auxiliary means. The present invention cooperates with the processing effect of the visual sensor under the dynamic environment.

分别将左目和右目所属的当前帧特征点与所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，这里采用DBOW3词袋技术对特征点匹配进行加速，然后利用特征点旋转角一致性进行有效特征点筛选，分别得到左目匹配特征点对，和右目特征点对，这里我们称之为左目和右目的运动视觉匹配特征点对。将左目当前帧与右目当前进行ORB描述子匹配和SAD立体匹配优化得到更加精确的左目和右目立体匹配特征点对，我们称为立体视觉匹配特征点对，然后筛选出运动视觉匹配特征点对与提提视觉匹配特征点对的共有匹配特征点对集，对相应的特征点进行立体深度估计，还原特征点3D坐标。Match the feature points of the current frame of the left eye and the right eye with the feature points corresponding to the map points in the previous frame respectively, and use the DBOW3 word bag technology to accelerate the feature point matching, and then use the feature point rotation angle Consistency screening of effective feature points to obtain left-eye matching feature point pairs and right-eye feature point pairs, here we call the left-eye and right-eye motion visual matching feature point pairs. Perform ORB descriptor matching and SAD stereo matching optimization on the current frame of the left eye and the current right eye to obtain more accurate left and right eye stereo matching feature point pairs, which we call stereo vision matching feature point pairs, and then filter out motion vision matching feature point pairs and The common matching feature point pair set of the visual matching feature point pair is mentioned, the stereo depth estimation is performed on the corresponding feature point, and the 3D coordinates of the feature point are restored.

计算左目和右目的运动视觉匹配特征点对之间的光流速度v和旋转角θ，光流速度与旋转角原理示意如图8所示，计算公式为：Calculate the optical flow velocity v and rotation angle θ between the left-eye and right-eye motion visual matching feature point pairs. The principle of optical flow velocity and rotation angle is shown in Figure 8. The calculation formula is:

利用K-means++聚类算法将左目和右目的运动视觉匹配特征点对按照光流速度和旋转角聚成M个类别。在实际环境中，运动物体的速度与方向与背景元素的速度与方向是有很大差别的，通过聚类可以将不同速度的运动物体与静止物体的特征点进行隔离，在得到聚类结果后，对每个类别的特征点数做直方图统计，筛去统计数量过小且与别的数据差距较大的类以及对应的特征点对。Using the K-means++ clustering algorithm, the left-eye and right-eye motion visual matching feature point pairs are clustered into M categories according to the optical flow velocity and rotation angle. In the actual environment, the speed and direction of moving objects are very different from the speed and direction of background elements. Through clustering, the moving objects with different speeds can be isolated from the feature points of stationary objects. After the clustering results are obtained , make histogram statistics on the number of feature points of each category, and screen out the classes and corresponding feature point pairs that are too small in statistics and have a large gap with other data.

利用IMU预积分技术计算左目和右目相机在上一帧与当前帧时间段内的帧间位移，IMU预积分原理核心公式如下Use the IMU pre-integration technology to calculate the frame-to-frame displacement of the left-eye and right-eye cameras in the time period between the previous frame and the current frame. The core formula of the IMU pre-integration principle is as follows

利用得到的运动视觉匹配特征点集的不同的聚类，按照不同类别的匹配特征点集分别进行EPNP位姿估计和地图点估计，通过RANSAC随机采用一致性算法，对每个类别的位姿估计结果进行外点剔除。得到每个类别相应的位姿估计结果后，与IMU位姿结果进行比较，一般来说经过滤波优化的IMU短期位姿估计时准确的，因此利用IMU帧间位姿估计可以有效筛选真实的EPNP，排除运动物体特征点集的位姿估计结果，从而使得图像位姿估计结果更加真实精确，减少运动物体对估计结果的影响。通过结合IMU数据的筛选机制，我们可以得到较为准确的EPNP位姿和地图点估计结果，我们称为最佳EPNP位姿与地图点估计，对应的匹配特征点集称做运动视觉有效匹配特征点对集。Using the different clusters of the obtained motion vision matching feature point sets, the EPNP pose estimation and map point estimation are performed according to the matching feature point sets of different categories, and the pose estimation of each category is estimated by randomly adopting the consensus algorithm through RANSAC The result is outlier removal. After obtaining the corresponding pose estimation results of each category, compare them with the IMU pose results. Generally speaking, the IMU short-term pose estimation after filter optimization is accurate, so the real EPNP can be effectively screened by using the IMU inter-frame pose estimation. , excluding the pose estimation result of the moving object feature point set, so that the image pose estimation result is more realistic and accurate, and the influence of the moving object on the estimation result is reduced. By combining the screening mechanism of IMU data, we can get more accurate EPNP pose and map point estimation results, which we call the best EPNP pose and map point estimation, and the corresponding matching feature point set is called motion vision effective matching feature point pair set.

从共有匹配特征点集中筛选出属于有效匹配特征点对集的特征点，也就是说选出属于最佳EPNP位姿估计结果对应的特征点类的特征点对，然后对该特征点对对应的立体深度估计值与EPNP估计的深度值进行卡尔曼滤波融合，从而得到更加精确的地图点。Select the feature points that belong to the effective matching feature point pair set from the common matching feature point set, that is to say, select the feature point pair that belongs to the feature point class corresponding to the best EPNP pose estimation result, and then the corresponding feature point pair The stereo depth estimation value and the depth value estimated by EPNP are fused by Kalman filter to obtain more accurate map points.

利用基于图的优化算法得到的位姿与地图点进行优化，优化的过程中融入IMU预积分得到的结果，将IMU预积分结果也作为优化约束条件之一，从而达到融合优化的效果，优化的核心公式如下：The pose and map points obtained by the graph-based optimization algorithm are used to optimize, and the results obtained by IMU pre-integration are integrated into the optimization process, and the IMU pre-integration results are also used as one of the optimization constraints, so as to achieve the effect of fusion optimization. The core formula is as follows:

其中表示EPNP位姿估计得到的估计位姿变换矩阵，表示IMU预积分算法得到的位姿变换矩阵，z^k+1表示相机图像的像素坐标，Pk表示在第k帧相机坐标系下地图点3D坐标，π_c(.)是基于相机的模型的投影变换，用于将相机坐标系下3D坐标变化为相机图像中的2D像素坐标。in Represents the estimated pose transformation matrix obtained by EPNP pose estimation, Represents the pose transformation matrix obtained by the IMU pre-integration algorithm, z^k+1 represents the pixel coordinates of the camera image, Pk represents the 3D coordinates of the map point in the k-th frame camera coordinate system, π_c (.) is the projection of the camera-based model Transformation, used to transform the 3D coordinates in the camera coordinate system into 2D pixel coordinates in the camera image.

在完成位姿估计之后，将正确的地图点加入局部地图与全局地图当中，对局部地图进行一次更新与替换，对于不在属于局部地图的点进行剔除，同时比较当前帧与次新关键帧之间的共视特征点是否高于阈值，如果高于阈值说明当前帧与最近的关键帧之间重合信息过多，不必要加入新关键帧，而是利用当前关键帧去替换最新关键帧，并且对局部地图点和全局地图点也进行部分替换与删除。如果低于阈值说明当前帧与最新关键帧之间信息差异较大，将当前帧加入关键帧集合以增加对环境的信息容量。After completing the pose estimation, add the correct map points to the local map and the global map, update and replace the local map once, remove the points that do not belong to the local map, and compare the current frame with the next new key frame Whether the common-view feature point is higher than the threshold. If it is higher than the threshold, it means that there is too much overlap information between the current frame and the nearest key frame. It is not necessary to add a new key frame, but use the current key frame to replace the latest key frame, and for Local map points and global map points are also partially replaced and deleted. If it is lower than the threshold, it means that the information difference between the current frame and the latest key frame is large, and the current frame is added to the key frame set to increase the information capacity of the environment.

对与最新关键帧有足够共视特征点的关键帧的位姿以及与这些关键帧相关的地图点进行BundleAdjustment优化以及外点剔除。以及将上一帧以及与当前帧有共视关系的关键帧加入当前帧的共视图当中，更新数据图结构。然后进入下轮循环当中。BundleAdjustment optimization and outlier elimination are performed on the poses of the keyframes that have enough common-view feature points with the latest keyframes and the map points related to these keyframes. And add the previous frame and key frames that have a common view relationship with the current frame into the common view of the current frame, and update the data graph structure. Then enter the next cycle.

在上述过程中，都是考虑有上一帧地图点与当前帧足够匹配特征点的前提下进行的，如果匹配特征点不足，则需要补充特征点，补充特征点的方式主要有两种，一种是搜索局部地图，增加可匹配地图点，另一种则是对上一帧与当前帧重新进行特征匹配，匹配特征点范围不在局限于地图点对应的上一帧特征点，而是上一帧中所有特征点都参与匹配，从而有效弥补特征点空缺，一旦特征点满足需求则继续进行位姿估计，如果不满足需求，则利用IMU记录短期位姿变换以保持位姿轨迹的连续性，进入下一轮循环，直到找到足够特征点，如果连续很长时间没有找到足够特征点说明定位失效，此时IMU也因累积误差而不精确，需要重新初始化系统。In the above process, it is carried out under the premise that there are enough matching feature points between the map points of the previous frame and the current frame. If the matching feature points are insufficient, feature points need to be supplemented. There are two main ways to supplement feature points. One is to search the local map and add map points that can be matched. The other is to perform feature matching on the previous frame and the current frame. The range of matching feature points is not limited to the feature points of the previous frame corresponding to the map points, but the previous frame. All the feature points in the frame are involved in the matching, so as to effectively make up for the vacancy of the feature points. Once the feature points meet the requirements, the pose estimation will continue. If the requirements are not met, the IMU will be used to record the short-term pose transformation to maintain the continuity of the pose trajectory. Enter the next cycle until enough feature points are found. If not enough feature points are found for a long time, the positioning fails. At this time, the IMU is also inaccurate due to accumulated errors, and the system needs to be re-initialized.

该回环线程，用于检测回环帧，估计当前帧与该回环帧之间的位姿变换矩阵，并以该回环帧的位姿为基准，通过该位姿变换矩阵对该当前帧的位姿进行更新；The loopback thread is used to detect the loopback frame, estimate the pose transformation matrix between the current frame and the loopback frame, and use the pose of the loopback frame as a reference to perform the pose transformation of the current frame through the pose transformation matrix renew;

具体的，请参阅图9和图10，回环检测是指检测机器人是否回到曾经到过的地方，如果检测到回到曾经来过的地方，这可以建立起当前帧与回环帧之间的几何约束，通过优化消除从回环帧到当前帧这中间的累积误差。回环检测与特征匹配过程类似，主要采用DBOW3词袋加速，在过去所有关键帧中检测与最新关键帧有非常高的匹配率的帧，一旦检测到，该帧很可能就是回环帧，然后根据一系列筛选机制比如几何一致性、组匹配、时间一致性能方法确定是否为回环帧，一旦在关键帧集合中找到回环帧则估计当前帧与回环帧之间的位姿变换矩阵，对回环帧与当前帧之间的位姿变换矩阵以及当前帧的地图点进行BundleAdjustment优化以及外点剔除，最后以回环帧的位姿为基准通过回环帧与当前帧之间的位姿变换矩阵对当前帧的位姿进行更新，对与当前帧共视程度高的共视帧进行位姿进行更新，以及对相关帧相关的地图点进行更新，整个回环优化的作用消除部分累积误差，建立集合约束，为全局一致性优化提供约束条件。Specifically, please refer to Figure 9 and Figure 10. Loopback detection refers to detecting whether the robot returns to the place it has been to. If it detects that it has returned to the place it has been to, it can establish the geometry between the current frame and the loopback frame. Constraints, eliminate the cumulative error from the loopback frame to the current frame through optimization. Loopback detection is similar to the feature matching process. It mainly uses DBOW3 word bag to accelerate. It detects frames with a very high matching rate with the latest keyframe in all past keyframes. Once detected, the frame is likely to be a loopback frame, and then according to a A series of screening mechanisms such as geometric consistency, group matching, and time consistency can determine whether it is a loop frame. Once a loop frame is found in the key frame set, the pose transformation matrix between the current frame and the loop frame is estimated, and the loop frame and the current The pose transformation matrix between frames and the map points of the current frame are optimized by BundleAdjustment and outlier points are eliminated. Finally, the pose of the current frame is adjusted based on the pose transformation matrix between the loop frame and the current frame based on the pose of the loop frame. Update, update the pose of the common-view frame with a high degree of common-view with the current frame, and update the map points related to the relevant frame. The role of the entire loop optimization eliminates part of the accumulated error, establishes a set constraint, and achieves global consistency. Optimization provides constraints.

该后端优化线程，用于根据IMU预积分算法的位姿估计结果以及当前帧与回环帧之间的位姿变换矩阵对所有帧的位姿进行优化。The backend optimization thread is used to optimize the poses of all frames according to the pose estimation results of the IMU pre-integration algorithm and the pose transformation matrix between the current frame and the loopback frame.

具体的，请参阅图11和图12端优化线程是提高定位精度，减少全局累积误差的关键，在后端优化模块将对所有关键帧以及当前帧的共视帧以及全局地图点进行优化。整个优化流程又分为两大步，第一步位姿图优化，即只对关键帧的位姿进行优化，通过引入IMU预积分的帧间位姿约束以及当前帧与回环帧之间的约束对所有位姿进行优化，该步骤主要目的是融合IMU数据对关键帧的位姿进行优化，第二步全局一致性优化，在第一步的基础上回环优化为基础进一步消除累积误差，在回环约束下调整全局地图点和全部帧的位姿，从而减少轨迹漂移，保持全局一致性。Specifically, please refer to Figure 11 and Figure 12. The optimization thread is the key to improving positioning accuracy and reducing global cumulative errors. The back-end optimization module will optimize all key frames, the common view frame of the current frame, and the global map points. The entire optimization process is divided into two steps. The first step is to optimize the pose graph, that is, to optimize only the pose of the key frame. By introducing the inter-frame pose constraints of IMU pre-integration and the constraints between the current frame and the loop frame Optimize all poses. The main purpose of this step is to fuse the IMU data to optimize the poses of the key frames. The second step is global consistency optimization. Based on the first step, the loopback optimization is used to further eliminate the cumulative error. Adjust the pose of the global map point and all frames under constraints, thereby reducing trajectory drift and maintaining global consistency.

本发明改变现有处理动态环境的方式，通过剔除运动物体而非动态物体的方式，可以保留大量有效特征点，可以保留静止的动态物体特征点来提高位姿估计的精度，而且相比较剔除动态物体的方式，本发明可以适应极端情况下，比如场景中只有静止的动态物体，如果采用剔除动态物体的方式则无法获取足够特征点，而采用剔除运动物体的方法则可以保留静止的动态物体特征点用于位姿估计。The present invention changes the existing way of dealing with the dynamic environment, by eliminating moving objects instead of dynamic objects, a large number of effective feature points can be retained, and static dynamic object feature points can be retained to improve the accuracy of pose estimation. Objects, the present invention can adapt to extreme situations, such as only static dynamic objects in the scene, if the method of removing dynamic objects is used, it is impossible to obtain enough feature points, and the method of removing moving objects can retain the characteristics of static dynamic objects points are used for pose estimation.

本发明改变现有基于深度学习的语义标签筛选的方式来应对动态环境的SLAM方法，提出一种对帧间匹配特征点对的速度和旋转方向进行聚类的方法区分运动物体与非运动物体(静态物体和静止的动态物体)，采用IMU数据作为辅助约束信息的方式，筛选不同聚类数据集位姿估计的结果中最接近IMU数据的位姿估计作为最佳位姿估计来剔除动态场景下运动物体对位姿估计的干扰，相比较基于深度学习的语义方式，大大降低了运算复杂度提高了定位算法的实时性。The present invention changes the existing deep learning-based semantic label screening method to deal with the dynamic environment SLAM method, and proposes a method for clustering the speed and rotation direction of the matching feature point pairs between frames to distinguish moving objects from non-moving objects ( Static objects and static dynamic objects), using IMU data as auxiliary constraint information, screening the pose estimation of different clustering data sets that is closest to the IMU data as the best pose estimation to eliminate dynamic scenes Compared with the semantic method based on deep learning, the interference of moving objects on pose estimation greatly reduces the computational complexity and improves the real-time performance of the positioning algorithm.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本发明所必须的。It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence. Because of the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上为对本发明所提供的动态环境下的双目视觉定位方法及系统的描述，对于本领域的一般技术人员，依据本发明实施例的思想，在具体实施方式及应用范围上均会有改变之处，综上，本说明书内容不应理解为对本发明的限制。The above is a description of the binocular vision positioning method and system in a dynamic environment provided by the present invention. For those of ordinary skill in the art, according to the idea of the embodiment of the present invention, there will be changes in the specific implementation and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

Translated fromChinese

1.一种动态环境下的双目视觉定位方法，其特征在于，包括：1. A binocular vision positioning method under a dynamic environment, characterized in that, comprising:

2.根据权利要求1所述的动态环境下的双目视觉定位方法，其特征在于，所述IMU预积分算法：2. the binocular vision positioning method under the dynamic environment according to claim 1, is characterized in that, described IMU pre-integration algorithm:

其中分别表示在第k帧相机坐标系下从第k帧到k+1帧时间段内的位移增量，速度增量、以及旋转角度变化的四元数，a_t表示加速计测量值，w_t表示角速度测量值，b_at和b_wt分别表示加速度计和陀螺仪的偏置，n_a和n_w分别表示噪声，表示时刻t相对于第k帧相机坐标系的旋转变换矩阵，可由相关公式得到，Ω是与四元数计算相关的矩阵，Δt_k表示t_k到t_k+1的时间间隔，g^w表示在世界坐标系下的重力向量。in Respectively represent the displacement increment, velocity increment, and quaternion of the rotation angle change in the k-th frame camera coordinate system from the k-th frame to the k+1 frame time period, a_t represents the accelerometer measurement value, w_t Indicates the angular velocity measurement value, b_at and b_wt represent the bias of the accelerometer and gyroscope, respectively, n_a and n_w represent the noise, Represents the rotation transformation matrix of the moment t relative to the camera coordinate system of the kth frame, which can be obtained by According to the relevant formula, Ω is a matrix related to quaternion calculation, Δt_k represents the time interval from t_k to t_k+1 , and g^w represents the gravity vector in the world coordinate system.

3.根据权利要求1所述的动态环境下的双目视觉定位方法，其特征在于，所述按照聚类结果将所述运动视觉匹配特征点对分为多个类别之后，包括：3. the binocular vision positioning method under the dynamic environment according to claim 1, is characterized in that, after described according to clustering result described motion vision matching feature point pair is divided into a plurality of categories, comprises:

4.根据权利要求3所述的动态环境下的双目视觉定位方法，其特征在于，所述对各类别下运动视觉匹配特征点对分别进行EPNP位姿估计和地图点估计，得到EPNP的位姿估计结果和地图点估计结果包括：4. the binocular vision positioning method under the dynamic environment according to claim 3, it is characterized in that, described to carry out EPNP pose estimation and map point estimation respectively to motion vision matching feature point under each category, obtain the position of EPNP The pose estimation results and map point estimation results include:

5.根据权利要求4所述的动态环境下的双目视觉定位方法，其特征在于，所述得到估计位姿和估计地图点之后，包括：5. the binocular vision positioning method under the dynamic environment according to claim 4, is characterized in that, after described obtaining estimated pose and estimated map point, comprising:

所述图的优化算法为：The optimization algorithm for the graph is:

其中，表示EPNP位姿估计得到的估计位姿变换矩阵，表示IMU预积分算法得到的位姿变换矩阵，z^k+1表示相机图像的像素坐标，Pk表示在第k帧相机坐标系下地图点3D坐标，π_c(.)是基于相机的模型的投影变换，用于将相机坐标系下3D坐标变化为相机图像中的2D像素坐标。in, Represents the estimated pose transformation matrix obtained by EPNP pose estimation, Represents the pose transformation matrix obtained by the IMU pre-integration algorithm, z^k+1 represents the pixel coordinates of the camera image, Pk represents the 3D coordinates of the map point in the k-th frame camera coordinate system, π_c (.) is the projection of the camera-based model Transformation, used to transform the 3D coordinates in the camera coordinate system into 2D pixel coordinates in the camera image.

6.根据权利要求5所述的动态环境下的双目视觉定位方法，其特征在于，所述对所述估计位姿和估计地图点进行优化，得到优化后的地图点之后，包括：6. the binocular vision positioning method under the dynamic environment according to claim 5, is characterized in that, described estimation pose and estimated map point are optimized, after obtaining the optimized map point, include:

7.根据权利要求1所述的动态环境下的双目视觉定位方法，其特征在于，所述根据所述特征点，得到所述左目和右目的运动视觉匹配特征点对包括：7. the binocular vision positioning method under the dynamic environment according to claim 1, is characterized in that, described according to described feature point, obtains described left eye and right eye motion visual matching feature point pair and comprises:

对所述右目图像所属的当前帧中的特征点和所属的上一帧中的地图点对应的特征点进行ORB描述子匹配，得到右目的匹配特征点对；其中，所述左目和右目的运动视觉匹配特征点对包括所述左目的匹配特征点对和所述右目的匹配特征点对。Perform ORB descriptor matching on the feature points in the current frame to which the right-eye image belongs and the feature points corresponding to the map points in the previous frame to which the right-eye image belongs, to obtain a right-eye matching feature point pair; wherein, the left-eye and right-eye motion The pair of visual matching feature points includes the pair of matching feature points for the left purpose and the pair of matching feature points for the right purpose.

8.根据权利要求1所述的动态环境下的双目视觉定位方法，其特征在于，所述根据所述特征点，得到所述左目和右目的立体视觉匹配特征点对包括：8. the binocular vision positioning method under the dynamic environment according to claim 1, is characterized in that, described according to described feature point, obtaining described left eye and right eye stereo vision matching feature point pair comprises:

9.一种动态环境下的双目视觉定位系统，其特征在于，包括：9. A binocular vision positioning system in a dynamic environment, characterized in that it comprises:

10.根据权利要求9所述的动态环境下的双目视觉定位系统，其特征在于，所述根据所述特征点，计算当前帧的估计位姿和估计地图点包括：10. the binocular vision positioning system under the dynamic environment according to claim 9, is characterized in that, described according to described feature point, calculates the estimated pose of current frame and estimated map point and comprises: