CN114612494A

Movatterモバイル変換

Info

Publication number: CN114612494A
Application number: CN202210242660.5A
Authority: CN
Inventors: 樊卫华; 刘佳祺; 宋辉
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-10
Anticipated expiration: 2042-03-11
Also published as: CN114612494B

Abstract

Translated fromChinese

本发明公开了一种动态场景下的移动机器人视觉里程计设计方法，包括：通过Intel RealSense深度相机获取实时图像信息并对其进行预处理；使用基于ORB算法提出的自适应阈值方法更全面地提取图像特征点；使用MS_COCO数据集作为样本训练YOLACT网络并对图像进行实例分割，从而获取图像语义信息；结合图像语义信息和L‑K光流法粗滤除动态特征点；基于渐进一致采样算法估计基本矩阵F，然后计算极线距离并精滤除动态特征点；最后选取滤除后的关键帧并作为局部建图线程的输入。本发明结合环境语义信息和几何约束滤除关键帧的动态特征点，从而有效避免周围环境中动态物体的影响，提高了机器人在动态环境下定位及建图的准确率。

The invention discloses a visual odometry design method for a mobile robot in a dynamic scene, comprising: acquiring real-time image information through an Intel RealSense depth camera and preprocessing it; using an adaptive threshold method based on an ORB algorithm to extract more comprehensively Image feature points; use the MS_COCO dataset as a sample to train the YOLACT network and segment images to obtain image semantic information; combine image semantic information and L‑K optical flow method to coarsely filter out dynamic feature points; estimate based on progressive consistent sampling algorithm Basic matrix F, then calculate the epipolar distance and finely filter out the dynamic feature points; finally select the filtered key frame and use it as the input of the local mapping thread. The invention combines environmental semantic information and geometric constraints to filter out dynamic feature points of key frames, thereby effectively avoiding the influence of dynamic objects in the surrounding environment, and improving the accuracy of robot positioning and mapping in dynamic environments.

Description

Translated fromChinese

一种动态场景下的移动机器人视觉里程计设计方法A Design Method of Visual Odometry for Mobile Robots in Dynamic Scenarios

技术领域technical field

本发明属于机器人同步建图与定位SLAM技术领域，具体涉及一种动态场景下移动机器人视觉里程计设计方法。The invention belongs to the technical field of robot synchronous mapping and positioning SLAM, and in particular relates to a method for designing a visual odometry of a mobile robot in a dynamic scene.

背景技术Background technique

移动机器人的市场功能需求越来越丰富，相关研究也成为近年热点问题，有着广阔的发展前景。对于移动机器人而言，自主导航能力是其能够实现各种高级应用功能及算法技术的重要基础，而机器人对自身位姿状态的定位认知能力和未知环境的探索感知能力又是实现自主导航和路径规划的前提，因此用于解决这两个关键问题的同时定位和地图构建(Simultaneous Localization and Mapping，SLAM)技术具有积极的科学研究价值与实际应用意义。The market function requirements of mobile robots are becoming more and more abundant, and related research has become a hot issue in recent years, with broad development prospects. For mobile robots, autonomous navigation capability is an important basis for the realization of various advanced application functions and algorithm technologies, and the robot's ability to locate and recognize its own posture and state and explore and perceive the unknown environment is the key to realizing autonomous navigation and Therefore, the Simultaneous Localization and Mapping (SLAM) technology used to solve these two key problems has positive scientific research value and practical application significance.

随着硬件平台计算力的提高和计算机视觉相关技术的发展，视觉SLAM系统已经突破了许多难点，在诸多研究人员的努力下取得了一定的研究成果，也形成了较为成熟的系统框架，但在实际应用过程中仍然存在一些会影响最终定位建图效果的问题，这仍待进一步优化。目前提出的大部分方案是基于环境稳定静态的前提假设，但现实环境中存在动态障碍物、光照变化、各物体互相重叠遮挡等众多不可控因素，这会导致最终结果产生较大偏差，在复杂环境下机器人的定位建图精度和鲁棒性均比较差。为消除系统的应用场景局限性，本发明对目前发展比较成熟的视觉SLAM算法框架ORB-SLAM2进行优化改进，针对动态场景问题提出一种结合语义信息和几何约束的视觉里程计设计方法。With the improvement of the computing power of the hardware platform and the development of computer vision related technologies, the visual SLAM system has broken through many difficulties. With the efforts of many researchers, certain research results have been achieved, and a relatively mature system framework has also been formed. In the actual application process, there are still some problems that will affect the final positioning and mapping effect, which still needs to be further optimized. Most of the solutions proposed so far are based on the premise that the environment is stable and static, but there are many uncontrollable factors in the real environment, such as dynamic obstacles, changes in illumination, and overlapping and occlusion of objects, which will lead to large deviations in the final results. The accuracy and robustness of the robot's positioning and mapping in the environment are relatively poor. In order to eliminate the limitation of the application scene of the system, the present invention optimizes and improves the relatively mature visual SLAM algorithm framework ORB-SLAM2, and proposes a visual odometry design method combining semantic information and geometric constraints for dynamic scene problems.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有SLAM系统视觉里程计方案的缺点与不足，提出一种建图精度高、鲁棒性强、适应性广的结合语义信息和几何约束的动态场景下的移动机器人视觉里程计设计方法。The purpose of the present invention is to overcome the shortcomings and deficiencies of the existing SLAM system visual odometry scheme, and propose a mobile robot vision vision system in dynamic scenes combining semantic information and geometric constraints with high mapping accuracy, strong robustness and wide adaptability. Odometer design method.

实现本发明目的的技术方案为：一种动态场景下基于ORB-SLAM2的视觉里程计设计方法，包括以下步骤：The technical scheme for realizing the object of the present invention is: a visual odometry design method based on ORB-SLAM2 under a dynamic scene, comprising the following steps:

步骤S1：Intel RealSense深度相机获取实时图像信息并进行灰度化等预处理；Step S1: the Intel RealSense depth camera obtains real-time image information and performs preprocessing such as grayscale;

步骤S2：使用基于网格划分的自适应阈值方法提取图像ORB特征点；Step S2: using the adaptive threshold method based on grid division to extract image ORB feature points;

步骤S3：搭建YOLACT网络，采用MS_COCO数据集为样本完成训练后对图像进行实例分割，从而获取图像语义信息；Step S3: Build the YOLACT network, and use the MS_COCO dataset as the sample to perform instance segmentation on the image to obtain image semantic information;

步骤S4：结合图像语义信息和L-K光流法进行运动一致性检测，并粗滤除动态特征点；Step S4: combine image semantic information and L-K optical flow method to detect motion consistency, and roughly filter out dynamic feature points;

步骤S5：基于PROSAC方法对两帧图像的特征点进行匹配，并估计基础矩阵F；Step S5: matching the feature points of the two frames of images based on the PROSAC method, and estimating the fundamental matrix F;

步骤S6：根据基础矩阵F计算极线距离，进一步精滤除动态特征点，根据所选取的关键帧估计机器人初始位姿并对结果进行Bundle Adjustment优化。Step S6: Calculate the epipolar distance according to the fundamental matrix F, further finely filter out the dynamic feature points, estimate the initial pose of the robot according to the selected key frame, and perform Bundle Adjustment optimization on the result.

进一步地，步骤S1包含以下步骤：Further, step S1 includes the following steps:

步骤S11:视觉传感器部分选择能够同时获取图像深度信息及彩色信息的IntelRealSense D415深度相机；为加快系统运算速度，在提取特征点之前先对获取的实时图像进行预处理，包括去除图像噪点及灰度化等操作。Step S11: the visual sensor part selects the IntelRealSense D415 depth camera that can simultaneously acquire image depth information and color information; in order to speed up the system operation speed, before extracting feature points, the acquired real-time image is preprocessed, including removing image noise and grayscale. operations, etc.

进一步地，步骤S2包含以下步骤：Further, step S2 includes the following steps:

步骤S21：为保证特征点的尺度不变性，使同一幅图像进行比例缩放后仍能匹配相应特征点，首先对输入的图像构建尺度金字塔，在不同尺度图像中计算ORB特征点。金字塔将相机获取的原始图像作为第0层，逐层按比例因子缩小，直至金字塔顶层。Step S21: In order to ensure the scale invariance of the feature points, so that the same image can still match the corresponding feature points after scaling, firstly construct a scale pyramid for the input image, and calculate the ORB feature points in images of different scales. The pyramid takes the original image acquired by the camera as the 0th layer, and scales down layer by layer until the top layer of the pyramid.

步骤S22：为保证所提取特征点分布更加均匀合理，从而能够获取更全面的信息，将金字塔每层图像划分为一定行列数的网格，并设定每个网格内预提取特征点的数目和初始FAST角点提取阈值；Step S22: In order to ensure that the distribution of the extracted feature points is more uniform and reasonable, so that more comprehensive information can be obtained, the image of each layer of the pyramid is divided into grids with a certain number of rows and columns, and the number of pre-extracted feature points in each grid is set. and the initial FAST corner extraction threshold;

步骤S23：按初始阈值完成首次特征点预提取后，如果网格内实际提取特征点数目小于设定的预提取数目，则更改阈值继续提取，重复以上过程直至完成网格内特征点的自适应提取。Step S23: After completing the first feature point pre-extraction according to the initial threshold, if the actual number of extracted feature points in the grid is less than the set pre-extraction number, then change the threshold and continue to extract, and repeat the above process until the self-adaptation of the feature points in the grid is completed. extract.

进一步地，步骤S3包含以下步骤：Further, step S3 includes the following steps:

步骤S31：使用ResNet101卷积模块构建YOLACT主干网络部分，主要负责完成图像的特征提取；Step S31: Use the ResNet101 convolution module to construct the YOLACT backbone network part, which is mainly responsible for completing the feature extraction of the image;

步骤S32：构建Feature Pyramid Networks(FPN)网络，目的是生成多尺度特征图，保证能够检测不同尺寸的目标物体；Step S32: Constructing a Feature Pyramid Networks (FPN) network, the purpose is to generate a multi-scale feature map to ensure that target objects of different sizes can be detected;

步骤S33：构建Protonet分支，用于生成原型掩膜，提取待处理图像中重要的部分；Step S33: constructing a Protonet branch for generating a prototype mask and extracting important parts of the image to be processed;

步骤S34：构建Prediction Head分支，用于生成掩模系数，为使分割的实时性较好使用了共享卷积网络，该步骤与S33同步进行；Step S34: constructing a Prediction Head branch for generating mask coefficients, and using a shared convolutional network for better real-time segmentation, this step is performed synchronously with S33;

步骤S35：使用MS_COCO(Microsoft Common Objects in Context)数据集中比较常见的18类室内家居物品作为训练样本对YOLACT网络进行训练。Step S35: The YOLACT network is trained using 18 common types of indoor household items in the MS_COCO (Microsoft Common Objects in Context) data set as training samples.

进一步地，步骤S4包含以下步骤：Further, step S4 includes the following steps:

步骤S41：通过L-K光流法可跟踪计算出中动态物体的像素速度，结合运算结果和语义信息对该物体的动态性进行分析，若该目标与背景存在相对运动，则将当前帧该物体所包含的特征点剔除；若物体速度矢量较上一帧小于阈值，则保留相关特征点，至此完成动态特征点的粗滤除。Step S41: The pixel speed of the dynamic object can be tracked and calculated by the L-K optical flow method, and the dynamicity of the object is analyzed in combination with the operation result and the semantic information. The included feature points are eliminated; if the object velocity vector is smaller than the threshold value of the previous frame, the relevant feature points are retained, and the rough filtering of dynamic feature points is completed.

进一步地，步骤S5包含以下步骤：Further, step S5 includes the following steps:

步骤S51：计算两帧图像每个特征点对的最小欧氏距离和评价函数值，并按评价函数值的大小将特征点降序排列。每八个特征点为一组计算每组质量和并排序，选取匹配质量最高的8组匹配点并计算基础矩阵F；Step S51: Calculate the minimum Euclidean distance and the evaluation function value of each feature point pair of the two frames of images, and arrange the feature points in descending order according to the value of the evaluation function. Each group of eight feature points is calculated and sorted by the quality of each group, and the 8 groups of matching points with the highest matching quality are selected and the fundamental matrix F is calculated;

步骤S52：将上述8组匹配点从子集中剔除后，根据基础矩阵计算子集内剩余特征点的相应投影点；Step S52: after removing the above-mentioned 8 groups of matching points from the subset, calculate the corresponding projection points of the remaining feature points in the subset according to the fundamental matrix;

步骤S53：计算其他特征点与投影点之间误差，若小于设定值则认定为内点。更新内点数目后，重新计算基础矩阵F，并获取新内点。若迭代次数不超过最大值，返回F和内点集合；否则建立模型失败；Step S53: Calculate the error between the other feature points and the projection point, and if it is less than the set value, it is regarded as an interior point. After updating the number of interior points, recalculate the fundamental matrix F and obtain new interior points. If the number of iterations does not exceed the maximum value, return F and the set of interior points; otherwise, the establishment of the model fails;

步骤S54：计算极线距离，若超过设定阈值则说明误差大，周围存在需要被剔除的动态物体。Step S54: Calculate the distance of the epipolar line. If it exceeds the set threshold, it means that the error is large, and there are dynamic objects around that need to be eliminated.

进一步地，步骤S6包含以下步骤：Further, step S6 includes the following steps:

步骤S61：利用四个不共面的虚拟控制点线性加权地将世界坐标系下目标物体上任意标点表示为相机坐标系中的坐标，然后把问题转化到3D-3D；Step S61: Use four non-coplanar virtual control points to linearly weight any punctuation on the target object in the world coordinate system to represent the coordinates in the camera coordinate system, and then convert the problem to 3D-3D;

步骤S62：利用ICP(Iterative Closest Point)算法解得相机位姿参数，包括旋转参数R和平移参数t。Step S62: Use the ICP (Iterative Closest Point) algorithm to solve the camera pose parameters, including the rotation parameter R and the translation parameter t.

步骤S63：由于观测点的噪声影响，估计结果可能存在误差，因此考虑使用BundleAdjustment(BA)对位姿计算结果进行优化。BA优化是一种比较常见的非线性优化方法，同时优化相机位姿和空间点位置，其主要思想是对误差求和后构建最小二乘问题，寻找最优的相机位姿，以使误差项最小。Step S63: Due to the noise influence of the observation point, the estimation result may have errors, so consider using BundleAdjustment (BA) to optimize the pose calculation result. BA optimization is a relatively common nonlinear optimization method, which optimizes the camera pose and spatial point position at the same time. minimum.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

(1)本发明所提出的视觉里程计方案鲁棒性更好。在机器人处于复杂环境的情况下，本发明所提出的视觉里程计方案能够很大程度上避免动态物体的影响，保留静态特征点，有效增加机器人定位及建图准确性。(1) The visual odometry scheme proposed by the present invention has better robustness. When the robot is in a complex environment, the visual odometry scheme proposed by the present invention can largely avoid the influence of dynamic objects, retain static feature points, and effectively increase the accuracy of robot positioning and mapping.

(2)本发明针对ORB特征提取算法结果容易出现特征点密集的情况，提出一种基于网格的自适应特征提取方法，通过该方法能够获得分布更加均匀的特征点，获取更全面的图像信息。(2) The present invention proposes a grid-based adaptive feature extraction method in view of the fact that the results of the ORB feature extraction algorithm are prone to dense feature points, through which feature points with more uniform distribution can be obtained and more comprehensive image information can be obtained. .

(3)本发明针对原系统无法获取环境语义信息的问题，研究了基于YOLACT的实例分割处理方法。为获取环境语义信息，本文提出在原有框架中增加一个实例分割线程，对RGB图像进行分割。YOLACT能够实现对常见家居物品的语义标注，除了用于视觉里程计部分，还可以用于后续语义地图的构建。(3) Aiming at the problem that the original system cannot obtain environmental semantic information, the present invention studies an instance segmentation processing method based on YOLACT. In order to obtain environmental semantic information, this paper proposes to add an instance segmentation thread to the original framework to segment RGB images. YOLACT can realize the semantic annotation of common household items, in addition to being used for the visual odometer part, it can also be used for the construction of subsequent semantic maps.

附图说明Description of drawings

图1为本发明设计的视觉里程计的总体框图。FIG. 1 is an overall block diagram of a visual odometer designed by the present invention.

图2为本发明设计的视觉里程计中跟踪线程的流程图。FIG. 2 is a flow chart of the tracking thread in the visual odometer designed by the present invention.

图3为本发明提出的基于网格自适应阈值特征提取方法示意图。FIG. 3 is a schematic diagram of a grid-based adaptive threshold feature extraction method proposed by the present invention.

图4为本发明构建的YOLACT网络结构图。FIG. 4 is a structural diagram of the YOLACT network constructed by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚、明白，下面结合附图和技术方案对本发明作进一步详细的说明。In order to make the objectives, technical solutions and advantages of the present invention clearer and more comprehensible, the present invention will be described in further detail below with reference to the accompanying drawings and technical solutions.

本发明的目的在于克服现有SLAM系统视觉里程计方案的缺点与不足，提出了一种动态场景下基于ORB-SLAM2的移动机器人视觉里程计算法。具体包括步骤：通过IntelRealSense深度相机获取实时图像信息并对其进行灰度化等预处理；使用基于ORB(Oriented FAST and Rotated BRIEF)算法提出的自适应阈值方法更全面地提取图像特征点；使用MS_COCO数据集作为样本训练YOLACT网络并对图像进行实例分割，从而获取图像语义信息；结合图像语义信息和L-K光流法粗滤除动态特征点；基于渐进一致采样(PROSAC)算法估计基本矩阵F，然后计算极线距离并精滤除动态特征点；最后选取滤除后的关键帧并作为局部建图线程的输入。本发明要点在于结合环境语义信息和几何约束滤除关键帧的动态特征点，从而有效避免周围环境中动态物体的影响，提高了机器人在动态环境下定位及建图的准确率。The purpose of the present invention is to overcome the shortcomings and deficiencies of the existing SLAM system visual odometry scheme, and propose a mobile robot visual odometry calculation method based on ORB-SLAM2 in a dynamic scene. The specific steps include: obtaining real-time image information through the IntelRealSense depth camera and performing preprocessing such as grayscale; using the adaptive threshold method based on the ORB (Oriented FAST and Rotated BRIEF) algorithm to extract image feature points more comprehensively; using MS_COCO The dataset is used as a sample to train the YOLACT network and segment the image to obtain image semantic information; combine the image semantic information and L-K optical flow method to coarsely filter out dynamic feature points; estimate the fundamental matrix F based on the progressive consistent sampling (PROSAC) algorithm, and then Calculate the epipolar distance and finely filter out the dynamic feature points; finally select the filtered keyframe and use it as the input of the local mapping thread. The main point of the present invention is to filter out the dynamic feature points of key frames in combination with environmental semantic information and geometric constraints, thereby effectively avoiding the influence of dynamic objects in the surrounding environment and improving the accuracy of robot positioning and mapping in dynamic environments.

下面对该方法具体步骤进行详细说明。The specific steps of the method are described in detail below.

结合图1、图2，一种动态场景下基于ORB-SLAM2的移动机器人视觉里程计算法，包括以下内容：Combined with Figure 1 and Figure 2, a mobile robot visual odometry calculation method based on ORB-SLAM2 in a dynamic scene, including the following:

步骤S1：采集并预处理图像，具体包含以下步骤。Step S1: collecting and preprocessing images, which specifically includes the following steps.

考虑到SLAM系统需求，选择能够同时获取图像深度信息及彩色信息的IntelRealSense D415作为视觉传感器。使用Intel RealSense D415深度相机采集图像信息，为使系统运算更加简捷，在提取特征点之前先对获取的实时图像进行预处理，包括去除噪点及灰度化等操作。Considering the requirements of the SLAM system, IntelRealSense D415, which can simultaneously obtain image depth information and color information, is selected as the vision sensor. The Intel RealSense D415 depth camera is used to collect image information. In order to make the system operation simpler, the acquired real-time image is preprocessed before extracting feature points, including noise removal and grayscale operations.

步骤S2：提取图像特征点，具体包含以下步骤。Step S2: extracting image feature points, which specifically includes the following steps.

步骤S21：为保证特征点的尺度不变性，使同一幅图像进行比例缩放后仍能匹配相应特征点，首先对输入的图像构建尺度金字塔，在不同尺度图像中计算ORB特征点将相机获取的原始图像作为第0层，逐层按比例因子缩小，直至金字塔顶层，金字塔总层数nlevels设置为8，则缩小后的图像为：Step S21: In order to ensure the scale invariance of the feature points, so that the same image can still match the corresponding feature points after scaling, first build a scale pyramid for the input image, calculate the ORB feature points in different scale images, and use the original image obtained by the camera. As the 0th layer, the image is scaled down layer by layer until it reaches the top level of the pyramid, and the total number of pyramid layers nlevels is set to 8, then the reduced image is:

其中，I^k是金字塔第k层的图像尺寸，I是相机获取的原始图像，scaleFactor是缩放的比例因子，设置为1.2，k表示金字塔层数，取值范围是[1，nlevels-1]；Among them, I^k is the image size of the k-th layer of the pyramid, I is the original image obtained by the camera, scaleFactor is the scaling factor, set to 1.2, k is the number of pyramid layers, and the value range is [1, nlevels-1];

每层图像上提取的特征点数目为：The number of feature points extracted on each layer of image is:

其中，N是拟提取ORB特征点的总数，DesiredC_i是金字塔第i层的特征点期望提取数目，i的取值范围是[0,n-1]，n是金字塔总层数，通常设置为8，InvSF是尺度因子scaleFactor的倒数；Among them, N is the total number of ORB feature points to be extracted, DesiredC_i is the expected number of feature points to be extracted at the i-th layer of the pyramid, the value range of i is [0,n-1], and n is the total number of layers in the pyramid, usually set as 8, InvSF is the reciprocal of the scale factor scaleFactor;

步骤S22：为保证所提取特征点分布更加均匀合理，从而能够获取更全面的信息，提出一种划分网格的自适应阈值特征点提取算法，示意图如图3所示。将金字塔每层图像划分为网格，网格列数为Cols_i，行数为Rows_i，计算如下：Step S22: In order to ensure that the distribution of the extracted feature points is more uniform and reasonable, so that more comprehensive information can be obtained, an adaptive threshold feature point extraction algorithm for dividing a grid is proposed, as shown in FIG. 3 . Divide the image of each layer of the pyramid into grids, the number of grid columns is Cols_i , and the number of rows is Rows_i , the calculation is as follows:

其中，t为网格划分系数，当t减小时网格总数增多；IRat_i表示第i层图像所划分的网格行列数之比，即：IRat_i＝Rows/Cols。Among them, t is the grid division coefficient, and the total number of grids increases when t decreases; IRat_i represents the ratio of the number of grid rows and columns divided by the image of the i-th layer, namely: IRat_i =Rows/Cols.

并设定每个网格内预提取特征点的数目cDesC_i和初始FAST角点提取阈值iniTh，计算如下：And set the number of pre-extracted feature points cDesC_i in each grid and the initial FAST corner extraction threshold iniTh, calculated as follows:

其中，I(x)是图像中某像素点的灰度值，κ为所有像素点的平均灰度值，sp表示图像上所包含像素点总数。Among them, I(x) is the gray value of a pixel in the image, κ is the average gray value of all pixels, and sp represents the total number of pixels contained in the image.

步骤S3：获取图像语义信息，具体包含以下步骤。Step S3: acquiring image semantic information, which specifically includes the following steps.

步骤S31：YOLACT网络总体结构图如图4所示，主干网络部分使用ResNet101，其卷积模块共有5个，包括conv1，conv2_x到conv5_x，分别对应图4中的C1到C5，主要负责完成图像的特征提取；Step S31: The overall structure of the YOLACT network is shown in Figure 4. The backbone network uses ResNet101. There are 5 convolution modules, including conv1, conv2_x to conv5_x, corresponding to C1 to C5 in Figure 4, which are mainly responsible for completing the image processing. feature extraction;

步骤S32：图4中的P3到P7是FPN(Feature Pyramid Networks)网络，生成多尺度特征图，保证能够检测不同尺寸的目标物体。首先由C5经过一个卷积层得到P5，然后对P5用双线性插值的方法将特征图扩大一倍，与经过卷积的C4相加得到P4，再通过相同操作得到P3。除了向下变换，对P5进行卷积和下采样后生成P6，对P6操作相同后得到P7，自此完整地建立了FPN网络，产生了不同尺度的特征图，特征更加丰富，更加有利于分割不同尺寸的目标物体；Step S32: P3 to P7 in FIG. 4 are FPN (Feature Pyramid Networks) networks, which generate multi-scale feature maps to ensure that target objects of different sizes can be detected. First, P5 is obtained from C5 through a convolutional layer, and then the feature map is doubled by bilinear interpolation for P5, and P4 is obtained by adding the convolutional C4, and then P3 is obtained by the same operation. In addition to down-conversion, P5 is convolved and down-sampled to generate P6, and P7 is obtained after the same operation on P6. Since then, the FPN network has been completely established, and feature maps of different scales have been generated, with richer features and more conducive to segmentation. Target objects of different sizes;

步骤S33：构建Protonet分支，用于生成原型掩膜。该分支的输入为P3，中间由若干卷积层组成，最后的输出为k个通道，每个通道表示一个维度为138×138的原型掩膜，可以通过该掩膜提取待处理图像中重要的部分；Step S33: Build a Protonet branch for generating a prototype mask. The input of this branch is P3, which consists of several convolutional layers in the middle, and the final output is k channels, each channel represents a prototype mask with a dimension of 138 × 138, which can be used to extract important information in the image to be processed. part;

步骤S34：构建Prediction Head分支，用于生成掩模系数。为使分割的实时性较好使用了共享卷积网络，该步骤与S33同步进行；Step S34: Build a Prediction Head branch for generating mask coefficients. In order to make the segmentation more real-time, a shared convolutional network is used, and this step is performed synchronously with S33;

MS_COCO数据集是微软在2014年发布的开源数据集，其中包含了91类日常生活中较为常见的物体及场景，且以JSON文件的形式存储了目标实例(object instances)、目标关键点(object keypoints)和图像说明(image captions)三种标注类型。自定义其中常见的18类室内家居物品作为训练样本。The MS_COCO dataset is an open source dataset released by Microsoft in 2014. It contains 91 types of common objects and scenes in daily life, and stores object instances and object keypoints in the form of JSON files. ) and image captions. Customize 18 common indoor household items as training samples.

步骤S4：进行运动一致性检测，具体包含以下步骤。Step S4: Perform motion consistency detection, which specifically includes the following steps.

步骤S41：通过L-K光流法可跟踪计算出中动态物体的像素速度，结合运算结果和语义信息对该物体的动态性进行分析，若该目标与背景存在相对运动，则将当前帧该物体所包含的特征点剔除；若物体速度矢量较上一帧变化不大，则保留相关特征点，至此完成动态特征点的粗滤除。Step S41: The pixel speed of the dynamic object can be tracked and calculated by the L-K optical flow method, and the dynamicity of the object is analyzed in combination with the operation result and the semantic information. The included feature points are eliminated; if the object velocity vector does not change much from the previous frame, the relevant feature points are retained, and the rough filtering of dynamic feature points is completed.

步骤S5：匹配特征点，具体包含以下步骤。Step S5: matching feature points, which specifically includes the following steps.

步骤S51：计算两帧图像每个特征点对的最小欧氏距离d_min、次小欧氏距离d_min2和评价函数值q(u_x)，评价函数计算如下：Step S51: Calculate the minimum Euclidean distance d_min , the second-smallest Euclidean distance d_min2 and the evaluation function value q(u_x ) of each feature point pair of the two frames of images, and the evaluation function is calculated as follows:

其含义是若集合u_N含有N个数据点，则该集合内数据点满足：The implication is that if the set u_N contains N data points, then the data points in the set satisfy:

定性地说，评价函数值q(u_x)大的数据点在集合中排序更靠前。按评价函数值的大小将特征点降序排列，每八个特征点为一组计算每组质量和并排序，选取匹配质量最高的8组匹配点并计算基础矩阵F；Qualitatively speaking, data points with larger merit function values q(u_x ) are ranked higher in the set. Arrange the feature points in descending order according to the value of the evaluation function, calculate the quality sum of each group for each group of eight feature points and sort them, select the 8 groups of matching points with the highest matching quality and calculate the basic matrix F;

步骤S52：将上述8组匹配点从子集中剔除后，根据基础矩阵F计算子集内剩余特征点的相应投影点；Step S52: after removing the above-mentioned 8 groups of matching points from the subset, calculate the corresponding projection points of the remaining feature points in the subset according to the fundamental matrix F;

步骤S53：计算其他特征点与投影点之间误差e，若小于最大误差值ε则认定该点为内点。更新内点数目后，重新计算基础矩阵F，并获取新内点。若迭代次数不超过最大值，返回基础矩阵F和内点集合。否则建立模型失败；Step S53: Calculate the error e between the other feature points and the projection point, and if it is less than the maximum error value ε, the point is determined to be an interior point. After updating the number of interior points, recalculate the fundamental matrix F and obtain new interior points. If the number of iterations does not exceed the maximum value, the fundamental matrix F and the set of interior points are returned. Otherwise, building the model fails;

步骤S54：根据基础矩阵F计算极线距离，相机成像平面的投影点为p₁，它在当前帧成像平面的匹配点为p₂，则p₂的归一化坐标x₂到极线l的距离为：Step S54: Calculate the epipolar distance according to the fundamental matrix F, the projection point of the camera imaging plane is p₁ , and its matching point in the imaging plane of the current frame is p₂ , then the normalized coordinate of p₂ x₂ to the epipolar line 1 . The distance is:

其中，x₁是p₁的归一化坐标，[A B C]^T是极线l的表达式系数组成的向量。where x₁ is the normalized coordinate of p₁ and [ABC]^T is a vector of expression coefficients for polar line l.

若计算结果超过设定阈值θ则说明误差大，相应的三维世界点周围存在需要被剔除的动态物体。阈值θ计算如下：If the calculation result exceeds the set threshold θ, it means that the error is large, and there are dynamic objects that need to be eliminated around the corresponding three-dimensional world point. The threshold θ is calculated as follows:

步骤S6：估计机器人初始位姿，具体包含以下步骤。Step S6: Estimate the initial pose of the robot, which specifically includes the following steps.

步骤S61：利用四个不共面的虚拟控制点线性加权地将世界坐标系下目标物体上任意标点表示为相机坐标系中的坐标，然后得到一组匹配3D点，从而把问题转化到3D-3D；Step S61: Use four non-coplanar virtual control points to linearly weight any punctuation on the target object in the world coordinate system to represent the coordinates in the camera coordinate system, and then obtain a set of matching 3D points, thereby transforming the problem into 3D- 3D;

步骤S62：利用ICP(Iterative Closest Point)算法解得相机位姿参数，即能使误差平方和达到极小的旋转参数R和平移参数t。Step S62 : using the ICP (Iterative Closest Point) algorithm to solve the camera pose parameters, that is, the rotation parameter R and the translation parameter t can be minimized for the sum of squared errors.

以上所述实施例，仅为本发明的具体实施方式，用以说明本发明的技术方案，而非对其限制，本发明的保护范围并不局限于此，尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应所述以权利要求的保护范围为准。The above-mentioned embodiments are only specific implementations of the present invention, and are used to illustrate the technical solutions of the present invention, but not to limit them. Detailed description, those of ordinary skill in the art should understand: any person skilled in the art is within the technical scope disclosed by the present invention, and it can still modify the technical solutions described in the foregoing embodiments or can easily think of changes, Or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A mobile robot vision odometer design method under a dynamic scene is characterized by comprising the following steps:

step S1: acquiring real-time image information and preprocessing the real-time image information;

step S2: extracting ORB characteristic points of the image by using a self-adaptive threshold method based on grid division;

step S3: building a YOLACT network, and performing instance segmentation on the image after training is completed by adopting an MS _ COCO data set as a sample, so as to obtain image semantic information;

step S4: detecting the motion consistency by combining image semantic information and an L-K optical flow method, and roughly filtering out dynamic feature points;

step S5: matching the characteristic points of the two frames of images based on a PROSAC method, and estimating a basic matrix;

step S6: and calculating polar line distance according to the basic matrix, finely filtering out dynamic characteristic points, estimating the initial pose of the robot according to the selected key frame, and performing Bundle Adjustment optimization on the result.

2. The method as claimed in claim 1, wherein the step S1 is performed by using an Intel RealSense depth camera to obtain real-time image information, and the preprocessing includes noise reduction and graying.

3. The method for calculating the visual mileage of a mobile robot under a dynamic scene of claim 1, wherein step S2 comprises the following steps:

step S21: constructing a scale pyramid for an input image, and calculating ORB characteristic points in images with different scales;

step S22: dividing each layer of image of the pyramid into grids with corresponding column number and line number, and setting the number of pre-extracted feature points in each grid and an initial FAST corner extraction threshold;

step S23: after the initial feature point pre-extraction is completed according to the initial threshold, if the number of the actually extracted feature points in the grid is smaller than the set pre-extraction number, the threshold is changed to continue the extraction, and the above processes are repeated until the self-adaptive extraction of the feature points in the grid is completed.

4. The method for calculating the visual mileage of a mobile robot under a dynamic scene as claimed in claim 1, wherein step S3 comprises the following steps:

step S31: constructing a YOLACT backbone network part by using a ResNet101 convolution module, wherein the YOLACT backbone network part is mainly responsible for completing the feature extraction of an image;

step S32: constructing an FPN network, generating a multi-scale characteristic diagram, and ensuring that target objects with different sizes can be detected;

step S33: constructing a Protonet branch, generating a prototype mask, and extracting an interested part in an image to be processed through the mask;

step S34: constructing a Prediction Head branch for generating mask coefficients, using a shared convolution network, the step being performed in synchronization with S33;

step S35: the YOLACT network was trained using 18 types of indoor household goods common in the MS _ COCO dataset as training samples.

5. The method for calculating the visual mileage of the mobile robot under the dynamic scene of claim 4, wherein the MS _ COCO data set stores three annotation types of a target instance, a target key point and an image description in a JSON file form.

6. The method for calculating the visual mileage of the mobile robot in the dynamic scene according to claim 1, wherein the step S4 is specifically:

the pixel speed of a dynamic object can be tracked and calculated by an L-K optical flow method, the dynamic property of the object is analyzed by combining an operation result and semantic information, and if the target and the background move relatively, the feature points contained in the object of the current frame are removed; if the speed vector of the object does not exceed the threshold value compared with the change of the last frame, the related characteristic points are reserved, and the rough filtering and the elimination of the dynamic characteristic points are completed.

7. The method for calculating the visual mileage of a mobile robot under a dynamic scene as claimed in claim 1, wherein step S5 comprises the following steps:

step S51: calculating the minimum Euclidean distance and the evaluation function value of each feature point pair of the two frames of images, and arranging the feature points in a descending order according to the size of the evaluation function value; calculating the quality of each group and sequencing every eight characteristic points as a group, selecting 8 groups of matching points with the highest matching quality and calculating a basic matrix F;

step S52: after the 8 groups of matching points are removed from the subset, calculating corresponding projection points of the residual characteristic points in the subset according to the basic matrix;

step S53: calculating errors between other characteristic points and the projection points, and if the errors are smaller than a set value, determining the errors as interior points; after updating the number of the interior points, recalculating the basic matrix F and acquiring new interior points; if the iteration times do not exceed the maximum value, returning to F and the inner point set; otherwise, the model building fails;

step S54: and calculating the epipolar line distance, wherein if the epipolar line distance exceeds a threshold value, the error is large, and dynamic objects needing to be eliminated exist around the epipolar line distance.

8. The method for calculating the visual mileage of a mobile robot under a dynamic scene as claimed in claim 1, wherein step S6 comprises the following steps:

step S61: expressing any punctuations on a target object under a world coordinate system as coordinates in a camera coordinate system in a linear weighted manner by utilizing four virtual control points which are not coplanar, and then converting the problem into 3D-3D;

step S62: resolving camera pose parameters by using an ICP (inductively coupled plasma) algorithm, wherein the camera pose parameters comprise a rotation parameter R and a translation parameter t;

step S63: and optimizing the posture calculation result by using BA.

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements a mobile robot visual odometer design method in a dynamic scenario as claimed in any one of claims 1-8.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a mobile robot visual odometer designing method in a dynamic scenario, as set forth in any one of claims 1-8.