Movatterモバイル変換


[0]ホーム

URL:


CN114612494B - A design method for visual odometry of mobile robots in dynamic scenes - Google Patents

A design method for visual odometry of mobile robots in dynamic scenes
Download PDF

Info

Publication number
CN114612494B
CN114612494BCN202210242660.5ACN202210242660ACN114612494BCN 114612494 BCN114612494 BCN 114612494BCN 202210242660 ACN202210242660 ACN 202210242660ACN 114612494 BCN114612494 BCN 114612494B
Authority
CN
China
Prior art keywords
feature points
image
dynamic
points
mobile robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210242660.5A
Other languages
Chinese (zh)
Other versions
CN114612494A (en
Inventor
樊卫华
刘佳祺
宋辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and TechnologyfiledCriticalNanjing University of Science and Technology
Priority to CN202210242660.5ApriorityCriticalpatent/CN114612494B/en
Publication of CN114612494ApublicationCriticalpatent/CN114612494A/en
Application grantedgrantedCritical
Publication of CN114612494BpublicationCriticalpatent/CN114612494B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种动态场景下的移动机器人视觉里程计设计方法,包括:通过Intel RealSense深度相机获取实时图像信息并对其进行预处理;使用基于ORB算法提出的自适应阈值方法更全面地提取图像特征点;使用MS_COCO数据集作为样本训练YOLACT网络并对图像进行实例分割,从而获取图像语义信息;结合图像语义信息和L‑K光流法粗滤除动态特征点;基于渐进一致采样算法估计基本矩阵F,然后计算极线距离并精滤除动态特征点;最后选取滤除后的关键帧并作为局部建图线程的输入。本发明结合环境语义信息和几何约束滤除关键帧的动态特征点,从而有效避免周围环境中动态物体的影响,提高了机器人在动态环境下定位及建图的准确率。

The present invention discloses a design method of a mobile robot visual odometer in a dynamic scene, including: obtaining real-time image information through an Intel RealSense depth camera and preprocessing it; using an adaptive threshold method proposed based on an ORB algorithm to more comprehensively extract image feature points; using the MS_COCO data set as a sample to train a YOLACT network and perform instance segmentation on the image, thereby obtaining image semantic information; combining image semantic information and the L-K optical flow method to coarsely filter out dynamic feature points; estimating a basic matrix F based on a progressive consistent sampling algorithm, then calculating the epipolar distance and finely filtering out dynamic feature points; finally selecting the filtered key frame and using it as the input of a local mapping thread. The present invention combines environmental semantic information and geometric constraints to filter out the dynamic feature points of the key frame, thereby effectively avoiding the influence of dynamic objects in the surrounding environment, and improving the accuracy of the robot's positioning and mapping in a dynamic environment.

Description

Mobile robot vision odometer design method under dynamic scene
Technical Field
The invention belongs to the technical field of synchronous drawing and positioning SLAM of robots, and particularly relates to a design method of a visual odometer of a mobile robot in a dynamic scene.
Background
The market function requirements of the mobile robots are more and more abundant, related researches become hot spot problems in recent years, and the mobile robots have wide development prospects. For mobile robots, autonomous navigation capability is an important basis for realizing various advanced application functions and algorithm technologies, and positioning cognitive capability of the robots on the pose states of the robots and exploration cognitive capability of unknown environments are preconditions for realizing autonomous navigation and path planning, so that simultaneous positioning and map construction (Simultaneous Localization AND MAPPING, SLAM) technology for solving the two key problems has positive scientific research value and practical application significance.
With the improvement of the computing power of a hardware platform and the development of related technologies of computer vision, the vision SLAM system breaks through a plurality of difficulties, a certain research result is obtained under the effort of a plurality of researchers, a mature system framework is formed, but the problem that the final positioning and mapping effect is influenced still exists in the practical application process, and the problem is still to be further optimized. Most of the schemes proposed at present are based on the premise of stable and static environment, but a plurality of uncontrollable factors such as dynamic barriers, illumination changes, mutual overlapping and shielding of all objects exist in the real environment, so that a final result generates larger deviation, and the positioning and mapping precision and robustness of the robot are relatively poor in the complex environment. In order to eliminate the application scene limitation of the system, the invention optimizes and improves the ORB-SLAM2 of the visual SLAM algorithm framework which is developed relatively mature at present, and provides a visual odometer design method combining semantic information and geometric constraint aiming at the problem of dynamic scenes.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the conventional SLAM system visual odometer scheme and provides a mobile robot visual odometer design method under a dynamic scene combining semantic information and geometric constraint, wherein the method has high map building precision, strong robustness and wide adaptability.
The technical scheme for realizing the purpose of the invention is that the visual odometer design method based on ORB-SLAM2 in a dynamic scene comprises the following steps:
step S1, a INTEL REALSENSE depth camera acquires real-time image information and carries out preprocessing such as graying;
S2, extracting image ORB characteristic points by using a self-adaptive threshold method based on grid division;
step S3, constructing YOLACT a network, performing example segmentation on the image after training by taking an MS_COCO data set as a sample, thereby acquiring image semantic information;
S4, detecting motion consistency by combining image semantic information and an L-K optical flow method, and roughly filtering dynamic feature points;
step S5, matching the characteristic points of the two frames of images based on PROSAC method, and estimating a basic matrix F;
And S6, calculating the polar line distance according to the basic matrix F, further finely filtering dynamic characteristic points, estimating the initial pose of the robot according to the selected key frame, and optimizing the result Bundle Adjustment.
Further, step S1 comprises the steps of:
Step S11, selecting INTEL REALSENSE D depth camera capable of simultaneously acquiring image depth information and color information by the vision sensor part, and preprocessing the acquired real-time image before extracting the characteristic points to accelerate the system operation speed, wherein the operations comprise removing image noise points, graying and the like.
Further, step S2 includes the steps of:
in step S21, in order to ensure the scale invariance of the feature points, the same image can still be matched with the corresponding feature points after scaling, firstly, a scale pyramid is constructed on the input image, and ORB feature points are calculated in images with different scales. The pyramid takes the original image acquired by the camera as a 0 th layer, and the original image is reduced layer by layer according to the scale factors until reaching the top layer of the pyramid.
Step S22, in order to ensure that the distribution of the extracted characteristic points is more uniform and reasonable, so that more comprehensive information can be obtained, each layer of image of the pyramid is divided into grids with a certain number of rows and columns, and the number of pre-extracted characteristic points in each grid and an initial FAST corner extraction threshold value are set;
Step S23, after the first feature point pre-extraction is completed according to the initial threshold, if the number of the feature points actually extracted in the grid is smaller than the set pre-extraction number, changing the threshold to continue the extraction, and repeating the above processes until the self-adaptive extraction of the feature points in the grid is completed.
Further, step S3 includes the steps of:
S31, constructing YOLACT a main network part by using a ResNet-101 convolution module, wherein the main network part is mainly responsible for finishing the feature extraction of the image;
S32, constructing Feature Pyramid Networks (FPN) network, wherein the aim is to generate a multi-scale characteristic diagram so as to ensure that target objects with different sizes can be detected;
step S33, constructing Protonet branches for generating a prototype mask and extracting important parts in the image to be processed;
Step S34, constructing a Prediction Head branch for generating mask coefficients, wherein a shared convolution network is used for better real-time segmentation, and the step is performed synchronously with the step S33;
step S35, training the YOLACT network by using 18 types of indoor household articles which are common in the MS_ COCO (Microsoft Common Objects in Context) dataset as training samples.
Further, step S4 includes the steps of:
Step S41, the pixel speed of the dynamic object can be tracked and calculated through an L-K optical flow method, the dynamic property of the object is analyzed by combining the operation result and semantic information, if the object and the background have relative motion, characteristic points contained in the object in the current frame are removed, if the object speed vector is smaller than a threshold value in the previous frame, relevant characteristic points are reserved, and rough filtering of the dynamic characteristic points is completed.
Further, step S5 includes the steps of:
and S51, calculating the minimum Euclidean distance and the evaluation function value of each characteristic point pair of the two frames of images, and arranging the characteristic points in a descending order according to the magnitude of the evaluation function value. Each eight feature points are used for calculating the quality of each group and arranging the quality of each group, 8 groups of matching points with highest matching quality are selected, and a basic matrix F is calculated;
Step S52, after the 8 groups of matching points are removed from the subset, corresponding projection points of the residual characteristic points in the subset are calculated according to the basic matrix;
and step S53, calculating errors between other characteristic points and the projection points, and if the errors are smaller than the set value, determining the errors as inner points. After updating the number of interior points, the basic matrix F is recalculated, and new interior points are acquired. If the iteration number does not exceed the maximum value, returning to the F and the interior point set, otherwise, failing to build the model;
step S54, calculating the polar distance, if the polar distance exceeds a set threshold value, indicating that the error is large, and surrounding dynamic objects needing to be removed exist.
Further, step S6 includes the steps of:
Step S61, using four non-coplanar virtual control points to linearly and weighted represent any punctuation on a target object in a world coordinate system as coordinates in a camera coordinate system, and then converting the problem into 3D-3D;
and step S62, the camera pose parameters including the rotation parameters R and the translation parameters t are obtained by utilizing ICP (Iterative Closest Point) algorithm.
In step S63, the estimation result may have errors due to the noise influence of the observation point, so that the pose calculation result is considered to be optimized by Bundle Adjustment (BA). The BA optimization is a relatively common nonlinear optimization method, and simultaneously optimizes the pose of the camera and the position of a space point, and the main idea is to build a least square problem after summing errors, and find the optimal pose of the camera so as to minimize an error term.
Compared with the prior art, the invention has the beneficial effects that:
(1) The visual odometer scheme provided by the invention has better robustness. Under the condition that the robot is in a complex environment, the visual odometer scheme provided by the invention can avoid the influence of dynamic objects to a great extent, keep static characteristic points and effectively increase the positioning and map building accuracy of the robot.
(2) Aiming at the condition that feature points are easy to be dense in the ORB feature extraction algorithm result, the invention provides a grid-based self-adaptive feature extraction method, and the method can obtain more uniformly distributed feature points and acquire more comprehensive image information.
(3) Aiming at the problem that the original system cannot acquire the environment semantic information, the invention researches an instance segmentation processing method based on YOLACT. In order to acquire the environment semantic information, an example segmentation thread is added in the original framework to segment the RGB image. YOLACT can be used for semantic labeling of common household articles, and can be used for constructing subsequent semantic maps besides a visual odometer part.
Drawings
Fig. 1 is a general block diagram of a visual odometer of the present invention.
FIG. 2 is a flow chart of a trace thread in a visual odometer of the present invention.
Fig. 3 is a schematic diagram of a feature extraction method based on grid adaptive threshold according to the present invention.
Fig. 4 is a diagram of a YOLACT network constructed in accordance with the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.
The invention aims to overcome the defects and shortcomings of the conventional SLAM system visual odometer scheme and provides an ORB-SLAM 2-based mobile robot visual odometer method in a dynamic scene. The method comprises the steps of obtaining real-time image information through a INTEL REALSENSE depth camera and carrying out preprocessing such as graying on the real-time image information, extracting image feature points more comprehensively by using a self-adaptive threshold method proposed based on ORB (Oriented FAST and Rotated BRIEF) algorithm, training YOLACT network by using MS_COCO data set as a sample and carrying out example segmentation on the image so as to obtain image semantic information, roughly filtering dynamic feature points by combining the image semantic information and an L-K optical flow method, estimating a basic matrix F based on a progressive consistent sampling (PROSAC) algorithm, calculating epipolar distances and finely filtering the dynamic feature points, and finally selecting filtered key frames and taking the filtered key frames as input of a local map building thread. The key point of the invention is to filter the dynamic characteristic points of the key frames by combining the environment semantic information and the geometric constraint, thereby effectively avoiding the influence of dynamic objects in the surrounding environment and improving the accuracy of positioning and mapping of the robot in the dynamic environment.
Specific steps of the method are described in detail below.
Referring to fig. 1 and 2, a mobile robot vision mileage calculation method based on ORB-SLAM2 in a dynamic scene includes the following steps:
and S1, acquiring and preprocessing an image, wherein the method specifically comprises the following steps.
INTEL REALSENSE D415 to be able to acquire image depth information and color information simultaneously is selected as a vision sensor in consideration of SLAM system requirements. Image information is acquired by using INTEL REALSENSE D depth camera, and the acquired real-time image is preprocessed before feature points are extracted, including noise removal, graying and other operations, in order to make system operation simpler.
And S2, extracting image characteristic points, wherein the method specifically comprises the following steps.
In order to ensure the scale invariance of the feature points, the same image can still be matched with the corresponding feature points after scaling, firstly, a scale pyramid is constructed on the input image, the ORB feature points are calculated in the images of different scales, the original image obtained by the camera is used as a 0 th layer, the original image is scaled down layer by layer according to the scaling factors until the pyramid top layer, the total pyramid layer number nlevels is set to 8, and then the scaled image is:
Wherein, Ik is the image size of the k layer of the pyramid, I is the original image acquired by the camera, scaleF actor is the scaling factor, which is set to 1.2, k represents the number of pyramid layers, and the value range is [1, nlevels-1];
The number of the extracted characteristic points on each layer of image is as follows:
Wherein N is the total number of ORB feature points to be extracted, desiredCi is the expected extraction number of feature points of the ith layer of the pyramid, the value range of i is [0, N-1], N is the total layer number of the pyramid, generally set to 8, and InvSF is the reciprocal of scale factor scaleF actor;
In step S22, in order to ensure that the extracted feature points are distributed more uniformly and reasonably, so that more comprehensive information can be obtained, a self-adaptive threshold feature point extraction algorithm for dividing grids is provided, and a schematic diagram is shown in FIG. 3. Each layer of image of the pyramid is divided into grids, the number of grid columns is Colsi, the number of lines is Rowsi, and the calculation is as follows:
wherein t is a grid division coefficient, and the total number of grids is increased when t is reduced, IRati represents the ratio of grid Rows and columns divided by the ith layer image, namely IRati =Rows/Cols.
And setting the number cDesCi of pre-extraction feature points in each grid and an initial FAST corner extraction threshold iniTh, and calculating as follows:
Wherein I (x) is a gray value of a certain pixel in the image, κ is an average gray value of all pixels, and sp represents the total number of pixels included in the image.
Step S23, after the first feature point pre-extraction is completed according to the initial threshold, if the number of the feature points actually extracted in the grid is smaller than the set pre-extraction number, changing the threshold to continue the extraction, and repeating the above processes until the self-adaptive extraction of the feature points in the grid is completed.
And step S3, acquiring image semantic information, wherein the method specifically comprises the following steps.
Step S31: YOLACT the overall structure of the network is shown in FIG. 4, the main network part uses ResNet101, the total number of convolution modules is 5, and the convolution modules comprise conv1, conv2_x to conv5_x, which correspond to C1 to C5 in FIG. 4 respectively and are mainly responsible for finishing the feature extraction of the image;
In step S32, P3 to P7 in FIG. 4 are FPN (Feature Pyramid Networks) networks, and a multi-scale characteristic diagram is generated, so that different-size target objects can be detected. Firstly, P5 is obtained from C5 through a convolution layer, then, the characteristic diagram is doubled by a bilinear interpolation method for P5, P4 is obtained by adding the characteristic diagram and the convolved C4, and then P3 is obtained through the same operation. Except for down conversion, P6 is generated after the P5 is rolled and down sampled, P7 is obtained after the P6 is operated identically, an FPN network is built completely, feature graphs with different dimensions are generated, features are more abundant, and the segmentation of target objects with different dimensions is facilitated;
Step S33, constructing Protonet branches for generating a prototype mask. The input of the branch is P3, the middle of the branch is composed of a plurality of convolution layers, the final output is k channels, each channel represents a prototype mask with 138 x 138 dimensions, and important parts in the image to be processed can be extracted through the mask;
And step S34, constructing a Prediction Head branch for generating mask coefficients. In order to make the real-time performance of the segmentation better, a shared convolution network is used, and the step is performed synchronously with the step S33;
step S35, training the YOLACT network by using 18 types of indoor household articles which are common in the MS_ COCO (Microsoft Common Objects in Context) dataset as training samples.
The MS_COCO data set is an open source data set published by Microsoft in 2014, comprises 91 types of objects and scenes which are common in daily life, and stores three annotation types of a target instance (object instances), a target key point (object keypoints) and an image description (image captions) in the form of a JSON file. The common 18 types of indoor household articles are customized to serve as training samples.
And S4, detecting motion consistency, which specifically comprises the following steps.
Step S41, the pixel speed of the dynamic object can be tracked and calculated through an L-K optical flow method, the dynamic property of the object is analyzed by combining the operation result and semantic information, if the object and the background have relative motion, characteristic points contained in the object in the current frame are removed, and if the speed vector of the object is less changed than that of the previous frame, relevant characteristic points are reserved, so that rough filtering of the dynamic characteristic points is completed.
And S5, matching the characteristic points, wherein the method specifically comprises the following steps.
Step S51, calculating a minimum Euclidean distance dmin, a second minimum Euclidean distance dmin2 and an evaluation function value q (ux) of each feature point pair of the two frames of images, wherein the evaluation function is calculated as follows:
The meaning is that if set uN contains N data points, then the data points within the set satisfy:
Qualitatively, data points with a large evaluation function value q (ux) are ranked higher in the set. Arranging the characteristic points in descending order according to the evaluation function value, calculating the quality of each group and arranging the characteristic points side by side for each group, selecting 8 groups of matching points with highest matching quality, and calculating a basic matrix F;
Step S52, after the 8 groups of matching points are removed from the subset, corresponding projection points of the residual characteristic points in the subset are calculated according to the basic matrix F;
And step S53, calculating the error e between other characteristic points and the projection point, and if the error e is smaller than the maximum error epsilon, determining the point as an inner point. After updating the number of interior points, the basic matrix F is recalculated, and new interior points are acquired. And if the iteration number does not exceed the maximum value, returning to the basic matrix F and the inner point set. Otherwise, the model establishment fails;
Step S54, calculating the polar line distance according to the basic matrix F, wherein the projection point of the imaging plane of the camera is p1, the matching point of the imaging plane of the current frame is p2, and the distance from the normalized coordinate x2 of p2 to the polar line l is:
where x1 is the normalized coordinates of p1, [ A B C ]T is a vector of expression coefficients of the epipolar line l.
If the calculated result exceeds the set threshold value theta, the error is large, and dynamic objects needing to be removed exist around the corresponding three-dimensional world points. The threshold θ is calculated as follows:
And S6, estimating the initial pose of the robot, which specifically comprises the following steps.
Step S61, using four non-coplanar virtual control points to linearly and weighted represent any punctuation on a target object under a world coordinate system as coordinates in a camera coordinate system, and then obtaining a group of matching 3D points, so as to convert the problem into 3D-3D;
And S62, solving the pose parameters of the camera by utilizing ICP (Iterative Closest Point) algorithm, namely enabling the square sum of errors to reach the minimum rotation parameter R and translation parameter t.
In step S63, the estimation result may have errors due to the noise influence of the observation point, so that the pose calculation result is considered to be optimized by Bundle Adjustment (BA). The BA optimization is a relatively common nonlinear optimization method, and simultaneously optimizes the pose of the camera and the position of a space point, and the main idea is to build a least square problem after summing errors, and find the optimal pose of the camera so as to minimize an error term.
The foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the foregoing embodiments, but it should be understood by those skilled in the art that any modification, variation or substitution of some technical features described in the foregoing embodiments may be made without departing from the spirit and scope of the technical solutions of the embodiments of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

Translated fromChinese
1.一种动态场景下的移动机器人视觉里程计设计方法,其特征在于,包括以下步骤:1. A method for designing a visual odometer for a mobile robot in a dynamic scene, characterized by comprising the following steps:步骤S1:获取实时图像信息并进行预处理;Step S1: acquiring real-time image information and performing preprocessing;步骤S2:使用基于网格划分的自适应阈值方法提取图像ORB特征点,具体包含以下步骤:Step S2: Extracting ORB feature points of the image using an adaptive threshold method based on grid division, specifically including the following steps:步骤S21:对输入的图像构建尺度金字塔,在不同尺度图像中计算ORB特征点;Step S21: construct a scale pyramid for the input image and calculate ORB feature points in images of different scales;步骤S22:将金字塔每层图像划分为相应列数和行数的网格,并设定每个网格内预提取特征点的数目和初始FAST角点提取阈值;Step S22: Divide each layer of the pyramid image into grids with corresponding numbers of columns and rows, and set the number of pre-extracted feature points in each grid and the initial FAST corner point extraction threshold;步骤S23:按初始阈值完成首次特征点预提取后,如果网格内实际提取特征点数目小于设定的预提取数目,则更改阈值继续提取,重复步骤S21-S23直至完成网格内特征点的自适应提取;Step S23: After the first feature point pre-extraction is completed according to the initial threshold, if the number of feature points actually extracted in the grid is less than the set pre-extraction number, the threshold is changed to continue the extraction, and steps S21-S23 are repeated until the adaptive extraction of feature points in the grid is completed;步骤S3:搭建YOLACT网络,采用MS_COCO数据集为样本完成训练后对图像进行实例分割,从而获取图像语义信息;具体包含以下步骤:Step S3: Build the YOLACT network, use the MS_COCO dataset as a sample to complete the training and perform instance segmentation on the image to obtain the image semantic information; specifically, it includes the following steps:步骤S31:使用ResNet101卷积模块构建YOLACT主干网络部分,主要负责完成图像的特征提取;Step S31: Use the ResNet101 convolution module to build the YOLACT backbone network, which is mainly responsible for completing the feature extraction of the image;步骤S32:构建FPN网络,生成多尺度特征图,保证能够检测不同尺寸的目标物体;Step S32: construct an FPN network and generate a multi-scale feature map to ensure that target objects of different sizes can be detected;步骤S33:构建Protonet分支,生成原型掩膜,通过该掩膜提取待处理图像中感兴趣的部分;Step S33: construct a Protonet branch to generate a prototype mask, through which the part of interest in the image to be processed is extracted;步骤S34:构建Prediction Head分支,用于生成掩模系数,使用共享卷积网络,该步骤与S33同步进行;Step S34: construct a Prediction Head branch for generating mask coefficients, using a shared convolutional network. This step is performed simultaneously with S33.步骤S35:使用MS_COCO数据集中常见的18类室内家居物品作为训练样本对YOLACT网络进行训练;Step S35: Use 18 types of common indoor household objects in the MS_COCO dataset as training samples to train the YOLACT network;步骤S4:结合图像语义信息和L-K光流法进行运动一致性检测,并粗滤除动态特征点;Step S4: Combine image semantic information and L-K optical flow method to perform motion consistency detection and roughly filter out dynamic feature points;步骤S5:基于PROSAC方法对两帧图像的特征点进行匹配,并估计基础矩阵;Step S5: Match the feature points of the two frames of images based on the PROSAC method and estimate the basic matrix;步骤S6:根据基础矩阵计算极线距离,精滤除动态特征点,根据所选取的关键帧估计机器人初始位姿并对结果进行Bundle Adjustment优化。Step S6: Calculate the polar line distance according to the basic matrix, filter out the dynamic feature points, estimate the initial posture of the robot according to the selected key frame and perform Bundle Adjustment optimization on the result.2.根据权利要求1所述的动态场景下的移动机器人视觉里程计设计方法,其特征在于,步骤S1中通过Intel RealSense深度相机获取实时图像信息,所述的预处理包括去除噪点及灰度化操作。2. The method for designing a mobile robot visual odometer in a dynamic scene according to claim 1 is characterized in that in step S1, real-time image information is obtained by an Intel RealSense depth camera, and the preprocessing includes noise removal and grayscale operations.3.根据权利要求1所述的动态场景下的移动机器人视觉里程计设计方法,其特征在于,MS_COCO数据集以JSON文件的形式存储了目标实例、目标关键点和图像说明三种标注类型。3. According to the design method of mobile robot visual odometer in dynamic scenes according to claim 1, it is characterized in that the MS_COCO dataset stores three types of annotations: target instances, target key points and image descriptions in the form of JSON files.4.根据权利要求1所述的动态场景下的移动机器人视觉里程计设计方法,其特征在于,步骤S4具体为:4. The method for designing a mobile robot visual odometer in a dynamic scene according to claim 1, wherein step S4 is specifically:通过L-K光流法可跟踪计算出中动态物体的像素速度,结合运算结果和语义信息对该物体的动态性进行分析,若该物体与背景存在相对运动,则将当前帧该物体所包含的特征点剔除;若物体速度矢量较上一帧变化不超过阈值,则保留相关特征点,至此完成动态特征点的粗滤除。The L-K optical flow method can be used to track and calculate the pixel speed of dynamic objects. The dynamics of the object can be analyzed by combining the calculation results and semantic information. If the object is in relative motion with the background, the feature points contained in the object in the current frame will be removed; if the object velocity vector does not change by more than the threshold compared with the previous frame, the relevant feature points will be retained, thus completing the coarse filtering of dynamic feature points.5.根据权利要求1所述的动态场景下的移动机器人视觉里程计设计方法,其特征在于,步骤S5包含以下步骤:5. The method for designing a mobile robot visual odometer in a dynamic scene according to claim 1, wherein step S5 comprises the following steps:步骤S51:计算两帧图像每个特征点对的最小欧氏距离和评价函数值,并按评价函数值的大小将特征点降序排列;每八个特征点为一组计算每组质量和并排序,选取匹配质量最高的8组匹配点并计算基础矩阵F;Step S51: Calculate the minimum Euclidean distance and evaluation function value of each feature point pair of the two frames of images, and sort the feature points in descending order according to the evaluation function value; calculate the quality and sort of each group of eight feature points, select the 8 groups of matching points with the highest matching quality, and calculate the basic matrix F;步骤S52:将上述8组匹配点从子集中剔除后,根据基础矩阵计算子集内剩余特征点的相应投影点;Step S52: after removing the above 8 groups of matching points from the subset, the corresponding projection points of the remaining feature points in the subset are calculated according to the basic matrix;步骤S53:计算其他特征点与投影点之间误差,若小于设定值则认定为内点;更新内点数目后,重新计算基础矩阵F,并获取新内点;若迭代次数不超过最大值,返回F和内点集合;否则建立模型失败;Step S53: Calculate the error between other feature points and the projection point. If it is less than the set value, it is identified as an inlier. After updating the number of inliers, recalculate the basic matrix F and obtain new inliers. If the number of iterations does not exceed the maximum value, return F and the inlier set. Otherwise, the model establishment fails.步骤S54:计算极线距离,若超过阈值则说明误差大,周围存在需要被剔除的动态物体。Step S54: Calculate the epipolar distance. If it exceeds the threshold, it means that the error is large and there are dynamic objects around that need to be removed.6.根据权利要求1所述的动态场景下的移动机器人视觉里程计设计方法,其特征在于,步骤S6包含以下步骤:6. The method for designing a mobile robot visual odometer in a dynamic scene according to claim 1, wherein step S6 comprises the following steps:步骤S61:利用四个不共面的虚拟控制点线性加权地将世界坐标系下目标物体上任意标点表示为相机坐标系中的坐标,然后把问题转化到3D-3D;Step S61: using four non-coplanar virtual control points to linearly weight any point on the target object in the world coordinate system as a coordinate in the camera coordinate system, and then transforming the problem into 3D-3D;步骤S62:利用ICP算法解得相机位姿参数,包括旋转参数R和平移参数t;Step S62: using the ICP algorithm to solve the camera pose parameters, including the rotation parameter R and the translation parameter t;步骤S63:使用BA对位姿计算结果进行优化。Step S63: Use BA to optimize the pose calculation results.7.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-6中任一所述的动态场景下的移动机器人视觉里程计设计方法。7. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for designing a visual odometer for a mobile robot in a dynamic scene as described in any one of claims 1 to 6 is implemented.8.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一所述的动态场景下的移动机器人视觉里程计设计方法。8. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method for designing a visual odometer for a mobile robot in a dynamic scene as described in any one of claims 1 to 6 is implemented.
CN202210242660.5A2022-03-112022-03-11 A design method for visual odometry of mobile robots in dynamic scenesActiveCN114612494B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210242660.5ACN114612494B (en)2022-03-112022-03-11 A design method for visual odometry of mobile robots in dynamic scenes

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210242660.5ACN114612494B (en)2022-03-112022-03-11 A design method for visual odometry of mobile robots in dynamic scenes

Publications (2)

Publication NumberPublication Date
CN114612494A CN114612494A (en)2022-06-10
CN114612494Btrue CN114612494B (en)2025-07-04

Family

ID=81863309

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210242660.5AActiveCN114612494B (en)2022-03-112022-03-11 A design method for visual odometry of mobile robots in dynamic scenes

Country Status (1)

CountryLink
CN (1)CN114612494B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115731385A (en)*2022-11-222023-03-03中国电子科技南湖研究院 Image Feature Extraction Method, Device and SLAM System Based on Semantic Segmentation
CN118314162B (en)*2024-06-062024-08-30合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室)Dynamic visual SLAM method and device for time sequence sparse reconstruction
CN119313732B (en)*2024-09-292025-07-11中国科学院国家空间科学中心 An infrared/visible light fusion all-day autonomous positioning method and system guided by semantic information in dynamic scenes

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110378345B (en)*2019-06-042022-10-04广东工业大学 Dynamic scene SLAM method based on YOLACT instance segmentation model
NL2025452B1 (en)*2020-04-292021-11-09Navinfo Europe B VSystem and method for 3D positioning a landmark in images from the real world
CN112991447B (en)*2021-03-162024-04-05华东理工大学Visual positioning and static map construction method and system in dynamic environment
CN113916245B (en)*2021-10-092024-07-19上海大学Semantic map construction method based on instance segmentation and VSLAM

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
室内动态场景下基于立体视觉的SLAM系统研究;刘佳祺;中国优秀硕士学位论文全文数据库信息科技辑;20240215;全文*

Also Published As

Publication numberPublication date
CN114612494A (en)2022-06-10

Similar Documents

PublicationPublication DateTitle
CN114255238B (en) A 3D point cloud scene segmentation method and system integrating image features
CN114612494B (en) A design method for visual odometry of mobile robots in dynamic scenes
CN111553949B (en)Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN113065546A (en) A target pose estimation method and system based on attention mechanism and Hough voting
Lü et al.Comprehensive improvement of camera calibration based on mutation particle swarm optimization
CN112183506A (en)Human body posture generation method and system
CN102446356A (en)Parallel self-adaptive matching method for obtaining remote sensing images with uniformly distributed matching points
CN117994480A (en)Lightweight hand reconstruction and driving method
CN114882524A (en)Monocular three-dimensional gesture estimation method based on full convolution neural network
CN118071932A (en)Three-dimensional static scene image reconstruction method and system
CN117011380A (en)6D pose estimation method of target object
CN115205654A (en) A Novel Monocular Vision 3D Object Detection Method Based on Keypoint Constraints
CN117611797A (en) An intelligent recognition system for template engineering structures based on deep learning
CN119338873B (en) A 3D point cloud registration method based on multiple convolutional networks
CN119006742B (en)Human body three-dimensional reconstruction method and system based on deep learning
CN114972948A (en)Neural detection network-based identification and positioning method and system
CN115008454A (en) An online hand-eye calibration method for robots based on multi-frame pseudo-label data enhancement
CN118967913A (en) A neural radiance field rendering method based on straight line constraints
CN116363329B (en)Three-dimensional image generation method and system based on CGAN and LeNet-5
CN117495883A (en)RGBD image instance segmentation method based on edge enhancement module
CN114782802B (en)Robot vision odometer method under dynamic scene
Wang et al.Precision visual recognition application study for 3c assembly robots
CN115880334A (en)Video object tracking method for automatic machine learning map fusion
Zhu et al.Rgb-d saliency detection based on cross-modal and multi-scale feature fusion
CN120411381B (en)Image three-dimensional point cloud generation method based on low-deviation feature extraction and enhancement

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp