CN116468940B

Movatterモバイル変換

Info

Publication number: CN116468940B
Application number: CN202310390675.0A
Authority: CN
Inventors: 陈孟元; 程浩
Original assignee: Anhui Polytechnic University
Current assignee: Anhui Polytechnic University
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-09-19
Anticipated expiration: 2043-04-07
Also published as: CN116468940A

Abstract

Translated fromChinese

本发明公开了一种基于深度学习的感知增强与运动判断算法，包括如下步骤：步骤S1、通过融合模糊区域关注模块与增强检测模块的感知增强网络准确检测动态模糊物体；步骤S2、利用感知器对图像中物体的识别，根据获取到的语义信息将场景中的物体分为高动态、中动态和低动态三种；步骤S3、提取图像中的特征点，并将高动态和中动态目标分类为潜在动态区域进行数据关联；步骤S4、通过构建全局条件随机场对潜在动态区域的特征点进行筛选，最后剔除区域内判断为动态的特征点。本发明基于一元势和二元势函数，通过构建场能量函数并使其最小化来获取全局最佳标签，剔除潜在动态区域中判断为动态的特征点减少动态点对系统的影响。

The invention discloses a perceptual enhancement and motion judgment algorithm based on deep learning, which includes the following steps: Step S1, accurately detect dynamic blur objects through a perceptual enhancement network that fuses the fuzzy area attention module and the enhanced detection module; Step S2, utilize the perceptron To identify the objects in the image, the objects in the scene are divided into three types: high dynamic, medium dynamic and low dynamic based on the obtained semantic information; step S3, extract the feature points in the image and classify the high dynamic and medium dynamic targets. Perform data association for potential dynamic areas; Step S4: Filter the feature points of potential dynamic areas by constructing a global conditional random field, and finally eliminate feature points judged to be dynamic in the area. This invention is based on unary potential and binary potential functions, obtains the global best label by constructing a field energy function and minimizing it, and eliminates feature points judged to be dynamic in the potential dynamic area to reduce the impact of dynamic points on the system.

Description

Translated fromChinese

一种基于深度学习的感知增强与运动判断算法、存储介质及设备A deep learning-based perception enhancement and motion judgment algorithm, storage medium andequipment

技术领域Technical field

本发明属于同步定位与地图创建(Simultaneous Location And Mapping，SLAM)技术领域，具体涉及一种基于深度学习的感知增强与运动判断算法、存储介质及设备。The invention belongs to the technical field of Simultaneous Location And Mapping (SLAM), and specifically relates to a deep learning-based perception enhancement and motion judgment algorithm, storage medium and equipment.

背景技术Background technique

同步定位与地图构建(Simultaneous Location And Mapping，SLAM)是指机移动机器人在没有周围环境先验信息的前提下，使用自身搭载的传感器，完成自身位姿估计与环境地图构建。当前主流SLAM系统都在静态环境中取得了高精度的定位与构图，但在面临复杂环境下SLAM系统位姿估计与地图构建效果较差，特别是在动态模糊环境下系统难以准确识别或错误识别场景物体，无法准确判断物体的运动情况，动态物体对SLAM系统定位精度影响较大。现有技术对图像中动态物体和静态物体进行识别分类以判断图像中的动态点和静态点，但是动态物体不一定处于运动状态，只属于有较大可能处于动态运动的潜在动态区域。现有技术无法根据所采集图像提供的全局信息对其中的潜在动态区域进行判断，造成静态点和动态点被错误标记的可能较大，导致对图像中对象进行运动判断的准确性不足，影响后续地图构建的效果。Simultaneous Location And Mapping (SLAM) refers to a mobile robot that uses its own sensors to complete its own pose estimation and environmental map construction without prior information about the surrounding environment. Current mainstream SLAM systems have achieved high-precision positioning and composition in static environments, but the SLAM system pose estimation and map construction effects are poor in complex environments, especially in dynamic blur environments where the system is difficult to accurately identify or misidentify. Scene objects cannot accurately determine the movement of objects, and dynamic objects have a greater impact on the positioning accuracy of the SLAM system. Existing technologies identify and classify dynamic objects and static objects in images to determine dynamic points and static points in the image. However, dynamic objects are not necessarily in a state of motion and only belong to potential dynamic areas that are more likely to be in dynamic motion. The existing technology cannot judge the potential dynamic areas based on the global information provided by the collected images, resulting in a greater possibility of static points and dynamic points being erroneously marked, resulting in insufficient accuracy in judging the movement of objects in the image, affecting subsequent follow-up work. The effect of map construction.

发明内容Contents of the invention

本发明的目的是提供一种基于深度学习的感知增强与运动判断算法，用于解决现有技术由于根据所采集图像提供的全局信息对其中的潜在动态区域进行判断，造成静态点和动态点被错误标记的可能较大，导致对图像中对象进行运动判断的准确性不足，因而无法建成完整全静态地图的技术问题。The purpose of the present invention is to provide a perception enhancement and motion judgment algorithm based on deep learning to solve the problem in the existing technology that static points and dynamic points are judged based on the global information provided by the collected images. The possibility of incorrect labeling is high, which leads to insufficient accuracy in judging the motion of objects in the image, making it impossible to build a complete fully static map.

所述的一种基于深度学习的感知增强与运动判断算法，包括下列步骤：The deep learning-based perception enhancement and motion judgment algorithm includes the following steps:

步骤S1、通过融合模糊区域关注模块与增强检测模块的感知增强网络准确检测动态模糊物体；Step S1: Accurately detect dynamic blur objects through the perception enhancement network that fuses the fuzzy area attention module and the enhanced detection module;

步骤S2、利用感知器对图像中物体的识别，根据获取到的语义信息将场景中的物体分为高动态、中动态和低动态三种；Step S2: Use the perceptron to identify the objects in the image, and classify the objects in the scene into three types: high dynamic, medium dynamic and low dynamic according to the obtained semantic information;

步骤S3、提取图像中的特征点，并将高动态和中动态目标分类为潜在动态区域进行数据关联；Step S3: Extract feature points in the image, and classify high-dynamic and medium-dynamic targets into potential dynamic areas for data association;

步骤S4、通过构建全局条件随机场对潜在动态区域的特征点进行筛选，最后剔除区域内判断为动态的特征点。Step S4: Screen the feature points of the potential dynamic area by constructing a global conditional random field, and finally eliminate the feature points judged to be dynamic in the area.

优选的，所述步骤S4中，通过构建全局条件随机场判断潜在动态点的运动状态，全局观测信息包括每个点在不同帧中的观测情况和重投影误差，将构建的全局条件随机场的模型转化为Gibbs能量函数来求解，最小化能量函数E(x)得到所有点的最优标签分配，通过使用有效平均场近似法最小化能量函数得到全局最佳标签；Preferably, in step S4, the motion state of potential dynamic points is determined by constructing a global conditional random field. The global observation information includes the observation situation and reprojection error of each point in different frames. The constructed global conditional random field is The model is converted into a Gibbs energy function to solve, and the energy function E(x) is minimized to obtain the optimal label distribution of all points. The global optimal label is obtained by minimizing the energy function using the effective mean field approximation method;

所述能量函数E(x)为：The energy function E(x) is:

能量函数分为两个部分，分别为一元势函数ψ_u(x_i)和二元势函数其中，i和j表示不同的节点，x_i和x_j分别表示节点i和j的类别标签；本算法构造一元势模型来拟合全局条件随机场中的每一个顶点，二元势模型来拟合顶点间相连的边。The energy function is divided into two parts, namely the unary potential function ψ_u (x_i ) and the binary potential function Among them, i and j represent different nodes, x_i and x_j represent the category labels of nodes i and j respectively; this algorithm constructs a unary potential model to fit each vertex in the global conditional random field, and a binary potential model to fit each vertex in the global conditional random field. Connect the edges between vertices.

优选的，一元势函数用来建模特征点集合与观测场之间的关系，如下所示：Preferably, the unary potential function is used to model the relationship between the feature point set and the observation field, as shown below:

ψ_u(x_i)＝-ln((α_i-μ_α)²(β_i-μ_β)²(γ_i-μ_γ)²)ψ_u (x_i )=-ln ((α_i -μ_α )² (β_i -μ_β )² (γ_i -μ_γ )² )

其中，ψ_u(x_i)为构造的一元势函数，α_i为空间点P_i重投影误差，β_i为空间点P_i的总观测数，γ_i为对应像素点到极线的距离，像素点为空间点经投影变换后的点，μ_α、μ_β和μ_γ分别代表上述重投影误差、总观测数和像素点到极线距离的均值。Among them, ψ_u (x_i ) is the constructed unary potential function, α_i is the reprojection error of spatial point_Pi , β_i is the total number of observations of spatial point_Pi , γ_i is the distance from the corresponding pixel point to the epipolar line, The pixel point is the point after projection transformation of the spatial point. μ_α , μ_β and μ_γ respectively represent the above-mentioned reprojection error, the total number of observations and the mean value of the distance from the pixel point to the epipolar line.

优选的，二元势函数构建当前节点与其领域节点之间的关系，通过建模特征点的空间相关性来提高变化的检测性能，如下式所示：Preferably, the binary potential function constructs the relationship between the current node and its domain node, and improves the change detection performance by modeling the spatial correlation of feature points, as shown in the following formula:

其中α_i和α_j分别表示节点i、j的平均重投影误差，β_i和β_j分别表示节点i、j的观测数，p_i和p_j分别表示节点i和j的位姿参数，k₁和k₂为权重参数对，σ_α、σ_β和σ_γ为常数，用来控制高斯核的形状和尺度，μ(x_i，x_j)为表示兼容性函数，如下式所示：where α_i and α_j represent the average reprojection errors of nodes i and j respectively, β_i and β_j represent the number of observations of nodes i and j respectively, p_i and p_j represent the pose parameters of nodes i and j respectively, k₁ and k₂ are a pair of weight parameters, σ_α , σ_β and σ_γ are constants used to control the shape and scale of the Gaussian kernel, and μ (_xi , x_j ) represents the compatibility function, as shown in the following formula:

优选的，所述步骤S1中，将输入特征图输入到改进的通道注意力模块中，获取特征图通道表达机制，对获取的通道表达机制分别进行像素值参数解算操作，从而增强特征图特定通道与区域表达；经过改进的通道注意力获得的区域权值变化参数F_c计算方式如下所示：Preferably, in step S1, the input feature map is input into the improved channel attention module, the feature map channel expression mechanism is obtained, and the pixel value parameter calculation operation is performed on the obtained channel expression mechanism, thereby enhancing the specificity of the feature map. Channel and regional expression; the regional weight change parameter F_c obtained by the improved channel attention is calculated as follows:

其中，η表示Sigmoid函数，Δ₀、Δ₁为2层线性层的权重，和/>为特征图经过像素值参数解算后的结果，/>采用平均池化操作，/>采用最大池化操作，最后用F_c对输入特征图进行逐层通道加权得到通道维度特征图；Among them, eta represents the Sigmoid function, Δ₀ and Δ₁ are the weights of the two linear layers, and/> It is the result of feature map after pixel value parameter calculation,/> Using average pooling operation,/> The maximum pooling operation is used, and finally F_c is used to weight the input feature map layer by channel to obtain the channel dimension feature map;

将输入特征图输入到改进的空间注意力模块中，通过像素值参数解算后拼接融合形成一个双通道的特征图，再通过一层卷积层进行卷积计算，将得到的特征向量使用Sigmoid函数处理得到改进的空间注意力区域权值变化参数F_s，其计算方式如下所示：The input feature map is input into the improved spatial attention module, and the pixel value parameters are solved and then spliced and fused to form a two-channel feature map, and then a convolution layer is used for convolution calculation, and the obtained feature vector is used Sigmoid The function handles the improved spatial attention area weight change parameter F_s , and its calculation method is as follows:

其中，η表示Sigmoid函数，f^7×7表示用7×7的卷积核进行卷积，和/>为特征图经过像素值参数解算后的结果，/>采用平均池化操作，/>采用最大池化操作；Among them, eta represents the Sigmoid function, f^7×7 represents convolution with a 7×7 convolution kernel, and/> It is the result of feature map after pixel value parameter calculation,/> Using average pooling operation,/> Use max pooling operation;

将F_c、F_s与输入特征图连接得到经改进的通道和空间注意力加权后的特征图F_e。Connect F_c and F_s with the input feature map to obtain the improved channel and spatial attention weighted feature map F_e .

优选的，所述步骤S2中，在去模糊网络双鉴别器基础上加入目标检测感知器，感知器将生成器中恢复后的潜在清晰图像进行处理，通过卷积块再次进行下采样，将生成器中相同尺寸图像进行特征融合。感知器骨干网络中的卷积为深度可分离卷积，深度可分离卷积为3×3深度卷积和1×1逐点卷积，深度可分离计算量为其中：C_k表示卷积核的大小，M和N分别表示输入输出数据通道数，C_w和C_h分别表示输出特征矩阵的高和宽。Preferably, in step S2, a target detection perceptron is added to the dual discriminator of the deblurring network. The perceptron processes the restored potentially clear image in the generator, and downsamples it again through the convolution block to generate Feature fusion is performed on images of the same size in the processor. The convolutions in the perceptron backbone network are depth-separable convolutions. The depth-separable convolutions are 3×3 depth convolution and 1×1 point-wise convolution. The depth-separable calculation amount is Among them: C_k represents the size of the convolution kernel, M and N represent the number of input and output data channels respectively, C_w and C_h represent the height and width of the output feature matrix respectively.

优选的，所述步骤S2中，利用感知器对图像中物体的识别，根据获取到的语义信息将场景中的物体分为高动态、中动态和低动态三种；具有自主移动能力的物体定义为高动态物体；既可能静止也可能移动的物体被定义为中动态物体；大部分情况下不会移动的物体被定义为低动态物体。Preferably, in the step S2, the sensor is used to identify the objects in the image, and the objects in the scene are divided into three types: high dynamic, medium dynamic and low dynamic according to the obtained semantic information; objects with the ability to move autonomously are defined. It is a highly dynamic object; an object that may be stationary or moving is defined as a medium dynamic object; an object that does not move in most cases is defined as a low dynamic object.

优选的，所述步骤S3中，将步骤S2中高动态和中动态物体分类为潜在动态物体，低动态物体和背景则归类为静态特征；对获取到的图像进行特征点提取，并将特征点状态与得到的语义信息相关联，潜在动态物体框中的特征点为潜在特征点，低动态物体和背景中的特征点为静态特征点。Preferably, in step S3, the high dynamic and medium dynamic objects in step S2 are classified as potential dynamic objects, and the low dynamic objects and background are classified as static features; feature points are extracted from the acquired images, and the feature points are The state is associated with the obtained semantic information, the feature points in the potential dynamic object frame are potential feature points, and the feature points in low dynamic objects and background are static feature points.

本发明还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上所述的一种基于深度学习的感知增强与运动判断算法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the deep learning-based perception enhancement and motion judgment algorithm are implemented as described above.

本发明还提供了一种计算机设备，包括存储器、处理器及存储在存储器上并能在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上所述的一种基于深度学习的感知增强与运动判断算法的步骤。The present invention also provides a computer device, which includes a memory, a processor and a computer program stored in the memory and capable of running on the processor. When the processor executes the computer program, it implements the depth-based method as described above. Steps in learning algorithms for perceptual enhancement and motion judgment.

本发明具有以下优点：The invention has the following advantages:

1、现有技术对于潜在动态区域中的特征点运动状态缺少有效判断，本算法引入全局条件随机场对潜在动态区域进行判断，针对该问题全局条件随机场可以细化动态点和静态点的标签结果，并且可以剔除被错误标记为静态标签的动态点，从而提高相机位姿的估计精度。基于三组互补的观测变量定义了一元势和二元势函数，通过构建场能量函数并使其最小化来获取全局最佳标签。剔除潜在动态区域中判断为动态的特征点减少动态点对系统的影响。因此对图像中对象进行运动判断的准确性大幅提高。1. The existing technology lacks effective judgment on the motion status of feature points in the potential dynamic area. This algorithm introduces a global conditional random field to judge the potential dynamic area. For this problem, the global conditional random field can refine the labels of dynamic points and static points. As a result, dynamic points that are incorrectly labeled as static labels can be eliminated, thereby improving the accuracy of camera pose estimation. One-dimensional potential and two-dimensional potential functions are defined based on three sets of complementary observation variables, and the global optimal label is obtained by constructing the field energy function and minimizing it. Eliminate feature points judged to be dynamic in potential dynamic areas to reduce the impact of dynamic points on the system. Therefore, the accuracy of motion judgment of objects in the image is greatly improved.

2、根据网络获取到的语义信息，对场景中的物体进行动态评级分类，具有自主移动能力的定义为高动态物体；一般是静止的，但是随时可能运动的定义为中动态物体；大部分情况下都是不会移动的定义为低动态物体。对输入图像提取特征点，并根据语义信息对特征数据进行关联。低动态物体和背景中的特征点可以直接用于位姿估计，中动态和高动态物体作为潜在动态物其中的特征点需要经过运动判断，剔除真实动态点。2. Based on the semantic information obtained from the network, dynamically rate and classify the objects in the scene. Objects with the ability to move autonomously are defined as high dynamic objects; objects that are generally stationary but may move at any time are defined as medium dynamic objects; in most cases The following are objects that do not move and are defined as low dynamic objects. Feature points are extracted from the input image and the feature data are associated based on semantic information. Feature points in low-dynamic objects and backgrounds can be directly used for pose estimation. Medium-dynamic and high-dynamic objects, as potential dynamic objects, need to undergo motion judgment to eliminate real dynamic points.

3、本算法提出一种融合去模糊与目标检测的单阶段网络DYNET。该网络包含恢复图像局部特征的主干网络模块、模糊区域关注模块以及增强识别的感知器模块。主干网络模块采用MobileNetv2基础上构建特征金字塔FPN结构，同时为了增强去模糊的效果，在生成器中引入模糊区域关注模块，将具有丰富高维特征信息的特征图输入到改进的空间和通道注意力模块中，以加强网络对模糊特征的学习，增加模糊区域像素权重，提高模糊物体的识别率。在原有双鉴别器基础上引入改进的目标检测感知器，当网络前端完成去模糊的同时，感知器对图像进行检测。3. This algorithm proposes a single-stage network DYNET that integrates deblurring and target detection. The network includes a backbone network module that restores local features of the image, a fuzzy area attention module, and a perceptron module that enhances recognition. The backbone network module uses MobileNetv2 to build a feature pyramid FPN structure. At the same time, in order to enhance the deblurring effect, a fuzzy area attention module is introduced in the generator, and the feature map with rich high-dimensional feature information is input into the improved space and channel attention. In the module, it is used to strengthen the network's learning of fuzzy features, increase the pixel weight of fuzzy areas, and improve the recognition rate of fuzzy objects. An improved target detection perceptron is introduced based on the original dual discriminator. When the network front-end completes deblurring, the perceptron detects the image.

附图说明Description of drawings

图1为本发明基于深度学习的感知增强与运动判断的流程示意图。Figure 1 is a schematic flowchart of the present invention's perception enhancement and motion judgment based on deep learning.

图2为本发明基于深度学习的感知增强与运动判断的流程图。Figure 2 is a flow chart of perception enhancement and motion judgment based on deep learning of the present invention.

图3为本发明所设计感知增强网络结构图。Figure 3 is a structural diagram of the perception enhancement network designed by the present invention.

图4为本发明所提出的全局条件随机场模型结果图。Figure 4 is a result diagram of the global conditional random field model proposed by the present invention.

图5为本发明为对模糊图像修复前后的对比图。Figure 5 is a comparison diagram before and after the blurred image is repaired according to the present invention.

图6为本发明对图像去模糊前后特征数量和匹配对比图。Figure 6 is a comparison chart of the number of features and matching before and after deblurring the image according to the present invention.

图7为本发明在TUM数据集下运行得到的特征匹配数据柱状图。Figure 7 is a histogram of feature matching data obtained by running the present invention on the TUM data set.

图8为本发明中图像修复前后检测精度对比图。Figure 8 is a comparison chart of detection accuracy before and after image restoration in the present invention.

图9为本发明在TUM数据集下运行得到的动态点剔除对比图。Figure 9 is a comparison diagram of dynamic point elimination obtained by the present invention when running on the TUM data set.

图10为本发明在TUM数据集W_half序列下运行得到的轨迹图。Figure 10 is a trajectory diagram obtained by the present invention operating under the W_half sequence of the TUM data set.

图11为本发明在TUM数据集W_xyz序列下运行得到的轨迹图。Figure 11 is a trajectory diagram obtained by the present invention operating under the TUM data set W_xyz sequence.

图12为本发明在TUM数据集W_rpy序列下运行得到的轨迹图。Figure 12 is a trajectory diagram obtained by the present invention running under the TUM data set W_rpy sequence.

图13为真实实验环境场景和对应的平面布局图。Figure 13 shows the real experimental environment scene and the corresponding floor plan.

图14为本发明在真实场景中对采集到的图像修复前后对比图。Figure 14 is a comparison chart before and after the present invention repairs the collected images in a real scene.

图15为本发明在真实场景中对图像去模糊前后物体检测精度对比图。Figure 15 is a comparison chart of object detection accuracy before and after image deblurring in real scenes according to the present invention.

图16为本发明在真实场景中动态点剔除效果图。Figure 16 is a diagram showing the dynamic point elimination effect of the present invention in a real scene.

图17为移动机器人真实场景运行轨迹图。Figure 17 shows the running trajectory of the mobile robot in a real scene.

具体实施方式Detailed ways

下面对照附图，通过对实施例的描述，对本发明具体实施方式作进一步详细的说明，以帮助本领域的技术人员对本发明的发明构思、技术方案有更完整、准确和深入的理解。The specific embodiments of the present invention will be further described in detail below by describing the embodiments with reference to the accompanying drawings to help those skilled in the art have a more complete, accurate and in-depth understanding of the inventive concepts and technical solutions of the present invention.

传统SLAM算法在动态模糊场景下难以准确识别或错误识别物体，无法准确判断潜在动态点的运动状态等问题，最终导致SLAM系统定位与地图构建效果较差，基于此本算法提出一种改进的SLAM算法。该方法包含感知增强、潜在动态点筛选与基于全局条件随机场的动态点剔除三个环节。感知增强环节通过本算法所提DYNET网络对输入图像进行去模糊和目标检测。在潜在动态点筛选环节，通过获取到的语义信息将场景中的物体分为高、中、低三种动态等级，并提取图像特征点，将语义信息和特征信息相关联。在全局条件随机场的动态点剔除环节，通过构建全局条件随机场对潜在动态区域进行判断，剔除判断为动态的特征点。利用该原理对现有的SLAM技术进行改进得到本发明方案。The traditional SLAM algorithm is difficult to accurately identify or misidentify objects in dynamic blur scenes, and cannot accurately judge the motion status of potential dynamic points. This ultimately leads to poor SLAM system positioning and map construction effects. Based on this, this algorithm proposes an improved SLAM. algorithm. This method includes three links: perception enhancement, potential dynamic point screening, and dynamic point elimination based on global conditional random fields. In the perception enhancement link, the input image is deblurred and target detected through the DYNET network proposed by this algorithm. In the potential dynamic point screening process, the objects in the scene are classified into three dynamic levels of high, medium and low through the obtained semantic information, and image feature points are extracted to associate the semantic information and feature information. In the dynamic point elimination process of the global conditional random field, the potential dynamic areas are judged by constructing the global conditional random field, and feature points judged to be dynamic are eliminated. This principle is used to improve the existing SLAM technology to obtain the solution of the present invention.

实施例一：Example 1:

如图1-17所示，本发明提供了一种基于深度学习的感知增强与运动判断算法，包括如下步骤：As shown in Figure 1-17, the present invention provides a deep learning-based perception enhancement and motion judgment algorithm, which includes the following steps:

步骤S1、通过融合模糊区域关注模块与增强检测模块的感知增强网络准确检测动态模糊物体。Step S1: Accurately detect dynamic blur objects through a perception enhancement network that fuses the fuzzy area attention module and the enhanced detection module.

模糊区域关注模块由改进的通道注意力和改进的空间注意力两部分组成，将输入特征图分别在通道和空间两个维度进行连接，并将获取到的对应特征图进行拼接融合得到输出F_e，增大模糊区域的像素权重，提高模糊物体的识别率和精度，改进的通道注意力作用为增强特征图特定通道与区域表达，更好地学习模糊特征；改进的空间注意力作用为可通过学习优化特征图中模糊区域像素值权重，根据反向传播不断动态调整各权重值，进而引导网络模型关注模糊部分所在区域，由此对动态模糊物体进行准确检测。The fuzzy area attention module consists of two parts: improved channel attention and improved spatial attention. It connects the input feature maps in the two dimensions of channel and space respectively, and splices and fuses the obtained corresponding feature maps to obtain the output F_e , increase the pixel weight of the blurred area, improve the recognition rate and accuracy of blurred objects, the improved channel attention function is to enhance the specific channel and regional expression of the feature map, and better learn the fuzzy features; the improved spatial attention function is to pass Learn to optimize the pixel value weight of the blurred area in the feature map, and continuously and dynamically adjust each weight value based on backpropagation, thereby guiding the network model to focus on the area where the blurred part is located, thereby accurately detecting dynamically blurred objects.

所述步骤S1中，将输入特征图输入到改进的通道注意力模块中，获取特征图通道表达机制，对获取的通道表达机制分别进行像素值参数解算操作，从而增强特征图特定通道与区域表达。通过像素值参数解算操作获得的区域表达和/>经过多层感知机模块加强区域表达的相关性，并对各区域权值变化参数进行重分配，更好地学习模糊特征，经过改进的通道注意力获得的区域权值变化参数F_c计算方式如下所示：In the step S1, the input feature map is input into the improved channel attention module, the feature map channel expression mechanism is obtained, and the pixel value parameter calculation operation is performed on the obtained channel expression mechanism respectively, thereby enhancing the specific channels and regions of the feature map. Express. Regional representation obtained by pixel value parametric solving operation and/> After the multi-layer perceptron module strengthens the correlation of regional expressions and redistributes the weight change parameters of each region to better learn fuzzy features, the regional weight change parameter F_c obtained through improved channel attention is calculated as follows Shown:

其中，η表示Sigmoid函数，Δ₀、Δ₁为2层线性层的权重，和/>为特征图经过像素值参数解算后的结果，/>采用平均池化操作，/>采用最大池化操作，最后用F_c对输入特征图进行逐层通道加权得到通道维度特征图。Among them, eta represents the Sigmoid function, Δ₀ and Δ₁ are the weights of the two linear layers, and/> It is the result of feature map after pixel value parameter calculation,/> Using average pooling operation,/> The maximum pooling operation is used, and finally F_c is used to weight the input feature map layer by channel to obtain the channel dimension feature map.

本步骤中，还将输入特征图输入到改进的空间注意力模块中，通过像素值参数解算后拼接融合形成一个双通道的特征图，再通过一层卷积层进行卷积计算，将得到的特征向量使用Sigmoid函数处理得到改进的空间注意力区域权值变化参数F_s，其计算方式如下所示：In this step, the input feature map is also input into the improved spatial attention module, and the pixel value parameters are solved and then spliced and fused to form a two-channel feature map, and then a convolution layer is used for convolution calculation, and the result is The feature vector is processed using the Sigmoid function to obtain the improved spatial attention area weight change parameter F_s , which is calculated as follows:

其中，η表示Sigmoid函数，f^7×7表示用7×7的卷积核进行卷积，和/>为特征图经过像素值参数解算后的结果，/>采用平均池化操作，/>采用最大池化操作。将F_c、F_s与输入特征图连接得到经改进的通道和空间注意力加权后的特征图F_e。Among them, eta represents the Sigmoid function, f^7×7 represents convolution with a 7×7 convolution kernel, and/> It is the result of feature map after pixel value parameter calculation,/> Using average pooling operation,/> Use max pooling operation. Connect F_c and F_s with the input feature map to obtain the improved channel and spatial attention weighted feature map F_e .

步骤S2、利用感知器对图像中物体的识别，根据获取到的语义信息将场景中的物体分为高动态、中动态和低动态三种。Step S2: Use the sensor to identify the objects in the image, and classify the objects in the scene into three types: high dynamic, medium dynamic and low dynamic based on the obtained semantic information.

本步骤中，在去模糊网络双鉴别器基础上加入目标检测感知器，感知器将生成器中恢复后的潜在清晰图像进行处理，通过卷积块再次进行下采样，将生成器中相同尺寸图像进行特征融合。这样使得网络在去模糊的同时完成目标检测，为了减少网络的参数量将感知器骨干网络中的标准卷积替换为深度可分离卷积，传统卷积先对各通道中的输入特征图和对应卷积核进行卷积，然后将它们输出特征叠加。传统卷积计算量为深度可分离卷积将标准卷积中的一步运算改为3×3深度卷积和1×1逐点卷积，深度可分离计算量为/>传统卷积和深度可分离卷积的计算量之比α为In this step, a target detection perceptron is added to the dual discriminator of the deblurring network. The perceptron processes the restored potentially clear image in the generator, downsamples it again through the convolution block, and converts the image of the same size in the generator into Perform feature fusion. This allows the network to complete target detection while deblurring. In order to reduce the parameter amount of the network, the standard convolution in the perceptron backbone network is replaced with a depth-separable convolution. The traditional convolution first performs the input feature map and the corresponding function in each channel. The convolution kernel performs convolution, and then their output features are superimposed. The traditional convolution calculation amount is Depthwise separable convolution changes the one-step operation in standard convolution to 3×3 depth convolution and 1×1 point-wise convolution, and the depthwise separable calculation amount is/> The ratio α of the calculation amount of traditional convolution and depth-separable convolution is

其中：C_k表示卷积核的大小，M和N分别表示输入输出数据通道数，C_w和C_h分别表示输出特征矩阵的高和宽。卷积核大小一般为3×3，所以普通卷积的计算量是改进后网络的8～9倍。这样有效降低了对图像中物体进行识别时所耗费的计算量，提高了识别效率。Among them: C_k represents the size of the convolution kernel, M and N represent the number of input and output data channels respectively, C_w and C_h represent the height and width of the output feature matrix respectively. The convolution kernel size is generally 3×3, so the calculation amount of ordinary convolution is 8 to 9 times that of the improved network. This effectively reduces the amount of calculations required to identify objects in images and improves identification efficiency.

该步骤中，利用感知器对图像中物体的识别，根据获取到的语义信息将场景中的物体分为高动态、中动态和低动态三种。人、动物等具有自主移动能力的物体定义为高动态物体；椅子、书本等物体一般是静止的，但是随时可能运动，这类既可能静止也可能移动的物体被定义为中动态物体；电脑、桌子等物体大部分情况下都是不会移动的，因此被定义为低动态物体。In this step, the perceptron is used to identify objects in the image, and the objects in the scene are classified into three types: high dynamic, medium dynamic and low dynamic based on the obtained semantic information. Objects such as people and animals that have the ability to move autonomously are defined as highly dynamic objects; objects such as chairs and books are generally stationary, but may move at any time. Such objects that may be stationary or moving are defined as medium dynamic objects; computers, books, etc. Objects such as tables do not move in most cases, so they are defined as low-dynamic objects.

步骤S3、提取图像中的特征点，并将高动态和中动态目标分类为潜在动态区域进行数据关联。Step S3: Extract feature points in the image, and classify high-dynamic and medium-dynamic targets into potential dynamic areas for data association.

将步骤S2中高动态和中动态物体分类为潜在动态物体，低动态物体和背景则归类为静态特征。对获取到的图像进行特征点提取，并将特征点状态与得到的语义信息相关联，即潜在动态物体框中的特征点为潜在特征点，低动态物体和背景中的特征点为静态特征点。其中，潜在动态区域需要经过步骤S4进一步判断运动情况，静态特征点则无需判断可以直接用于位姿估计和地图构建。The high dynamic and medium dynamic objects in step S2 are classified as potential dynamic objects, and the low dynamic objects and background are classified as static features. Extract feature points from the acquired image, and associate the feature point status with the obtained semantic information, that is, the feature points in the potential dynamic object frame are potential feature points, and the feature points in low-dynamic objects and background are static feature points. . Among them, the potential dynamic area needs to be further judged in step S4, while the static feature points do not need to be judged and can be directly used for pose estimation and map construction.

通过构建全局条件随机场判断潜在动态点的运动状态，全局条件随机场可以细化动态点和静态点的标签结果，并且可以剔除被错误标记为静态标签的动态点，从而提高相机位姿的估计精度。在该模型中，全局观测信息包括每个点在不同帧中的观测情况和重投影误差。根据这些信息，全局条件随机场可以自主学习动态点和静态点的特征，并且可以利用这些特征来对新的点进行分类。将构建的概率模型(即全局条件随机场的模型)转化为Gibbs能量函数来求解，最小化能量函数E(x)得到所有点的最优标签分配，本算法提出的GCRF模型的能量函数E(x)为：By constructing a global conditional random field to determine the motion status of potential dynamic points, the global conditional random field can refine the label results of dynamic points and static points, and can eliminate dynamic points that are incorrectly labeled as static labels, thereby improving the estimation of camera poses. Accuracy. In this model, global observation information includes the observation situation and reprojection error of each point in different frames. Based on this information, the global conditional random field can autonomously learn the characteristics of dynamic points and static points, and can use these characteristics to classify new points. The constructed probability model (that is, the model of the global conditional random field) is converted into a Gibbs energy function to solve, and the energy function E(x) is minimized to obtain the optimal label distribution of all points. The energy function E of the GCRF model proposed by this algorithm ( x) is:

能量函数分为两个部分，分别为一元势函数ψ_u(x_i)和二元势函数其中，i和j表示不同的节点，x_i和x_j分别表示节点i和j的类别标签。本算法构造一元势模型来拟合全局条件随机场中的每一个顶点，二元势模型来拟合顶点间相连的边。The energy function is divided into two parts, namely the unary potential function ψ_u (x_i ) and the binary potential function Among them, i and j represent different nodes, and x_i and x_j represent the category labels of nodes i and j respectively. This algorithm constructs a unary potential model to fit each vertex in the global conditional random field, and a binary potential model to fit the edges connected between the vertices.

一元势函数用来建模特征点集合与观测场之间的关系，针对一元势函数，基于三组互补的观测变量定义了三种静态似然先验，通过加权的方式给出总体静态概率并构造一元势函数，如下所示The unary potential function is used to model the relationship between the feature point set and the observation field. For the unary potential function, three static likelihood priors are defined based on three sets of complementary observation variables, and the overall static probability is given in a weighted manner. Construct a unary potential function as shown below

其中，ψ_u(x_i)为构造的一元势函数，α_i为空间点P_i重投影误差，β_i为空间点P_i的总观测数，γ_i为对应像素点到极线的距离。像素点为空间点经投影变换后的点，μ_α、μ_β和μ_γ分别代表上述重投影误差、总观测数和像素点到极线距离的均值。通常使用投影变换将三维的空间点映射到二维的像素点上，这种投影变换被称为透视投影或针孔相机模型。Among them, ψ_u (x_i ) is the constructed unary potential function, α_i is the reprojection error of spatial point_Pi , β_i is the total number of observations of spatial point_Pi , and γ_i is the distance from the corresponding pixel point to the epipolar line. The pixel point is the point after projection transformation of the spatial point. μ_α , μ_β and μ_γ respectively represent the above-mentioned reprojection error, the total number of observations and the mean value of the distance from the pixel point to the epipolar line. Projection transformation is usually used to map three-dimensional space points to two-dimensional pixel points. This projection transformation is called perspective projection or pinhole camera model.

二元势函数构建当前节点与其领域节点之间的关系，通过建模特征点的空间相关性来提高变化的检测性能。本发明使用双边高斯核来拟合模型，双边分别为观测边和定位边。基于观测边的高斯核描述了具有相似观测数量和平均重投影误差的节点往往属于相同标签的同一类，不同标签的点在观测次数和平均重投影误差应该有明显差异；基于定位边的高斯核描述了相邻空间点应该属于同一个物体，也应该具有相同的标签，定位核(即定位边的高斯核)中加入了不同标签但彼此相邻点的惩罚项。二元势函数促使相邻点保持一致的标签，如下式所示：The binary potential function constructs the relationship between the current node and its domain nodes, and improves the change detection performance by modeling the spatial correlation of feature points. This invention uses a bilateral Gaussian kernel to fit the model, and the bilateral sides are the observation side and the positioning side respectively. Gaussian kernel based on observation edges describes that nodes with similar number of observations and average reprojection error tend to belong to the same category with the same label. Points with different labels should have obvious differences in the number of observations and average reprojection error; Gaussian kernel based on positioning edges It describes that adjacent space points should belong to the same object and should also have the same label. The positioning kernel (that is, the Gaussian kernel for positioning edges) adds a penalty term for points with different labels but adjacent to each other. The binary potential function encourages adjacent points to maintain consistent labels, as shown in the following formula:

通过使用有效平均场近似法最小化能量函数得到全局最佳标签。The globally optimal label is obtained by minimizing the energy function using an efficient mean field approximation.

步骤S4中在剔除动态点完成后，本算法即完成了对空间点的运动判断，实现动态点的筛选和剔除。此后即可利用剔除动态点后的剩余静态点进行位姿估计与地图构建。After the elimination of dynamic points is completed in step S4, the algorithm completes the motion judgment of spatial points and realizes the screening and elimination of dynamic points. After that, the remaining static points after removing the dynamic points can be used for pose estimation and map construction.

下面以结合具体实验对上述基于深度学习大的感知增强与运动判断算法的过程进行说明。The process of the above-mentioned deep learning-based perception enhancement and motion judgment algorithm will be explained below with specific experiments.

图5为本算法在公开数据集TUM对模糊图像修复前后对比图，本算法选取数据集中存在模糊的场景来验证本算法的效果。由图可知，由于本算法引入了模糊区域关注模块加强了模糊区域的特征表示，因此本算法可由粗到细的实现图像中细粒度纹理特征的复原以及处理复杂的模糊问题。Figure 5 shows the comparison before and after blurred image repair using this algorithm in the public data set TUM. This algorithm selects scenes with blur in the data set to verify the effect of this algorithm. As can be seen from the figure, since this algorithm introduces a fuzzy area attention module to strengthen the feature representation of the fuzzy area, this algorithm can restore fine-grained texture features in the image from coarse to fine and handle complex blur problems.

图6本算法在公开数据集TUM中对修复前后图像进行特征点提取及特征匹配对比图。通过感知增强网络增强了图像质量，然后对图像进行特征点提取和特征匹配，使用本算法前后图像特征点和特征匹配数量都有一定提高，且剔除了部分错误特征点和特征匹配。Figure 6: This algorithm performs feature point extraction and feature matching comparison on images before and after repair in the public data set TUM. The image quality is enhanced through the perceptual enhancement network, and then feature point extraction and feature matching are performed on the image. The number of image feature points and feature matching has been improved before and after using this algorithm, and some erroneous feature points and feature matching have been eliminated.

图7为本算法在公开数据集TUM中动态序列W_half、W_xyz和W_rpy中特征匹配数柱状图，由图可知对模糊图像进行修复后，特征匹配数都在一定程度上得到了提升，本算法在模糊环境中具有较高鲁棒性。Figure 7 is a histogram of the number of feature matches in the dynamic sequences W_half, W_xyz and W_rpy of this algorithm in the public data set TUM. It can be seen from the figure that after repairing the blurred image, the number of feature matches has been improved to a certain extent. This algorithm has improved It has high robustness in fuzzy environments.

图8为本算法选取公开数据集TUM中去模糊前后目标检测精度对比图，由图可知，直接对数据集中模糊图像进行检测存在大量漏检测和错误检测问题，而本算法采用感知增强网络先对图像进行修复，因此提升了目标检测精度。Figure 8 is a comparison chart of target detection accuracy before and after deblurring in the public data set TUM selected by this algorithm. It can be seen from the figure that there are a lot of missed detection and wrong detection problems in directly detecting blurred images in the data set. However, this algorithm uses a perceptual enhancement network to first detect blurred images. The image is repaired, thus improving the target detection accuracy.

图9为本算法在公开数据集TUM中和不同算法特征提取对比图，传统的动态SLAM算法仅通过深度学习网络来剔除动态点，这样会存在一定的误剔除，当误剔除数量较多时，SLAM系统会跟踪丢失。而本算法结合深度学习网络和全局条件随机场来综合判断动态点，由图可以看出，本算法可以筛选出实际动态点，一定程度上增加了特征信息。Figure 9 is a comparison of the feature extraction of this algorithm and different algorithms in the public data set TUM. The traditional dynamic SLAM algorithm only uses the deep learning network to eliminate dynamic points, which will cause certain false eliminations. When the number of false eliminations is large, SLAM The system tracks losses. This algorithm combines deep learning networks and global conditional random fields to comprehensively determine dynamic points. As can be seen from the figure, this algorithm can screen out actual dynamic points and increase feature information to a certain extent.

图10至图12本算法在公开数据集TUM中动态序列W_half、W_xyz和W_rpy中运行轨迹图。图中线条分别表示相机运动真是轨迹、SLAM方法估计的相机运动轨迹和轨迹误差。在结合本算法后，由于使用了感知增强网络和基于全局条件随机场的运动判断减少了动态物体对系统的影响，因此本算法与真实轨迹相近。Figure 10 to Figure 12 show the running trajectory of this algorithm in the dynamic sequences W_half, W_xyz and W_rpy in the public data set TUM. The lines in the figure respectively represent the true trajectory of the camera motion, the camera motion trajectory estimated by the SLAM method, and the trajectory error. After combining this algorithm, this algorithm is close to the real trajectory due to the use of perception enhancement network and motion judgment based on global conditional random fields to reduce the impact of dynamic objects on the system.

下面将用另一组实验对本实施例的方法进行说明：本算法选取学校会议室为室内实验场景，如图13左所示，大小为10m×5m。图13右为真实场景平面布局图，其中包括工作台，A-B段为行人往返运动路线，C-D段为机器人运动路线。Another set of experiments will be used to illustrate the method of this embodiment: This algorithm selects the school conference room as the indoor experimental scene, as shown on the left side of Figure 13, with a size of 10m×5m. The right side of Figure 13 shows the floor plan of the real scene, including the workbench. Sections A-B are the pedestrian movement routes, and sections C-D are the robot movement routes.

图14为真实场景中对采集到的图像修复前后对比图。通过该实验证明本算法在实际场景中去模糊效果。由图可以看出图像中图像中原本模糊的区域都得到了修复，有效改善图像的细节纹理，重构出模糊物体清晰结构并且减少了在其它区域造成的视觉拖影。Figure 14 is a comparison before and after restoration of the collected images in a real scene. This experiment proves the deblurring effect of this algorithm in actual scenes. It can be seen from the figure that the originally blurry areas in the image have been repaired, effectively improving the detailed texture of the image, reconstructing the clear structure of blurred objects and reducing visual smear caused in other areas.

图15为真实场景中对图像去模糊前后物体检测精度对比图。通过实际场景中获取的图像帧验证本算法的有效性。由图可知，本算法通过图像修复有效提高了目标检测精度，减少了错误匹配。Figure 15 shows a comparison of object detection accuracy before and after image deblurring in a real scene. The effectiveness of this algorithm is verified through image frames obtained in actual scenes. As can be seen from the figure, this algorithm effectively improves the target detection accuracy and reduces false matching through image repair.

图16为本发明在真实场景中动态点剔除效果图。通过感知增强网络对输入图像处理后，根据语义信息筛选出的潜在动态区域，本算法构建全局条件随机场判断实际动态情况。从图中可以看出，本算法相比传统动态SLAM算法特征点剔除更准确，减少了误剔除，提升了复杂环境下SLAM系统的鲁棒性。Figure 16 is a diagram showing the dynamic point elimination effect of the present invention in a real scene. After processing the input image through the perception enhancement network, this algorithm constructs a global conditional random field to judge the actual dynamic situation based on the potential dynamic areas screened out by the semantic information. As can be seen from the figure, this algorithm is more accurate in eliminating feature points than the traditional dynamic SLAM algorithm, reducing false eliminations and improving the robustness of the SLAM system in complex environments.

图17为移动机器人真实场景运行轨迹图。由图17可以看出，在动态模糊环境下本算法由于加入DYNET感知增强网络检测场景中模糊物体，并根据获取到的语义信息进行数据关联，通过构建全局条件随机场进行动态点筛选和剔除，减少了误剔除，保留了更多用于位姿估计与地图构建的特征点数量。因此本算法在动态模糊环境下具有较高鲁棒性。Figure 17 shows the running trajectory of the mobile robot in a real scene. As can be seen from Figure 17, in the dynamic fuzzy environment, this algorithm adds DYNET perception enhancement network to detect fuzzy objects in the scene, and performs data association based on the obtained semantic information, and performs dynamic point screening and elimination by constructing a global conditional random field. False elimination is reduced and more feature points are retained for pose estimation and map construction. Therefore, this algorithm has high robustness in dynamic fuzzy environment.

实施例二：Example 2:

与本发明实施例一对应，本发明实施例二提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时依照实施例一的方法实现以下步骤：Corresponding to the first embodiment of the present invention, the second embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the following steps are implemented according to the method of the first embodiment:

步骤S4，通过构建全局条件随机场对潜在动态区域的特征点进行筛选，最后剔除区域内判断为动态的特征点。Step S4: Filter the feature points of the potential dynamic area by constructing a global conditional random field, and finally eliminate the feature points judged to be dynamic in the area.

上述存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、光盘等各种可以存储程序代码的介质。The above-mentioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), optical disk and other media that can store program codes.

上述关于计算机可读存储介质中程序执行后实现步骤的具体限定可以参见实施例一，在此不再做详细说明。The above specific limitations on the implementation steps after the program is executed in the computer-readable storage medium can be found in Embodiment 1, and will not be described in detail here.

实施例三：Embodiment three:

与本发明实施例一对应，本发明实施例三提供一种计算机设备，包括存储器、处理器及存储在存储器上并能在处理器上运行的计算机程序，所述处理器执行所述程序时依照实施例一的方法实现以下步骤：Corresponding to the first embodiment of the present invention, the third embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the program, The method of Embodiment 1 implements the following steps:

上述关于计算机设备实现步骤的具体限定可以参见实施例一，在此不再做详细说明。For the above specific limitations on the implementation steps of the computer device, please refer to Embodiment 1, and will not be described in detail here.

需要说明的是，本发明的说明书附图中的框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与获得机指令的组合来实现。It should be noted that each block in the block diagram and/or flowchart in the drawings of the present invention, and the combination of blocks in the block diagram and/or flowchart, may be used to perform the specified function or action. It can be implemented using a hardware-based system, or it can be implemented using a combination of dedicated hardware and acquisition machine instructions.

上面结合附图对本发明进行了示例性描述，显然本发明具体实现并不受上述方式的限制，只要采用了本发明的发明构思和技术方案进行的各种非实质性的改进，或未经改进将本发明构思和技术方案直接应用于其它场合的，均在本发明保护范围之内。The present invention has been exemplarily described above in conjunction with the accompanying drawings. It is obvious that the specific implementation of the present invention is not limited by the above-mentioned manner, as long as various non-substantive improvements are made using the inventive concepts and technical solutions of the present invention, or without improvement. Direct application of the concepts and technical solutions of the present invention to other situations shall fall within the protection scope of the present invention.