CN111161217B

Movatterモバイル変換

Info

Publication number: CN111161217B
Application number: CN201911258543.2A
Authority: CN
Inventors: 黄睿; 邢艳; 叶何斌
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2023-04-18
Anticipated expiration: 2039-12-10
Also published as: CN111161217A

Abstract

The invention discloses a Conv-LSTM multi-scale feature fusion-based fuzzy detection method, which comprises the following steps: constructing a network basic architecture consisting of a multi-scale convolutional network and a Conv-LSTM sub-network, wherein the convolutional network is used for extracting image features under different scales, and the sub-network is used for feature fusion and improvement; fusing and reducing dimensions of the features extracted by the convolution networks under different scales in a bilinear interpolation and convolution mode; improving the detection result and the depth characteristic by using the sequence property of Conv-LSTM; the losses are calculated for the outputs of the five Conv-LSTM layers of the Conv-LSTM sub-network, the final loss function value being the sum of the loss function values of the 5 layers. The invention adopts a bilinear interpolation and convolution mode to fuse the features of different scales, improves the fuzzy detection result by using a Conv-LSTM network model, fuses the low-level features and high-level semantic information of the image and trains the network by using a multilayer loss function.

Description

Translated fromChinese

基于Conv-LSTM多尺度特征融合的模糊检测方法Blur detection method based on Conv-LSTM multi-scale feature fusion

技术领域Technical Field

本发明涉及模糊检测领域，尤其涉及一种基于Conv-LSTM(卷积长短时记忆神经网络)多尺度特征融合的模糊检测方法。The present invention relates to the field of blur detection, and in particular to a blur detection method based on Conv-LSTM (Convolutional Long Short-Term Memory Neural Network) multi-scale feature fusion.

背景技术Background Art

模糊检测不仅区分图像模糊和清晰区域，还研究图像模糊中包含的深层信息。应用包括模糊放大、图像去模糊、景深估计和图像质量评估。Blur detection not only distinguishes blurry and clear areas in an image, but also studies the deep information contained in the blurry image. Applications include blur magnification, image deblurring, depth estimation, and image quality assessment.

通常，模糊图像相对于清晰图像来说图像梯度更小；且清晰图像包含更多的高频成分，而模糊图像的高频成分少，频域分布相对集中。传统的模糊检测方法基于此类手工特征，来提取模糊图像中的信息。主要方法包括：利用离散余弦变换(DCT)，将图像从空间域转换到频域，利用模糊图像块和清晰图像块之间的频率差异作为模糊检测的特征。一些方法利用图像边缘提取算子(例如：Sobel)，提取图像的梯度信息作为模糊检测的特征。还有一类利用奇异值(SVD)分解提取特征的模糊检测方法。其中原理为，保留图像细节需要更多的非零奇异值，也就是说清晰图像比模糊图像包含更多的非零奇异值。以上方法在处理图像均值区域和面对图像噪声时效果较差，且不包含图像当中的高级语义信息。Generally, the image gradient of a blurred image is smaller than that of a clear image; and a clear image contains more high-frequency components, while a blurred image has fewer high-frequency components and a relatively concentrated frequency domain distribution. Traditional blur detection methods are based on such manual features to extract information from blurred images. The main methods include: using discrete cosine transform (DCT) to convert the image from the spatial domain to the frequency domain, and using the frequency difference between blurred image blocks and clear image blocks as a feature for blur detection. Some methods use image edge extraction operators (such as Sobel) to extract image gradient information as a feature for blur detection. There is also a class of blur detection methods that use singular value decomposition (SVD) to extract features. The principle is that retaining image details requires more non-zero singular values, that is, clear images contain more non-zero singular values than blurred images. The above methods are less effective in processing image mean areas and facing image noise, and do not contain high-level semantic information in the image.

最近随着深度学习的发展，卷积神经网络被广泛应用到图像检测领域。深度学习的方法利用神经网络提取图像的特征，并对输入图像和标签进行端到端的学习，能够有效解决传统手工特征无法提取语义信息以及面对图像均值区域检测效果差等问题。Zhao^[1]等人提出了一个Multi-Stream Bottom-top-Bottom的网络(BTBNet)来解决图像均值区域检测效果差的问题，并且有强的抗图像背景杂波能力。Tang^[2]等人提出了一个多尺度深度特征融合和改善的模型，名为DeFusionNET，解决图像低级特征和高级语义信息的融合问题，并使用层交叉的方式对检测结果进行改善。Recently, with the development of deep learning, convolutional neural networks have been widely used in the field of image detection. Deep learning methods use neural networks to extract image features and perform end-to-end learning on input images and labels. They can effectively solve the problems that traditional manual features cannot extract semantic information and face the problem of poor detection of image mean regions.^{Zhao [1]} et al. proposed a Multi-Stream Bottom-top-Bottom network (BTBNet) to solve the problem of poor detection of image mean regions, and it has strong resistance to image background clutter.^{Tang [2]} et al. proposed a multi-scale deep feature fusion and improvement model, named DeFusionNET, which solves the problem of fusion of low-level image features and high-level semantic information, and uses layer crossover to improve the detection results.

现有的深度学习方法在面对图像均值区域时，性能表现依然很好，且具有很强的鲁棒性。但大多数方法都是基于一个交叉的复杂网络结构来对模糊检测结果进行改善，保留了无用特征，性能达不到最优。Existing deep learning methods still perform well and are highly robust when facing image mean regions. However, most methods are based on a cross-complex network structure to improve blur detection results, retaining useless features and failing to achieve optimal performance.

参考文献References

[1]W.Zhao,F.Zhao,D.Wang,and H.Lu,“Defocus blur detection viamultistream bottom-top-bottom fully convolutional network,”in Proc.IEEEConf.Comput.Vis.Pattern Recognit.(CVPR),2018:3080–3088.[1]W.Zhao, F.Zhao, D.Wang, and H.Lu, "Defocus blur detection via multistream bottom-top-bottom fully convolutional network," in Proc.IEEEConf.Comput.Vis.Pattern Recognit.(CVPR) ,2018:3080–3088.

[2]C.Tang,X.Zhu,X.Liu,and L.Wang,“DeFusionNET:Defocus Blur Detectionvia Recurrently Fusing and Refining Multi-scale Deep Features,”in Proc.IEEEConf.Comput.Vis.Pattern Recognit.(CVPR),2019:2700-2709.[2] C.Tang, CVPR), 2019:2700-2709.

发明内容Summary of the invention

本发明提供了一种基于Conv-LSTM多尺度特征融合的模糊检测方法，本发明采用双线性插值加卷积的方式对不同尺度的特征进行融合，用Conv-LSTM网络模型对模糊检测结果进行改善，并融合了图像低级特征和高级的语义信息，用多层损失函数训练网络，详见下文描述：The present invention provides a blur detection method based on Conv-LSTM multi-scale feature fusion. The present invention adopts bilinear interpolation plus convolution to fuse features of different scales, uses Conv-LSTM network model to improve blur detection results, and fuses low-level features and high-level semantic information of the image. The network is trained with a multi-layer loss function, as described below:

一种基于Conv-LSTM多尺度特征融合的模糊检测方法，所述方法包括：A blur detection method based on Conv-LSTM multi-scale feature fusion, the method comprising:

构建由多尺度的卷积网络、Conv-LSTM子网络构成的网络基本架构，所述卷积网络用于提取不同尺度下的图像特征，所述子网络用于特征融合和改善；Construct a basic network architecture consisting of a multi-scale convolutional network and a Conv-LSTM sub-network, wherein the convolutional network is used to extract image features at different scales, and the sub-network is used for feature fusion and improvement;

对不同尺度下的卷积网络提取的特征用双线性插值和卷积的方式进行融合和降维；The features extracted by convolutional networks at different scales are fused and reduced in dimension using bilinear interpolation and convolution.

利用Conv-LSTM的序列性质，对检测结果和深度特征进行改善；Utilize the sequential nature of Conv-LSTM to improve the detection results and deep features;

对Conv-LSTM子网络的五个Conv-LSTM层的输出计算损失，最终损失函数值为5个层的损失函数值之和。The loss is calculated for the output of the five Conv-LSTM layers of the Conv-LSTM subnetwork, and the final loss function value is the sum of the loss function values of the five layers.

其中，所述Conv-LSTM子网络由5个Conv-LSTM层组成，分别与多尺度卷积网络5个Conv模块对应；Conv-LSTM层接收对应的多尺度融合特征和上一个Conv-LSTM模块的输出作为输入。The Conv-LSTM subnetwork consists of 5 Conv-LSTM layers, which correspond to the 5 Conv modules of the multi-scale convolutional network respectively; the Conv-LSTM layer receives the corresponding multi-scale fusion features and the output of the previous Conv-LSTM module as input.

进一步地，所述对不同尺度下的卷积网络的特征用双线性插值和卷积的方式进行融合和降维，具体为：Furthermore, the features of the convolutional networks at different scales are fused and reduced in dimension by bilinear interpolation and convolution, specifically:

根据输入图片大小将卷积网络分为三个尺度，尺度从大到小分别为：原始输入图片大小、原始输入图片大小0.8倍、原始输入图片大小的0.6倍；The convolutional network is divided into three scales according to the input image size, from large to small: the original input image size, 0.8 times the original input image size, and 0.6 times the original input image size;

对三个尺度卷积网络Conv模块输出的深度特征分别进行卷积，使用1*1*64的卷积核，对深度特征进行降维；再对两个小尺度卷积网络提取的深度特征进行双线性插值上采样，然后将它们与最大尺度卷积网络提取的深度特征联合，得到联合特征；The deep features output by the three-scale convolutional network Conv modules are convolved separately, and the dimensionality of the deep features is reduced using a 1*1*64 convolution kernel; the deep features extracted by the two small-scale convolutional networks are then upsampled using bilinear interpolation, and then combined with the deep features extracted by the largest-scale convolutional network to obtain the joint features;

最后使用3*3*64的卷积核和对联合特征进行卷积操作，实现不同尺度的深度特征融合和通道降维，得到Conv模块的多尺度融合特征。Finally, a 3*3*64 convolution kernel is used to perform convolution operations on the joint features to achieve deep feature fusion and channel dimensionality reduction of different scales, and obtain the multi-scale fusion features of the Conv module.

其中，所述方法还包括：Wherein, the method further comprises:

使用高斯模糊处理的图片进行预训练，训练时的数据增强方法包括：水平、垂直和水平垂直翻转；以图片中心随机旋转若干个角度。Use Gaussian blurred images for pre-training. The data enhancement methods during training include: horizontal, vertical, and horizontal-vertical flipping; and random rotation of several angles around the center of the image.

进一步地，所述方法还包括：Furthermore, the method further comprises:

以Tensorflow为后端的Keras深度学习框架，先使用预训练数据训练网络基本架构，预训练之前，每个尺度的卷积网络，都加载其在ImageNet上训练的权重；The Keras deep learning framework with Tensorflow as the backend first uses pre-training data to train the basic network architecture. Before pre-training, the convolutional network of each scale is loaded with its weights trained on ImageNet;

再使用增强后的数据训练网络，可在相应的数据集上得到一个训练好的网络模型；Then use the enhanced data to train the network, and a trained network model can be obtained on the corresponding data set;

使用该网络模型，输入模糊图像，模型输出模糊检测结果，流程结束。Use this network model, input a blurred image, the model outputs the blur detection result, and the process ends.

本发明提供的技术方案的有益效果是：The beneficial effects of the technical solution provided by the present invention are:

1、本方法创新性的利用Conv-LSTM的序列性质，联合图像特征和模糊特征作为Conv-LSTM的输入，对多尺度特征和模糊检测结果进行改善，实验结果表明，本方法性能表现优于目前的深度学习方法；1. This method innovatively uses the sequence properties of Conv-LSTM, combines image features and blur features as the input of Conv-LSTM, and improves the multi-scale features and blur detection results. Experimental results show that the performance of this method is better than the current deep learning method;

2、本发明采用多尺度卷积神经网络结合Conv-LSTM改善检测结果进行模糊检测，在基准数据集上展示了优异的性能；2. This paper uses a multi-scale convolutional neural network combined with Conv-LSTM to improve the detection results for blur detection, and shows excellent performance on the benchmark dataset;

3、本发明对经过卷积降维后的多尺度特征，使用双线性插值加卷积的方式进行特征融合，利用不同尺度的特征，充分考虑了模糊的尺度敏感性。3. The present invention uses bilinear interpolation plus convolution to fuse the multi-scale features after convolution dimensionality reduction, utilizes features of different scales, and fully considers the scale sensitivity of blur.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明提出的网络结构示意图；FIG1 is a schematic diagram of a network structure proposed by the present invention;

图2为本发明提出方法和其他方法在公共数据集CHUK上的检测结果示意图；FIG2 is a schematic diagram of the detection results of the method proposed in the present invention and other methods on the public dataset CHUK;

图3为本发明提出方法和其他方法在公共数据集DUT上的检测结果示意图。FIG3 is a schematic diagram of the detection results of the method proposed in the present invention and other methods on the public data set DUT.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚，下面对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention more clear, the embodiments of the present invention are described in further detail below.

实施例1Example 1

一种基于Conv-LSTM多尺度特征融合的模糊检测方法，参见图1，该方法包括以下步骤：A blur detection method based on Conv-LSTM multi-scale feature fusion, as shown in FIG1, comprises the following steps:

一、网络基本架构1. Basic network architecture

参见图1，本发明实施例中的网络基本架构主要分为两个子网络。一个是多尺度的卷积网络用于特征提取(图1左边三列)，另一个是特征融合和改善网络即Conv-LSTM子网络(图1右边一列)。Referring to Figure 1, the basic network architecture in the embodiment of the present invention is mainly divided into two sub-networks. One is a multi-scale convolutional network for feature extraction (the three columns on the left of Figure 1), and the other is a feature fusion and improvement network, namely the Conv-LSTM sub-network (the column on the right of Figure 1).

其中，三个尺度的卷积网络(即多尺度的卷积网络)基本结构和VGG16网络结构相同(其中，VGG16网络结构主要包括：5个卷积层模块Conv 1-Conv 5、和两个全连层模块FC6、FC7，该VGG16网络结构为本领域技术人员所公知，本发明实施例对此不做赘述)，本发明将VGG 16的最后一个池化层(pool 5)和全连接层FC6、FC7去除。三个不同尺度的卷积网络提取不同尺度下图像的特征，充分考虑了模糊的尺度敏感性。Among them, the basic structure of the convolutional network of three scales (i.e., multi-scale convolutional network) is the same as the VGG16 network structure (wherein, the VGG16 network structure mainly includes: 5 convolutional layer modules Conv 1-Conv 5, and two fully connected layer modules FC6, FC7, the VGG16 network structure is well known to those skilled in the art, and the embodiment of the present invention will not be described in detail), and the present invention removes the last pooling layer (pool 5) and the fully connected layers FC6, FC7 of VGG 16. The convolutional networks of three different scales extract the features of images at different scales, and fully consider the scale sensitivity of blur.

另外，在多尺度卷积网络进行特征融合之前，需要对两个小尺度卷积网络提取的特征进行插值和降维操作。网络中Conv-LSTM子网络由5个Conv-LSTM层组成，分别与多尺度卷积网络5个Conv模块提取的多尺度特征对应。Conv-LSTM模块接收对应的多尺度融合特征和上一个Conv-LSTM模块输出的模糊特征作为输入。利用Conv-LSTM的序列性质对结果进行改善。In addition, before the multi-scale convolutional network performs feature fusion, the features extracted by the two small-scale convolutional networks need to be interpolated and reduced in dimension. The Conv-LSTM subnetwork in the network consists of 5 Conv-LSTM layers, which correspond to the multi-scale features extracted by the 5 Conv modules of the multi-scale convolutional network. The Conv-LSTM module receives the corresponding multi-scale fusion features and the fuzzy features output by the previous Conv-LSTM module as input. The sequence properties of Conv-LSTM are used to improve the results.

二、不同尺度下卷积网络的特征用双线性插值和卷积的方式进行融合和降维2. The features of convolutional networks at different scales are fused and reduced in dimension using bilinear interpolation and convolution

不同尺度下卷积网络输出的深度特征大小不同，在将其作为下层网络的输入时，必须对大小不一的深度特征进行统一。本方法采用双线性插值的方式，对另外两个小尺度卷积网络的深度特征进行上采样，详细的实施方式为：The depth features of the convolutional networks at different scales have different sizes. When using them as the input of the lower network, the depth features of different sizes must be unified. This method uses bilinear interpolation to upsample the depth features of the other two small-scale convolutional networks. The detailed implementation method is as follows:

首先对三个尺度卷积网络Conv模块输出的深度特征分别进行卷积，使用1*1*64的卷积核，对深度特征进行降维；然后再对两个小尺度卷积网络提取的深度特征，进行双线性插值上采样，并联合三个尺度卷积网络的深度特征，得到联合深度特征；最后使用3*3*64的卷积核对联合深度特征进行卷积操作，实现不同尺度的深度特征融合和通道降维，得到相同深度的Conv模块的多尺度融合特征。需要特别说明的是，对Conv 1模块输出的特征使用的是32通道的卷积核(见图1多尺度特征融合部分)。First, the deep features output by the three-scale convolutional network Conv modules are convolved separately, and the dimensionality of the deep features is reduced using a 1*1*64 convolution kernel; then the deep features extracted by the two small-scale convolutional networks are upsampled by bilinear interpolation, and the deep features of the three-scale convolutional networks are combined to obtain the joint deep features; finally, the joint deep features are convolved using a 3*3*64 convolution kernel to achieve the fusion of deep features of different scales and channel dimensionality reduction, and obtain the multi-scale fusion features of the Conv module of the same depth. It should be noted that the features output by theConv 1 module use a 32-channel convolution kernel (see the multi-scale feature fusion part in Figure 1).

三、使用Conv-LSTM对模糊特征进行改善3. Use Conv-LSTM to improve fuzzy features

Conv-LSTM的输入为多尺度深度特征和上一个Conv-LSTM模块输出的模糊特征，特别地，Conv-LSTM第5个模块输入只包含多尺度的深度特征。利用Conv-LSTM的序列性质，对模糊特征进行改善。The input of Conv-LSTM is multi-scale deep features and the fuzzy features output by the previous Conv-LSTM module. In particular, the input of the fifth Conv-LSTM module only contains multi-scale deep features. The fuzzy features are improved by using the sequence properties of Conv-LSTM.

Conv-LSTM的工作过程可以总结为：根据输入i、输出o和遗忘f三个门的值顺序更新隐含状态

和细胞状态

The working process of Conv-LSTM can be summarized as follows: update the implicit state according to the values of the three gates: input i, output o and forget f.

and cell status

其中，

表示卷积网络的深度特征和经过改善的模糊特征；*表示卷积操作；°表示Hardmard积，σ和tanh分别表示Sigmoid和双曲正切激活函数，t表示时刻，W为权重，b为偏置，c为细胞状态的参数下标。in,

Represents the deep features and improved blur features of the convolutional network; * represents the convolution operation; ° represents the Hardmard product, σ and tanh represent the Sigmoid and hyperbolic tangent activation functions respectively, t represents the time, W represents the weight, b represents the bias, and c represents the parameter subscript of the cell state.

本方法中t为3，且Conv-LSTM所有时序的输入相同，即

In this method, t is 3, and the input of all time series of Conv-LSTM is the same, that is,

四、多层损失函数计算策略4. Multi-layer loss function calculation strategy

本发明分别对Conv-LSTM子网络的五个Conv-LSTM层的输出计算损失，最终损失函数值为这5个层的损失函数值之和。损失函数共由5个部分组成，公式如下：The present invention calculates the loss of the output of the five Conv-LSTM layers of the Conv-LSTM subnetwork respectively, and the final loss function value is the sum of the loss function values of these five layers. The loss function consists of five parts, and the formula is as follows:

其中，B表示输出，G表示标签；α₁＝α₂＝α₃＝α₄＝0.1；

为交叉熵，是本网络损失函数的主要成分，其具体计算方法如下：Where B represents output, G represents label; α₁ = α₂ = α₃ = α₄ = 0.1;

is the cross entropy, which is the main component of the network loss function. Its specific calculation method is as follows:

其中，b_x∈B,g_x∈G，都为像素值；W表示输入图片宽度，H表示输入图片高度，以像素数量计，∈为正则化常量(本网络中为1e-8)。Among them, b_x ∈B, g_x ∈G are both pixel values; W represents the width of the input image, H represents the height of the input image, in number of pixels, and ∈ is the regularization constant (1e-8 in this network).

和

计算方式分别类似于F-Measure、准确率(Precision)召回率(Recall)。因为这三个指标值越大表示模型效果越好，所以本发明取其相反数作为损失函数以最小化损失，具体计算如下：

and

The calculation methods are similar to F-Measure, Precision, and Recall. Since the larger the values of these three indicators are, the better the model effect is, the present invention takes their opposites as the loss function to minimize the loss. The specific calculation is as follows:

计算方法与平均绝对误差相同，能够体现网络输出和标签的相似度，具体计算方法如下：

The calculation method is the same as the mean absolute error, which can reflect the similarity between the network output and the label. The specific calculation method is as follows:

五、模拟数据进行预训练和扩展训练数据5. Simulate data for pre-training and expand training data

为了使模型拟合能力更强，本方法首先使用高斯模糊处理的图片对模型进行预训练。在预训练完成后，再对模型进行训练。其中扩展训练数据的方法包括：水平、垂直和水平垂直翻转；以图片中心随机旋转11个角度。In order to make the model more capable of fitting, this method first uses Gaussian blurred images to pre-train the model. After the pre-training is completed, the model is trained. The methods for expanding the training data include: horizontal, vertical, and horizontal and vertical flipping; and randomly rotating the image by 11 angles around the center.

六、网络训练和测试6. Network training and testing

网络训练和测试基于以Tensorflow为后端的Keras深度学习框架(为本领域技术人员所公知，本发明实施例对此不做赘述)，先使用第五步中预训练数据训练第一步至第四步所提出的网络，预训练开始之前，三个尺度的特征提取卷积网络(即经改造的VGG 16)，都加载其在ImageNet上训练的权重；再使用增强后的数据训练网络。可以在相应的数据集上得到一个训练好的网络模型。使用该网络模型，输入模糊图像，模型输出模糊检测结果B，流程结束。The network training and testing are based on the Keras deep learning framework with Tensorflow as the backend (known to those skilled in the art, and will not be described in detail in the present embodiment). The network proposed in the first to fourth steps is first trained using the pre-training data in the fifth step. Before the pre-training begins, the feature extraction convolutional networks of three scales (i.e., the modified VGG 16) are loaded with their weights trained on ImageNet; then the network is trained using the enhanced data. A trained network model can be obtained on the corresponding data set. Using the network model, a blurred image is input, and the model outputs a blur detection result B, and the process ends.

综上所述，本发明实施例用多尺度卷积网络提取不同尺度图像的深度特征；使用双线性插值的方式对深度特征进行缩放，再使用卷积操作进行融合。利用Conv-LSTM的序列性质对模糊特征进行改善，并对每个Conv-LSTM层的输出计算损失。最后得到输出结果，流程结束。In summary, the embodiment of the present invention uses a multi-scale convolutional network to extract the depth features of images of different scales; uses bilinear interpolation to scale the depth features, and then uses convolution operations to fuse them. The sequence properties of Conv-LSTM are used to improve the fuzzy features, and the loss is calculated for the output of each Conv-LSTM layer. Finally, the output result is obtained and the process ends.

实施例2Example 2

下面结合图1、具体实例对实施例1中的方案进行进一步地介绍，详见下文描述：The scheme in Example 1 is further introduced below in conjunction with FIG. 1 and a specific example, as described below for details:

本发明实施例在进行网络设计时，需要考虑如何有效利用卷积神经网络的不同尺度的特征，解决模糊对尺度敏感的问题。When designing a network, the embodiments of the present invention need to consider how to effectively utilize the features of different scales of the convolutional neural network to solve the problem that blur is sensitive to scale.

具体来讲，3个不同尺度下卷积网络输出的深度特征大小不同，在将其作为Conv-LSTM的输入时，必须对大小不一的深度特征进行统一。本方法采用双线性插值的方式，对另外两个小尺度卷积网络的深度特征进行上采样。Specifically, the depth features of the convolutional networks at three different scales have different sizes. When using them as the input of Conv-LSTM, the depth features of different sizes must be unified. This method uses bilinear interpolation to upsample the depth features of the other two small-scale convolutional networks.

详细地，取多尺度卷积网络中，由VGG 16改造而来的一个尺度卷积网络举例。首先对VGG 16Conv 4模块的输出(将Conv模块最后一个卷积层作为特征输出层，即此处为conv4_3)进行卷积，使用1*1*64的卷积核，对深度特征通道进行降维；然后再对两个小尺度的深度特征进行双线性插值上采样，将两个小尺度的特征大小放大到与原始尺度一致；再将三个尺度卷积网络的深度特征进行联合；最后使用3*3*64的卷积核和对联合的深度特征进行卷积操作，融合不同尺度的深度特征，并再次对联合的深度特征实现了通道降维，得到了多尺度融合特征。此融合特征与上一个Conv-LSTM输出的模糊特征联合，作为下一个Conv-LSTM的输入(上述计算方法同样被用到Conv 3至Conv 1模块)。需要特别说明的是Conv-LSTM 5的输入只有Conv 5模块输出的深度特征，不包含其他输入。In detail, we take a scale convolutional network transformed from VGG 16 as an example in the multi-scale convolutional network. First, convolve the output of theVGG 16Conv 4 module (take the last convolutional layer of the Conv module as the feature output layer, that is, conv4_3 here), use a 1*1*64 convolution kernel to reduce the dimension of the deep feature channel; then perform bilinear interpolation upsampling on the two small-scale deep features, and enlarge the feature size of the two small scales to be consistent with the original scale; then combine the deep features of the three-scale convolutional networks; finally, use a 3*3*64 convolution kernel and convolve the combined deep features to fuse the deep features of different scales, and again implement channel dimension reduction on the combined deep features to obtain multi-scale fusion features. This fusion feature is combined with the fuzzy feature output of the previous Conv-LSTM as the input of the next Conv-LSTM (the above calculation method is also used inConv 3 toConv 1 modules). It should be noted that the input of Conv-LSTM 5 only includes the deep features output by theConv 5 module and does not include other inputs.

本发明实施例采用Conv-LSTM对模糊特征进行改善，充分利用了Conv-LSTM的序列性质，在模糊检测问题上实现了很高的准确度。The embodiment of the present invention adopts Conv-LSTM to improve the fuzzy features, fully utilizes the sequence property of Conv-LSTM, and achieves high accuracy in the fuzzy detection problem.

本发明实施例使用多层损失函数作为监督信号，共同训练网络，充分利用了Conv-LSTM的模糊特征信息，提升了网络的性能和鲁棒性。并使用了由交叉熵、准确率、召回率、F-Measure和MAE组成的多样化的损失函数，保证网络性能达到最优。The embodiment of the present invention uses a multi-layer loss function as a supervisory signal to jointly train the network, making full use of the fuzzy feature information of Conv-LSTM to improve the performance and robustness of the network. It also uses a variety of loss functions consisting of cross entropy, accuracy, recall, F-Measure and MAE to ensure that the network performance is optimal.

本发明实施例将用于图像分类的VGG16和用于短时降水预测的Conv-LSTM作为基础架构，移除了VGG16最后一个池化层(pool_5)和原FC6、FC7全连接层，并添加了新的卷积层，包括用于降维的卷积层和用于特征融合的卷积层。本方法使用Xavier方式初始化新增的卷积层和Conv-LSTM子网络的参数，对于VGG 16子网络，采用在ImageNet上预训练好的模型参数初始化。The embodiment of the present invention uses VGG16 for image classification and Conv-LSTM for short-term precipitation prediction as the basic architecture, removes the last pooling layer (pool_5) of VGG16 and the original FC6 and FC7 fully connected layers, and adds new convolutional layers, including convolutional layers for dimensionality reduction and convolutional layers for feature fusion. This method uses the Xavier method to initialize the parameters of the newly added convolutional layer and Conv-LSTM subnetwork. For the VGG 16 subnetwork, the model parameters pre-trained on ImageNet are used for initialization.

本发明实施例针对模糊数据的特点进行了数据扩展的方法，包括模糊图像的合成和训练集的增强。The embodiment of the present invention provides a method for data expansion based on the characteristics of fuzzy data, including synthesis of fuzzy images and enhancement of training sets.

在合成模糊图像时，具体采用的办法是：分别对一张图片的上下左右四个半部分，进行高斯模糊处理(kernel_size＝7*7,sigma＝2)。每张图迭代处理5遍，得到20张合成的模糊图和对应的标签。在对训练集进行增强时，以图片对角线交点为中心旋转9个随机角度，再加90、180和270度三个角度(保证不在9个随机角度中)，再加上水平、垂直和水平垂直翻转。数据集增强为原来的15倍，标签处理方式和图片相同。When synthesizing blurred images, the specific method used is: Gaussian blur processing (kernel_size=7*7, sigma=2) is performed on the four halves of a picture, the top, bottom, left and right. Each picture is iterated 5 times to obtain 20 synthesized blurred pictures and corresponding labels. When enhancing the training set, rotate 9 random angles with the intersection of the diagonal lines of the picture as the center, add 90, 180 and 270 degrees (make sure they are not in the 9 random angles), and add horizontal, vertical and horizontal vertical flips. The data set is enhanced to 15 times the original size, and the label processing method is the same as the picture.

本发明实施例在训练模型时使用了CHUK和DUT共2个数据集，根据实际需要，通过上述的数据增强方法对这两个数据集进行了增强。在预训练模型时，合成模糊图片使用了UCID、BSDS500和PASCAL VOC 2008，3个数据集中的2000张图片。The embodiment of the present invention uses two datasets, CHUK and DUT, when training the model. According to actual needs, the two datasets are enhanced by the above data enhancement method. When pre-training the model, 2000 images from three datasets, UCID, BSDS500 and PASCAL VOC 2008, are used to synthesize blurred images.

实施例3Example 3

下面结合图2-图3对实施例1和2中的方案进行可行性验证，详见下文描述：The feasibility of the schemes in Examples 1 and 2 is verified below in conjunction with Figures 2-3, as described below:

根据图1所示网络结构搭建本发明实施例中的网络，首先合成模糊数据对网络进行预训练，再对训练集进行扩展，产生增强后的训练数据集，进行网络训练。The network in the embodiment of the present invention is built according to the network structure shown in FIG1 . First, fuzzy data is synthesized to pre-train the network, and then the training set is expanded to generate an enhanced training data set for network training.

从图2、3中，可以看出本发明实施例得到的模糊检测结果明显优于其它模糊检测结果。图2、3中，第一列为输入图像，第二列为标签，第三列为本发明实例的模糊检测结果。其余为基于其它模糊检测算法产生的检测结果。从图2、3的实验对比结果可以看出，相对于现有其它方法的模糊检测结果，本发明实施例提出的方法得到的模糊检测结果更加准确，模糊检测结果对比度更高，边界更加平滑，细节部分检测效果较好。其中图2为CHUK数据集上的检测结果，图3为DUT数据集上的检测结果。From Figures 2 and 3, it can be seen that the blur detection result obtained by the embodiment of the present invention is significantly better than other blur detection results. In Figures 2 and 3, the first column is the input image, the second column is the label, and the third column is the blur detection result of the example of the present invention. The rest are detection results generated based on other blur detection algorithms. From the experimental comparison results of Figures 2 and 3, it can be seen that compared with the blur detection results of other existing methods, the blur detection result obtained by the method proposed in the embodiment of the present invention is more accurate, the blur detection result has a higher contrast, the boundary is smoother, and the detection effect of the details is better. Figure 2 shows the detection results on the CHUK dataset, and Figure 3 shows the detection results on the DUT dataset.

本领域技术人员可以理解附图只是一个优选实施例的示意图，上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the serial numbers of the embodiments of the present invention are only for description and do not represent the advantages and disadvantages of the embodiments.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于Conv-LSTM多尺度特征融合的模糊检测方法，其特征在于，所述方法包括：1. A fuzzy detection method based on Conv-LSTM multi-scale feature fusion, characterized in that the method comprises:

2.根据权利要求1所述的一种基于Conv-LSTM多尺度特征融合的模糊检测方法，其特征在于，2. According to claim 1, a fuzzy detection method based on Conv-LSTM multi-scale feature fusion is characterized in that:

所述Conv-LSTM子网络由5个Conv-LSTM层组成，分别与多尺度卷积网络5个Conv模块对应；Conv-LSTM层接收对应的多尺度融合特征和上一个Conv-LSTM模块的输出作为输入。The Conv-LSTM subnetwork consists of 5 Conv-LSTM layers, which correspond to the 5 Conv modules of the multi-scale convolutional network respectively; the Conv-LSTM layer receives the corresponding multi-scale fusion features and the output of the previous Conv-LSTM module as input.

3.根据权利要求1所述的一种基于Conv-LSTM多尺度特征融合的模糊检测方法，其特征在于，所述对不同尺度下的卷积网络的特征用双线性插值和卷积的方式进行融合和降维，具体为：3. According to a fuzzy detection method based on Conv-LSTM multi-scale feature fusion according to claim 1, it is characterized in that the features of the convolutional networks at different scales are fused and reduced in dimension by bilinear interpolation and convolution, specifically:

4.根据权利要求1所述的一种基于Conv-LSTM多尺度特征融合的模糊检测方法，其特征在于，所述方法还包括：4. The fuzzy detection method based on Conv-LSTM multi-scale feature fusion according to claim 1, characterized in that the method further comprises:

5.根据权利要求1所述的一种基于Conv-LSTM多尺度特征融合的模糊检测方法，其特征在于，所述方法还包括：5. The fuzzy detection method based on Conv-LSTM multi-scale feature fusion according to claim 1, characterized in that the method further comprises:

以Tensorflow为后端的Keras深度学习框架，先使用预训练数据训练网络基本架构，预训练之前，每个尺度的卷积网络都加载其在ImageNet上训练的权重；The Keras deep learning framework with Tensorflow as the backend first uses pre-training data to train the basic network architecture. Before pre-training, the convolutional network of each scale is loaded with its weights trained on ImageNet;