技术领域technical field
本发明属于图像处理领域,涉及一种RGB-T图像显著目标检测方法,具体涉及一种多级深度特征融合的RGB-T图像显著性目标检测方法,可用于计算机视觉中图像的预处理进程。The invention belongs to the field of image processing, and relates to a method for detecting salient objects in RGB-T images, in particular to a method for detecting salient objects in RGB-T images with multi-level depth feature fusion, which can be used for image preprocessing in computer vision.
背景技术Background technique
显著性目标检测旨在利用模型或算法检测和分割出图像中的显著性目标区。作为图像的预处理步骤,显著性目标检测在视觉跟踪、图像识别、图像压缩、图像融合等视觉任务中起着至关重要的作用。Salient object detection aims to use models or algorithms to detect and segment salient object areas in images. As an image preprocessing step, salient object detection plays a vital role in vision tasks such as visual tracking, image recognition, image compression, and image fusion.
现有的目标检测方法可以分为两大类:一类是基于传统的显著性目标检测方法,另一类是基于深度学习的显著性目标检测方法。基于传统的显著性目标检测算法通过手工提取的颜色、纹理、方向等特征完成显著性预测,过度依赖于人工选取的特征,对场景适应性不强,在复杂数据集上表现不佳。随着深度学习的广泛应用,基于深度学习的显著性目标检测研究取得了突破性进展,相较于传统的显著性算法,检测性能显著提高。Existing target detection methods can be divided into two categories: one is based on traditional salient target detection methods, and the other is salient target detection methods based on deep learning. Based on the traditional salient target detection algorithm, the saliency prediction is completed through manually extracted features such as color, texture, and direction, which rely too much on manually selected features, are not adaptable to the scene, and perform poorly on complex data sets. With the wide application of deep learning, breakthroughs have been made in the research of salient object detection based on deep learning. Compared with traditional salient algorithms, the detection performance has been significantly improved.
大多数的显著目标检测方法如“Q.Hou,M.M.Cheng,X.Hu,et al.Deeplysupervised salient object detection with short connections.IEEE Transactionson Pattern Analysis and Machine Intelligence,2019,41(4):815–828.”仅通过单一模态的RGB图像计算显著值,获取的场景信息有限,在低光照、低对比度、复杂背景等挑战性场景下,难以完整一致地检测出显著目标。Most salient object detection methods such as "Q.Hou, M.M.Cheng, X.Hu, et al.Deeplysupervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4):815–828. "Calculation of saliency values only through single-mode RGB images can obtain limited scene information. In challenging scenes such as low light, low contrast, and complex backgrounds, it is difficult to detect salient objects completely and consistently.
为解决上述问题,一些基于RGB-T图像的显著目标检测方法被提出,如“Li C,WangG,Ma Y,et al.A Unified RGB-T Saliency Detection Benchmark:Dataset,Baselines,Analysis and A Novel Approach.arXiv preprint arXiv:1701.02829,2017.”,公开了一种基于流行排序模型的RGB-T图像显著性目标检测方法,该种方法利用RGB和热红外图像的补充信息,构建跨模态一致性的流型排序模型,结合两阶段图的方法计算各个节点的显著值。在低光照、低对比度的情况下,较以RGB为输入的显著性目标检测方法,能更为准确地检测显著性目标。To solve the above problems, some salient object detection methods based on RGB-T images have been proposed, such as "Li C, WangG, Ma Y, et al. A Unified RGB-T Saliency Detection Benchmark: Dataset, Baselines, Analysis and A Novel Approach .arXiv preprint arXiv:1701.02829,2017.", discloses a RGB-T image saliency target detection method based on the popular ranking model, which uses the supplementary information of RGB and thermal infrared images to construct a cross-modal consistency Streamline sorting model, combined with the method of two-stage graph to calculate the saliency value of each node. In the case of low light and low contrast, it can detect salient objects more accurately than salient object detection methods that use RGB as input.
然而,这种方法以区域块为基本单位进行检测,显著图中出现明显的块效应,目标与背景的分割边界不准确,且目标内部不均一。此外,该方法基于人工提取特征而建立,选取的特征并不能完全表达不同图像的内在特性,对不同模态图像间补充信息的利用尚不充分,检测效果提升有限。However, this method uses the area block as the basic unit for detection, and obvious block effects appear in the saliency map, the segmentation boundary between the object and the background is not accurate, and the interior of the object is not uniform. In addition, this method is based on artificially extracted features, the selected features cannot fully express the intrinsic characteristics of different images, the use of supplementary information between images of different modalities is not sufficient, and the detection effect is limited.
发明内容Contents of the invention
发明目的:针对上述现有技术的不足,本发明的目的在于提出一种基于多级深度特征融合的RGB-T图像显著性目标检测方法,以提高在复杂多变场景图像中对显著目标检测的完整一致性效果。主要解决现有技术在复杂多变场景中不能完整一致地检测出显著目标的问题。Purpose of the invention: In view of the deficiencies in the prior art above, the purpose of the present invention is to propose a method for detecting salient objects in RGB-T images based on multi-level depth feature fusion, so as to improve the detection accuracy of salient objects in complex and changeable scene images. Full consistency effect. It mainly solves the problem that the existing technology cannot detect salient objects completely and consistently in complex and changeable scenes.
实现本发明的关键是RGB-T图像多级深度特征提取和融合:通过对RGB和热红外图像提取的多级单模态特征进行融合,预测显著性:对RGB或热红外图像,从支柱网络的不同深度提取粗糙的多级特征;构建邻近深度特征融合模块,提取改善的多级单模态特征;构建多分支组融合模块,对不同模态特征进行融合;得到融合输出特征图;训练网络得到模型参数;预测RGB-T图像的像素级显著图。The key to realizing the present invention is multi-level depth feature extraction and fusion of RGB-T images: by fusing the multi-level single-modal features extracted from RGB and thermal infrared images, predicting salience: for RGB or thermal infrared images, from the pillar network Extract rough multi-level features at different depths; build a neighboring depth feature fusion module to extract improved multi-level single-modal features; build a multi-branch group fusion module to fuse different modal features; obtain the fusion output feature map; train the network Get the model parameters; predict the pixel-level saliency map of the RGB-T image.
技术方案:多级深度特征融合的RGB-T图像显著性目标检测方法,包括如下步骤:Technical solution: RGB-T image saliency target detection method for multi-level deep feature fusion, including the following steps:
(1)对输入图像提取粗糙的多级特征:(1) Extract rough multi-level features from the input image:
对图像提取基础网络中位于不同深度的5级特征作为粗糙的单模态特征;Extract the 5-level features at different depths in the image extraction base network as rough single-modal features;
(2)构建邻近深度特征融合模块,改善单模态特征:(2) Build a neighboring deep feature fusion module to improve single-modal features:
建立多个邻近深度特征融合模块,然后通过该邻近深度特征融合模块将步骤(1)得到的5级粗糙的单模态特征处理,将来自邻近深度的3级特征进行融合,得到改善的3级单模态特征;Establish multiple adjacent depth feature fusion modules, and then process the 5-level rough single-modal feature obtained in step (1) through the adjacent depth feature fusion module, and fuse the 3-level features from the adjacent depth to obtain an improved 3-level unimodal features;
(3)构建多分支组融合模块,融合多模态特征:(3) Build a multi-branch group fusion module to fuse multi-modal features:
构建包含两个融合分支的多分支组融合模块,对步骤(2)得到的改善的3级单模态特征中,位于同一特征级下的不同单模态特征进行融合,得到融合的多模态特征;Construct a multi-branch group fusion module containing two fusion branches, and fuse different single-modal features under the same feature level among the improved three-level single-modal features obtained in step (2) to obtain a fused multi-modal feature;
(4)得到融合输出特征图:(4) Get the fusion output feature map:
将步骤(3)得到的融合的多模态特征的不同级特征逐级反向融合,得到多个边输出特征图,并将所有边输出特征图融合,得到融合输出特征图;The different levels of features of the multimodal features of the fusion obtained in step (3) are reversely fused step by step to obtain a plurality of edge output feature maps, and all edge output feature maps are fused to obtain a fusion output feature map;
(5)训练算法网络:(5) Training algorithm network:
在训练数据集上,对步骤(4)中得到的边输出特征图和融合输出特征图,采用深度监督学习机制,通过最小化交叉熵损失函数,完成算法网络训练,得到网络模型参数;On the training data set, for the edge output feature map and fusion output feature map obtained in step (4), adopt the deep supervision learning mechanism, and complete the algorithm network training by minimizing the cross entropy loss function, and obtain the network model parameters;
(6)预测RGB-T图像的像素级显著图:(6) Predict the pixel-level saliency map of the RGB-T image:
在测试数据集上,利用步骤(5)得到的网络模型参数,对步骤 (4)中得到的边输出特征图和融合输出特征图,通过sigmoid分类计算,预测RGB-T图像的像素级显著图。On the test data set, use the network model parameters obtained in step (5) to predict the pixel-level saliency map of the RGB-T image through sigmoid classification calculation for the edge output feature map and fusion output feature map obtained in step (4). .
进一步地,步骤(1)中所述的图形为RGB图像或热红外图像。Further, the graphics described in step (1) are RGB images or thermal infrared images.
进一步地,步骤(1)中的基础网络为VGG16网络。Further, the basic network in step (1) is a VGG16 network.
更进一步地,步骤(2)中所述的构建邻近深度特征融合模块,包括以下步骤:Furthermore, the construction of the adjacent depth feature fusion module described in step (2) includes the following steps:
(21)将步骤(1)得到的5级粗糙的单模态特征分别用符号表示,其中,n=1或者2,分别代表RGB图像或热红外图像;(21) The five-level rough single-mode features obtained in step (1) are respectively represented by Represent, wherein, n=1 or 2, represent RGB image or thermal infrared image respectively;
(22)每一个邻近深度融合模块包含3个卷积操作和1个反卷积操作,以获得第d级单模态特征,d=1,2,3。(22) Each adjacent depth fusion module contains 3 convolution operations and 1 deconvolution operation to obtain d-level single-modal features, d=1,2,3.
更更进一步地,步骤(22)包括:Further, step (22) includes:
(221)将一个卷积核为3×3,步长为2,参数为的卷积操作一个卷积核为1×1,步长为1,参数为的卷积操作和一个卷积核为2×2,步长为1/2,参数为的反卷积操作分别作用于和(221) A convolution kernel is 3×3, the step size is 2, and the parameter is The convolution operation A convolution kernel is 1×1, the step size is 1, and the parameters are The convolution operation And a convolution kernel is 2×2, the step size is 1/2, and the parameter is The deconvolution operation of respectively act on and
(222)将这3级特征级联,并通过一个卷积核为1×1,步长为 1,参数为的卷积操作得到128通道的第d级单模态特征邻近深度融合模块可表示如下:(222) The 3-level features are cascaded, and a convolution kernel is 1×1, the step size is 1, and the parameter is The convolution operation Get 128-channel d-level unimodal features The neighborhood depth fusion module can be expressed as follows:
其中:in:
Cat(·)表示跨通道级联操作;Cat( ) means cross-channel cascade operation;
φ(·)是一个ReLu激活函数。φ( ) is a ReLu activation function.
进一步地,步骤(3)中的多分支组融合模块是针对同一特征级下的不同单模态进行融合,且包括两个融合分支:多组融合分支和单组融合分支,其中:Further, the multi-branch group fusion module in step (3) is for fusion of different single modes under the same feature level, and includes two fusion branches: multi-group fusion branch and single-group fusion branch, wherein:
多组融合分支有8个组,做单组融合分支只有一个组;There are 8 groups for multi-group fusion branches, and only one group for single-group fusion branches;
每个融合分支输出64通道的特征,将两个融合分支输出特征进行级联,得到128通道的多模态特征。Each fusion branch outputs 64-channel features, and the two fusion branch output features are cascaded to obtain 128-channel multimodal features.
更进一步地,步骤(3)中所述的构建多分支组融合模块,在多组融合分支中对同一特征级下的不同单模态进行融合,得到融合的多模态特征,包括以下步骤:Furthermore, the multi-branch group fusion module described in step (3) is constructed to fuse different single modes under the same feature level in multiple groups of fusion branches to obtain the multi-modal features of fusion, including the following steps:
(31)输入的单模态特征和分别根据通道数量被切分成M 个通道数相同的小组,得到和两个特征集,其中:(31) Input unimodal features and According to the number of channels, they are divided into M groups with the same number of channels, and get and Two feature sets, where:
M正整数,其取值范围是2≤M≤128;M is a positive integer, and its value range is 2≤M≤128;
(32)紧接着,将同级的两个特征集中来自第m个小组的对应RGB和热红外特征通过级联操作进行结合,继而通过通道数为64/M的 1×1的卷积和通道数为64/M的3×3的两个堆栈的卷积操作,实现小组内跨模态特征的融合,每个卷积操作之后都紧随着一个ReLu激活函数;(32) Immediately afterwards, the corresponding RGB and thermal infrared features from the mth group in the two feature sets of the same level are combined through a cascade operation, and then through a 1×1 convolution and channel with a channel number of 64/M The convolution operation of two 3×3 stacks with a number of 64/M realizes the fusion of cross-modal features in the group, and each convolution operation is followed by a ReLu activation function;
(33)M个小组输出被级联在一起,得到多组融合分支的输出特征H1,d,其表达式为:(33) The outputs of M subgroups are cascaded together to obtain the output features H1,d of multiple groups of fusion branches, the expression of which is:
其中:in:
表示上述中带ReLu激活函数的堆栈卷积操作, Represents the stack convolution operation with the ReLu activation function in the above,
代表第m个小组的融合参数。 represents the fusion parameters for the mth subgroup.
更更进一步地,步骤(3)中所述的构建多分支组融合模块,在单组融合分支中对同一特征级下的不同单模态进行融合,得到融合的多模态特征,包括以下步骤:Furthermore, the construction of the multi-branch group fusion module described in step (3) is to fuse different single modes under the same feature level in the single group fusion branch to obtain the multi-modal features of fusion, including the following steps :
(3a)单组融合分支可看作是多组融合分支中M=1时的特殊情况,表达式为:(3a) The single-group fusion branch can be regarded as a special case when M=1 in the multi-group fusion branch, and the expression is:
其中:in:
H2,d是单组融合分支的第d级融合特征输出;H2,d is the d-level fusion feature output of the single-group fusion branch;
包含两个堆栈的卷积操作,分别是通道数为64的1×1 的卷积和通道数为64的3×3的卷积,且每个卷积操作之后都跟随着一个ReLu激活函数; Convolution operations including two stacks, namely 1×1 convolution with 64 channels and 3×3 convolution with 64 channels, and each convolution operation is followed by a ReLu activation function;
表示单组融合分支的融合参数; Represents the fusion parameters of a single group of fusion branches;
(3b)第d级的多分支组融合特征Hd由H1,d和H2,d简单级联得到,其表达式为:(3b) The multi-branch group fusion feature Hd of level d is obtained by simply cascading H1,d and H2,d , and its expression is:
Hd=Cat(H1,d,H2,d)。Hd =Cat(H1,d ,H2,d ).
有益效果:本发明公开的多级深度特征融合的RGB-T图像显著性目标检测方法与现有技术相比,具有如下有益效果:Beneficial effects: Compared with the prior art, the multi-level deep feature fusion RGB-T image salient target detection method disclosed by the present invention has the following beneficial effects:
1)不需要人工设计并提取特征,能够实现RGB-T图像的端对端的像素级检测,仿真结果表明本发明在复杂多变场景下检测图像显著目标时更具有完整一致性效果。1) End-to-end pixel-level detection of RGB-T images can be realized without manual design and feature extraction. The simulation results show that the present invention has a more complete and consistent effect when detecting prominent objects in images in complex and changeable scenes.
2)本发明将从支柱网络提取的5级粗糙的单模态特征,通过建立多个邻近深度特征融合模块进行改善,得到3级单模态特征,能够有效捕捉输入图像的低级细节和高级语义信息,同时避免特征级数过多而导致网络整体参数急剧增多,降低网络训练难度。2) The present invention improves the 5-level rough single-modal feature extracted from the pillar network by establishing multiple adjacent deep feature fusion modules to obtain a 3-level single-modal feature, which can effectively capture the low-level details and high-level semantics of the input image Information, while avoiding too many feature series, which will lead to a sharp increase in the overall parameters of the network, and reduce the difficulty of network training.
3)本发明通过构建包含两个融合分支的多分支组融合模块融合不同模态特征,由于单分支组融合结构捕捉来自于RGB图像和热红外图像的不同模态全部特征间跨通道的相关性,而多组融合分支中提取到更显著的特征,可有效地捕捉来自RGB和热红外图像的跨模态信息,有助于检测更完整一致的目标,同时融合模块所需训练参数较少,可提高算法的检测速度。3) The present invention fuses different modal features by constructing a multi-branch group fusion module including two fusion branches, because the single-branch group fusion structure captures the cross-channel correlation between all the features of different modalities from RGB images and thermal infrared images , and more significant features are extracted from multiple sets of fusion branches, which can effectively capture cross-modal information from RGB and thermal infrared images, and help to detect more complete and consistent targets, while the fusion module requires fewer training parameters, The detection speed of the algorithm can be improved.
附图说明Description of drawings
图1为本发明公开的多级深度特征融合的RGB-T图像显著性目标检测方法的实现流程图;Fig. 1 is the implementation flowchart of the RGB-T image salient target detection method disclosed by the present invention of multi-level depth feature fusion;
图2为本发明与现有技术在RGB-thermal数据库下的实验结果仿真对比图;Fig. 2 is the simulation contrast figure of the experimental result of the present invention and prior art under RGB-thermal database;
图3a和图3b为本发明与现有技术在RGB-thermal数据库下的 P-R曲线、F-measure曲线两种评价指标仿真对比图。Fig. 3a and Fig. 3b are the simulation comparison diagrams of the P-R curve and the F-measure curve of the present invention and the prior art under the RGB-thermal database.
具体实施方式:Detailed ways:
下面对本发明的具体实施方式详细说明。Specific embodiments of the present invention will be described in detail below.
参照图1,多级深度特征融合的RGB-T图像显著性目标检测方法, 包括如下步骤:Referring to Figure 1, the multi-level deep feature fusion RGB-T image salient target detection method includes the following steps:
步骤1)对输入图像提取粗糙的多级特征:Step 1) Extract rough multi-level features from the input image:
对RGB图像或热红外图像,提取VGG16网络中位于不同深度的5 级特征作为粗糙的单模态特征,分别为:For RGB images or thermal infrared images, the 5-level features at different depths in the VGG16 network are extracted as rough single-modal features, respectively:
Conv1-2(用符号表示,包含64个尺寸为256×256的特征图);Conv1-2 (with the symbol Indicates that it contains 64 feature maps of size 256×256);
Conv2-2(用符号表示,包含128个尺寸为128×128的特征图);Conv2-2 (with the symbol Indicates that it contains 128 feature maps with a size of 128×128);
conv3-3(用符号表示,包含256个尺寸为64×64的特征图);conv3-3 (with the symbol representation, containing 256 feature maps of size 64×64);
Conv4-3(用符号表示,包含512个尺寸为32×32的特征图);Conv4-3 (with the symbol representation, containing 512 feature maps of size 32×32);
Conv5-3(用符号表示,包含512个尺寸为16×16的特征图);Conv5-3 (with the symbol representation, containing 512 feature maps of size 16×16);
其中:n=1或者2,Wherein: n=1 or 2,
n=1时代表RGB图像;When n=1, it represents an RGB image;
n=2时代表热红外图像;When n=2, it represents thermal infrared image;
步骤2)构建邻近深度特征融合模块,改善单模态特征:Step 2) Construct the adjacent deep feature fusion module to improve single-modal features:
常见多模态视觉方法直接将五级特征作为单模态特征,该方法因为特征级数过多导致网络参数量巨大,网络训练难度加大,本发明将不同深度的5级特征作为粗糙的单模态特征,通过建立多个邻近深度特征融合模块,得到3级改善的RGB图像特征或热红外图像特征;Common multimodal vision methods directly use five-level features as single-mode features. In this method, the number of network parameters is huge due to too many feature series, and the difficulty of network training is increased. The present invention uses five-level features of different depths as rough single-mode features. Modal features, through the establishment of multiple adjacent depth feature fusion modules, 3-level improved RGB image features or thermal infrared image features are obtained;
每一个邻近深度融合模块包含3个卷积操作和1个反卷积操作,特别地,为获得第d级单模态特征,d=1,2,3,首先将一个卷积核为3 ×3,步长为2,参数为的卷积操作一个卷积核为1×1,步长为1,参数为的卷积操作和一个卷积核为2×2,步长为1/2,参数为的反卷积操作分别作用于和以确保来自支柱网络的邻近3级特征具有相同的空间分辨率和特征通道数(本发明为128通道);之后将这3级特征级联,并通过一个卷积核为1×1,步长为1,参数为的卷积层得到128通道的第d级单模态特征邻近深度融合模块可表示如下:Each adjacent depth fusion module contains 3 convolution operations and 1 deconvolution operation. In particular, in order to obtain the d-level single-modal feature, d=1,2,3, firstly, a convolution kernel is 3 × 3, the step size is 2, and the parameter is The convolution operation A convolution kernel is 1×1, the step size is 1, and the parameters are The convolution operation And a convolution kernel is 2×2, the step size is 1/2, and the parameter is The deconvolution operation of respectively act on and To ensure that the adjacent 3-level features from the pillar network have the same spatial resolution and number of feature channels (128 channels in the present invention); then the 3-level features are cascaded and passed through a convolution kernel to be 1×1, with a step size of is 1, the parameter is convolutional layer Get 128-channel d-level unimodal features The neighborhood depth fusion module can be expressed as follows:
其中,Cat(·)表示跨通道级联操作,φ(·)是一个ReLu激活函数;Among them, Cat( ) represents a cross-channel cascade operation, and φ( ) is a ReLu activation function;
正如上述所示,第d级的RGB或热红外单模态特征同时包含了3级来自支柱网络的特征信息,即与它的邻近深度特征和这也表明将包含更丰富的细节和语义信息,有助于准确识别目标,另外,特征相对于简单合并和拥有更简洁的数据,通过邻近深度特征融合,粗提取的特征中的冗余信息在改善的特征中得到压缩;As indicated above, the RGB or thermal infrared single-modality characteristic of the d-level At the same time, it contains three levels of feature information from the pillar network, namely with its neighboring deep features and This also shows will contain richer details and semantic information, which will help to accurately identify the target. In addition, the feature vs simple merge and With more concise data, through the fusion of adjacent deep features, the redundant information in the roughly extracted features is improving. features are compressed;
步骤3)构建多分支组融合模块,融合多模态特征:Step 3) Build a multi-branch group fusion module to fuse multimodal features:
多分支组融合模块针对同一特征级下的不同单模态进行融合,且包含两个融合分支,其中;The multi-branch group fusion module performs fusion for different single modes under the same feature level, and contains two fusion branches, where;
第一个融合分支(又叫做多组融合分支)有M(本实施例为 M=8)个组,主要放大各通道的作用,减少网络参数;The first fusion branch (also known as multi-group fusion branch) has M (in this embodiment, M=8) groups, which mainly amplifies the effect of each channel and reduces network parameters;
第二个融合分支(又叫做单组融合分支)只有一个组,主要作用为充分捕捉不同模态的全部输入特征间的跨通道相关性;两个分支输出相同通道数的特征(本实施例为64通道),因此,多分支组融合模块最终的输出特征通道数是每个融合分支的两倍,同时又等于输入多分支组融合模块的RGB或热红外图像特征通道数(本实施例为128通道);The second fusion branch (also known as a single-group fusion branch) has only one group, and its main function is to fully capture the cross-channel correlation between all input features of different modalities; the two branches output the features of the same number of channels (this embodiment is 64 channels), therefore, the final output feature channel number of the multi-branch group fusion module is twice that of each fusion branch, and is equal to the RGB or thermal infrared image feature channel number of the input multi-branch group fusion module (128 in this embodiment) aisle);
多组融合分支根据“拆分—转换—合并”的基本思想建立,在多组融合分支中,输入的单模态特征和分别根据通道数量被切分成M个通道数相同的小组(128/M),得到和两个特征集;紧接着,将同级的两个特征集中来自第m个小组的对应RGB和热红外特征通过级联操作进行结合,继而通过通道数为64/M的1×1的卷积和通道数为64/M的3×3的两个堆栈的卷积操作,实现小组内跨模态特征的融合,其中,第一个1×1主要起到减少特征通道数的作用,第二个卷积主要用于融合特征,而每个卷积操作之后都紧随着一个ReLu激活函数;最终,M个小组输出被级联在一起,得到多组融合分支的输出特征H1,d,其表达式为:The multi-group fusion branch is established according to the basic idea of "split-transform-merge". In the multi-group fusion branch, the input unimodal features and According to the number of channels, it is divided into M groups with the same number of channels (128/M), and and Two feature sets; then, the corresponding RGB and thermal infrared features from the mth group in the two feature sets of the same level are combined through a cascade operation, and then through a 1×1 convolution with a channel number of 64/M The convolution operation of two stacks of 3×3 with a channel number of 64/M realizes the fusion of cross-modal features in the group. Among them, the first 1×1 mainly serves to reduce the number of feature channels, and the second Convolutions are mainly used to fuse features, and each convolution operation is followed by a ReLu activation function; finally, M group outputs are cascaded together to obtain output features H1,d of multiple sets of fusion branches, Its expression is:
其中,表示上述中带ReLu激活函数的堆栈卷积操作,代表第m个小组的融合参数;in, Represents the stack convolution operation with the ReLu activation function in the above, represents the fusion parameters of the mth group;
单组融合分支可看作是多组融合分支中M=1时的特殊情况,表达式为:A single group of fusion branches can be regarded as a special case when M=1 in multiple groups of fusion branches, and the expression is:
其中,H2,d是单组融合分支的第d级融合特征输出,包含两个堆栈的卷积操作,分别是通道数为64的1×1的卷积和通道数为64 的3×3的卷积,通过两个卷积充分捕捉输入的全部多模态特征之间的相关性信息,且每个卷积操作之后都跟随着一个ReLu激活函数,表示单组融合分支的融合参数;Among them, H2,d is the d-level fusion feature output of the single-group fusion branch, Convolution operations including two stacks, namely 1×1 convolution with 64 channels and 3×3 convolution with 64 channels, fully capture all multimodal features of the input through two convolutions The correlation information between, and each convolution operation is followed by a ReLu activation function, Represents the fusion parameters of a single group of fusion branches;
最终,经过多组融合分支和单组融合分支,第d级的多分支组融合特征Hd可以由H1,d和H2,d简单级联得到,表达式为:Finally, after multiple sets of fusion branches and a single set of fusion branches, the d-th level multi-branch group fusion feature Hd can be obtained by simply cascading H1,d and H2,d , the expression is:
Hd=Cat(H1,d,H2,d)Hd =Cat(H1,d ,H2,d )
正如所述所示,多分支组融合模块既可以通过单分支组融合结构捕捉来自于RGB图像和热红外图像的不同模态全部特征间跨通道的相关性,又可以从多组融合分支中提取到更显著的特征。因此,通过多个多分支组融合模块,基于多模态的多级融合特征被提取,且相较于常用的融合方法,可更有效地捕捉来自RGB和热红外图像的跨模态信息,检测更完整一致的目标;由于分组卷积的思想,多分支组融合模块相较于常见的直接级联再经过一系列卷积层和激活层的融合方法,需要更少的训练参数;As mentioned, the multi-branch group fusion module can not only capture the cross-channel correlation between all the features of different modalities from RGB images and thermal infrared images through the single-branch group fusion structure, but also extract to more prominent features. Therefore, through multiple multi-branch group fusion modules, multi-modality-based multi-level fusion features are extracted, and compared with commonly used fusion methods, it can more effectively capture cross-modal information from RGB and thermal infrared images, detect More complete and consistent goals; due to the idea of group convolution, the multi-branch group fusion module requires fewer training parameters than the common direct cascading and then a series of fusion methods of convolutional layers and activation layers;
步骤4)得到融合输出特征图:Step 4) Obtain the fusion output feature map:
将不同级特征经过逐级反向进行融合,获得多个边输出特征图 {Pd|d=1,2,3},表达式为:The features of different levels are fused step by step to obtain multiple edge output feature maps {Pd |d=1,2,3}, the expression is:
其中,D(*;γd,(1/2)d)是一个卷积核为2d×2d,步长为(1/2)d,参数为γd的反卷积层,使融合的特征具有相同的空间分辨率,和是两个卷积核为1×1,步长为1,参数分别为和的卷积层,分别被用作融合不同级特征和产生各级的边输出特征图。经过逐级信息传递,我们得到3个尺寸等同于输入的单模态图像的边输出特征图 {Pd|d=1,2,3};Among them, D(*;γd ,(1/2)d ) is a deconvolution layer with a convolution kernel of 2d ×2d , a step size of (1/2)d , and a parameter of γd , so that the fusion The features of have the same spatial resolution, and The two convolution kernels are 1×1, the step size is 1, and the parameters are and The convolutional layers are used to fuse different levels of features and generate edge output feature maps of each level. After step-by-step information transfer, we get 3 side-output feature maps {Pd |d=1,2,3} whose size is equivalent to the input unimodal image;
使用级联操作将多级特征合并,再通过一个卷积核为1×1,步长为1,参数为θ0的卷积操作C(*;θ0,1)融合生成特征图P0,表达式为:Use the cascade operation to combine the multi-level features, and then generate a feature map P0 through a convolution operation C(*; θ0 ,1) with a convolution kernel of 1×1, a step size of 1, and a parameter of θ0 , The expression is:
P0=C(Cat(P1,P2,P3);θ0,1)P0 =C(Cat(P1 ,P2 ,P3 );θ0 ,1)
步骤5)训练算法网络:Step 5) Train the algorithm network:
在训练数据集上,采用深度监督学习机制,将边输出特征图和融合输出特征图{Pt|t=0,1,2,3},与真值图G进行比较,求取网络模型的交叉熵损失函数L:On the training data set, the deep supervised learning mechanism is used to compare the edge output feature map and the fusion output feature map {Pt |t=0,1,2,3} with the truth map G to obtain the network model. Cross entropy loss function L:
其中,G(i,j)∈{0,1}是真值图G中位于(i,j)位置的值,Pt(i,j)是特征图Pt经过σ(Pt)操作后得到的概率图中位于(i,j)位置的概率值,σ(·)是一个 sigmoid激活函数。在不同图像中,显著性目标所占区域大小于背景区域大小是不同的,为了平衡前景和背景的损失,增加算法对不同尺寸的显著性目标的检测准确性,使用了一个类平衡参数β,β是真值图中背景像素的数量和整个真值图像素数量的比值,可以表示为:Among them, G(i,j)∈{0,1} is the value at position (i,j) in the truth map G, Pt (i,j) is the feature map Pt after σ(Pt ) operation The probability value at the (i,j) position in the obtained probability map, σ(·) is a sigmoid activation function. In different images, the size of the area occupied by the salient target is different from that of the background area. In order to balance the loss of the foreground and background and increase the detection accuracy of the algorithm for the salient target of different sizes, a class balance parameter β is used. β is the ratio of the number of background pixels in the truth map to the number of pixels in the whole truth map, which can be expressed as:
其中,Nb表示背景像素点数量,Nf表示前景像素点数量;Among them, Nb represents the number of background pixels, and Nf represents the number of foreground pixels;
本发明使用“3步训练法”对网络进行训练:第一步,通过最小化交叉熵损失函数来训练RGB图像的分支网络,在构建的分支网络中,多分支组融合模块被移除,从多个邻近深度特征融合模块中输出的多级可见光图像特征,被直接输入到反向传递过程中去预测显著性;第二步,使用同第一步中与RGB分支网络相同的方法构建和训练热红外分支;第三步,基于前两步中,RGB和热红外单分支网络得到的 VGG16支柱网络参数和临近深度特征融合模块参数,训练RGB-T图像检测的整体网络,得到网路模型参数;The present invention uses the "3-step training method" to train the network: the first step is to train the branch network of the RGB image by minimizing the cross-entropy loss function. In the constructed branch network, the multi-branch group fusion module is removed, from The multi-level visible light image features output by multiple adjacent depth feature fusion modules are directly input into the backward pass process to predict saliency; the second step is constructed and trained using the same method as the RGB branch network in the first step Thermal infrared branch; the third step, based on the VGG16 pillar network parameters obtained by the RGB and thermal infrared single-branch networks and the parameters of the adjacent depth feature fusion module in the first two steps, train the overall network of RGB-T image detection to obtain network model parameters ;
在训练热红外单模态分支网络参数时,用于热红外单模态显著性目标检测的数据集缺失,为了能够顺利训练,本发明使用了RGB图像的R通道代替热红外单模态数据,因为RGB图像的三个通道中, R通道图像最接近热红外图像,具体训练数据集构建如下:When training the parameters of the thermal infrared single-mode branch network, the data set used for thermal infrared single-mode salient target detection is missing. In order to train smoothly, the present invention uses the R channel of the RGB image instead of the thermal infrared single-mode data. Because the R channel image is the closest to the thermal infrared image among the three channels of the RGB image, the specific training data set is constructed as follows:
使用RGB-thermal数据集中的RGB图像(每两张取一张)和 MSRA-B训练数据集(每3张取一张),形成1:2的数据比训练RGB 分支网络模型;对应着,使用RGB-thermal数据集中的热红外图像(每两张取一张)和MSRA-B训练数据集中图像的R通道(每3张取一张),形成1:2的数据比训练热红外分支网络模型;对于RGB-T图像多模态网络模型,使用RGB-thermal数据集中的成对图像(每两对取一对)进行训练;Use the RGB images in the RGB-thermal data set (take one for every two) and the MSRA-B training data set (take one for every three) to form a 1:2 data ratio to train the RGB branch network model; correspondingly, use The thermal infrared images in the RGB-thermal dataset (take one for every two images) and the R channel of the images in the MSRA-B training dataset (take one for every three images), forming a data ratio of 1:2 to train the thermal infrared branch network model ; For the RGB-T image multimodal network model, use the paired images in the RGB-thermal data set (every two pairs get a pair) for training;
训练中,为避免训练数据过少出现过拟合现象,对每幅图像进行旋转90°,180°,270°,以及水平,上下翻转操作,将原有的数据集总量扩大成为8倍的数量;During training, in order to avoid overfitting due to too little training data, each image is rotated 90°, 180°, 270°, and horizontally, up and down flipped to expand the total amount of the original data set to 8 times quantity;
步骤5)预测RGB-T图像的像素级显著图:Step 5) Predict the pixel-level saliency map of the RGB-T image:
将RGB-thermal数据集中除用于训练外的另一半数据作为测试数据,利用步骤(5)得到的网络模型参数,对步骤(4)中得到的边输出特征图和融合输出特征图,进行进一步分类计算,用{St|t=0,1,2,3} 表示网络所有的输出显著图,St可表示如下:Use the other half of the data in the RGB-thermal data set except for training as test data, and use the network model parameters obtained in step (5) to further perform a further step on the edge output feature map and fusion output feature map obtained in step (4). For classification calculation, use {St |t=0,1,2,3} to represent all output saliency maps of the network, and St can be expressed as follows:
St=σ(Pt)St =σ(Pt )
其中,σ(·)是一个sigmoid激活函数;Among them, σ( ) is a sigmoid activation function;
最后,将S0作为最终的RGB-T预测显著图。Finally,S0 is taken as the final RGB-T predicted saliency map.
以下结合仿真实验,对本发明的技术效果作进一步说明:Below in conjunction with simulation experiment, technical effect of the present invention is described further:
1、仿真条件:所有仿真实验均在Ubuntu 16.04.5环境下采用caffe 深度学习框架,借助Matlab R2014b软件为接口实现;1. Simulation conditions: All simulation experiments are implemented in the Ubuntu 16.04.5 environment using the caffe deep learning framework, with the help of Matlab R2014b software as the interface;
2、仿真内容及结果分析:2. Simulation content and result analysis:
仿真1Simulation 1
将本发明与现有的基于RGB图像的显著目标检测方法和RGB-T 图像的显著性目标检测算法在公共图像数据库RGB-thermal上进行显著目标检测实验,部分实验结果进行直观的比较,如图2所示,其中,RGB图像表示数据库中用于实验输入的RGB图像,T图像表示数据库中用于实验输入的与RGB图像成对的热红外图像,GT表示人工标定的真值图;The present invention and the existing salient target detection method based on RGB images and the salient target detection algorithm of RGB-T images are used for salient target detection experiments on the public image database RGB-thermal, and some experimental results are intuitively compared, as shown in Fig. 2, where the RGB image represents the RGB image used for experimental input in the database, the T image represents the thermal infrared image paired with the RGB image used for experimental input in the database, and GT represents the artificially calibrated truth map;
从图2可以看出,相较于现有技术,本发明对背景抑制效果更好,在复杂场景下的显著目标检测中具有更好的完整一致性效果,且更接近于人工标定的真值图。It can be seen from Fig. 2 that, compared with the prior art, the present invention has a better background suppression effect, better complete consistency effect in salient target detection in complex scenes, and is closer to the true value of manual calibration picture.
仿真2Simulation 2
将本发明与现有的基于单模态图像的显著目标检测方法和基于 RGB-T图像的显著性目标检测算法在公共图像数据库RGB-thermal 上进行显著目标检测实验得到的结果,采用公认的评价指标进行客观评价,评价仿真结果如图3a和图3b所示,其中:The present invention and the existing salient target detection method based on single-modal images and the salient target detection algorithm based on RGB-T images are used to perform salient target detection experiments on the public image database RGB-thermal. The indicators are objectively evaluated, and the evaluation simulation results are shown in Figure 3a and Figure 3b, where:
图3a为本发明和现有技术采用准确率-召回率(P-R)曲线进行评价的结果图;Fig. 3 a is the result figure that the present invention and prior art adopt precision rate-recall rate (P-R) curve to evaluate;
图3b为本发明和现有技术采用F-Measure曲线进行评价的结果图;Fig. 3 b is the result figure that the present invention and prior art adopt F-Measure curve to evaluate;
从图3a和图3b可以看出,相较于现有技术,本发明具有更高的 PR曲线和F-measure曲线,从而表明了本发明对显著性的目标检测具有更好的一致性和完整性,充分表明了本发明方法的有效性和优越性。It can be seen from Figure 3a and Figure 3b that compared with the prior art, the present invention has a higher PR curve and F-measure curve, which shows that the present invention has better consistency and completeness in the detection of salient objects property, fully demonstrated the effectiveness and superiority of the inventive method.
上面对本发明的实施方式做了详细说明。但是本发明并不限于上述实施方式,在所属技术领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下做出各种变化。The embodiments of the present invention have been described in detail above. However, the present invention is not limited to the above-mentioned embodiments, and various changes can be made within the scope of knowledge of those skilled in the art without departing from the gist of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910431110.6ACN110210539B (en) | 2019-05-22 | 2019-05-22 | RGB-T image saliency target detection method based on multi-level depth feature fusion |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910431110.6ACN110210539B (en) | 2019-05-22 | 2019-05-22 | RGB-T image saliency target detection method based on multi-level depth feature fusion |
| Publication Number | Publication Date |
|---|---|
| CN110210539Atrue CN110210539A (en) | 2019-09-06 |
| CN110210539B CN110210539B (en) | 2022-12-30 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910431110.6AActiveCN110210539B (en) | 2019-05-22 | 2019-05-22 | RGB-T image saliency target detection method based on multi-level depth feature fusion |
| Country | Link |
|---|---|
| CN (1) | CN110210539B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110889416A (en)* | 2019-12-13 | 2020-03-17 | 南开大学 | A salient object detection method based on cascade improved network |
| CN111047571A (en)* | 2019-12-10 | 2020-04-21 | 安徽大学 | An image salient object detection method with adaptive selection training process |
| CN111242138A (en)* | 2020-01-11 | 2020-06-05 | 杭州电子科技大学 | RGBD significance detection method based on multi-scale feature fusion |
| CN111428602A (en)* | 2020-03-18 | 2020-07-17 | 浙江科技学院 | Binocular saliency image detection method based on edge-assisted enhancement of convolutional neural network |
| CN111583173A (en)* | 2020-03-20 | 2020-08-25 | 北京交通大学 | A saliency object detection method for RGB-D images |
| CN111582316A (en)* | 2020-04-10 | 2020-08-25 | 天津大学 | A RGB-D Saliency Object Detection Method |
| CN111666977A (en)* | 2020-05-09 | 2020-09-15 | 西安电子科技大学 | Shadow detection method of monochrome image |
| CN111814895A (en)* | 2020-07-17 | 2020-10-23 | 大连理工大学人工智能大连研究院 | A saliency object detection method based on absolute and relative depth-induced networks |
| CN111986240A (en)* | 2020-09-01 | 2020-11-24 | 交通运输部水运科学研究所 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
| CN112348870A (en)* | 2020-11-06 | 2021-02-09 | 大连理工大学 | Significance target detection method based on residual error fusion |
| CN113159068A (en)* | 2021-04-13 | 2021-07-23 | 天津大学 | RGB-D significance target detection method based on deep learning |
| CN113205481A (en)* | 2021-03-19 | 2021-08-03 | 浙江科技学院 | Salient object detection method based on stepped progressive neural network |
| CN113221659A (en)* | 2021-04-13 | 2021-08-06 | 天津大学 | Double-light vehicle detection method and device based on uncertain sensing network |
| CN113298094A (en)* | 2021-06-10 | 2021-08-24 | 安徽大学 | RGB-T significance target detection method based on modal association and double-perception decoder |
| CN113486899A (en)* | 2021-05-26 | 2021-10-08 | 南开大学 | Saliency target detection method based on complementary branch network |
| CN113822855A (en)* | 2021-08-11 | 2021-12-21 | 安徽大学 | RGB-T image salient object detection method combining independent decoding and joint decoding |
| CN114638304A (en)* | 2022-03-18 | 2022-06-17 | 北京奇艺世纪科技有限公司 | Training method of image recognition model, image recognition method and device |
| CN114663371A (en)* | 2022-03-11 | 2022-06-24 | 安徽大学 | Image salient object detection method based on modal unique and common feature extraction |
| CN115436894A (en)* | 2021-06-01 | 2022-12-06 | 富士通株式会社 | Key point identification device and method based on wireless radar signal |
| CN114092774B (en)* | 2021-11-22 | 2023-08-15 | 沈阳工业大学 | RGB-T image saliency detection system and detection method based on information flow fusion |
| CN116863160A (en)* | 2023-07-11 | 2023-10-10 | 国网江苏省电力有限公司常州供电分公司 | Saliency target detection method |
| CN118247486A (en)* | 2024-04-01 | 2024-06-25 | 杭州电子科技大学 | Multimodal salient object detection method and system based on lightweight three-branch encoder-decoder network |
| CN119339065A (en)* | 2024-12-19 | 2025-01-21 | 西安电子科技大学 | Arbitrary modality salient target detection method, system, device and medium |
| CN119399591A (en)* | 2024-12-31 | 2025-02-07 | 中国人民解放军海军航空大学 | A radar multi-feature graph representation and graph network fusion target detection system and method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070076917A1 (en)* | 2003-03-21 | 2007-04-05 | Lockheed Martin Corporation | Target detection improvements using temporal integrations and spatial fusion |
| US20180032840A1 (en)* | 2016-07-27 | 2018-02-01 | Beijing Kuangshi Technology Co., Ltd. | Method and apparatus for neural network training and construction and method and apparatus for object detection |
| CN109598268A (en)* | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
| CN109712105A (en)* | 2018-12-24 | 2019-05-03 | 浙江大学 | An image salient object detection method combining color and depth information |
| CN109784183A (en)* | 2018-12-17 | 2019-05-21 | 西北工业大学 | Saliency object detection method based on concatenated convolutional network and light stream |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070076917A1 (en)* | 2003-03-21 | 2007-04-05 | Lockheed Martin Corporation | Target detection improvements using temporal integrations and spatial fusion |
| US20180032840A1 (en)* | 2016-07-27 | 2018-02-01 | Beijing Kuangshi Technology Co., Ltd. | Method and apparatus for neural network training and construction and method and apparatus for object detection |
| CN109598268A (en)* | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
| CN109784183A (en)* | 2018-12-17 | 2019-05-21 | 西北工业大学 | Saliency object detection method based on concatenated convolutional network and light stream |
| CN109712105A (en)* | 2018-12-24 | 2019-05-03 | 浙江大学 | An image salient object detection method combining color and depth information |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111047571A (en)* | 2019-12-10 | 2020-04-21 | 安徽大学 | An image salient object detection method with adaptive selection training process |
| CN111047571B (en)* | 2019-12-10 | 2023-04-25 | 安徽大学 | Image salient target detection method with self-adaptive selection training process |
| CN110889416A (en)* | 2019-12-13 | 2020-03-17 | 南开大学 | A salient object detection method based on cascade improved network |
| CN110889416B (en)* | 2019-12-13 | 2023-04-18 | 南开大学 | Salient object detection method based on cascade improved network |
| CN111242138B (en)* | 2020-01-11 | 2022-04-01 | 杭州电子科技大学 | RGBD significance detection method based on multi-scale feature fusion |
| CN111242138A (en)* | 2020-01-11 | 2020-06-05 | 杭州电子科技大学 | RGBD significance detection method based on multi-scale feature fusion |
| CN111428602A (en)* | 2020-03-18 | 2020-07-17 | 浙江科技学院 | Binocular saliency image detection method based on edge-assisted enhancement of convolutional neural network |
| CN111583173A (en)* | 2020-03-20 | 2020-08-25 | 北京交通大学 | A saliency object detection method for RGB-D images |
| CN111583173B (en)* | 2020-03-20 | 2023-12-01 | 北京交通大学 | A saliency target detection method in RGB-D images |
| CN111582316A (en)* | 2020-04-10 | 2020-08-25 | 天津大学 | A RGB-D Saliency Object Detection Method |
| CN111582316B (en)* | 2020-04-10 | 2022-06-28 | 天津大学 | A RGB-D Saliency Object Detection Method |
| CN111666977A (en)* | 2020-05-09 | 2020-09-15 | 西安电子科技大学 | Shadow detection method of monochrome image |
| CN111666977B (en)* | 2020-05-09 | 2023-02-28 | 西安电子科技大学 | A Method of Shadow Detection in Monochrome Image |
| CN111814895A (en)* | 2020-07-17 | 2020-10-23 | 大连理工大学人工智能大连研究院 | A saliency object detection method based on absolute and relative depth-induced networks |
| CN111814895B (en)* | 2020-07-17 | 2024-12-03 | 大连理工大学人工智能大连研究院 | Salient object detection method based on absolute and relative depth induced network |
| CN111986240A (en)* | 2020-09-01 | 2020-11-24 | 交通运输部水运科学研究所 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
| CN112348870A (en)* | 2020-11-06 | 2021-02-09 | 大连理工大学 | Significance target detection method based on residual error fusion |
| CN113205481A (en)* | 2021-03-19 | 2021-08-03 | 浙江科技学院 | Salient object detection method based on stepped progressive neural network |
| CN113159068A (en)* | 2021-04-13 | 2021-07-23 | 天津大学 | RGB-D significance target detection method based on deep learning |
| CN113159068B (en)* | 2021-04-13 | 2022-08-30 | 天津大学 | RGB-D significance target detection method based on deep learning |
| CN113221659A (en)* | 2021-04-13 | 2021-08-06 | 天津大学 | Double-light vehicle detection method and device based on uncertain sensing network |
| CN113221659B (en)* | 2021-04-13 | 2022-12-23 | 天津大学 | A dual-light vehicle detection method and device based on uncertain perception network |
| CN113486899A (en)* | 2021-05-26 | 2021-10-08 | 南开大学 | Saliency target detection method based on complementary branch network |
| CN115436894A (en)* | 2021-06-01 | 2022-12-06 | 富士通株式会社 | Key point identification device and method based on wireless radar signal |
| CN113298094B (en)* | 2021-06-10 | 2022-11-04 | 安徽大学 | An RGB-T Salient Object Detection Method Based on Modality Correlation and Dual Perceptual Decoder |
| CN113298094A (en)* | 2021-06-10 | 2021-08-24 | 安徽大学 | RGB-T significance target detection method based on modal association and double-perception decoder |
| CN113822855A (en)* | 2021-08-11 | 2021-12-21 | 安徽大学 | RGB-T image salient object detection method combining independent decoding and joint decoding |
| CN114092774B (en)* | 2021-11-22 | 2023-08-15 | 沈阳工业大学 | RGB-T image saliency detection system and detection method based on information flow fusion |
| CN114663371A (en)* | 2022-03-11 | 2022-06-24 | 安徽大学 | Image salient object detection method based on modal unique and common feature extraction |
| CN114663371B (en)* | 2022-03-11 | 2025-04-08 | 安徽大学 | Image salient object detection method based on modal unique and common feature extraction |
| CN114638304A (en)* | 2022-03-18 | 2022-06-17 | 北京奇艺世纪科技有限公司 | Training method of image recognition model, image recognition method and device |
| CN116863160A (en)* | 2023-07-11 | 2023-10-10 | 国网江苏省电力有限公司常州供电分公司 | Saliency target detection method |
| CN116863160B (en)* | 2023-07-11 | 2025-09-30 | 国网江苏省电力有限公司常州供电分公司 | Salient object detection methods |
| CN118247486A (en)* | 2024-04-01 | 2024-06-25 | 杭州电子科技大学 | Multimodal salient object detection method and system based on lightweight three-branch encoder-decoder network |
| CN119339065A (en)* | 2024-12-19 | 2025-01-21 | 西安电子科技大学 | Arbitrary modality salient target detection method, system, device and medium |
| CN119399591A (en)* | 2024-12-31 | 2025-02-07 | 中国人民解放军海军航空大学 | A radar multi-feature graph representation and graph network fusion target detection system and method |
| Publication number | Publication date |
|---|---|
| CN110210539B (en) | 2022-12-30 |
| Publication | Publication Date | Title |
|---|---|---|
| CN110210539B (en) | RGB-T image saliency target detection method based on multi-level depth feature fusion | |
| Zhou et al. | Contextual ensemble network for semantic segmentation | |
| CN111582316B (en) | A RGB-D Saliency Object Detection Method | |
| CN109522966B (en) | A target detection method based on densely connected convolutional neural network | |
| CN109614985B (en) | Target detection method based on densely connected feature pyramid network | |
| CN112884064A (en) | Target detection and identification method based on neural network | |
| CN112084866A (en) | Target detection method based on improved YOLO v4 algorithm | |
| CN107403430A (en) | A kind of RGBD image, semantics dividing method | |
| CN109902806A (en) | Determination method of target bounding box of noisy image based on convolutional neural network | |
| CN111027576B (en) | Co-saliency detection method based on co-saliency generative adversarial network | |
| CN110909594A (en) | Video significance detection method based on depth fusion | |
| CN112766186B (en) | Real-time face detection and head posture estimation method based on multitask learning | |
| Hara et al. | Towards good practice for action recognition with spatiotemporal 3d convolutions | |
| CN114219824B (en) | Visible light-infrared target tracking method and system based on deep network | |
| Wang et al. | DECA: a novel multi-scale efficient channel attention module for object detection in real-life fire images | |
| CN111291826A (en) | Pixel-by-pixel classification of multi-source remote sensing images based on correlation fusion network | |
| CN113361466B (en) | Multispectral target detection method based on multi-mode cross guidance learning | |
| CN113033454B (en) | A detection method for building changes in urban video cameras | |
| Jiang et al. | Mirror complementary transformer network for RGB‐thermal salient object detection | |
| CN114299305B (en) | Saliency target detection algorithm for aggregating dense and attention multi-scale features | |
| CN108985200A (en) | A kind of In vivo detection algorithm of the non-formula based on terminal device | |
| CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
| CN113887468B (en) | Single-view human-object interaction identification method of three-stage network framework | |
| CN112651459A (en) | Defense method, device, equipment and storage medium for confrontation sample of deep learning image | |
| CN118587449A (en) | A RGB-D saliency detection method based on progressive weighted decoding |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |