CN111860398A

Movatterモバイル変換

Info

Publication number: CN111860398A
Application number: CN202010737230.1A
Authority: CN
Inventors: 刘京; 田亮; 郭蔚; 杨烁今; 陈栋; 周丙寅
Original assignee: Hebei Normal University
Current assignee: Hebei Normal University
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2020-10-30
Anticipated expiration: 2040-07-28
Also published as: CN111860398B

Abstract

The invention is suitable for the technical field of image processing, and discloses a method, a system and a terminal device for detecting a remote sensing image target, wherein the method comprises the following steps: acquiring a remote sensing image to be detected; inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales; and carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result. The invention carries out feature extraction through the parallel perception attention network model, not only can extract multi-scale, context and global features of the target, but also can extract correlation features among non-local targets and can extract direction sensitive target features.

Description

Translated fromChinese

遥感图像目标检测方法、系统及终端设备Remote sensing image target detection method, system and terminal device

技术领域technical field

本发明属于图像处理技术领域，尤其涉及一种遥感图像目标检测方法、系统及终端设备。The invention belongs to the technical field of image processing, and in particular relates to a remote sensing image target detection method, system and terminal device.

背景技术Background technique

目标检测是图像处理领域的重要研究内容，有着很高的实际应用价值，是国内外专家学者广泛关注的研究课题。并且，随着深度学习的发展，将深度学习应用于遥感图像的目标检测是大势所趋。Object detection is an important research content in the field of image processing. It has high practical application value and is a research topic that is widely concerned by experts and scholars at home and abroad. And, with the development of deep learning, it is an irresistible trend to apply deep learning to target detection in remote sensing images.

目前，基于深度学习的目标检测模型主要分为两大类。一类是以RCNN，Fast-RCNN，Faster-RCNN为代表的基于区域推荐的目标检测模型，该类目标检测模型经过由粗到细两步预测出被检测目标的边界框和类别，有较高的精确度但是检测速度慢；另一类是以YOLO，SSD为代表的基于回归的目标检测模型，该类模型直接预测出被检测目标的边界围框和类别而无需经过“先粗后细”的过程，有较快的检测速度但是检测精度一般。因此，现有技术无法兼顾目标检测速度和目标检测精度。At present, the target detection models based on deep learning are mainly divided into two categories. One is the target detection model based on region recommendation represented by RCNN, Fast-RCNN, Faster-RCNN. This type of target detection model predicts the bounding box and category of the detected target through two steps from coarse to fine. The accuracy is high but the detection speed is slow; the other is a regression-based target detection model represented by YOLO and SSD. This type of model directly predicts the bounding box and category of the detected target without going through "first thick and then thin". The process has a faster detection speed but the detection accuracy is average. Therefore, the prior art cannot take into account both the target detection speed and the target detection accuracy.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供了一种遥感图像目标检测方法、系统及终端设备，以解决现有技术无法兼顾目标检测速度和目标检测精度的问题。In view of this, the embodiments of the present invention provide a remote sensing image target detection method, system, and terminal device, so as to solve the problem that the target detection speed and target detection accuracy cannot be considered in the prior art.

本发明实施例的第一方面提供了一种遥感图像目标检测方法，包括：A first aspect of the embodiments of the present invention provides a remote sensing image target detection method, including:

获取待检测遥感图像；Obtain remote sensing images to be detected;

将待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图；Input the remote sensing images to be detected into the trained parallel perceptual attention network model, and obtain multiple output feature maps of different scales;

根据多个不同尺度的输出特征图进行目标检测得到检测结果。The detection results are obtained by performing target detection according to multiple output feature maps of different scales.

本发明实施例的第二方面提供了一种遥感图像目标检测系统，包括：A second aspect of the embodiments of the present invention provides a remote sensing image target detection system, including:

获取模块，用于获取待检测遥感图像；an acquisition module for acquiring remote sensing images to be detected;

特征提取模块，用于将待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图；The feature extraction module is used to input the remote sensing images to be detected into the trained parallel perception attention network model, and obtain multiple output feature maps of different scales;

目标检测模块，用于根据多个不同尺度的输出特征图进行目标检测得到检测结果。The target detection module is used to perform target detection according to multiple output feature maps of different scales to obtain detection results.

本发明实施例的第三方面提供了一种终端设备，包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序，处理器执行计算机程序时实现如第一方面所述遥感图像目标检测方法的步骤。A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, the remote sensing as described in the first aspect is implemented. The steps of an image object detection method.

本发明实施例的第四方面提供了一种计算机可读存储介质，计算机可读存储介质存储有计算机程序，计算机程序被一个或多个处理器执行时实现如第一方面所述遥感图像目标检测方法的步骤。A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by one or more processors, the remote sensing image target detection according to the first aspect is implemented steps of the method.

本发明实施例与现有技术相比存在的有益效果是：本发明实施例首先获取待检测遥感图像，然后将待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图，最后根据多个不同尺度的输出特征图进行目标检测得到检测结果，本发明实施例通过并行感知注意力网络模型进行特征提取，不但可以提取目标的多尺度、上下文以及全局特征，而且可以提取非局部的目标间关联特征，还可以提取方向敏感的目标特征，使用通过并行感知注意力网络模型提取的多个尺度的输出特征图进行目标检测可以在保证较高检测准确率的同时提高检测速度，能够同时兼顾目标检测速度和目标检测精度。Compared with the prior art, the embodiment of the present invention has the following beneficial effects: the embodiment of the present invention first obtains the remote sensing image to be detected, and then inputs the remote sensing image to be detected into the trained parallel perceptual attention network model to obtain a plurality of different scales. Finally, according to the output feature maps of multiple different scales, the target detection is performed to obtain the detection results. In the embodiment of the present invention, the feature extraction is performed through the parallel perceptual attention network model, which can not only extract the multi-scale, context and global features of the target, In addition, non-local inter-target correlation features can be extracted, and direction-sensitive target features can also be extracted. Using the output feature maps of multiple scales extracted by the parallel perceptual attention network model for target detection can ensure high detection accuracy at the same time. The detection speed is improved, and the target detection speed and target detection accuracy can be taken into account at the same time.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present invention. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明一实施例提供的遥感图像目标检测方法的实现流程示意图；Fig. 1 is the realization flow schematic diagram of the remote sensing image target detection method provided by one embodiment of the present invention;

图2是本发明一实施例提供的并行感知注意力网络模型的示意图；2 is a schematic diagram of a parallel perceptual attention network model provided by an embodiment of the present invention;

图3是本发明一实施例提供的第一多尺度注意力子模块的示意图；3 is a schematic diagram of a first multi-scale attention sub-module provided by an embodiment of the present invention;

图4是本发明一实施例提供的第一上下文注意力子模块的示意图；4 is a schematic diagram of a first contextual attention sub-module provided by an embodiment of the present invention;

图5是本发明一实施例提供的第一通道注意力子模块的示意图；5 is a schematic diagram of a first channel attention sub-module provided by an embodiment of the present invention;

图6是本发明一实施例提供的第一尺度特征图的热力图像的示意图；6 is a schematic diagram of a thermal image of a first-scale feature map provided by an embodiment of the present invention;

图7是本发明一实施例提供的第一上下文特征图的热力图像的示意图；7 is a schematic diagram of a thermal image of a first context feature map provided by an embodiment of the present invention;

图8是本发明一实施例提供的第一通道特征图的热力图像的示意图；8 is a schematic diagram of a thermal image of a first channel feature map provided by an embodiment of the present invention;

图9是本发明又一实施例提供的遥感图像目标检测方法的实现流程示意图；FIG. 9 is a schematic flowchart of an implementation of a remote sensing image target detection method provided by another embodiment of the present invention;

图10是本发明一实施例提供的实验检测结果的示意图；10 is a schematic diagram of an experimental detection result provided by an embodiment of the present invention;

图11是本发明一实施例提供的遥感图像目标检测系统的示意框图；11 is a schematic block diagram of a remote sensing image target detection system provided by an embodiment of the present invention;

图12是本发明一实施例提供的终端设备的示意框图。FIG. 12 is a schematic block diagram of a terminal device according to an embodiment of the present invention.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

为了说明本发明所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solutions of the present invention, the following specific embodiments are used for description.

图1是本发明一实施例提供的遥感图像目标检测方法的实现流程示意图，为了便于说明，仅示出了与本发明实施例相关的部分。本发明实施例的执行主体可以是终端设备。如图1所示，该方法可以包括以下步骤：FIG. 1 is a schematic diagram of an implementation flowchart of a remote sensing image target detection method provided by an embodiment of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown. The execution subject of the embodiment of the present invention may be a terminal device. As shown in Figure 1, the method may include the following steps:

S101：获取待检测遥感图像。S101: Obtain a remote sensing image to be detected.

在本发明实施例中，可以通过现有方法获取待检测遥感图像。In the embodiment of the present invention, the remote sensing image to be detected may be obtained by using an existing method.

S102：将待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图。S102: Input the remote sensing image to be detected into the trained parallel perceptual attention network model to obtain a plurality of output feature maps of different scales.

在本发明实施例中，首先构建并行感知注意力网络模型，然后通过训练集对构建的并行感知注意力网络模型进行训练，得到训练后的并行感知注意力网络模型。In the embodiment of the present invention, a parallel perceptual attention network model is first constructed, and then the constructed parallel perceptual attention network model is trained through a training set to obtain a trained parallel perceptual attention network model.

在本发明的一个实施例中，在对并行感知注意力网络模型进行训练的过程中，使用类别损失函数和回归损失函数，其中回归损失函数为距离交并比损失函数。In an embodiment of the present invention, in the process of training the parallel perceptual attention network model, a class loss function and a regression loss function are used, wherein the regression loss function is a distance intersection ratio loss function.

具体地，类别损失函数为：Specifically, the class loss function is:

距离交并比损失函数(Distance Intersection over Union，DIoU)为：The loss function of distance intersection over union (DIoU) is:

在距离交并比损失函数中，b，b^gt分别代表了锚边界框和标签边界框的中心点，p代表计算两个中心点的欧式距离，c代表可以同时覆盖锚边界框和标签边界框的最小矩形的对角线距离。因此DIoU中对锚边界框和标签边界框之间的归一化距离进行了建模。该损失函数在加快收敛的同时有助于提升小目标的检测准确率。In the distance intersection ratio loss function, b and b^gt represent the center points of the anchor bounding box and the label bounding box, respectively, p represents the Euclidean distance between the two center points, and c represents that the anchor bounding box and the label bounding box can be covered at the same time. The diagonal distance of the smallest rectangle. Therefore, the normalized distance between the anchor bounding box and the label bounding box is modeled in DIoU. The loss function helps to improve the detection accuracy of small targets while accelerating the convergence.

在本发明实施例中，采用距离交并比损失代替传统的回归损失，能够在加快训练速度的同时增强对小目标的检测准确率。In the embodiment of the present invention, the distance intersection ratio loss is used to replace the traditional regression loss, which can improve the detection accuracy of small targets while speeding up the training speed.

在本发明的一个实施例中，参见图2，并行感知注意力网络模型以残差网络为主干；In an embodiment of the present invention, referring to FIG. 2 , the parallel perceptual attention network model uses a residual network as the backbone;

并行感知注意力网络模型包括第一残差块B₁、第二残差块B₂、第三残差块B₃、第四残差块B₄、第一并行感知注意力模块、第二并行感知注意力模块、第三并行感知注意力模块和第四并行感知注意力模块；第一残差块B₁、第二残差块B₂、第三残差块B₃和第四残差块B₄的尺寸均不同；The parallel perceptual attention network model includes a first residual block B₁ , a second residual block B₂ , a third residual block B₃ , a fourth residual block B₄ , a first parallel perceptual attention module, a second parallel Perceptual attention module, third parallel perceptual attention module and fourth parallel perceptual attention module; first residual block B₁ , second residual block B₂ , third residual block B₃ and fourth residual block B₄ are all different sizes;

第一并行感知注意力模块以第一残差块B₁和第二残差块B₂为输入，输出第一融合特征图IB₁；第二并行感知注意力模块以第二残差块B₂和第三残差块B₃为输入，输出第二融合特征图IB₂；第三并行感知注意力模块以第三残差块B₃和第四残差块B₄为输入，输出第三融合特征图IB₃；第四并行感知注意力模块以第四残差块B₄为输入，输出第四融合特征图IB₄；The first parallel perceptual attention module takes the first residual block B₁ and the second residual block B₂ as input, and outputs the first fusion feature map IB₁ ; the second parallel perceptual attention module uses the second residual block B₂ and the third residual block B₃ as input, and output the second fusion feature map IB₂ ; the third parallel perceptual attention module takes the third residual block B₃ and the fourth residual block B₄ as input, and outputs the third fusion feature map IB₃ ; the fourth parallel perceptual attention module takes the fourth residual block B₄ as input, and outputs the fourth fusion feature map IB₄ ;

第四融合特征图IB₄经过可变形卷积得到第四尺度的输出特征图O₄；第三融合特征图IB₃经过可变形卷积之后与经过2倍上采样后的第四尺度的输出特征图O₄相加得到第三尺度的输出特征图O₃；第二融合特征图IB₂经过可变形卷积之后与经过2倍上采样后的第三尺度的输出特征图O₃相加得到第二尺度的输出特征图O₂；第一融合特征图IB₁经过可变形卷积之后与经过2倍上采样后的第二尺度的输出特征图O₂相加得到第一尺度的输出特征图O₁。The fourth fused feature map IB₄ obtains an output feature map O₄ of the fourth scale after deformable convolution; the third fused feature map IB₃ is subjected to deformable convolution and the output feature of the fourth scale after upsampling by 2 times. Figure O₄ is added to obtain the output feature map O₃ of the third scale; the second fusion feature map IB₂ after deformable convolution is added with the output feature map O₃ of the third scale after double upsampling to obtain the first The output feature map O₂ of the second scale; the first fusion feature map IB₁ is added to the output feature map O₂ of the second scale after twice upsampling after deformable convolution to obtain the output feature map O of the first scale₁ .

其中，并行感知注意力网络模型的主干采用残差网络ResNet-101。Among them, the backbone of the parallel perception attention network model adopts the residual network ResNet-101.

在各个残差块后面引入并行感知注意力模块，并经过融合操作得到各个融合特征图，然后使用可变形卷积精确提取位置敏感特征并采用多尺度融合策略得到四个不同尺度的输出特征图。A parallel perceptual attention module is introduced after each residual block, and each fusion feature map is obtained through fusion operation, and then the position-sensitive features are accurately extracted by deformable convolution and the multi-scale fusion strategy is used to obtain output feature maps of four different scales.

由于在遥感图像中目标角度多变，常规的卷积操作很容易提取到无关信息，为了减少无关特征对方向敏感目标的影响，本发明实施例在得到的各个尺度下融合后的特征图后使用可变形卷积操作，该操作通过为每一个采样位置预测一对x方向和y方向的偏移量来达到修正采样位置的目的，从而改变了传统的规则的采样结构，可以对任意形状的物体进行采样，能够加强对方向敏感目标的特征提取能力。Since the target angle in the remote sensing image is variable, the conventional convolution operation can easily extract irrelevant information. In order to reduce the influence of the irrelevant features on the direction-sensitive target, the embodiment of the present invention uses the obtained fused feature maps at various scales. Deformable convolution operation, this operation achieves the purpose of correcting the sampling position by predicting a pair of offsets in the x-direction and y-direction for each sampling position, thus changing the traditional regular sampling structure, which can be used for objects of any shape. Sampling can enhance the feature extraction ability of direction-sensitive targets.

具体地，在可变形卷积中，输入为H×W×C的特征图，H为特征图的高度，W为特征图的宽度，C为特征图的通道数，经过卷积操作后得到大小为H×W×2C的特征图，此时的通道数为原来的两倍，分别代表每个像素点在X方向和Y方向的偏移量，最后将输入图像中像素的索引与经过卷积得到的偏移值相加得到最终的特征图，在进行像素偏移时需要将偏移量设定为图片以内。由于在实际运算中偏移量通常为小数，不能直接作为偏移坐标，如果强行取整会引入很大的误差，为了避免这一误差，通常在实际操作中采用双线性插值法得到最终特征图。Specifically, in the deformable convolution, the input is the feature map of H×W×C, H is the height of the feature map, W is the width of the feature map, C is the number of channels of the feature map, and the size is obtained after the convolution operation. It is a feature map of H×W×2C, the number of channels at this time is twice the original, representing the offset of each pixel in the X direction and the Y direction, and finally the index of the pixel in the input image is convolved with The obtained offset values are added to obtain the final feature map. When performing pixel offset, the offset needs to be set within the picture. Since the offset is usually a decimal in the actual operation, it cannot be directly used as the offset coordinate. If the rounding is forced, a large error will be introduced. In order to avoid this error, the bilinear interpolation method is usually used in the actual operation to obtain the final feature. picture.

在本发明的一个实施例中，第一并行感知注意力模块、第二并行感知注意力模块和第三并行感知注意力模块的结构相同；In an embodiment of the present invention, the structures of the first parallel perceptual attention module, the second parallel perceptual attention module and the third parallel perceptual attention module are the same;

参见图2，第一并行感知注意力模块包括第一多尺度注意力子模块、第一上下文注意力子模块和第一通道注意力子模块；Referring to FIG. 2, the first parallel perceptual attention module includes a first multi-scale attention sub-module, a first contextual attention sub-module and a first channel attention sub-module;

第一多尺度注意力子模块以第一残差块B₁和第二残差块B₂为输入，输出第一尺度特征图E；The first multi-scale attention sub-module takes the first residual block B₁ and the second residual block B₂ as input, and outputs the first scale feature map E;

第一上下文注意力子模块以第一残差块B₁为输入，输出第一上下文特征图F；The first context attention sub-module takes the first residual block B₁ as input, and outputs the first context feature map F;

第一通道注意力子模块以第一残差块B₁为输入，输出第一通道特征图G；The first channel attention sub-module takes the first residual block B₁ as input, and outputs the first channel feature map G;

将第一尺度特征图E、第一上下文特征图F和第一通道特征图G进行融合，得到第一融合特征图IB₁。The first scale feature map E, the first context feature map F and the first channel feature map G are fused to obtain a first fused feature map IB₁ .

在本发明实施例中，第一并行感知注意力模块、第二并行感知注意力模块和第三并行感知注意力模块的结构相同，只是输入输出不同。第二并行感知注意力模块包括第二多尺度注意力子模块、第二上下文注意力子模块和第二通道注意力子模块；第三并行感知注意力模块包括第三多尺度注意力子模块、第三上下文注意力子模块和第三通道注意力子模块。In the embodiment of the present invention, the structures of the first parallel perceptual attention module, the second parallel perceptual attention module and the third parallel perceptual attention module are the same, but the input and output are different. The second parallel perceptual attention module includes a second multi-scale attention sub-module, a second contextual attention sub-module and a second channel attention sub-module; the third parallel perceptual attention module includes a third multi-scale attention sub-module, The third contextual attention submodule and the third channel attention submodule.

具体地，在第二并行感知注意力模块中，第二残差块B₂代替第一并行感知注意力模块中的第一残差块B₁的位置，第三残差块B₃代替第一并行感知注意力模块中的第二残差块B₂的位置。同理，在第三并行感知注意力模块中，第三残差块B₃代替第一并行感知注意力模块中的第一残差块B₁的位置，第四残差块B₄代替第一并行感知注意力模块中的第二残差块B₂的位置。Specifically, in the second parallel perceptual attention module, the second residual block B₂ replaces the position of the first residual block B₁ in the first parallel perceptual attention module, and the third residual block B₃ replaces the first residual block B 3 The position of the second residual block B₂ in the attention module is perceived in parallel. Similarly, in the third parallel perceptual attention module, the third residual block B₃ replaces the position of the first residual block B₁ in the first parallel perceptual attention module, and the fourth residual block B₄ replaces the first residual block B 4 The position of the second residual block B₂ in the attention module is perceived in parallel.

在本发明的一个实施例中，参见图3，在第一多尺度注意力子模块中，将第一残差块B₁进行卷积得到第一中间尺度特征图A，将第二残差块B2进行卷积得到第二中间尺度特征图B，将第二中间尺度特征图B进行矩阵变换后与第一中间尺度特征图A进行相乘操作得到第三中间尺度特征图，对第三中间尺度特征图进行归一化得到第一多尺度注意力权重图M，将第一多尺度注意力权重图M与第二中间尺度特征图B进行相乘操作得到第四中间尺度特征图，对第四中间尺度特征图进行上采样后与第一残差块B₁进行相加操作得到第一尺度特征图E；In an embodiment of the present invention, referring to FIG. 3 , in the first multi-scale attention sub-module, the first residual block B₁ is convolved to obtain the first intermediate-scale feature map A, and the second residual block B2 is convolved to obtain the second intermediate scale feature map B, and the second intermediate scale feature map B is subjected to matrix transformation and then multiplied with the first intermediate scale feature map A to obtain the third intermediate scale feature map. The feature map is normalized to obtain the first multi-scale attention weight map M, and the first multi-scale attention weight map M and the second intermediate-scale feature map B are multiplied to obtain the fourth intermediate-scale feature map. After upsampling, the intermediate scale feature map is added with the first residual block B₁ to obtain the first scale feature map E;

参见图4，在第一上下文注意力子模块中，将第一残差块B₁进行卷积分别得到第一中间上下文特征图K和第二中间上下文特征图D，将第二中间上下文特征图D进行矩阵变换后与第一中间上下文特征图K进行相乘操作得到第三中间上下文特征图，对第三中间上下文特征图进行归一化得到第一上下文注意力权重图P，将第一上下文注意力权重图P和第一残差块B₁进行相乘操作得到第四中间上下文特征图，将第四中间上下文特征图进行矩阵变换后与第一残差块B₁进行相加操作得到第一上下文特征图F；Referring to Fig. 4, in the first context attention sub-module, the_first residual block B1 is convolved to obtain the first intermediate context feature map K and the second intermediate context feature map D, respectively, and the second intermediate context feature map After matrix transformation, D is multiplied with the first intermediate context feature map K to obtain the third intermediate context feature map, and the third intermediate context feature map is normalized to obtain the first context attention weight map P. The attention weight map P and the first residual block B₁ are multiplied to obtain the fourth intermediate context feature map, and the fourth intermediate context feature map is subjected to matrix transformation and added to the first residual block B₁ to obtain the fourth intermediate context feature map. a context feature map F;

参见图5，在第一通道注意力子模块中，将第一残差块B₁进行矩阵变换后与第一残差块B₁进行相乘操作得到第一中间通道特征图，将第一中间通道特征图进行归一化得到第一通道注意力权重图Q，将第一通道注意力权重图Q与第一残差块B₁进行相乘得到第二中间通道特征图，将第二中间通道特征图进行矩阵变换后与第一残差块B₁进行相加操作得到第一通道特征图G。Referring to FIG. 5 , in the first channel attention sub-module, the first residual block B₁ is subjected to matrix transformation and then multiplied by the first residual block B₁ to obtain the first intermediate channel feature map, and the first intermediate channel feature map is The channel feature map is normalized to obtain the first channel attention weight map Q, and the first channel attention weight map Q is multiplied by the first residual block B₁ to obtain the second intermediate channel feature map, and the second intermediate channel After the feature map is transformed into a matrix, an addition operation is performed with the first residual block B₁ to obtain the first channel feature map G.

在本发明实施例中，给出了第一并行感知注意力模块包括的第一多尺度注意力子模块、第一上下文注意力子模块和第一通道注意力子模块的具体工作过程，由于第一并行感知注意力模块、第二并行感知注意力模块和第三并行感知注意力模块的结构相同，只是输入输出不同，在此不再具体描述第二并行感知注意力模块和第三并行感知注意力模块的具体过程。In the embodiment of the present invention, the specific working process of the first multi-scale attention sub-module, the first contextual attention sub-module and the first channel attention sub-module included in the first parallel perceptual attention module is given. The structures of the first parallel perception attention module, the second parallel perception attention module and the third parallel perception attention module are the same, but the input and output are different. The second parallel perception attention module and the third parallel perception attention module will not be described in detail here. The specific process of the force module.

具体地，在深度卷积神经网络中，不同尺度的特征图含有不同程度的结构和语义信息，在高级特征图中语义信息较为丰富，在低级特征图中结构信息较为丰富。然而这些信息对于检测遥感图像中的目标尤其是小目标是非常重要的，为了充分利用以上信息，本发明实施例提出了多尺度注意力模块从而增强了小目标的特征表达。Specifically, in a deep convolutional neural network, feature maps of different scales contain different degrees of structural and semantic information, with richer semantic information in high-level feature maps and richer structural information in low-level feature maps. However, such information is very important for detecting targets in remote sensing images, especially small targets. In order to make full use of the above information, the embodiment of the present invention proposes a multi-scale attention module to enhance the feature expression of small targets.

本发明实施例给出了第一多尺度注意力子模块的具体工作过程。其中，第一中间尺度特征图A和第二中间尺度特征图B是由第一残差块B₁和第二残差块B₂分别进行1×1卷积得到的注意力权重图，H和W分别代表第一残差块B₁的高度和宽度，第一残差块B₁的通道数用C表示。矩阵变换可以是矩阵转置。归一化可以为Softmax归一化。由于第二中间尺度特征图B处于较深的网络层中，所以第一中间尺度特征图A含有较丰富的结构信息，第一多尺度注意力权重图M蕴含着第一中间尺度特征图A对第二中间尺度特征图B的结构信息的先验，因此，通过第一多尺度注意力权重图M得到的第一尺度特征图E含有丰富的结构信息以及较深层的语义信息，有利于检测小尺度的目标。The embodiment of the present invention provides a specific working process of the first multi-scale attention sub-module. Among them, the first mid-scale feature map A and the second mid-scale feature map B are the attention weight maps obtained by the 1×1 convolution of the first residual block B₁ and the second residual block B₂ respectively, and H and W represents the height and width of the first residual block B₁ respectively, and the number of channels of the first residual block B₁ is represented by C. Matrix transformation can be matrix transpose. The normalization can be Softmax normalization. Since the second intermediate-scale feature map B is in a deeper network layer, the first intermediate-scale feature map A contains rich structural information, and the first multi-scale attention weight map M contains the first intermediate-scale feature map A pair of The prior of the structural information of the second intermediate-scale feature map B, therefore, the first-scale feature map E obtained through the first multi-scale attention weight map M contains rich structural information and deeper semantic information, which is conducive to the detection of small scale target.

在本发明的一个实施例中，第一多尺度注意力权重图M的计算公式为：In an embodiment of the present invention, the calculation formula of the first multi-scale attention weight map M is:

其中，i表示第i行，j表示第j列，N为第一残差块B₁的高度，A为第一中间尺度特征图，B为第二中间尺度特征图；Wherein, i represents the ith row, j represents the jth column, N is the height of the first residual block B₁ , A is the first intermediate scale feature map, and B is the second intermediate scale feature map;

第一尺度特征图E的计算公式为：The calculation formula of the first scale feature map E is:

其中，B₁为第一残差块，α为可学习的第一权重系数。Among them, B₁ is the first residual block, and α is the learnable first weight coefficient.

可选地，j的取值可以为从1至第一残差块B₁的宽度之间的正整数。Optionally, the value of j may be a positive integer from 1 to the width of the first residual block B₁ .

可选地，第一残差块B₁的高度与宽度相同。Optionally, the height and width of the_first residual block B1 are the same.

M_ji为第一多尺度注意力权重图M中的归一化权重系数，其衡量在各个尺度中第i个位置对第j个位置的影响，α为可学习的第一权重系数，用来权衡修正后的特征图和初始特征图。参见图6，图6展示了部分第一尺度特征图E的热力图像，从图像中可以看出更多的小的飞机区域被激活。M_ji is the normalized weight coefficient in the first multi-scale attention weight map M, which measures the influence of the i-th position on the j-th position in each scale, and α is a learnable first weight coefficient, used for Weigh the revised feature map with the original feature map. Referring to Figure 6, Figure 6 shows a thermal image of part of the first scale feature map E, from which it can be seen that more small aircraft regions are activated.

本发明实施例还给出了第一上下文注意力子模块的具体工作过程。上下文信息可以有效地区分前景信息和背景信息，有助于复杂背景下遥感图像目标检测，第一上下文注意力子模块将上下文信息嵌入注意力机制中以充分提取前后背景的关联信息进而加强网络的特征表达能力。其主要结构如图4所示。The embodiment of the present invention also provides a specific working process of the first contextual attention sub-module. Context information can effectively distinguish foreground information from background information, which is helpful for remote sensing image target detection under complex background. feature expressiveness. Its main structure is shown in Figure 4.

其中，在第一上下文注意力子模块，对第一残差块B₁分别进行7×7卷积得到第一中间上下文特征图K和第二中间上下文特征图D；在第二上下文注意力子模块中，对第二残差块B₂分别进行5×5卷积得到两个中间上下文特征图；在第三上下文注意力子模块中，对第三残差块B₃分别进行3×3卷积得到两个中间上下文特征图；在第四上下文注意力子模块中，对第四残差块B₄分别进行1×1卷积得到两个中间上下文特征图。Among them, in the first context attention sub-module, 7×7 convolution is performed on the first residual block B₁ to obtain the first intermediate context feature map K and the second intermediate context feature map D; in the second context attention sub-module In the module, 5 × 5 convolutions are performed on the second residual block B₂ to obtain two intermediate context feature maps; in the third context attention sub-module, 3 × 3 convolutions are performed on the third residual block B₃ respectively. product to obtain two intermediate context feature maps; in the fourth contextual attention sub-module, 1×1 convolution is performed on the_fourth residual block B4 to obtain two intermediate context feature maps.

第一上下文注意力权重图包含了各个尺度下目标带有非局部关联性的上下文信息对目标分类以及回归的贡献程度。第一上下文特征图增强了目标以及目标周围关联信息的表达。The first contextual attention weight map contains the contribution of the contextual information of the target with non-local relevance to target classification and regression at each scale. The first contextual feature map enhances the representation of the target and associated information around the target.

在本发明的一个实施例中，第一上下文注意力权重图P的计算公式为：In an embodiment of the present invention, the calculation formula of the first contextual attention weight map P is:

其中，K为第一中间上下文特征图，D为第二中间上下文特征图；Wherein, K is the first intermediate context feature map, and D is the second intermediate context feature map;

第一上下文特征图F的计算公式为：The calculation formula of the first context feature map F is:

其中，β为可学习的第二权重系数。Among them, β is a learnable second weight coefficient.

其中，P_ji为带有上下文信息的权重图中衡量第i个位置对第j个位置的权重影响系数；β为可学习的第二权重系数，用来权衡修正后的特征图和初始特征图。参见图7，图7展示了部分第一上下文特征图F的热力图像，从图像中可以看出更多目标周围的局部信息被激活。Among them, P_ji is the weight influence coefficient of the i-th position on the j-th position in the weight map with context information; β is the second learnable weight coefficient, which is used to weigh the revised feature map and the initial feature map . Referring to Fig. 7, Fig. 7 shows a thermal image of a part of the first context feature map F. From the image, it can be seen that more local information around the target is activated.

本发明实施例还给出了第一通道注意力子模块的具体工作过程。卷积神经网络的特征图每个通道有不同的类别和空间位置的全局信息，有些信息有利于目标的检测，有些信息不利于目标的检测，为了强化积极的响应，削弱消极的响应，本发明实施例提出了通道注意力子模块对通道间的相互关系以及特征图内部的非局部关联进行建模。具体过程可参见图5。The embodiment of the present invention also provides a specific working process of the attention sub-module of the first channel. Each channel of the feature map of the convolutional neural network has global information of different categories and spatial positions. Some information is beneficial to the detection of the target, and some information is not conducive to the detection of the target. In order to strengthen the positive response and weaken the negative response, the present invention Embodiments propose a channel attention sub-module to model inter-channel interrelationships and non-local associations within feature maps. The specific process can be seen in Figure 5.

在本发明的一个实施例中，第一通道注意力权重图Q的计算公式为：In an embodiment of the present invention, the calculation formula of the first channel attention weight map Q is:

其中，C为第一残差块B₁的通道数量；Wherein, C is the number of channels of the first residual block B₁ ;

第一通道特征图G的计算公式为：The calculation formula of the first channel feature map G is:

其中，γ为可学习的第三权重系数。Among them, γ is a learnable third weight coefficient.

其中Q_ji为通道i对通道j的响应系数，γ为可学习的第三权重系数，用来权衡修正后的特征图和初始特征图。参见图8，图8展示了部分第一通道特征图G的热力图像，从图像中可以看出更多与目标相关联的全局信息被激活。where Q_ji is the response coefficient of channel i to channel j, and γ is a learnable third weight coefficient, which is used to weigh the modified feature map and the initial feature map. Referring to Fig. 8, Fig. 8 shows a thermal image of part of the first channel feature map G, and it can be seen from the image that more global information associated with the target is activated.

在本发明的一个实施例中，第四并行感知注意力模块包括第四上下文注意力子模块和第四通道注意力子模块；In one embodiment of the present invention, the fourth parallel perceptual attention module includes a fourth contextual attention sub-module and a fourth channel attention sub-module;

第四上下文注意力子模块以第四残差块B₄为输入，输出第四上下文特征图；The fourth context attention sub-module takes the fourth residual block B₄ as input, and outputs the fourth context feature map;

第四通道注意力子模块以第四残差块B₄为输入，输出第四通道特征图；The fourth channel attention sub-module takes the fourth residual block B₄ as input, and outputs the fourth channel feature map;

将第四上下文特征图与第四通道特征图进行融合，得到第四融合特征图IB₄。The fourth context feature map and the fourth channel feature map are fused to obtain a fourth fused feature map IB₄ .

与前述三个感知注意力模块不同，第四并行感知注意力模块只包括上下文注意力子模块和通道注意力子模块，上下文注意力子模块和通道注意力子模块分别与前述描述的第一上下文注意力子模块和第一通道注意力子模块的工作过程类似，在此不再赘述。Different from the aforementioned three perceptual attention modules, the fourth parallel perceptual attention module only includes a contextual attention sub-module and a channel attention sub-module, which are respectively the same as the first context described above. The working process of the attention sub-module is similar to that of the first-channel attention sub-module, and will not be repeated here.

可选地，在S102之前，还可以包括：Optionally, before S102, it may further include:

对待检测遥感图像进行预处理得到预处理后的待检测遥感图像；The remote sensing image to be detected is preprocessed to obtain the preprocessed remote sensing image to be detected;

相应的，S102可以包括：Correspondingly, S102 may include:

将预处理后的待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图。The preprocessed remote sensing images to be detected are input into the trained parallel perceptual attention network model, and multiple output feature maps of different scales are obtained.

S103：根据多个不同尺度的输出特征图进行目标检测得到检测结果。S103: Perform target detection according to a plurality of output feature maps of different scales to obtain a detection result.

在本发明实施例中，可以利用任何现有方法，根据多个不同尺度的输出特征图进行目标检测得到检测结果。In the embodiment of the present invention, any existing method may be used to perform target detection according to a plurality of output feature maps of different scales to obtain a detection result.

可选地，参见图9，在通过训练后的并行感知注意力网络模型进行特征提取之后，可以通过区域推荐网络，对齐、池化等操作以及进行非极大值抑制输出分类、定位结果等操作进行目标检测得到检测结果。Optionally, referring to FIG. 9 , after feature extraction is performed through the trained parallel perceptual attention network model, operations such as alignment, pooling, and non-maximum suppression output classification and positioning results can be performed through a regional recommendation network. Perform target detection to obtain detection results.

在本发明实施例中，并行感知注意力网络模型的设计细节参数如表1所示。In the embodiment of the present invention, the design details parameters of the parallel perceptual attention network model are shown in Table 1.

表1并行感知注意力网络模型的设计细节参数Table 1 Design details parameters of the parallel perception attention network model

通过实验验证本发明实施例的目标检测效果。The target detection effect of the embodiment of the present invention is verified by experiments.

实验使用的硬件和软件环境如下：The hardware and software environments used in the experiments are as follows:

CPU：Intel core i7 6700 3.30GHZ；GPU：p2000 5G；Memory：16G；操作系统：Ubuntu 16.04；开发环境：Tensorflow编程语言：Python 3.5；IDE：PycharmCPU: Intel core i7 6700 3.30GHZ; GPU: p2000 5G; Memory: 16G; Operating System: Ubuntu 16.04; Development Environment: Tensorflow Programming Language: Python 3.5; IDE: Pycharm

实验数据集：Experimental dataset:

实验所用数据集为两个遥感图像公共数据集：RSOD和UCAS-AOD，随机选取了其中汽车和飞机类别中的80％作为训练集，20％作为测试集。The datasets used in the experiment are two public datasets of remote sensing images: RSOD and UCAS-AOD. 80% of the car and airplane categories are randomly selected as the training set and 20% as the test set.

网络模型采用101层的残差网络作为主干网络，参数使用在ImageNet上与预练的权重进行初始化，图片输入大小统一调整为800x800像素，使用随机梯度下降法进行30000轮训练，初试学习率为0.001，经过15000轮后降为0.0001。在锚边界框选择上，使用四个尺度分别为32x32，64x64，128x128，256x256，长宽比为1:1，2:1和1:2锚边界框，这样的设置可以减少计算的同时保证较好的精度，IoU的阈值设置为0.7。The network model uses a 101-layer residual network as the backbone network, the parameters are initialized with the weights pre-trained on ImageNet, the image input size is uniformly adjusted to 800x800 pixels, and the stochastic gradient descent method is used for 30,000 rounds of training. The initial learning rate is 0.001 , which drops to 0.0001 after 15,000 rounds. In the anchor bounding box selection, four scales are used, 32x32, 64x64, 128x128, 256x256, and the aspect ratios are 1:1, 2:1 and 1:2 anchor bounding boxes. This setting can reduce the calculation while ensuring the comparison. Good accuracy, the threshold for IoU is set to 0.7.

表2与其他方法在平均精度和召回率上的检测结果对比Table 2 Comparison of detection results with other methods in terms of average precision and recall

实验结果：Experimental results:

实验的评价指标采用平均准确率和召回率。图10显示了本发明实施例的目标检测方法与当前主流的深度学习方法的检测结果对比，前三列分别展示了目标(飞机)在复杂背景下，在尺度较小时以及在遮挡情况下的检测结果，最后一列展示了各场景下汽车的检测结果，其中第一行为原始图片，第二行和第三行分别为基于回归的目标检测模型YOLO和SSD的检测结果，从框中可以看出其检测准确率不高，在复杂的场景下仍然有很多漏检情况。第四行和第五行是基于区域推荐的目标检测模型FPN和Faster-RCNN，从结果看出其检测准确率高于YOLO和SSD，可见该类方法对复杂场景下的目标有一定的鲁棒性，第六行为本发明实施例提供的方法，由于各个并行注意力模块的存在，可以使网络提取到更加丰富的目标的多尺度特征以及目标的非局部关联特征，较其他网络模型，本模型在复杂场景下表现良好，可以对图中所有目标进行有效检测。The evaluation indicators of the experiment are the average precision and recall. Figure 10 shows the comparison between the detection results of the target detection method according to the embodiment of the present invention and the current mainstream deep learning method. The first three columns respectively show the detection of the target (airplane) in a complex background, when the scale is small and in the case of occlusion As a result, the last column shows the detection results of cars in each scene. The first row is the original image, and the second and third rows are the detection results of the regression-based target detection models YOLO and SSD, respectively. It can be seen from the box that its The detection accuracy is not high, and there are still many missed detections in complex scenarios. The fourth and fifth lines are the target detection models FPN and Faster-RCNN based on regional recommendation. From the results, it can be seen that the detection accuracy is higher than that of YOLO and SSD. It can be seen that this type of method is robust to targets in complex scenes. , the sixth behavior is the method provided by the embodiment of the present invention. Due to the existence of each parallel attention module, the network can extract more abundant multi-scale features of the target and non-local correlation features of the target. Compared with other network models, this model is in It performs well in complex scenes and can effectively detect all objects in the graph.

表2显示了本发明实施例提供的方法与其他方法对汽车和飞机检测结果的准确率和召回率数值对比情况，相比于其他深度学习方法，本发明实施例提供的方法在汽车和飞机检测的平均准确率和召回率上平均提升7％，比当前最好的检测方法大约提高1％。Table 2 shows the numerical comparison of the accuracy and recall rate of the method provided by the embodiment of the present invention and other methods on the detection results of automobiles and airplanes. On average, the average precision and recall are improved by 7%, which is about 1% higher than the current state-of-the-art detection methods.

表3显示了本发明实施例提供的方法与其他方法检测速度的对比情况，从表3中可以看出将本发明实施例提供的方法中的网络模型作为目标检测的主干网络，可以达到约8.8FPS的检测速度，较上一模型提升3倍，且比主流基于区域推荐的网络模型的检测速度也有所提高。Table 3 shows the comparison of the detection speed between the method provided by the embodiment of the present invention and other methods. It can be seen from Table 3 that the network model in the method provided by the embodiment of the present invention is used as the backbone network for target detection, which can reach about 8.8 The detection speed of FPS is 3 times higher than that of the previous model, and the detection speed of the mainstream network model based on region recommendation is also improved.

表3与其他方法检的检测速度对比Table 3 Comparison of detection speed with other methods

实验还使用消融研究来验证各个子模块对检测结果的作用情况，从表4的消融研究数据来看，当模型只使用通道注意力子模块和上下文注意力子模块时，平均准确度提升了0.9％，当使用多尺度注意力子模块和通道注意力子模块时，平均准确率提升了2.1％，当使用上下文注意力子模块和多尺度注意力子模块时，提高了2.3％的平均精度，这表明多尺度和上下文的信息特征更有助于检测目标，当所有子模块都被使用时提高了3.7％的平均精度，由此可见各个子模块对于检测目标来说都是有效的。The experiment also uses the ablation study to verify the effect of each sub-module on the detection results. From the ablation study data in Table 4, when the model only uses the channel attention sub-module and the contextual attention sub-module, the average accuracy is improved by 0.9 %, the average accuracy is improved by 2.1% when using the multi-scale attention sub-module and the channel attention sub-module, and the average accuracy is improved by 2.3% when using the contextual attention sub-module and the multi-scale attention sub-module, This shows that multi-scale and contextual informative features are more helpful for detecting objects, and the average accuracy is improved by 3.7% when all sub-modules are used, which shows that each sub-module is effective for detecting objects.

表4各模块对检测结果的作用情况Table 4 The effect of each module on the test results

本发明实施例基于注意力机制提出了并行感知注意力网络模型(神经网络模型)来提升遥感图像目标检测的准确率和检测速度，该网络模型包括并行的多尺度注意力子模块，上下文注意力子模块以及通道注意力子模块。首先融合多个尺度下三个并行模块的输出，获得丰富的多尺度特征，上下文特征以及非局部的关联特征；然后在得到的融合后特征图中使用可变形卷积代替传统卷积，从而更好地提取方向敏感的物体特征；最后使用距离交并比损失代替传统的边界框损失，在加快模型收敛速度的同时获得了更精确的目标定位；实验结果验证了将该网络模型作为目标检测的主干网络可以有效地提高检测精确度和检测速度，同时对处于复杂场景下的目标亦有很好的检测效果。The embodiment of the present invention proposes a parallel perceptual attention network model (neural network model) based on the attention mechanism to improve the accuracy and detection speed of remote sensing image target detection. The network model includes parallel multi-scale attention sub-modules, context attention submodule as well as the channel attention submodule. First, the outputs of the three parallel modules at multiple scales are fused to obtain rich multi-scale features, contextual features and non-local associated features; then, the deformable convolution is used to replace the traditional convolution in the obtained fused feature map, so that the The direction-sensitive object features can be extracted well. Finally, the distance intersection ratio loss is used to replace the traditional bounding box loss, which can speed up the model convergence speed and obtain more accurate target positioning. The experimental results verify that the network model is used as a target detection method. The backbone network can effectively improve the detection accuracy and detection speed, and also has a good detection effect on targets in complex scenes.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

图11是本发明一实施例提供的遥感图像目标检测系统的示意框图，为了便于说明，仅示出与本发明实施例相关的部分。FIG. 11 is a schematic block diagram of a remote sensing image target detection system provided by an embodiment of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.

在本发明实施例中，遥感图像目标检测系统110可以包括获取模块1101、特征提取模块1102和目标检测模块1103。In this embodiment of the present invention, the remote sensing imagetarget detection system 110 may include anacquisition module 1101 , afeature extraction module 1102 and atarget detection module 1103 .

其中，获取模块1101，用于获取待检测遥感图像；Wherein, theacquisition module 1101 is used to acquire remote sensing images to be detected;

特征提取模块1102，用于将待检测遥感图像输入训练后的并行感知注意力网络模型中，得到多个不同尺度的输出特征图；Thefeature extraction module 1102 is used to input the remote sensing images to be detected into the trained parallel perceptual attention network model to obtain a plurality of output feature maps of different scales;

目标检测模块1103，用于根据多个不同尺度的输出特征图进行目标检测得到检测结果。Thetarget detection module 1103 is configured to perform target detection according to a plurality of output feature maps of different scales to obtain a detection result.

可选地，在特征提取模块1102中，并行感知注意力网络模型以残差网络为主干；Optionally, in thefeature extraction module 1102, the parallel perceptual attention network model takes the residual network as the backbone;

并行感知注意力网络模型包括第一残差块、第二残差块、第三残差块、第四残差块、第一并行感知注意力模块、第二并行感知注意力模块、第三并行感知注意力模块和第四并行感知注意力模块；第一残差块、第二残差块、第三残差块和第四残差块的尺寸均不同；The parallel perceptual attention network model includes a first residual block, a second residual block, a third residual block, a fourth residual block, a first parallel perceptual attention module, a second parallel perceptual attention module, and a third parallel perceptual attention module. The perceptual attention module and the fourth parallel perceptual attention module; the sizes of the first residual block, the second residual block, the third residual block and the fourth residual block are all different;

第一并行感知注意力模块以第一残差块和第二残差块为输入，输出第一融合特征图IB1；第二并行感知注意力模块以第二残差块和第三残差块为输入，输出第二融合特征图；第三并行感知注意力模块以第三残差块和第四残差块为输入，输出第三融合特征图；第四并行感知注意力模块以第四残差块为输入，输出第四融合特征图；The first parallel perceptual attention module takes the first residual block and the second residual block as input, and outputs the first fusion feature map IB1; the second parallel perceptual attention module takes the second residual block and the third residual block as Input, output the second fusion feature map; the third parallel perceptual attention module takes the third residual block and the fourth residual block as input, and outputs the third fusion feature map; the fourth parallel perceptual attention module uses the fourth residual The block is the input, and the fourth fusion feature map is output;

第四融合特征图经过可变形卷积得到第四尺度的输出特征图；第三融合特征图经过可变形卷积之后与经过2倍上采样后的第四尺度的输出特征图相加得到第三尺度的输出特征图；第二融合特征图经过可变形卷积之后与经过2倍上采样后的第三尺度的输出特征图相加得到第二尺度的输出特征图；第一融合特征图IB1经过可变形卷积之后与经过2倍上采样后的第二尺度的输出特征图相加得到第一尺度的输出特征图。The fourth fused feature map undergoes deformable convolution to obtain the output feature map of the fourth scale; the third fused feature map undergoes deformable convolution and is added to the output feature map of the fourth scale after twice upsampling to obtain the third fused feature map. The output feature map of the scale; the second fusion feature map after deformable convolution is added to the output feature map of the third scale after 2 times upsampling to obtain the output feature map of the second scale; the first fusion feature map IB1 After deformable convolution, the output feature map of the first scale is obtained by adding the output feature map of the second scale after twice upsampling.

可选地，第一并行感知注意力模块、第二并行感知注意力模块和第三并行感知注意力模块的结构相同；Optionally, the structures of the first parallel perception attention module, the second parallel perception attention module and the third parallel perception attention module are the same;

第一并行感知注意力模块包括第一多尺度注意力子模块、第一上下文注意力子模块和第一通道注意力子模块；The first parallel perceptual attention module includes a first multi-scale attention sub-module, a first contextual attention sub-module and a first channel attention sub-module;

第一多尺度注意力子模块以第一残差块和第二残差块为输入，输出第一尺度特征图；The first multi-scale attention sub-module takes the first residual block and the second residual block as input, and outputs the first scale feature map;

第一上下文注意力子模块以第一残差块为输入，输出第一上下文特征图；The first context attention sub-module takes the first residual block as input, and outputs the first context feature map;

第一通道注意力子模块以第一残差块为输入，输出第一通道特征图；The first channel attention sub-module takes the first residual block as input, and outputs the first channel feature map;

将第一尺度特征图、第一上下文特征图和第一通道特征图进行融合，得到第一融合特征图IB1。The first scale feature map, the first context feature map, and the first channel feature map are fused to obtain a first fused feature map IB1.

可选地，在第一多尺度注意力子模块中，将第一残差块进行卷积得到第一中间尺度特征图，将第二残差块进行卷积得到第二中间尺度特征图，将第二中间尺度特征图进行矩阵变换后与第一中间尺度特征图进行相乘操作得到第三中间尺度特征图，对第三中间尺度特征图进行归一化得到第一多尺度注意力权重图，将第一多尺度注意力权重图与第二中间尺度特征图进行相乘操作得到第四中间尺度特征图，对第四中间尺度特征图进行上采样后与第一残差块进行相加操作得到第一尺度特征图；Optionally, in the first multi-scale attention sub-module, the first residual block is convolved to obtain a first intermediate-scale feature map, the second residual block is convolved to obtain a second intermediate-scale feature map, and the After the second intermediate-scale feature map is transformed into a matrix, it is multiplied with the first intermediate-scale feature map to obtain a third intermediate-scale feature map, and the third intermediate-scale feature map is normalized to obtain a first multi-scale attention weight map. Multiply the first multi-scale attention weight map and the second intermediate-scale feature map to obtain a fourth intermediate-scale feature map, and upsample the fourth intermediate-scale feature map and add it to the first residual block to obtain The first scale feature map;

在第一上下文注意力子模块中，将第一残差块进行卷积分别得到第一中间上下文特征图和第二中间上下文特征图，将第二中间上下文特征图进行矩阵变换后与第一中间上下文特征图进行相乘操作得到第三中间上下文特征图，对第三中间上下文特征图进行归一化得到第一上下文注意力权重图，将第一上下文注意力权重图和第一残差块进行相乘操作得到第四中间上下文特征图，将第四中间上下文特征图进行矩阵变换后与第一残差块进行相加操作得到第一上下文特征图；In the first context attention sub-module, the first residual block is convolved to obtain the first intermediate context feature map and the second intermediate context feature map, respectively, and the second intermediate context feature map is matrix-transformed with the first intermediate context feature map. The context feature map is multiplied to obtain the third intermediate context feature map, the third intermediate context feature map is normalized to obtain the first context attention weight map, and the first context attention weight map and the first residual block are processed. A multiplication operation is performed to obtain a fourth intermediate context feature map, and the fourth intermediate context feature map is subjected to matrix transformation and an addition operation is performed with the first residual block to obtain a first context feature map;

在第一通道注意力子模块中，将第一残差块进行矩阵变换后与第一残差块进行相乘操作得到第一中间通道特征图，将第一中间通道特征图进行归一化得到第一通道注意力权重图，将第一通道注意力权重图与第一残差块进行相乘得到第二中间通道特征图，将第二中间通道特征图进行矩阵变换后与第一残差块进行相加操作得到第一通道特征图。In the first channel attention sub-module, the first residual block is subjected to matrix transformation and then multiplied with the first residual block to obtain the first intermediate channel feature map, and the first intermediate channel feature map is normalized to obtain The first channel attention weight map is multiplied by the first channel attention weight map and the first residual block to obtain the second intermediate channel feature map. After matrix transformation, the second intermediate channel feature map is combined with the first residual block. An addition operation is performed to obtain the first channel feature map.

可选地，第一多尺度注意力权重图M的计算公式为：Optionally, the calculation formula of the first multi-scale attention weight map M is:

其中，i表示第i行，j表示第j列，N为第一残差块的高度，A为第一中间尺度特征图，B为第二中间尺度特征图；Wherein, i represents the ith row, j represents the jth column, N is the height of the first residual block, A is the first intermediate scale feature map, and B is the second intermediate scale feature map;

其中，B₁为第一残差块，α为可学习的第一权重系数；Among them, B₁ is the first residual block, and α is the learnable first weight coefficient;

第一上下文注意力权重图P的计算公式为：The calculation formula of the first context attention weight map P is:

其中，β为可学习的第二权重系数；Among them, β is a learnable second weight coefficient;

第一通道注意力权重图Q的计算公式为：The calculation formula of the first channel attention weight map Q is:

其中，C为第一残差块的通道数量；Among them, C is the number of channels of the first residual block;

可选地，第四并行感知注意力模块包括第四上下文注意力子模块和第四通道注意力子模块；Optionally, the fourth parallel perceptual attention module includes a fourth contextual attention sub-module and a fourth channel attention sub-module;

第四上下文注意力子模块以第四残差块为输入，输出第四上下文特征图；The fourth contextual attention sub-module takes the fourth residual block as input, and outputs the fourth contextual feature map;

第四通道注意力子模块以第四残差块为输入，输出第四通道特征图；The fourth channel attention sub-module takes the fourth residual block as input, and outputs the fourth channel feature map;

将第四上下文特征图与第四通道特征图进行融合，得到第四融合特征图。The fourth context feature map and the fourth channel feature map are fused to obtain a fourth fused feature map.

可选地，在对并行感知注意力网络模型进行训练的过程中，使用类别损失函数和回归损失函数，其中回归损失函数为距离交并比损失函数。Optionally, in the process of training the parallel perceptual attention network model, a class loss function and a regression loss function are used, wherein the regression loss function is a distance intersection ratio loss function.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述遥感图像目标检测系统的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述装置中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. The module is completed, that is, the internal structure of the remote sensing image target detection system is divided into different functional units or modules, so as to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above apparatus, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.

图12是本发明一实施例提供的终端设备的示意框图。如图12所示，该实施例的终端设备120包括：一个或多个处理器1201、存储器1202以及存储在所述存储器1202中并可在所述处理器1201上运行的计算机程序1203。所述处理器1201执行所述计算机程序1203时实现上述各个遥感图像目标检测方法实施例中的步骤，例如图1所示的步骤S101至S103。或者，所述处理器1201执行所述计算机程序1203时实现上述遥感图像目标检测系统实施例中各模块/单元的功能，例如图11所示模块1101至1103的功能。FIG. 12 is a schematic block diagram of a terminal device according to an embodiment of the present invention. As shown in FIG. 12 , theterminal device 120 in this embodiment includes: one ormore processors 1201 , amemory 1202 , and acomputer program 1203 stored in thememory 1202 and executable on theprocessor 1201 . When theprocessor 1201 executes thecomputer program 1203, the steps in each of the foregoing embodiments of the remote sensing image target detection method are implemented, for example, steps S101 to S103 shown in FIG. 1 . Alternatively, when theprocessor 1201 executes thecomputer program 1203, the functions of each module/unit in the above-mentioned embodiment of the remote sensing image target detection system are implemented, for example, the functions of themodules 1101 to 1103 shown in FIG. 11 .

示例性地，所述计算机程序1203可以被分割成一个或多个模块/单元，所述一个或者多个模块/单元被存储在所述存储器1202中，并由所述处理器1201执行，以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段，该指令段用于描述所述计算机程序1203在所述终端设备120中的执行过程。例如，所述计算机程序1203可以被分割成获取模块、特征提取模块和目标检测模块，各模块具体功能如下：Exemplarily, thecomputer program 1203 may be divided into one or more modules/units, and the one or more modules/units are stored in thememory 1202 and executed by theprocessor 1201 to complete the this application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of thecomputer program 1203 in theterminal device 120 . For example, thecomputer program 1203 can be divided into an acquisition module, a feature extraction module and a target detection module, and the specific functions of each module are as follows:

其它模块或者单元可参照图11所示的实施例中的描述，在此不再赘述。For other modules or units, reference may be made to the description in the embodiment shown in FIG. 11 , and details are not described herein again.

所述终端设备120可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备120包括但不仅限于处理器1201、存储器1202。本领域技术人员可以理解，图12仅仅是终端设备120的一个示例，并不构成对终端设备120的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如所述终端设备120还可以包括输入设备、输出设备、网络接入设备、总线等。Theterminal device 120 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. Theterminal device 120 includes, but is not limited to, theprocessor 1201 and thememory 1202 . Those skilled in the art can understand that FIG. 12 is only an example of theterminal device 120, and does not constitute a limitation on theterminal device 120. It may include more or less components than the one shown, or combine some components, or different Components, for example, theterminal device 120 may also include input devices, output devices, network access devices, buses, and the like.

所述处理器1201可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Theprocessor 1201 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

所述存储器1202可以是所述终端设备120的内部存储单元，例如终端设备120的硬盘或内存。所述存储器1202也可以是所述终端设备120的外部存储设备，例如所述终端设备120上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(SecureDigital,SD)卡，闪存卡(Flash Card)等。进一步地，所述存储器1202还可以既包括终端设备120的内部存储单元也包括外部存储设备。所述存储器1202用于存储所述计算机程序1203以及所述终端设备120所需的其他程序和数据。所述存储器1202还可以用于暂时地存储已经输出或者将要输出的数据。Thememory 1202 may be an internal storage unit of theterminal device 120 , such as a hard disk or a memory of theterminal device 120 . Thememory 1202 may also be an external storage device of theterminal device 120, such as a pluggable hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) card equipped on theterminal device 120. , Flash Card (Flash Card) and so on. Further, thememory 1202 may also include both an internal storage unit of theterminal device 120 and an external storage device. Thememory 1202 is used to store thecomputer program 1203 and other programs and data required by theterminal device 120 . Thememory 1202 may also be used to temporarily store data that has been output or will be output.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

在本申请所提供的实施例中，应该理解到，所揭露的遥感图像目标检测系统和方法，可以通过其它的方式实现。例如，以上所描述的遥感图像目标检测系统实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed remote sensing image target detection system and method may be implemented in other ways. For example, the embodiments of the remote sensing image target detection system described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple divisions. Units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括是电载波信号和电信信号。The integrated modules/units, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Excluded are electrical carrier signals and telecommunication signals.

以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

1. A remote sensing image target detection method is characterized by comprising the following steps:

acquiring a remote sensing image to be detected;

inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output characteristic graphs with different scales;

and carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.

2. The remote sensing image target detection method of claim 1, wherein the parallel perceptual attention network model is based on a residual error network;

the parallel perception attention network model comprises a first residual block, a second residual block, a third residual block, a fourth residual block, a first parallel perception attention module, a second parallel perception attention module, a third parallel perception attention module and a fourth parallel perception attention module; the first, second, third and fourth residual blocks all have different sizes;

the first parallel perception attention module takes the first residual block and the second residual block as input and outputs a first fusion feature map; the second parallel perception attention module takes the second residual block and the third residual block as input and outputs a second fusion feature map; the third parallel perceptual attention module takes the third residual block and the fourth residual block as input and outputs a third fused feature map; the fourth parallel perception attention module takes the fourth residual block as input and outputs a fourth fusion feature map;

the fourth fusion feature map is subjected to deformable convolution to obtain an output feature map of a fourth scale; after the third fusion feature map is subjected to deformable convolution, adding the third fusion feature map and the output feature map of the fourth scale subjected to 2 times of upsampling to obtain an output feature map of the third scale; the second fusion feature map is subjected to deformable convolution and then is added with the output feature map of the third scale subjected to 2 times of upsampling to obtain an output feature map of the second scale; and the first fusion feature map is subjected to deformable convolution and then is added with the output feature map of the second scale subjected to 2 times of upsampling to obtain an output feature map of the first scale.

3. The remote sensing image target detection method of claim 2, wherein the first parallel perceptual attention module, the second parallel perceptual attention module and the third parallel perceptual attention module have the same structure;

the first parallel perceptual attention module includes a first multi-scale attention submodule, a first contextual attention submodule, and a first channel attention submodule;

the first multi-scale attention submodule takes the first residual block and the second residual block as input and outputs a first scale feature map;

the first context attention submodule takes the first residual block as input and outputs a first context feature map;

the first channel attention submodule takes the first residual block as input and outputs a first channel characteristic diagram;

and fusing the first scale feature map, the first context feature map and the first channel feature map to obtain the first fused feature map.

4. The remote sensing image target detection method of claim 3, wherein in the first multi-scale attention sub-module, convolving the first residual block to obtain a first intermediate scale feature map, convolving the second residual block to obtain a second intermediate scale feature map, performing matrix transformation on the second intermediate scale feature map, multiplying the second intermediate scale feature map by the first intermediate scale feature map to obtain a third intermediate scale feature map, normalizing the third intermediate-scale feature map to obtain a first multi-scale attention weight map, multiplying the first multi-scale attention weight map and the second intermediate-scale feature map to obtain a fourth intermediate-scale feature map, performing upsampling on the fourth intermediate-scale feature map, and then performing addition operation on the upsampled fourth intermediate-scale feature map and the first residual block to obtain a first-scale feature map;

in the first context attention submodule, performing convolution on the first residual block to respectively obtain a first intermediate context feature map and a second intermediate context feature map, performing matrix transformation on the second intermediate context feature map, then performing multiplication operation on the second intermediate context feature map and the first intermediate context feature map to obtain a third intermediate context feature map, normalizing the third intermediate context feature map to obtain a first context attention weight map, performing multiplication operation on the first context attention weight map and the first residual block to obtain a fourth intermediate context feature map, performing matrix transformation on the fourth intermediate context feature map, and then performing addition operation on the fourth intermediate context feature map and the first residual block to obtain the first context feature map;

in the first channel attention submodule, performing matrix transformation on the first residual block, then performing multiplication operation on the first residual block and the first residual block to obtain a first intermediate channel feature map, normalizing the first intermediate channel feature map to obtain a first channel attention weight map, performing multiplication on the first channel attention weight map and the first residual block to obtain a second intermediate channel feature map, performing matrix transformation on the second intermediate channel feature map, and then performing addition operation on the second intermediate channel feature map and the first residual block to obtain the first channel feature map.

5. The method for detecting the target in the remote sensing image as claimed in claim 4, wherein the calculation formula of the first multi-scale attention weight map M is as follows:

wherein i represents the ith row, j represents the jth column, N is the height of the first residual block, a is the first intermediate-scale feature map, and B is the second intermediate-scale feature map;

the calculation formula of the first scale feature map E is as follows:

wherein, B₁For the first residual block, α is a learnable first weight coefficient;

the calculation formula of the first context attention weight map P is as follows:

wherein K is the first intermediate context feature map, and D is the second intermediate context feature map;

the calculation formula of the first context feature map F is as follows:

wherein β is a learnable second weight coefficient;

the calculation formula of the first channel attention weight map Q is as follows:

wherein C is the channel number of the first residual block;

the calculation formula of the first channel characteristic diagram G is as follows:

where γ is a learnable third weight coefficient.

6. The remote sensing image target detection method of claim 2, wherein the fourth parallel perceptual attention module comprises a fourth contextual attention submodule and a fourth channel attention submodule;

the fourth context attention submodule takes the fourth residual block as input and outputs a fourth context feature map;

the fourth channel attention submodule takes a fourth residual block as input and outputs a fourth channel characteristic diagram;

and fusing the fourth context feature map and the fourth channel feature map to obtain the fourth fused feature map.

7. The remote sensing image target detection method of any one of claims 1-6, characterized in that a class loss function and a regression loss function are used in the training process of the parallel perceptual attention network model, wherein the regression loss function is a distance cross-correlation loss function.

8. A remote sensing image target detection system, comprising:

the acquisition module is used for acquiring a remote sensing image to be detected;

the feature extraction module is used for inputting the remote sensing image to be detected into the trained parallel perception attention network model to obtain a plurality of output feature maps with different scales;

and the target detection module is used for carrying out target detection according to the output characteristic graphs with different scales to obtain a detection result.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for object detection of remote sensing images according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by one or more processors, carries out the steps of the method for object detection according to any one of claims 1 to 7.