CN117593187A

Movatterモバイル変換

Info

Publication number: CN117593187A
Application number: CN202311579041.6A
Authority: CN
Inventors: 张浩鹏; 魏小源; 姜志国; 谢凤英; 赵丹培
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-23

Abstract

The invention discloses a remote sensing image super-resolution reconstruction method based on meta-learning and Transformer, which is used for obtaining low-resolution and high-resolution image pairs under various scale factors of an original remote sensing image data set; extracting intensive residual error attention low score features and low score features based on a vision transducer network from the low score images respectively, and fusing the intensive residual error attention low score features and the low score features; and inputting the high-resolution image into a meta-up sampling module for meta-learning prediction to obtain an up-sampling filter of a corresponding scale factor, and obtaining the super-resolution image by utilizing low-resolution features based on a visual transducer network based on the up-sampling filter. The invention utilizes the global semantic information and the local target information of the remote sensing image simultaneously in the super-resolution reconstruction process, thereby improving the perceived quality of the reconstruction result. The combination element up-sampling method solves the problem of multi-scale super-resolution reconstruction by utilizing a single super-division model, and finally realizes a super-division reconstruction algorithm based on any scale factors.

Description

Translated fromChinese

基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法Super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformerconstruction method

技术领域Technical field

本发明涉及模式识别与机器学习技术领域，特别涉及一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法。The invention relates to the technical fields of pattern recognition and machine learning, and in particular to a method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer.

背景技术Background technique

遥感图像是通过各种传感器获得的包含地物电磁波信息的图片，具有种类多，数据量大，涵盖的时间和空间范围广的特点。空间分辨率是描述遥感图像的一个重要指标，指遥感图像能区分的最小地面单元，即遥感图像的一个像素的长度对应实际地面上的一段距离，该距离决定了遥感图像能包含的最细微的地物信息。提升遥感图像空间分辨率是遥感领域的重要课题，除了在硬件层面直接提高传感器本身的性能外，还可在软件层面设计算法实现遥感图像空间分辨率的提升，这类技术被称为超分辨率重建技术。Remote sensing images are pictures containing electromagnetic wave information of ground objects obtained through various sensors. They have the characteristics of many types, large amount of data, and covering a wide range of time and space. Spatial resolution is an important indicator for describing remote sensing images. It refers to the smallest ground unit that can be distinguished in remote sensing images. That is, the length of one pixel in a remote sensing image corresponds to a distance on the actual ground. This distance determines the smallest ground unit that a remote sensing image can contain. Feature information. Improving the spatial resolution of remote sensing images is an important topic in the field of remote sensing. In addition to directly improving the performance of the sensor itself at the hardware level, algorithms can also be designed at the software level to improve the spatial resolution of remote sensing images. This type of technology is called super-resolution. Reconstruction technology.

图像超分辨率重建技术(以下简称超分重建)可以根据不同任务需求，将低分辨率图像(以下简称低分图像)按照一定比例因子恢复到高分辨率图像(以下简称高分图像)，如果是在遥感图像空间尺度进行超分则可以提升其空间分辨率。在超分重建中，按不同比例因子放大图像会生成不同大小的结果，即比例因子的改变会导致低分图像到高分图像间映射规则的变化，因此对超分重建而言，不同的比例因子代表不同的任务。大部分基于单一比例因子的超分重建算法只能按一种比例因子对图像进行超分。而在真实遥感图像的超分处理任务中，通常要对图像进行不同的整数或非整数倍放大。若是对每种可能的比例因子都训练一个模型并存储到计算平台中，会造成极大的算力和空间资源的浪费。因此基于任意比例因子的超分重建算法对遥感图像处理而言更具实用性。Image super-resolution reconstruction technology (hereinafter referred to as super-resolution reconstruction) can restore low-resolution images (hereinafter referred to as low-resolution images) to high-resolution images (hereinafter referred to as high-resolution images) according to a certain scale factor according to different task requirements. If Super-resolution at the spatial scale of remote sensing images can improve their spatial resolution. In super-resolution reconstruction, enlarging images with different scale factors will produce results of different sizes. That is, changes in scale factors will lead to changes in the mapping rules between low-resolution images and high-resolution images. Therefore, for super-resolution reconstruction, different scales Factors represent different tasks. Most super-resolution reconstruction algorithms based on a single scale factor can only super-resolve images according to one scale factor. In the super-resolution processing task of real remote sensing images, it is usually necessary to enlarge the image at different integer or non-integer times. If a model is trained for every possible scaling factor and stored in the computing platform, it will cause a huge waste of computing power and space resources. Therefore, the super-resolution reconstruction algorithm based on arbitrary scale factors is more practical for remote sensing image processing.

遥感图像超分重建可大致分为非深度学习(传统)方法和深度学习方法。传统遥感图像超分重建主要基于插值算法，例如最近邻插值、线性插值、双线性插值和双三次插值等。此外还有基于边缘保持的超分重建，这种方法旨在保持边缘信息的同时提高遥感图像的分辨率。传统方法虽然计算简单，但是对于具有复杂纹理的遥感图像而言存在难以恢复高频细节信息的问题。此外对于整数和小范围内的比例因子，传统方法具有良好的表现并能满足实际应用需求，但对于非整数和更大的比例因子，如3.2、5、20等，传统方法便难以满足要求。Super-resolution reconstruction of remote sensing images can be roughly divided into non-deep learning (traditional) methods and deep learning methods. Traditional remote sensing image super-resolution reconstruction is mainly based on interpolation algorithms, such as nearest neighbor interpolation, linear interpolation, bilinear interpolation and bicubic interpolation. There is also super-resolution reconstruction based on edge preservation, which aims to maintain edge information while improving the resolution of remote sensing images. Although the traditional method is simple in calculation, it has the problem of difficulty in recovering high-frequency detailed information for remote sensing images with complex textures. In addition, for integers and scale factors within a small range, the traditional method has good performance and can meet the needs of practical applications. However, for non-integer and larger scale factors, such as 3.2, 5, 20, etc., the traditional method is difficult to meet the requirements.

随着深度学习技术的发展，各种基于神经网络的遥感图像超分重建技术被提出。其中卷积神经网络是目前最常用的方法之一，另外还有近年来被广泛应用于无监督图像生成的生成对抗网络，也在遥感图像超分重建领域取得了良好表现，实现了更高感知质量的遥感图像重建效果。基于深度网络的超分重建方法相比于传统方法，无论是在模型大小、运算速度还是重建效果方面，都取得了明显提升，但也存在两大问题。第一，现有的基于卷积神经网络和生成对抗网络的模型只能处理一种比例因子下的遥感图像超分辨率重建任务，在多尺度层面上缺少泛化性。第二，基于纯卷积的网络易受遥感图像丰富地物目标和复杂纹理信息的影响，使得重建结果中出现高频噪声和伪影，降低了重建效果。With the development of deep learning technology, various neural network-based super-resolution reconstruction technologies for remote sensing images have been proposed. Among them, convolutional neural networks are currently one of the most commonly used methods. In addition, generative adversarial networks have been widely used in unsupervised image generation in recent years. They have also achieved good performance in the field of remote sensing image super-resolution reconstruction and achieved higher perception. Quality remote sensing image reconstruction effect. Compared with traditional methods, the super-resolution reconstruction method based on deep networks has achieved significant improvements in terms of model size, computing speed and reconstruction effect, but there are also two major problems. First, existing models based on convolutional neural networks and generative adversarial networks can only handle the super-resolution reconstruction task of remote sensing images under one scale factor, and lack generalization at the multi-scale level. Second, pure convolution-based networks are susceptible to the influence of rich ground objects and complex texture information in remote sensing images, causing high-frequency noise and artifacts to appear in the reconstruction results, reducing the reconstruction effect.

因此，如何提供一种实现多种比例因子下的遥感图像超分辨率重建的同时，提升重建结果的感知质量的遥感图像超分辨率重建方法是本领域技术人员亟待解决的技术问题。Therefore, how to provide a remote sensing image super-resolution reconstruction method that achieves super-resolution reconstruction of remote sensing images under multiple scale factors and improves the perceived quality of the reconstruction results is an urgent technical problem that needs to be solved by those skilled in the art.

发明内容Contents of the invention

本发明针对上述研究现状和存在的问题，提供了一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法。采用视觉Transformer在超分重建过程中引入遥感图像的全局语义特征，利用元上采样模块获得对应比例因子的上采样滤波器，将低分特征图映射到高分图像的尺寸，得到最终的超分输出。In view of the above research status and existing problems, the present invention provides a method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer. The visual Transformer is used to introduce the global semantic features of remote sensing images in the super-resolution reconstruction process, and the meta-upsampling module is used to obtain the upsampling filter corresponding to the scale factor. The low-score feature map is mapped to the size of the high-score image to obtain the final super-score. output.

本发明提供的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，基于低分特征提取网络和元上采样模块构建的网络架构；包括如下步骤：The invention provides a method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer. It is based on a network architecture constructed by a low-score feature extraction network and a meta-upsampling module; it includes the following steps:

S1：对原始遥感图像数据集在给定比例因子范围内按预设步长进行双三次下采样，获得多种比例因子下的低分-高分图像对；所述低分-高分图像对包括同一原始遥感图像对应的低分图像和高分图像；S1: Perform bicubic downsampling on the original remote sensing image data set within a given scale factor range according to a preset step size, and obtain low-score-high-score image pairs under multiple scale factors; the low-score-high score image pairs Including low-resolution images and high-resolution images corresponding to the same original remote sensing image;

S2：将所述低分图像输入至低分特征提取网络提取低分特征，所述低分特征包括基于卷积的密集残差注意力低分特征和基于视觉Transformer网络的低分特征，并将得到的双路低分特征进行融合；S2: Input the low-scoring image to a low-scoring feature extraction network to extract low-scoring features. The low-scoring features include convolution-based dense residual attention low-scoring features and visual Transformer network-based low-scoring features, and The obtained two-way low-scoring features are fused;

S3：将所述高分图像输入至元上采样模块进行元学习预测得到相应比例因子的上采样滤波器，所述上采样滤波器将基于视觉Transformer网络的低分特征映射到所述高分图像的尺寸，得到超分图像；S3: Input the high-scoring image to the meta-upsampling module for meta-learning prediction to obtain an up-sampling filter with a corresponding scale factor. The up-sampling filter maps the low-scoring features based on the visual Transformer network to the high-scoring image. size to obtain a super-resolution image;

S4：基于所述超分图像与所述高分图像计算损失，并优化所述低分特征提取网络和元上采样模块的参数。S4: Calculate the loss based on the super-resolution image and the high-resolution image, and optimize the parameters of the low-resolution feature extraction network and the meta-upsampling module.

优选的，所述S2中的低分特征提取网络包括密集残差注意力网络，所述基于卷积的密集残差注意力低分特征的提取过程包括：Preferably, the low-scoring feature extraction network in S2 includes a dense residual attention network, and the extraction process of the convolution-based dense residual attention low-scoring features includes:

密集残差注意力块通过点卷积输出低分特征图；The dense residual attention block outputs low-scoring feature maps through point convolution;

所述低分特征图经由全局平均池化提取全局信息，再通过一维卷积获得特征通道间的相关关系；The low-scoring feature map extracts global information through global average pooling, and then obtains the correlation between feature channels through one-dimensional convolution;

所述相关关系由Sigmoid函数非线性化后得到权重向量；权重向量和输入的低分特征图相乘完成对特征图通道的加权，即完成一次对通道施加注意力的过程，得到注意力特征图。The correlation relationship is obtained by nonlinearizing the Sigmoid function to obtain a weight vector; the weight vector is multiplied by the input low-scoring feature map to complete the weighting of the feature map channel, that is, the process of applying attention to the channel is completed to obtain the attention feature map .

优选的，所述S2中的低分特征提取网络包括视觉Transformer网络；所述基于视觉Transformer网络的低分特征的提取步骤包括：Preferably, the low-scoring feature extraction network in S2 includes a visual Transformer network; the low-scoring feature extraction step based on the visual Transformer network includes:

将输入的所述低分图像按通道分离；Separate the input low-score image by channel;

分别对每个通道提取特征向量v^Trans；Extract the feature vector v^Trans for each channel separately;

将来自不同通道的特征向量v^Trans展平后再按通道合并，得到基Transformer特征图F^Trans；Flatten the feature vectors v^Trans from different channels and then merge them by channel to obtain the base Transformer feature map F^Trans ;

基Transformer特征图F^Trans经过两层卷积后输出用于S3的基于视觉Transformer网络的低分特征图F^LR。After two layers of convolution, the base Transformer feature map F^Trans outputs the low-score feature map F^LR based on the visual Transformer network for S3.

优选的，所述S2中将得到的双路低分特征进行融合的步骤包括：Preferably, the step of fusing the obtained two-way low score features in S2 includes:

F^LR(i',j')＝Ψ(F^Trans(i',j'),F^RDCA(i',j'))＝aF^Trans(i',j')+bF^RDCA(i',j')F^LR (i',j')=Ψ(F^Trans (i',j'),F^RDCA (i',j'))=aF^Trans (i',j')+bF^RDCA (i',j ')

式中，Ψ(·)表示特征融合函数，a和b分别表示Transformer特征图F^Trans和密集残差注意力特征图F^RDCA的权重，(i',j')表示低分特征图的像素位置。In the formula, Ψ(·) represents the feature fusion function, a and b represent the weights of the Transformer feature map F^Trans and the dense residual attention feature map F^RDCA respectively, (i', j') represents the pixel position of the low score feature map .

优选的，所述S3包括：Preferably, the S3 includes:

根据高分图像尺寸以及当前输入的比例因子计算偏移矩阵；Calculate the offset matrix based on the high-resolution image size and the currently input scale factor;

包含比例因子信息的偏移矩阵经过权重预测全连接网络计算上采样滤波器的卷积核参数，得到相应比例因子的上采样滤波器；The offset matrix containing the scale factor information calculates the convolution kernel parameters of the upsampling filter through the weight prediction fully connected network, and obtains the upsampling filter of the corresponding scale factor;

所述上采样滤波器将基于视觉Transformer网络的低分特征映射到高分图像尺寸，得到超分图像。The upsampling filter maps the low-resolution features based on the visual Transformer network to the high-resolution image size to obtain a super-resolution image.

优选的，所述S4包括：Preferably, the S4 includes:

基于所述超分图像与所述高分图像计算L1损失，并采用随机梯度下降法优化所述低分特征提取网络和元上采样模块的参数，L1损失的计算公式如下：The L1 loss is calculated based on the super-resolution image and the high-resolution image, and the stochastic gradient descent method is used to optimize the parameters of the low-resolution feature extraction network and the element upsampling module. The calculation formula of the L1 loss is as follows:

L＝Σ|I^SR(i,j)-I^HR(i,j)|L＝Σ|I^SR (i,j)-I^HR (i,j)|

式中，I^SR(i,j)表示超分图像I_SR中位置为(i,j)的像素值；I^HR(i,j)表示高分图像I_SR中位置为(i,j)的像素值。In the formula, I^SR (i, j) represents the pixel value at position (i, j) in the super-resolution image I_SR ; I^HR (i, j) represents the pixel value at position (i, j) in the high-resolution image I_SR . Pixel values.

本发明相较现有技术具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明基于低分特征提取网络和元上采样模块构建的网络架构对多种比例因子低分-高分图像对进行有监督训练和有参考质量评估，满足多尺度超分重建任务的需求。并且，本发明基于密集残差注意力网络和视觉Transformer网络对输入低分图像提取双路特征，在超分重建过程中同时利用遥感图像的全局语义信息和局部目标信息，提升重建结果的感知质量。结合元上采样方法解决利用单一超分模型解决多尺度超分辨率重建的问题，最终实现了基于任意比例因子的超分重建算法。The present invention performs supervised training and reference quality assessment on low-score-high-score image pairs with multiple scale factors based on a network architecture constructed by a low-score feature extraction network and a meta-upsampling module to meet the needs of multi-scale super-resolution reconstruction tasks. Moreover, the present invention extracts dual-channel features from the input low-resolution image based on the dense residual attention network and the visual Transformer network, and simultaneously uses the global semantic information and local target information of the remote sensing image during the super-resolution reconstruction process to improve the perceptual quality of the reconstruction results. . Combining the element upsampling method to solve the problem of using a single super-resolution model to solve multi-scale super-resolution reconstruction, a super-resolution reconstruction algorithm based on arbitrary scale factors was finally realized.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to describe the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on the provided drawings without exerting creative efforts.

图1是本发明实施例提供的基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法流程图；Figure 1 is a flow chart of a method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer provided by an embodiment of the present invention;

图2是本发明实施例提供的密集残差注意力特征提取网络结构图；Figure 2 is a structural diagram of a dense residual attention feature extraction network provided by an embodiment of the present invention;

图3是本发明实施例提供的通道注意力层结构图；Figure 3 is a structural diagram of the channel attention layer provided by an embodiment of the present invention;

图4是本发明实施例提供的视觉Transformer特征提取网络结构图；Figure 4 is a structural diagram of the visual Transformer feature extraction network provided by the embodiment of the present invention;

图5是本发明实施例提供的网络架构的整体框架图；Figure 5 is an overall framework diagram of the network architecture provided by the embodiment of the present invention;

图6是本发明实施例提供的遥感卫星数据集的超分效果图。Figure 6 is a super-resolution effect diagram of the remote sensing satellite data set provided by the embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

下面结合附图对本发明的应用原理作详细的描述。The application principle of the present invention will be described in detail below with reference to the accompanying drawings.

如图1所示，本发明实施例的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，下面对元学习和Transformer的实现方法进行说明：As shown in Figure 1, an embodiment of the present invention is a method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer. The implementation method of meta-learning and Transformer is explained below:

元学习是一种针对不同任务自适应改变模型内部机制的策略。与基础学习不同，基础学习由先验偏置确定假设空间，学习算法在确定好的假设空间内学习单一任务的最优解。元学习旨在学习导致一种算法适应一种任务的原因，以及如何将模型泛化到更多类型的任务中。可以将在上采样阶段动态改变卷积滤波器参数元上采样模块，或在预训练阶段利用元知识获得初始参数的元迁移方法应用于超分辨率重建领域。这些应用旨在解决利用单一超分模型解决多尺度超分辨率重建的问题，并都能达到与单一尺度超分效果相当甚至更出色的水平。Meta-learning is a strategy that adaptively changes the internal mechanisms of the model for different tasks. Different from basic learning, basic learning determines the hypothesis space by a priori bias, and the learning algorithm learns the optimal solution of a single task within the determined hypothesis space. Meta-learning aims to learn what causes an algorithm to adapt to one type of task, and how to generalize the model to more types of tasks. The meta-upsampling module that dynamically changes the convolution filter parameters in the upsampling stage, or the meta-transfer method that uses meta-knowledge to obtain initial parameters in the pre-training stage can be applied to the field of super-resolution reconstruction. These applications aim to solve the problem of using a single super-resolution model to solve the problem of multi-scale super-resolution reconstruction, and can achieve a level that is equivalent to or even better than that of single-scale super-resolution.

Transformer是一种基于自注意力机制的神经网络架构，Transformer中的自注意力机制可以使模型关注输入序列中不同位置的信息，并根据这些信息进行加权计算，从而更好地捕捉序列中的长期依赖关系。遥感图像中包含复杂的纹理细节，使用基于卷积的神经网络可以提取遥感图像的局部地物目标信息，重建小区域内的目标细节，但也存在难以充分利用全局信息，以及容易受到局部噪声干扰，使最终的超分重建结果降质的问题。采用视觉Transformer在超分重建过程中引入遥感图像的全局语义特征，可以减轻使用卷积网络造成的超分辨率重建结果中的伪影，提升重建结果的感知质量。Transformer is a neural network architecture based on the self-attention mechanism. The self-attention mechanism in Transformer allows the model to focus on information at different positions in the input sequence and perform weighted calculations based on this information to better capture long-term features in the sequence. Dependencies. Remote sensing images contain complex texture details. Convolution-based neural networks can be used to extract local object information from remote sensing images and reconstruct target details in small areas. However, it is also difficult to make full use of global information and is susceptible to local noise interference. Problems that degrade the final super-resolution reconstruction results. Using the visual Transformer to introduce the global semantic features of remote sensing images during the super-resolution reconstruction process can reduce the artifacts in the super-resolution reconstruction results caused by the use of convolutional networks and improve the perceptual quality of the reconstruction results.

本发明实施例基于低分特征提取网络和元上采样模块构建的网络架构，如图5所示；包括如下步骤：The embodiment of the present invention is based on the network architecture constructed by the low-scoring feature extraction network and the element upsampling module, as shown in Figure 5; it includes the following steps:

S1：对原始遥感图像数据集在给定比例因子范围内按预设步长进行双三次下采样，获得多种比例因子下的低分-高分图像对；低分-高分图像对包括同一原始遥感图像对应的低分图像和高分图像；S1: Perform bicubic downsampling on the original remote sensing image data set within a given scale factor range according to the preset step size, and obtain low-score-high-score image pairs under multiple scale factors; the low-score-high-score image pairs include the same Low score image and high score image corresponding to the original remote sensing image;

S2：将低分图像输入至低分特征提取网络提取低分特征，低分特征包括基于卷积的密集残差注意力低分特征和基于视觉Transformer网络的低分特征，并将得到的双路低分特征进行融合；S2: Input the low-scoring image to the low-scoring feature extraction network to extract low-scoring features. The low-scoring features include convolution-based dense residual attention low-scoring features and visual Transformer network-based low-scoring features, and use the resulting dual-channel Fusion of low-scoring features;

S3：将高分图像输入至元上采样模块进行元学习预测得到相应比例因子的上采样滤波器，上采样滤波器将基于视觉Transformer网络的低分特征映射到高分图像的尺寸，得到超分图像；S3: Input the high-score image to the meta-upsampling module for meta-learning prediction to obtain an up-sampling filter with the corresponding scale factor. The up-sampling filter maps the low-score features based on the visual Transformer network to the size of the high-score image to obtain the super score. image;

S4：基于超分图像与高分图像计算损失，并优化低分特征提取网络和元上采样模块的参数。S4: Calculate the loss based on the super-resolution image and the high-resolution image, and optimize the parameters of the low-resolution feature extraction network and the meta-upsampling module.

需要说明的是，高空间分辨率(高分)指原始真值图像的空间分辨率，低空间分辨率(低分)指对真值图像按一定比例因子进行下采样后的理论空间分辨率，超分辨率(超分)指对图像按给定比例因子重建后的理论空间分辨率。It should be noted that high spatial resolution (high score) refers to the spatial resolution of the original true value image, and low spatial resolution (low score) refers to the theoretical spatial resolution after downsampling the true value image according to a certain scale factor. Super-resolution (super-resolution) refers to the theoretical spatial resolution after reconstructing the image according to a given scale factor.

例如，原始图像空间分辨率为8m，经2倍下采样后低分图像的理论空间分辨率为16m，对原始图像进行8倍超分重建后其理论空间分辨率为1m。For example, the spatial resolution of the original image is 8m, the theoretical spatial resolution of the low-resolution image after 2x downsampling is 16m, and the theoretical spatial resolution of the original image after 8x super-resolution reconstruction is 1m.

在一个实施例中，对于超分重建任务，在进行有监督训练和有参考质量评估时，需要对训练数据和测试数据按一定比例因子进行下采样以获得低分-高分图像对。同时由于是多尺度超分，所以需要多种比例因子下的图像对。本算法采用双三次下采样对数据集进行预处理，其中训练数据以0.1为步长，测试数据以0.5为步长。训练数据集的采样范围为1.1到4.0，测试数据可根据算力情况任意设置比例因子。In one embodiment, for the super-score reconstruction task, when performing supervised training and reference quality assessment, the training data and test data need to be down-sampled according to a certain scaling factor to obtain low-score-high-score image pairs. At the same time, because it is multi-scale super-resolution, image pairs under multiple scale factors are required. This algorithm uses bicubic downsampling to preprocess the data set, where the training data uses 0.1 as the step size, and the test data uses 0.5 as the step size. The sampling range of the training data set is 1.1 to 4.0, and the scale factor of the test data can be set arbitrarily according to the computing power.

双三次下采样进行预处理包括使用双三次插值法(Bicubic)对高分辨率图像进行缩放。Preprocessing with bicubic downsampling involves scaling high-resolution images using bicubic interpolation (Bicubic).

在一个实施例中，本算法采用的特征提取网络需要输入低分图像并提取双路特征，包括基于卷积的密集残差注意力特征和基于视觉Transformer的特征。具体的，低分特征提取网络包括密集残差注意力网络，密集残差注意力网络是一种基于卷积的神经网络，可以在超分重建过程中强调遥感图像的局部特征，提高超分图像中局部细节的重建效果，网络结构如图2所示。通道注意力是一种引导卷积网络关注特征图更重要的通道的机制，本算法采用高效的通道注意力层实现对特征图通道方向上的加权。In one embodiment, the feature extraction network used in this algorithm needs to input a low-scoring image and extract two-way features, including convolution-based dense residual attention features and visual Transformer-based features. Specifically, the low-scoring feature extraction network includes the dense residual attention network. The dense residual attention network is a convolution-based neural network that can emphasize the local features of remote sensing images during the super-resolution reconstruction process and improve the super-resolution image. The reconstruction effect of local details in the network structure is shown in Figure 2. Channel attention is a mechanism that guides the convolutional network to focus on more important channels of the feature map. This algorithm uses an efficient channel attention layer to weight the channel direction of the feature map.

基于卷积的密集残差注意力低分特征的提取过程包括：The extraction process of low-scoring features based on convolutional dense residual attention includes:

输入的低分特征图经由全局平均池化提取全局信息，再通过一维卷积获得特征通道间的相关关系；The input low-scoring feature map extracts global information through global average pooling, and then obtains the correlation between feature channels through one-dimensional convolution;

相关关系由Sigmoid函数非线性化后得到权重向量；权重向量和输入的低分图像相乘完成对特征图通道的加权，即完成一次对通道施加注意力的过程，得到注意力特征图。The correlation relationship is nonlinearized by the Sigmoid function to obtain a weight vector; the weight vector is multiplied by the input low-scoring image to complete the weighting of the feature map channel, that is, the process of applying attention to the channel is completed to obtain the attention feature map.

本实施例中，计算公式如下：In this embodiment, the calculation formula is as follows:

s＝F_eca(X,θ)＝σ(Conv1D(GAP(X),θ))s＝F_eca (X,θ)＝σ(Conv1D(GAP(X),θ))

Y＝sXY＝sX

式中，X是输入的低分特征图；Y是对输入低分特征图按通道加权后得到的特征图，s为注意力模块根据参数θ计算得到的权重向量。在通道注意力层中，输入X首先经由全局平均池化GAP(·)以提取全局信息，然后通过一维卷积Conv1D(·)获得特征通道间的相关关系，最后由Sigmoid函数σ(·)非线性化后得到s。s和X相乘可实现对特征图通道的加权，即完成一次对通道施加注意力的过程。通道注意力层的展开结构如图3所示。In the formula, In the channel attention layer, the input After nonlinearization, s is obtained. The multiplication of s and The expanded structure of the channel attention layer is shown in Figure 3.

在一个实施例中，S2中的低分特征提取网络包括视觉Transformer网络，视觉Transformer是一种基于自注意力机制的神经网络架构，与卷积神经网络使用固定大小的卷积核不同，该网络首先将输入图像划分为一系列小的图像块，每个图像块都被视为一个单词，随后通过嵌入操作将这些图像块转换为向量表示，可以自适应处理不同尺寸的输入图像,然后视觉Transformer将这些向量表示输入到多个Transformer编码器中进行特征提取和信息压缩。In one embodiment, the low-scoring feature extraction network in S2 includes a visual Transformer network. The visual Transformer is a neural network architecture based on a self-attention mechanism. Unlike convolutional neural networks that use fixed-size convolution kernels, this network The input image is first divided into a series of small image blocks, each image block is regarded as a word, and then these image blocks are converted into vector representations through embedding operations, which can adaptively handle input images of different sizes, and then the visual Transformer These vector representations are input into multiple Transformer encoders for feature extraction and information compression.

基于视觉Transformer网络的低分特征的提取步骤包括：The extraction steps of low-scoring features based on the visual Transformer network include:

将输入的低分图像按通道分离；Separate the input low-score images by channels;

这种分支结构既可以保证特征向量在传播过程中不丢失通道信息，也可以使网络能逐通道地提取图像的全局语义特征。This branching structure can not only ensure that the feature vector does not lose channel information during the propagation process, but also enable the network to extract the global semantic features of the image channel by channel.

本实施例中，整个视觉Transformer特征提取过程可用下式描述：In this embodiment, the entire visual Transformer feature extraction process can be described by the following formula:

v^Trans＝Transformer(Embed(Chunk(I^LR))；γ)v^Trans =Transformer(Embed(Chunk(I^LR ));γ)

F^Trans＝Concat(View(v^Trans))F^Trans =Concat(View(v^Trans ))

F^LR＝Conv2D(Conv2D(F^Trans；α)；β)F^LR =Conv2D(Conv2D(F^Trans ; α); β)

式中Chunk(·)和Concat(·)分别表示按通道分离和合并特征图的操作，Embed(·)和View(·)分别表示将图像嵌入为图像块序列和将序列重整为特征图的操作，这些函数均没有参数化。此外，式中α、β和γ表示参数化的网络，这些参数在网络训练过程中都会根据损失不断更新。Transformer(·)表示视觉Transformer编码器，特征提取网络结构如图4所示。In the formula, Chunk(·) and Concat(·) respectively represent the operations of separating and merging feature maps by channels, and Embed(·) and View(·) respectively represent the operations of embedding images into image block sequences and reshaping the sequences into feature maps. Operations, none of these functions are parameterized. In addition, α, β and γ in the formula represent parameterized networks, and these parameters will be continuously updated according to the loss during the network training process. Transformer(·) represents the visual Transformer encoder, and the feature extraction network structure is shown in Figure 4.

在一个实施例中，为了在超分重建过程中同时利用遥感图像的全局语义信息和局部目标信息，需要将双路特征融合得到低分辨率图像特征图，融合过程可用下式描述：In one embodiment, in order to simultaneously utilize the global semantic information and local target information of remote sensing images during the super-resolution reconstruction process, it is necessary to fuse dual-channel features to obtain a low-resolution image feature map. The fusion process can be described by the following formula:

在一个实施例中，S3中的元上采样模块通过权重预测、位置投影和特征映射功能实现任意尺度超分。算法流程包括：In one embodiment, the meta-upsampling module in S3 achieves arbitrary scale super-resolution through weight prediction, position projection and feature mapping functions. The algorithm process includes:

根据高分图像尺寸H和W以及当前输入的比例因子计算偏移矩阵，该矩阵维度为HW×3；Calculate the offset matrix based on the high-resolution image dimensions H and W and the currently input scale factor. The matrix dimension is HW×3;

上采样滤波器将基于视觉Transformer网络的低分特征映射到高分图像尺寸，得到超分图像。The upsampling filter maps the low-resolution features based on the visual Transformer network to the high-resolution image size to obtain a super-resolution image.

本实施例中，元上采样的计算公式如下：In this embodiment, the calculation formula of element upsampling is as follows:

I^SR(i,j)＝Φ(F^LR(i',j'),W(i,j))I^SR (i,j)=Φ(F^LR (i',j'),W(i,j))

W(i,j)＝φ(v_ij；ω)W(i,j)=φ(_vij ;ω)

Φ(F^LR(i',j'),W(i,j))＝F^LR(i',j')W(i,j)Φ(F^LR (i',j'),W(i,j))＝F^LR (i',j')W(i,j)

式中，I^SR(i,j)表示超分图像I_SR中位置为(i,j)的像素值，F^LR(i',j')表示低分图像中位置为(i',j')的像素特征值，函数Φ(·)表示计算I^SR像素值的特征映射函数。W(i,j)表示上采样滤波器w对I^SR中像素(i,j)的权重，φ(·)表示以v_ij为输入，以ω为参数的全连接网络，上采样滤波器W的权重通过全连接网络预测得到。v_ij是根据比例因子r和位置(i,j)计算得到的相对低分图像像素(i',j')的偏移向量，即偏移矩阵的行向量。位置投影采用向下取整函数实现，r表示当前输入的比例因子。In the formula, I^SR (i,j) represents the pixel value at position (i,j) in the super-resolution image I_SR , and F^LR (i',j') represents the position (i',j' in the low-resolution image). ), the function Φ(·) represents the feature mapping function for calculating the I^SR pixel value. W(i,j) represents the weight of upsampling filter w to pixel (i,j) in I^SR , φ(·) represents a fully connected network with v_ij as input and ω as parameter, upsampling filter W The weight of is predicted by the fully connected network. v_ij is the offset vector of the relatively low-scoring image pixel (i', j') calculated based on the scale factor r and the position (i, j), that is, the row vector of the offset matrix. Position projection is implemented using a downward rounding function, and r represents the current input scale factor.

在一个实施例中，S4包括：In one embodiment, S4 includes:

基于超分图像与高分图像计算L1损失，并采用随机梯度下降法优化低分特征提取网络和元上采样模块的参数，元上采样模块的参数包括权重预测网络参数，L1损失的计算公式如下：The L1 loss is calculated based on the super-resolution image and the high-resolution image, and the stochastic gradient descent method is used to optimize the parameters of the low-resolution feature extraction network and the meta-upsampling module. The parameters of the meta-upsampling module include weight prediction network parameters. The calculation formula of L1 loss is as follows :

L＝Σ|I^SR(i,j)-I^HR(i,j)|L＝Σ|I^SR (i,j)-I^HR (i,j)|

下面结合具体的实验结果对本发明技术效果进行说明：The technical effects of the present invention will be described below in conjunction with specific experimental results:

实验中使用的数据集包括DIV2K数据集、AID数据集和来自中国澳门科普卫星的真实遥感图像数据。其中DIV2K和AID数据集用作训练，澳门数据作为测试集使用。对澳门科普卫星遥感图像进行了大小为50×50像素的随机裁剪，并挑选出75张具有丰富地物目标信息和纹理细节的图像构成测试数据集。这些数据集中AID和澳门科普卫星数据为遥感数据，其中AID的空间分辨率为0.5m到0.8m，澳门科普卫星遥感图像的空间分辨率为8m。实验结果如图6所示。The data sets used in the experiment include DIV2K data set, AID data set and real remote sensing image data from China's Macau science satellite. The DIV2K and AID data sets are used for training, and the Macau data is used as the test set. The Macao popular science satellite remote sensing images were randomly cropped to a size of 50×50 pixels, and 75 images with rich ground object information and texture details were selected to form a test data set. The AID and Macao science popularization satellite data in these data sets are remote sensing data. The spatial resolution of AID is 0.5m to 0.8m, and the spatial resolution of the Macao science popularization satellite remote sensing image is 8m. The experimental results are shown in Figure 6.

以上对本发明所提供的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上，本说明书内容不应理解为对本发明的限制。The above is a detailed introduction to the super-resolution reconstruction method of remote sensing images at any scale based on meta-learning and Transformer provided by the present invention. This article uses specific examples to illustrate the principles and implementation methods of the present invention. The above embodiments The description is only used to help understand the method and its core idea of the present invention; at the same time, for those of ordinary skill in the art, there will be changes in the specific implementation and application scope according to the idea of the present invention. In summary, this The content of the description should not be construed as limiting the invention.

在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。In this document, relational terms such as first, second, etc. are used merely to distinguish one entity or operation from another entity or operation and do not necessarily require or imply the existence of any such entity or operation between these entities or operations. Actual relationship or sequence. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

Claims

Translated fromChinese

1.一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于：基于低分特征提取网络和元上采样模块构建的网络架构；包括如下步骤：1. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer, which is characterized by: a network architecture based on a low-score feature extraction network and a meta-upsampling module; including the following steps:

2.根据权利要求1所述的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于，所述S2中的低分特征提取网络包括密集残差注意力网络，所述基于卷积的密集残差注意力低分特征的提取过程包括：2. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer according to claim 1, characterized in that the low-scoring feature extraction network in S2 includes a dense residual attention network, so The extraction process of low-scoring features based on convolution-based dense residual attention includes:

3.根据权利要求1所述的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于，所述S2中的低分特征提取网络包括视觉Transformer网络；所述基于视觉Transformer网络的低分特征的提取步骤包括：3. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer according to claim 1, characterized in that the low-scoring feature extraction network in S2 includes a visual Transformer network; the visual-based The steps for extracting low-scoring features of the Transformer network include:

4.根据权利要求1所述的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于，所述S2中将得到的双路低分特征进行融合的步骤包括：4. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer according to claim 1, characterized in that the step of fusing the obtained dual-path low-score features in S2 includes:

F^LR(i′,j')＝Ψ(F^Trans(i′,j'),F^RDCA(i',j'))＝aF^Trans(i',j')+bF^RDCA(i',j')F^LR (i′,j')=Ψ(F^Trans (i′,j'),F^RDCA (i',j'))=aF^Trans (i',j')+bF^RDCA (i',j ')

5.根据权利要求1所述的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于，所述S3包括：5. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer according to claim 1, characterized in that the S3 includes:

6.根据权利要求1所述的一种基于元学习和Transformer的遥感图像任意尺度超分辨率重建方法，其特征在于，所述S4包括：6. A method for super-resolution reconstruction of remote sensing images at any scale based on meta-learning and Transformer according to claim 1, characterized in that the S4 includes:

L＝Σ|I^SR(i,j)-I^HR(i,j)|L＝Σ|I^SR (i,j)-I^HR (i,j)|