CN117689592A

Movatterモバイル変換

Info

Publication number: CN117689592A
Application number: CN202311691713.2A
Authority: CN
Inventors: 彭茗; 周晓飞; 张继勇
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-12

Abstract

Translated fromChinese

本发明公开了一种基于级联自适应网络的水下图像增强方法，属于图像处理技术领域，该方法首先获取并预处理水下图像数据集，其中包含水下的失真图像。其次构建基于级联自适应网络的水下图像增强模型，所述水下图像增强模型包括细节恢复模块、色彩平衡模块、注意力特征融合模块、上下文注意力和全局颜色渲染模块。最后训练构建好的基于级联自适应的水下图像增强网络模型，得到增强后的水下图像。本发明解决了水下图像中细节模糊和颜色失真的问题，能得到较优的干净的水下像图。

The invention discloses an underwater image enhancement method based on a cascaded adaptive network, which belongs to the field of image processing technology. The method first obtains and preprocesses an underwater image data set, which contains underwater distorted images. Secondly, an underwater image enhancement model based on the cascade adaptive network is constructed. The underwater image enhancement model includes a detail restoration module, a color balance module, an attention feature fusion module, a contextual attention and a global color rendering module. Finally, the constructed underwater image enhancement network model based on cascade adaptation is trained to obtain the enhanced underwater image. The invention solves the problems of blurred details and color distortion in underwater images, and can obtain better and clean underwater images.

Description

Translated fromChinese

一种基于级联自适应网络的水下图像增强方法An underwater image enhancement method based on cascade adaptive network

技术领域Technical field

本发明属于图像处理技术领域，具体涉及一种基于级联自适应网络的水下图像增强方法。The invention belongs to the technical field of image processing, and specifically relates to an underwater image enhancement method based on a cascaded adaptive network.

背景技术Background technique

海洋与人类生活密切相连，对海洋资源的勘探和开发具有重要的科学和环境意义。水下图像携带着丰富的关于海洋资源的信息，在其开发和利用中发挥着至关重要的作用。近年来，海洋工程和研究越来越依赖于捕捉到的水下图像。The ocean is closely connected to human life, and the exploration and development of marine resources is of great scientific and environmental significance. Underwater images carry rich information about marine resources and play a vital role in their development and utilization. In recent years, ocean engineering and research have increasingly relied on captured underwater images.

然而，由于独特的物理和化学环境的影响，水下场景遭受光的折射、吸收和散射等现象，给水下考古、水下目标检测和海底探索等应用带来了重大挑战。因此，水下图像往往存在低对比度、模糊细节和颜色失真等问题，给水下考古、水下目标检测和海底探索等应用带来了重大挑战。这些挑战严重制约了相关领域的进展。因此，解决与水下图像质量相关的问题对于推动海洋研究和探索的进展至关重要。However, due to the influence of unique physical and chemical environments, underwater scenes suffer from phenomena such as refraction, absorption, and scattering of light, which brings significant challenges to applications such as underwater archeology, underwater target detection, and seabed exploration. Therefore, underwater images often suffer from problems such as low contrast, blurred details, and color distortion, which pose significant challenges to applications such as underwater archeology, underwater target detection, and seabed exploration. These challenges have seriously restricted progress in related fields. Therefore, solving problems related to underwater image quality is critical to driving progress in ocean research and exploration.

为了解决这些具有挑战性的问题，已经出现了许多水下图像增强(UIE)方法。传统的增强方法往往直接修改像素值，增加饱和度和亮度以增强图像从而导致结果不尽如人意。还有一些基于物理的方法通过准确估计介质传输和关键的水下成像参数来反转水下成像模型，从而获得清晰的图像。然而，水下环境复杂，模型假设并非总是合理。此外，估计多个参数往往会出现误差从而导致整体图像效果欠佳。To solve these challenging problems, many underwater image enhancement (UIE) methods have emerged. Traditional enhancement methods often directly modify pixel values and increase saturation and brightness to enhance the image, resulting in unsatisfactory results. There are also physics-based methods that invert underwater imaging models by accurately estimating medium transport and key underwater imaging parameters to obtain clear images. However, the underwater environment is complex and model assumptions are not always reasonable. Furthermore, errors in estimating multiple parameters often result in poor overall image results.

随着深度学习的迅速和显著进步，许多研究人员现在采用基于深度学习的算法来增强水下图像。这种方法通常需要大量的数据进行训练，从而有助于提高模型的泛化能力并取得令人印象深刻的结果。大多数基于深度学习的方法通过在有标签的数据集上以监督方式直接学习恢复映射，而无需明确的成像模型。然而，大部分算法常常面临一些挑战，比如在图像清晰度和自然色彩恢复方面的改善不足。With the rapid and significant advancements in deep learning, many researchers now employ deep learning-based algorithms to enhance underwater images. This approach usually requires a large amount of data for training, which helps improve the model's generalization ability and achieves impressive results. Most deep learning-based methods directly learn recovery maps in a supervised manner on labeled datasets without the need for explicit imaging models. However, most algorithms often face challenges such as insufficient improvement in image clarity and natural color recovery.

发明内容Contents of the invention

针对现有技术中存在的不足，本发明提出一种基于级联自适应网络的水下图像增强方法，结合自适应融合，以解决水下图像中的色偏和低对比度等问题。通过多个子网络逐渐增强降级图像，采用逐层优化策略，实现更精细和全面的图像增强。该方法主要由细节恢复模块、色彩平衡模块、注意力特征融合模块、上下文注意力、全局颜色渲染模块组成，通过对神经网络进行训练，得到最优参数，实现对水下图像的增强。In view of the deficiencies in the existing technology, the present invention proposes an underwater image enhancement method based on a cascaded adaptive network, combined with adaptive fusion, to solve problems such as color shift and low contrast in underwater images. Gradually enhance the degraded image through multiple sub-networks, using a layer-by-layer optimization strategy to achieve more refined and comprehensive image enhancement. This method mainly consists of a detail recovery module, a color balance module, an attention feature fusion module, a contextual attention module, and a global color rendering module. By training the neural network, the optimal parameters are obtained to enhance the underwater image.

为了解决上述技术问题，本发明的技术方案为：In order to solve the above technical problems, the technical solution of the present invention is:

一种基于级联自适应网络的水下图像增强方法，包括如下步骤：An underwater image enhancement method based on cascade adaptive network, including the following steps:

S1、获取并预处理水下图像数据集，其中包含水下的失真图像。S1. Obtain and preprocess the underwater image data set, which contains underwater distorted images.

S2、构建基于级联自适应网络的水下图像增强模型，所述水下图像增强模型包括细节恢复模块、色彩平衡模块、注意力特征融合模块、上下文注意力、全局颜色渲染模块。S2. Construct an underwater image enhancement model based on the cascade adaptive network. The underwater image enhancement model includes a detail restoration module, a color balance module, an attention feature fusion module, a contextual attention, and a global color rendering module.

首先失真图像输入细节恢复模块，通过细节恢复模块恢复失真图像中的细节部分，并将第一个卷积单元的输出作为后续上下文注意力的输入。First, the distorted image is input to the detail restoration module, and the detail part in the distorted image is restored through the detail restoration module, and the output of the first convolution unit is used as the input of subsequent contextual attention.

同时将失真图像输入颜色平衡模块，通过颜色平衡模块消除图像中存在的色彩偏差。将细节恢复模块第一个卷积单元的输出输入上下文注意力，通过上下文注意力关注退化严重区域。At the same time, the distorted image is input into the color balance module, and the color deviation existing in the image is eliminated through the color balance module. The output of the first convolution unit of the detail restoration module is input into the context attention, and the severely degraded areas are focused on through the context attention.

注意力特征融合模块对细节恢复模块以及色彩平衡模块生成的潜在结果进行融合，并将上下文注意力输入来对融合结果进行引导。The attention feature fusion module fuses the potential results generated by the detail restoration module and the color balance module, and inputs contextual attention to guide the fusion results.

最后融合的结果输入全局颜色渲染模块中对图片进行整体调整。The final fusion result is input into the global color rendering module for overall adjustment of the image.

S3、训练构建好的基于级联自适应的水下图像增强网络模型，得到增强后的水下图像。作为优选，所述通过细节恢复模块恢复失真图像中的细节部分的具体方法：细节恢复模块采用编码器-解码器网络结构，并在解码器层中加入了细节引导模块来有效地协助高级特征恢复图像细节。编码器-解码器网络用卷积单元(卷积层+实例归一化层+激活层)对图像进行编码和解码，并使用三个残差块作为瓶颈部分，以增加网络的容量。编码部分捕捉低级特征，而解码部分则用于重建更高级特征。同时将编码部分的低级特征和解码部分的高级特征输入细节引导模块中，细节引导模块首先应用全局平均池化来处理编码部分的低级特征，将其转化为全局信息表示。通过将这个全局信息与高级特征结合，引入更广泛的上下文来指导高级特征，使其能够准确选择需要增强的细节。同时，对高级特征进行卷积操作，以保持通道一致性，低级特征生成的全局信息经过1×1卷积后与卷积操作后的高级特征相乘，再与卷积操作后的高级特征相加，使低级特征能够有效地引导高级特征。此外，网络将低级特征作为后序上下文注意力的输入。S3. Train the constructed underwater image enhancement network model based on cascade adaptation to obtain the enhanced underwater image. Preferably, the specific method of restoring the details in the distorted image through the detail recovery module: the detail recovery module adopts an encoder-decoder network structure, and adds a detail guidance module in the decoder layer to effectively assist advanced feature recovery Image details. The encoder-decoder network uses convolutional units (convolutional layer + instance normalization layer + activation layer) to encode and decode images, and uses three residual blocks as the bottleneck part to increase the capacity of the network. The encoding part captures low-level features, while the decoding part is used to reconstruct higher-level features. At the same time, the low-level features of the encoding part and the high-level features of the decoding part are input into the detail guidance module. The detail guidance module first applies global average pooling to process the low-level features of the encoding part and convert them into global information representation. By combining this global information with high-level features, broader context is introduced to guide the high-level features, allowing them to accurately select the details that need enhancement. At the same time, a convolution operation is performed on high-level features to maintain channel consistency. The global information generated by low-level features is multiplied by 1×1 convolution with the high-level features after the convolution operation, and then combined with the high-level features after the convolution operation. Plus, low-level features can effectively guide high-level features. Furthermore, the network takes low-level features as input to post-order contextual attention.

作为优选，所述通过颜色平衡模块消除图像中存在的色彩偏差的具体方法：颜色平衡模块通过使用3D卷积来增加特征的维度。相比原有的特征维度，增加的一个维度代表RGB通道的深度信息，而3D卷积用于通道平衡。首先，将失真图像分割成3个单独的通道，并利用3D卷积的权重共享特性从不同通道中提取一致的特征分布，每个通道提取16张特征图。然后通过转置操作将来自不同通道的特征图串联在一起，分为16组3通道的特征图。同时进一步的将特征图经过3D卷积并将不同组的多张特征图进行维度压缩，最终生成16个具有独特特征信息的颜色平衡图像。Preferably, the specific method of eliminating the color deviation existing in the image through the color balance module: the color balance module increases the dimension of the features by using 3D convolution. Compared with the original feature dimension, an added dimension represents the depth information of the RGB channel, and 3D convolution is used for channel balance. First, the distorted image is divided into 3 separate channels, and the weight sharing characteristics of 3D convolution are used to extract consistent feature distribution from different channels, and 16 feature maps are extracted from each channel. Then the feature maps from different channels are concatenated together through the transposition operation and divided into 16 groups of 3-channel feature maps. At the same time, the feature maps are further subjected to 3D convolution and the multiple feature maps of different groups are dimensionally compressed, finally generating 16 color balance images with unique feature information.

作为优选，所述通过上下文注意力关注退化严重区域的具体方法：上下文注意力对细节恢复模块中的低级特征进行进一步处理。注意结构采用混合设计，先构建全局和局部分支以保留感知的全局和局部方面。最后，对全局和局部分支得到的全局和局部上下文信息进行求和。全局分支包括：(a)上下文建模模块，一条分支改变特征维度，另有一条改变特征维度后经过softmax函数，二者相乘的输出即为获取所有位置的注意权重；(b)特征转换模块捕获通道间的依赖关系，将上下文建模模块的输出经过顺序连接的1×1卷积层、激活函数层、1×1卷积层。局部分支利用两个平行的1×1卷积层结构提取局部上下文信息，同时在其中一个分支在卷积操作前执行全局平均池化。最后，求和操作将从两个分支提取的特征进行合并。As a preference, the specific method of focusing on severely degraded areas through contextual attention: contextual attention further processes low-level features in the detail recovery module. The attention structure adopts a hybrid design, building global and local branches first to preserve the global and local aspects of perception. Finally, the global and local context information obtained from the global and local branches are summed. The global branches include: (a) Context modeling module, one branch changes the feature dimension, and the other changes the feature dimension and passes the softmax function. The output of the multiplication of the two is to obtain the attention weight of all positions; (b) Feature conversion module Capture the dependencies between channels, and pass the output of the context modeling module through a sequentially connected 1×1 convolution layer, activation function layer, and 1×1 convolution layer. The local branch utilizes two parallel 1×1 convolutional layer structures to extract local context information, while one of the branches performs global average pooling before the convolution operation. Finally, the summation operation merges the features extracted from the two branches.

作为优选，所述通过注意力特征融合模块对生成的潜在结果进行融合的具体方法：注意力特征融合模块用来融合图像细节和颜色平衡结果，并引入了上下文注意力来引导融合过程，使网络能够同时关注全局和局部信息。该模块可以自适应地融合细节恢复模块和颜色平衡模块生成的两个潜在结果，具体体现在首先将细节恢复模块和颜色平衡模块生成的结果以及上下文注意力的输出求和，将求和结果分别通过两个平行的1×1卷积层结构，其中一个分支在卷积操作前执行全局平均池化，再将两个结果相加经过激活函数，得到得出细节恢复模块和的颜色平衡模块对应权重，然后将细节恢复模块和颜色平衡模块生成的结果与权重相乘并求和，得出最终融合的结果。As a preference, the specific method of fusing the generated potential results through the attention feature fusion module: the attention feature fusion module is used to fuse image details and color balance results, and introduces contextual attention to guide the fusion process, so that the network Able to focus on both global and local information. This module can adaptively fuse the two potential results generated by the detail recovery module and the color balance module. Specifically, it first sums the results generated by the detail recovery module and the color balance module and the output of the contextual attention, and then separates the summation results. Through two parallel 1×1 convolution layer structures, one branch performs global average pooling before the convolution operation, and then adds the two results through the activation function to obtain the detail recovery module and the color balance module corresponding to The results generated by the detail recovery module and the color balance module are then multiplied and summed with the weights to get the final fused result.

作为优选，所述通过全局颜色渲染模块对图片进行整体调整的具体方法：全局颜色渲染模块将通道分成4组，其中三个组分别通过调整块进行调整，另外一个组不经过调整块调整，用来保留原有图像信息。调整全程使用1×1卷积，将每个像素进行转换从而调整整体图像。调整块(1×1卷积层+实例归一化层+1×1卷积层)基于组别来自适应地选择哪些组应参与增强。然后通过使用拼接操作，将4个组调整后的结果进行合并。最后，全局颜色渲染模块采用1×1卷积来交互地处理特征信息。As a preferred method, the specific method of overall adjustment of the picture through the global color rendering module is as follows: the global color rendering module divides the channels into 4 groups, three of which are adjusted through the adjustment blocks respectively, and the other group is not adjusted through the adjustment blocks. to retain the original image information. The entire adjustment process uses 1×1 convolution to transform each pixel to adjust the overall image. The adjustment block (1 × 1 convolutional layer + instance normalization layer + 1 × 1 convolutional layer) adaptively selects which groups should participate in enhancement based on the group. Then the adjusted results of the four groups are merged using the splicing operation. Finally, the global color rendering module adopts 1×1 convolution to interactively process feature information.

作为优选，所述步骤S3的具体方法：训练过程中，整体网络的结果进行监督，并将GT图(真值图)同时进行裁剪，使其与输入数据具有相同的尺寸，通过对比预测图与GT图的差异，观测损失值的和是否收敛来判断网络的训练过程。As a preference, the specific method of step S3: During the training process, the results of the overall network are supervised, and the GT map (truth map) is simultaneously cropped so that it has the same size as the input data. By comparing the prediction map with The difference in the GT diagram and whether the sum of the observed loss values converges can be used to judge the training process of the network.

本发明具有以下的特点和有益效果：The invention has the following characteristics and beneficial effects:

采用上述技术方案，细节恢复模块和颜色平衡模块，分别用于解决水下图像中细节模糊和颜色失真的问题。此外，引入了上下文注意力以重复使用低级特征，增强网络的细节重建能力。并且结合了基于注意机制的融合机制，以自适应地选择最具代表性的特征，并合并来自两个处理流的结果。全局颜色渲染模块进行后处理增强图像，提升网络的色彩感知，并进一步微调颜色。灵活的网络结构配合合理的损失函数设置，能得到较优的干净的水下图像图。与其他水下图像增强方法对比，本发明对物体的细节处理更为精细，色彩处理恰当，能产生更好的效果。同时本发明构建的模型，易于为工程实际应用人员理解，以便更快更好的进行工程部署。Using the above technical solution, the detail recovery module and the color balance module are respectively used to solve the problems of blurred details and color distortion in underwater images. In addition, contextual attention is introduced to reuse low-level features and enhance the detailed reconstruction ability of the network. And an attention-based fusion mechanism is incorporated to adaptively select the most representative features and merge the results from the two processing streams. The global color rendering module performs post-processing to enhance the image, improve the network's color perception, and further fine-tune the color. Flexible network structure combined with reasonable loss function settings can produce better and clean underwater images. Compared with other underwater image enhancement methods, the present invention handles the details of objects more finely, handles colors appropriately, and can produce better effects. At the same time, the model constructed by the present invention is easy to be understood by practical engineering application personnel, so as to facilitate faster and better engineering deployment.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.

图1为本发明实施例的网络方法图；Figure 1 is a network method diagram according to an embodiment of the present invention;

图2为细节恢复模块结构；Figure 2 shows the structure of the detail recovery module;

图3为颜色平衡模块结构；Figure 3 shows the structure of the color balance module;

图4为上下文注意力结构；Figure 4 shows the contextual attention structure;

图5为特征注意力融合模块结构；Figure 5 shows the structure of the feature attention fusion module;

图6为全局颜色渲染模块结构；Figure 6 shows the structure of the global color rendering module;

图7为本发明的测试效果图及与原图和GT图效果对比。Figure 7 is a test effect diagram of the present invention and a comparison with the original picture and the GT picture effect.

具体实施方式Detailed ways

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.

在本发明的描述中，需要理解的是，术语“中心”、“纵向”、“横向”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", " The orientations or positional relationships indicated by "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientations or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and The simplified description is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operate in a specific orientation, and therefore should not be construed as a limitation of the present invention.

在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以通过具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that, unless otherwise clearly stated and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense. For example, it can be a fixed connection or a detachable connection. Connection, or integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood through specific situations.

本发明提供了一种基于级联式自适应增强网络的水下图像增强方法，如图1所示，包括如下步骤：The present invention provides an underwater image enhancement method based on a cascaded adaptive enhancement network, as shown in Figure 1, including the following steps:

S2、构建基于级联自适应网络的水下图像增强模型，所述模型包括细节恢复模块、色彩平衡模块、注意力特征融合模块、上下文注意力、全局颜色渲染模块。S2. Construct an underwater image enhancement model based on the cascade adaptive network. The model includes a detail restoration module, a color balance module, an attention feature fusion module, a contextual attention, and a global color rendering module.

具体的，所述的水下图像增强模型包括粗特征恢复、特征聚合和细化阶段。在粗特征恢复阶段，将图像增强过程分为两个任务：颜色恢复和恢复受损图像的细节。在随后的特征聚合阶段，引入了一个基于注意力的融合模块，将这些任务的结果结合起来，通过特征注意力强调严重受损的区域。此外，为了进一步提高模型的性能，在最终的细化阶段，继续调整图像的整体颜色，确保更具视觉吸引力的结果。Specifically, the underwater image enhancement model includes coarse feature recovery, feature aggregation and refinement stages. In the coarse feature restoration stage, the image enhancement process is divided into two tasks: color restoration and restoration of details of the damaged image. In the subsequent feature aggregation stage, an attention-based fusion module is introduced to combine the results of these tasks and emphasize severely damaged regions through feature attention. In addition, to further improve the performance of the model, in the final refinement stage, the overall color of the image continues to be adjusted to ensure a more visually appealing result.

(1)通过细节恢复模块恢复失真图像中的细节部分如图2所示；(1) The details in the distorted image are restored through the detail recovery module, as shown in Figure 2;

细节恢复模块采用编码器-解码器网络结构，并在每个解码器层中加入了细节引导模块来有效地协助高级特征恢复图像细节。网络用卷积单元(3×3卷积层+实例归一化层+激活层)对图像进行编码和解码，并使用三个残差块作为瓶颈部分，以增加网络的容量。编码部分捕捉低级特征，而解码部分则用于重建高级特征。同时将编码部分的低级特征和解码部分的高级特征输入细节引导模块中，细节引导模块首先应用全局平均池化来处理低级特征，将其转化为全局信息表示。通过将这个全局信息与高级特征结合，引入更广泛的上下文来指导高级特征，使其能够准确选择需要增强的细节。同时，对高级特征进行3×3卷积操作，以保持通道一致性。从低级特征生成的全局上下文信息经过1×1卷积后与高级特征相乘，再与卷积操作后的高级特征相加，使低级特征能够有效地引导高级特征。此外，网络将低级特征作为后序上下文注意力的输入。The detail restoration module adopts an encoder-decoder network structure and adds a detail guidance module in each decoder layer to effectively assist high-level features in restoring image details. The network uses convolutional units (3×3 convolutional layer + instance normalization layer + activation layer) to encode and decode images, and uses three residual blocks as the bottleneck part to increase the capacity of the network. The encoding part captures low-level features, while the decoding part is used to reconstruct high-level features. At the same time, the low-level features of the encoding part and the high-level features of the decoding part are input into the detail guidance module. The detail guidance module first applies global average pooling to process the low-level features and convert them into global information representation. By combining this global information with high-level features, broader context is introduced to guide the high-level features, allowing them to accurately select the details that need enhancement. At the same time, a 3×3 convolution operation is performed on the high-level features to maintain channel consistency. The global context information generated from low-level features is multiplied with high-level features after 1×1 convolution, and then added to the high-level features after the convolution operation, so that low-level features can effectively guide high-level features. Furthermore, the network takes low-level features as input to post-order contextual attention.

可以理解的，使用由卷积单元组成的编码器，导入处理后的失真水下图像，提取4层特征，各层特征之间以最大池化的下采样连接，输出信息尺寸分别为原图像的1/1，1/1，1/2，1/4，通道数分别为16,64,128,256。It is understandable that an encoder composed of convolutional units is used to import the processed distorted underwater image, extract 4 layers of features, and connect the features of each layer with maximum pooling downsampling. The output information size is the size of the original image. 1/1, 1/1, 1/2, 1/4, the number of channels are 16, 64, 128, 256 respectively.

(2)通过颜色平衡模块消除图像中存在的色彩偏差如图3所示；另外，所述具体方法：颜色平衡模块通过使用3D卷积来增加特征的维度。增加的维度代表RGB通道的深度信息，而3D卷积用于通道平衡。首先，将图像分割成3个单独的通道，并利用3D卷积的权重共享特性从不同通道中提取一致的特征分布，每个通道提取16张特征图。接着转置操作将来自不同通道的特征图串联在一起，分为16组3通道的特征图。同时进一步的将特征图经过3D卷积并将不同组的多张特征图进行维度压缩，最终生成16个具有独特特征信息的颜色平衡图像。(2) Eliminate the color deviation existing in the image through the color balance module, as shown in Figure 3; in addition, the specific method: the color balance module increases the dimension of the feature by using 3D convolution. The added dimension represents the depth information of the RGB channels, while 3D convolution is used for channel balancing. First, the image is divided into 3 separate channels, and the weight sharing property of 3D convolution is used to extract consistent feature distribution from different channels, and 16 feature maps are extracted from each channel. Then the transpose operation concatenates the feature maps from different channels and divides them into 16 groups of 3-channel feature maps. At the same time, the feature maps are further subjected to 3D convolution and the multiple feature maps of different groups are dimensionally compressed, finally generating 16 color balance images with unique feature information.

(3)通过上下文注意力关注退化严重区域如图4所示；进一步的，具体方法：上下文注意力对细节恢复模块中的低级特征进行进一步处理。注意结构采用混合设计，创建全局和局部分支以保留感知的全局和局部方面。最后，对全局和局部上下文信息进行求和。全局分支包括：(a)上下文建模模块，一条分支使用reshape改变特征维度，另有一条同样使用reshape改变特征维度后经过softmax函数，二者相乘的输出即为获取所有位置的注意权重；(b)特征转换模块捕获通道间的依赖关系，将上下文建模模块的输出经过1×1卷积+激活+1×1卷积。局部分支利用两个平行的1×1卷积层提取局部上下文信息，同时在其中一个分支在卷积操作前执行全局平均池化以合并一些全局信息。最后，求和操作将从两个分支提取的特征进行合并。上下文注意力的详细整体过程总结如下：(3) Focus on severely degraded areas through contextual attention, as shown in Figure 4; further, specific methods: contextual attention further processes low-level features in the detail recovery module. The attention structure adopts a hybrid design, creating global and local branches to preserve global and local aspects of perception. Finally, global and local context information are summed. The global branches include: (a) Context modeling module, one branch uses reshape to change the feature dimension, and the other branch also uses reshape to change the feature dimension and then passes the softmax function. The output of the multiplication of the two is to obtain the attention weight of all positions; ( b) The feature conversion module captures the dependencies between channels and subjects the output of the context modeling module to 1×1 convolution + activation + 1×1 convolution. The local branch utilizes two parallel 1×1 convolutional layers to extract local contextual information, while one of the branches performs global average pooling before the convolution operation to incorporate some global information. Finally, the summation operation merges the features extracted from the two branches. The detailed overall process of contextual attention is summarized as follows:

其中x是输入特征，α_i是全局注意力池的权重，表示上下文建模模块，该模块通过与权重α_i加权平均将所有位置的特征组合在一起，以获得全局上下文特征。Conv2(ReLU(Conv1(·)))表示特征变换以捕获通道依赖关系。而Conv2(BN(Conv1(·)))+Conv2(BN(Conv1(Avgpool(·)))则代表上下文注意力的提取局部上下文信息部分。Conv(·)表示卷积部分，ReLU(·)表示激活部分，BN(·)表示批归一化部分，Avgpool(·)表示全局平均池化。(4)注意力特征融合模块用来融合图像细节和颜色平衡结果，如图5所示；所述的具体方法：注意力特征融合模块用来融合图像细节和颜色平衡结果，并引入了上下文注意力来引导融合过程，使网络能够同时关注全局和局部信息。该模块可以自适应地融合细节恢复模块和颜色平衡模块生成的两个潜在结果，具体体现在首先将细节恢复模块和颜色平衡模块生成的结果以及上下文注意力求的输出和，将求和结果分别通过两个平行的1×1卷积层结构，其中一个分支在卷积操作前执行全局平均池化，再将两个结果相加经过激活函数，得到得出细节恢复模块和的颜色平衡模块对应权重，然后将细节恢复模块和颜色平衡模块生成的结果与权重相乘并求和，得出最终融合的结果。因此，注意特征融合可以表示为：where x is the input feature, α_i is the weight of the global attention pool, Represents the context modeling module, which combines the features of all locations together by weighted averaging with weight α_i to obtain global contextual features. Conv2(ReLU(Conv1(·))) represents feature transformation to capture channel dependencies. And Conv2(BN(Conv1(·)))+Conv2(BN(Conv1(Avgpool(·))) represents the part of context attention that extracts local context information. Conv(·) represents the convolution part, and ReLU(·) represents In the activation part, BN(·) represents the batch normalization part, and Avgpool(·) represents the global average pooling. (4) The attention feature fusion module is used to fuse image details and color balance results, as shown in Figure 5; described The specific method: the attention feature fusion module is used to fuse image details and color balance results, and contextual attention is introduced to guide the fusion process, allowing the network to focus on global and local information at the same time. This module can adaptively fuse the detail recovery module and two potential results generated by the color balance module. Specifically, the results generated by the detail restoration module and the color balance module are first summed with the output of the contextual attention. The summed results are passed through two parallel 1×1 convolutional layers. structure, one of the branches performs global average pooling before the convolution operation, and then adds the two results through the activation function to obtain the corresponding weights of the detail recovery module and the color balance module, and then combines the detail recovery module and the color balance module The generated results are multiplied with the weights and summed to obtain the final fusion result. Therefore, attention feature fusion can be expressed as:

weight＝Conv2(BN(Conv1(x)))+Conv2(BN(Conv1(Avgpool(x)))weight＝Conv2(BN(Conv1(x)))+Conv2(BN(Conv1(Avgpool(x)))

其中+表示逐元素求和，代表逐元素乘法。x表示输入特征，a代表细节恢复模块的输出，b代表颜色平衡模块的输出，c代表上下文注意力的输出。Conv(·)表示卷积部分，ReLU(·)表示激活部分，BN(·)表示批归一化部分，Avgpool(·)表示全局平均池化。where + represents element-wise summation, Represents element-wise multiplication. x represents the input feature, a represents the output of the detail recovery module, b represents the output of the color balance module, and c represents the output of contextual attention. Conv(·) represents the convolution part, ReLU(·) represents the activation part, BN(·) represents the batch normalization part, and Avgpool(·) represents global average pooling.

(5)通过全局颜色渲染模块对图片进行整体调整如图6所示；(5) The overall adjustment of the image is made through the global color rendering module as shown in Figure 6;

具体的，所述具体方法：全局颜色渲染模块将通道分成4组，其中三个组分别通过调整块进行调整，另外一个组不经过调整块调整，用来保留原有图像信息。调整全程使用1×1卷积，将每个像素进行转换从而调整整体图像。调整块(1×1卷积层+实例归一化层+1×1卷积层)基于组别来自适应地选择哪些组应参与增强。然后通过使用concat操作，将4个组调整后的结果进行合并。最后，全局颜色渲染模块采用1×1卷积来交互地处理特征信息。Specifically, the specific method is: the global color rendering module divides the channels into 4 groups, three of which are adjusted through adjustment blocks respectively, and the other group is not adjusted through adjustment blocks to retain the original image information. The entire adjustment process uses 1×1 convolution to transform each pixel to adjust the overall image. The adjustment block (1 × 1 convolutional layer + instance normalization layer + 1 × 1 convolutional layer) adaptively selects which groups should participate in enhancement based on the group. Then use the concat operation to merge the adjusted results of the four groups. Finally, the global color rendering module adopts 1×1 convolution to interactively process feature information.

S3、训练构建好的基于级联自适应的水下图像增强网络模型；S3. Train the constructed underwater image enhancement network model based on cascade adaptation;

具体的，所述步骤S3的具体方法：训练过程中，整体网络的结果进行监督，并将GT图同时进行裁剪，使其与输入数据具有相同的尺寸，通过对比预测图与GT图的差异，观测损失值的和是否收敛来判断网络的训练过程。Specifically, the specific method of step S3: During the training process, the results of the overall network are supervised, and the GT map is simultaneously cropped so that it has the same size as the input data. By comparing the difference between the prediction map and the GT map, Observe whether the sum of loss values converges to judge the training process of the network.

输入数据的尺寸统一调整为256×256×3，批处理大小设置为16，在训练过程中利用Adam优化器对模型参数进行更新操作，初始学习率设置为1e-4。The size of the input data is uniformly adjusted to 256×256×3, the batch size is set to 16, the Adam optimizer is used to update the model parameters during the training process, and the initial learning rate is set to 1e-4.

上述技术方案中，结合使用I1损失、SSIM损失和感知损失。损失函数计算预测图像和GT之间的L1距离。损失函数中使用SSIM损失对预测图像施加结构和纹理相似性。像素SSIM对像素x的定义为：In the above technical solution, I1 loss, SSIM loss and perceptual loss are used in combination. The loss function calculates the L1 distance between the predicted image and GT. SSIM loss is used in the loss function to impose structural and texture similarity on the predicted image. The definition of pixel x by pixel SSIM is:

其中SSIM(x)的计算需要像素x的邻域即patch。预测图像的patch的均值和标准差分别为μ_I，σ_I，和/>为GT的均值和标准差。/>是交叉协方差。这里取x像素周围11x11的patch进行计算，并设置C₁＝0.01和C₂＝0.03。The calculation of SSIM(x) requires the neighborhood of pixel x, which is the patch. The mean and standard deviation of the patch of the predicted image are μ_I , σ_I , respectively. and/> are the mean and standard deviation of GT. /> is the cross covariance. Here we take the 11x11 patch around the x pixel for calculation, and set C₁ =0.01 and C₂ =0.03.

SSIM的损失函数可以写成设置E(x)＝1-SSIM(x)：The loss function of SSIM can be written as setting E(x)=1-SSIM(x):

感知损失表示为预测图像和地面真实图像的特征表示之间的距离，来衡量图像之间的高级感知和语义差异：The perceptual loss is expressed as the distance between the feature representation of the predicted image and the ground truth image to measure the high-level perceptual and semantic differences between the images:

其中φ_j是预训练的VGG19的第j个卷积层；N是训练过程中每批的数量；C_jH_jW_j表示VGG-19网络特征图的第j层维度，其中C_j、H_j、W_j分别表示特征图的数量、高度和宽度。where φ_j is the j-th convolutional layer of pre-trained VGG19; N is the number of each batch during the training process; C_j H_j W_j represents the j-th layer dimension of the VGG-19 network feature map, where C_j , H_j and W_j represent the number, height and width of feature maps respectively.

本训练过程中，通过观测3个损失值的和是否收敛来判断网络的训练过程，若其值收敛，则此网络训练完成。During this training process, the network training process is judged by observing whether the sum of the three loss values converges. If the value converges, the network training is completed.

采用上述技术方案，细节恢复模块和颜色平衡模块，分别用于解决水下图像中细节模糊和颜色失真的问题。此外，引入了上下文注意力以重复使用低级特征，增强网络的细节重建能力。并且结合了基于注意机制的融合机制，以自适应地选择最具代表性的特征，并合并来自两个处理流的结果。全局颜色渲染模块进行后处理增强图像，提升网络的色彩感知，并进一步微调颜色。灵活的网络结构配合合理的损失函数设置，能得到较优的干净的水下图像图。如图7所示，与其他水下图像增强方法对比，本发明对物体的细节处理更为精细，色彩处理恰当，能产生更好的效果。同时本发明构建的模型，易于为工程实际应用人员理解，以便更快更好的进行工程部署。相比现有技术而言，结构简单，同时，该方法对细节处理的精细程度和色彩有明显的提升。Using the above technical solution, the detail recovery module and the color balance module are respectively used to solve the problems of blurred details and color distortion in underwater images. In addition, contextual attention is introduced to reuse low-level features and enhance the detailed reconstruction ability of the network. And an attention-based fusion mechanism is incorporated to adaptively select the most representative features and merge the results from the two processing streams. The global color rendering module performs post-processing to enhance the image, improve the network's color perception, and further fine-tune the color. Flexible network structure combined with reasonable loss function settings can produce better and clean underwater images. As shown in Figure 7, compared with other underwater image enhancement methods, the present invention handles the details of objects more finely, processes colors appropriately, and can produce better effects. At the same time, the model constructed by the present invention is easy to be understood by practical engineering application personnel, so as to facilitate faster and better engineering deployment. Compared with the existing technology, the structure is simple, and at the same time, this method significantly improves the fineness and color of detail processing.

另外，由于其能有效去除颜色偏差和提高图片的清晰程度，在水下考古、水下目标检测和海底探索等应用有广泛的应用价值。In addition, because it can effectively remove color deviation and improve the clarity of images, it has extensive application value in underwater archeology, underwater target detection, and seabed exploration.

以上结合附图对本发明的实施方式作了详细说明，但本发明不限于所描述的实施方式。对于本领域的技术人员而言，在不脱离本发明原理和精神的情况下，对这些实施方式包括部件进行多种变化、修改、替换和变型，仍落入本发明的保护范围内。The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. For those skilled in the art, without departing from the principle and spirit of the invention, various changes, modifications, substitutions and modifications can be made to these embodiments, including components, and still fall within the protection scope of the invention.