CN117725252B

Movatterモバイル変換

Info

Publication number: CN117725252B
Application number: CN202311526961.1A
Authority: CN
Inventors: 金炜; 蔡舟涛
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2024-12-06
Anticipated expiration: 2043-11-16
Also published as: CN117725252A

Abstract

The invention discloses a multi-tag hash searching method for a satellite cloud picture of an interpretable deep network, which relates to the field of multi-tag search for the satellite cloud picture, and aims to input a target cloud picture into a pre-trained interpretable deep network to obtain a target hash code, wherein the target hash code and the hash codes of each historical cloud picture are utilized to carry out similarity measurement and search out a historical cloud picture similar to the target cloud picture, the interpretable deep network comprises a feature learning module and a hash learning module, the feature learning module comprises a global feature learning module and a local feature learning module, the local feature learning module comprises an attention branch network and a suppression module, the attention branch network improves the effectiveness of feature extraction, the suppression module can gradually find out other complementary areas to endow each local feature with special semantic information, the search has certain interpretability, and a multi-tag supervision mechanism is introduced into the hash learning module to better describe the complex semantic content of the satellite cloud picture, so that the accurate search for the satellite cloud picture is realized.

Description

Satellite cloud image multi-label hash retrieval method of interpretable deep network

Technical Field

The invention relates to the technical field of satellite cloud image multi-label retrieval, in particular to a satellite cloud image multi-label hash retrieval method of an interpretable deep network.

Background

The cloud plays an important role in the weather system, the cloud type, the cloud phase state and the cloud height have important influences on the generation and development of the weather system, the weather is known through the observation of the cloud, the satellite image is a powerful tool for detecting the cloud and the weather system, different weather conditions can be known through the satellite image, the strength and the future development trend of the weather system are estimated, and all-weather basis is provided for weather forecast and disaster weather forecast.

Content-based IMAGE RETRIEVAL (CBIR) is an important branch of image retrieval, and images similar to a satellite cloud image can be searched in a database. At present, the CBIR mainly adopts two modes, namely, the characteristic information of the cloud picture is extracted by means of a traditional characteristic extraction method and is searched according to the characteristic information, the characteristic extraction method is a gray level symbiotic matrix method or a grid-based inscribed circle method, but the content of the satellite cloud picture is complex, the similarity between various images is high, and the weather semantic information contained in the cloud picture cannot be accurately revealed by using manual characteristics, so that the similarity searching effect is poor. The second way is to refer to a processing way of remote sensing images, and the features of the images are extracted by means of a convolutional neural network so as to realize similarity retrieval, but in the way, single-tag retrieval is usually relied on, which means that only the most remarkable tag or feature can be learned by a model, and for simple scenes, such as forest scenes, the way can obtain better retrieval results, but for complex scenes, such as satellite cloud images, the single-tag mode cannot distinguish complex categories in the images, and rich and complex information contained in the images is ignored.

Therefore, how to design an effective scheme to realize the similarity retrieval of the satellite cloud images is a current urgent problem to be solved.

Disclosure of Invention

The invention solves the problem of providing a satellite cloud picture multi-label hash retrieval method of an interpretable depth network, wherein the extracted fusion characteristic comprises semantic information rich in target cloud pictures, and the target hash codes generated subsequently on the basis also comprise rich semantics and have interpretability, so that the reliability of the whole interpretable depth network is improved, and the accurate retrieval of the satellite cloud pictures is realized.

In order to solve the problems, the invention provides a satellite cloud image multi-tag hash retrieval method of an interpretable deep network, which comprises the following steps:

and carrying out similarity measurement according to the target hash code and the hash codes of each historical cloud image in the historical cloud image database, and searching out the historical cloud images similar to the target cloud images.

The method has the beneficial effects that the fusion characteristics extracted in the method comprise semantic information rich in target cloud patterns, and the target hash codes generated subsequently on the basis of the semantic information are rich in semantics and have interpretability, so that the reliability of the whole interpretability depth network is improved, the accurate retrieval of satellite cloud patterns is realized, the method provides great help for meteorological workers to perform next research, and the method is beneficial to practical application.

Further, the hash learning module in the interpretable deep network generates the target hash code according to the fusion feature, including:

Inputting the fusion characteristics as input items to a pre-trained hash code learning module to obtain hash codes corresponding to the fusion characteristics;

determining a target hash code of the target cloud image according to the hash code and a first preset relation;

the first preset relation is:

wherein d is the hash code and b is the target hash code.

In the scheme, the fusion characteristics comprise rich semantic information, and the generated hash codes also comprise rich semantic information, so that the interpretability deep network has interpretability, and the reliability of the output result of the network is improved.

Further, the step of generating the fusion feature of the target cloud image by the feature learning module in the interpretable deep network includes:

inputting the target cloud image into a backbone network to obtain a deep feature image;

Determining global features of global semantic information representing the target cloud image according to the deep feature image and a preset global feature extraction strategy;

Determining local features representing local semantic information of a plurality of cloud types in different areas in the target cloud image based on the deep feature image and a preset local feature extraction strategy;

and splicing the global features and the local features to obtain fusion features.

Further, determining global features of global semantic information characterizing the target cloud image according to the deep feature image and a preset global feature extraction strategy, including:

Carrying out average pooling operation on the deep feature map;

And inputting the result after the average pooling operation to a full-connection layer for feature dimension reduction so as to obtain global features representing global semantic information of the target cloud image.

In the scheme, the global features are simply, accurately and reliably determined through the average pooling operation and the feature dimension reduction of the full connection layer, and the determination of the subsequent fusion features is facilitated.

Further, determining local features of local semantic information characterizing cloud types in different areas of the target cloud image based on the deep feature image and a preset local feature extraction strategy comprises:

convolving the deep feature map to obtain a first sub-map;

Processing the first sub-graph according to a preset attention strategy to obtain a first attention map representing the mining result of the most distinctive region;

generating a first local feature map according to the first attention map and the first subgraph;

Determining a first local feature of the local semantic information of the cloud type under the first local feature map so as to splice the global feature and the first local feature, and further obtaining a fusion feature corresponding to the target cloud map.

In the scheme, the design of the preset local feature extraction strategy ensures that the most distinctive region can be mined so as to extract the first local feature of the region and add the first local feature into the fusion feature, thereby enriching the semantics of the fusion feature.

Further, processing the first sub-graph according to a preset attention policy to obtain a first attention map representing a mining result of the most distinctive region, including:

activating the first sub-graph to determine a corresponding activation feature graph thereof;

And performing convolution operation and aggregation normalization operation on the activation feature map in sequence, and processing an execution result by using a Sigmoid activation function to obtain a first attention map.

In this scheme, the design of the preset attention strategy takes into account the attention mechanism, and can focus on the most distinctive area.

Further, after processing the first sub-graph according to a preset attention policy to obtain a first attention map representing a mining result of the most distinctive region, the method further includes:

s21. let o=1;

s22, determining a second pixel point based on a first attention map corresponding to the first sub-graph and a second preset relational expression so as to obtain a suppression map corresponding to the second pixel point;

The second preset relation is:

Wherein μ^k′ is a pixel value of a kth second pixel in the suppression chart, μ^k is a pixel value of a kth pixel in the first attention map, a is a preset super parameter, μ^mean is an average number of pixel values of all pixels in the first attention map, and μ^std is a standard deviation of pixel values of all pixels in the first attention map;

s23, determining an o second subgraph according to the inhibition chart and the first subgraph;

S24, processing the o second sub-graph according to the preset attention strategy to obtain an o second attention graph so as to generate an o second local feature graph according to the o second attention graph and the o second sub-graph, and further determining an o second local feature corresponding to the o second local feature graph;

S25: let o=o+1;

S26, judging whether the o reaches the preset branch number, wherein the preset branch number is an integer not less than 2, if not, entering S27, and if so, entering S29;

s27, determining a new second pixel point according to the o-1 second attention map and the second preset relation to obtain a new inhibition map corresponding to the new second pixel point;

S28, generating an o-th second sub-graph according to the new inhibition graph and the o-1-th second sub-graph, and returning to S24;

And S29, splicing the global feature, the first local feature and the obtained total o-1 second local features to obtain fusion features corresponding to the target cloud image.

In the scheme, the mining of the residual specific object areas except the most distinctive area is complemented by generating a plurality of second local features, so that the formed fusion features have rich semantic information, the final retrieval result also has a certain interpretation, and the complex semantic content of the satellite cloud picture is better described.

Further, the interpretive depth network is trained in advance based on a cloud image training set, the cloud image training set comprises N training cloud images, each training cloud image corresponds to at least one real image classification label, the total category number of the real image classification labels corresponding to all the training cloud images is C, and N and C are integers not less than 1, and the training step of the interpretive depth network comprises the following steps:

inputting the training cloud image into an interpretable depth network to be trained to obtain hash-like code characteristics;

Inputting the hash code characteristics into a classification layer to obtain corresponding image classification label predicted values;

training the interpretable depth network by utilizing the hash-like code features, the image classification label predicted value and a preset loss function;

The preset loss function is as follows:

L_total＝λL_cls+ηL_q+νL_b

L_total is the training error, L_cls is a multi-label classification loss, lambda is a first penalty parameter corresponding to L_cls, L_q is a quantization loss, eta is a second penalty parameter corresponding to L_q, L_b is a bit balance loss, and v is a third penalty parameter corresponding to L_b;

wherein, the L_cls is determined based on a third preset relational expression, and the third preset relational expression is:

An ith predicted value in the image classification label predicted values corresponding to the nth training cloud image, omega_i is a weight value,Classifying the ith real value in the label for the real image corresponding to the nth training cloud image, wherein sigma is a Sigmoid activation function;

the L_q is determined based on a fourth preset relational expression, where the fourth preset relational expression is:

k is the preset hash code length, d_n is the hash code-like characteristic corresponding to the nth training cloud picture, and e is a K-dimensional vector with a value of 1;

the L_b is determined based on a fifth preset relationship, where the fifth preset relationship is:

mean (d_n) represents the mean value of the d_n.

In the scheme, multi-label supervision is introduced into the interpretable deep network, so that more compact hash codes can be generated, and the similarity retrieval efficiency is improved.

Drawings

Fig. 1 is a flowchart of a satellite cloud image multi-tag hash searching method of an interpretable deep network provided by the invention;

FIG. 2 is a schematic diagram illustrating implementation of a preset local feature extraction strategy according to the present invention;

FIG. 3 is a schematic illustration of information delivery of an illustrative depth network according to the present invention;

Reference numerals illustrate:

F1-first sub-graph, fm-activation profile, M1-first attention profile, LFM 1-first local profile, GF-global profile, LF 1-first local profile, LF 2-first second local profile, LF 3-second local profile, F21-first second sub-graph, F22-second sub-graph, M2-first second attention profile.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

Referring to fig. 1, fig. 1 is a flowchart of a method for searching multi-tag hash of satellite cloud images of an interpretable deep network according to the present invention.

The satellite cloud image multi-tag hash retrieval method of the interpretable deep network comprises the following steps:

S11, inputting the obtained target cloud image as an input item to a pre-trained interpretable depth network to obtain a target hash code of the target cloud image, wherein the interpretable depth network comprises a feature learning module with interpretable fusion features for generating the target cloud image and a hash learning module for generating the target hash code, the feature learning module comprises a global feature learning module and a local feature learning module, the global feature learning module is a unit for learning a single global semantic feature for representing the target cloud image, the local feature learning module is a unit for learning local semantic features of a plurality of cloud types in different areas in the target cloud image, and the output of the global feature learning module and the output of the local feature learning module are spliced and combined to obtain the fusion features;

And S12, performing similarity measurement according to the target hash code and the hash codes of each historical cloud image in the historical cloud image database, and searching out a historical cloud image similar to the target cloud image.

Specifically, considering that different areas in a satellite cloud image often contain different cloud types, an interpretable depth network is designed, the network is trained in advance based on a multi-label cloud image training set, the defect that abundant semantic information in the cloud image is ignored when network training is carried out by means of a single semantic label is overcome, a feature learning module in the interpretable depth network can extract global features and a plurality of local features so as to obtain fusion features, a concentration branch network and a suppression module are applied in the extraction process of each local feature, the global features and each local feature are beneficial to finding out a plurality of different complementary areas through suppression layer by layer, a multi-label supervision mechanism is introduced by a hash learning module in the interpretable depth network, the content of the target cloud image can be represented by generating a compact target hash code, the cost of time and space is reduced, the abundant semantic information is also achieved, the Hamming distance is calculated by means of the target hash code and the hash codes to search similarity, and the interpretation efficiency is improved through simple exclusive or operation to search.

In addition, the method provided by the application is widely tested on a public satellite cloud image data set, namely LSCIDMR-V2, and the test result shows that mAP on LSCIDMR-V2 reaches 92.27 percent, so that the method has excellent performance.

In summary, the application provides a satellite cloud image multi-label hash retrieval method of an interpretable deep network, wherein the extracted fusion characteristic comprises semantic information rich in target cloud images, and the target hash code generated subsequently on the basis also comprises rich semantics and has interpretability, so that the reliability of the whole interpretable deep network is improved, an air image worker can trust the retrieval result, accurate retrieval of the satellite cloud images is realized, and practical application is facilitated.

As a preferred embodiment, the hash learning module in the interpretable deep network generates a target hash code from the fusion feature, including:

determining a target hash code of the target cloud picture according to the class hash code and a first preset relational expression;

The first preset relation is:

where d is a hash-like code and b is a target hash code.

Specifically, the compact target hash code can be generated for subsequent retrieval, and the target hash code also comprises rich semantic information, so that the interpretable deep network has interpretability, and the reliability of the output result of the network is improved.

As a preferred embodiment, the step of generating the fusion feature of the target cloud image by the feature learning module in the interpretive deep network includes:

determining global features of global semantic information representing a target cloud image according to the deep feature image and a preset global feature extraction strategy;

Determining local features of local semantic information representing a plurality of cloud types in different areas in a target cloud image based on a deep feature image and a preset local feature extraction strategy;

Specifically, the backbone network includes, but is not limited to ResNet, and the deep feature map can be obtained by extracting the target cloud map through the ResNet. Referring to fig. 3, the module generating the global feature GF is referred to as a global feature learning module in fig. 3, and the generated local features (including the first local feature LF1, the first second local feature LF2, and the second local feature LF 3) are referred to as local feature learning modules.

As a preferred embodiment, determining global features of global semantic information characterizing a target cloud image according to a deep feature image and a preset global feature extraction strategy includes:

carrying out average pooling operation on the deep feature map;

And inputting the result after the average pooling operation to a full-connection layer for feature dimension reduction so as to obtain global features of global semantic information representing the target cloud image.

Specifically, the deep feature map sequentially passes through the average pooling operation and the full-connection layer, so that the global feature can be simply, accurately and reliably determined, and the determination of the subsequent fusion feature is facilitated.

As a preferred embodiment, determining local features of local semantic information characterizing cloud types in different areas in a target cloud image based on a deep feature image and a preset local feature extraction strategy includes:

Convolving the deep feature map to obtain a first sub-graph F1;

Processing the first sub-graph F1 according to a preset attention strategy to obtain a first attention map M1 representing the mining result of the most distinctive region;

generating a first local feature map LFM1 according to the first attention map M1 and the first sub-graph F1;

and determining a first local feature of the local semantic information representing the cloud type under the first local feature map LFM1 so as to splice the global feature and the first local feature and further obtain a fusion feature corresponding to the target cloud map.

Specifically, considering that a satellite cloud image often comprises a plurality of cloud types, features of the local areas need to be captured to generate more meaningful fusion features and hash codes, the attention mechanism is considered by the design mechanism of a preset local feature extraction strategy so as to focus on a foreground part of the image more, and the salient part is highlighted, so that the features of the local areas are captured better.

For development, the deep feature map is convolved by 1×1 to obtain a first sub-map F1, please refer to fig. 2, fig. 2 is a schematic implementation diagram of a preset local feature extraction strategy of the present invention, the first sub-map F1 is processed according to the preset attention strategy to obtain a first attention map M1, a Hadamard product between elements of the first sub-map F1 and the first attention map M1 is performed to obtain a first local feature map LFM1, and a process of generating the first attention map M1 is denoted as an attention mechanism in fig. 3.

The specific manner of determining the first local feature of the cloud type local semantic information under the first local feature map LFM1 may be that the first local feature map LFM1 is subjected to an average pooling operation, and the result after the average pooling operation is input to a full-connection layer to perform feature dimension reduction so as to obtain the first local feature of the cloud type local semantic information under the first local feature map LFM1, as shown in fig. 3.

As a preferred embodiment, processing the first sub-graph according to a preset attention policy to obtain a first attention graph characterizing the mining results of the most distinct region comprises:

activating the first sub-graph F1 to determine a corresponding activation feature graph Fm;

the convolution operation and the aggregation normalization operation are sequentially carried out on the activation feature map Fm, and the execution result is processed by utilizing a Sigmoid activation function to obtain a first attention map M1.

Specifically, referring to fig. 2, the step of activating the first sub-graph F1 to determine the corresponding activation feature map Fm may be that the first sub-graph F1 sequentially passes through a3×3 convolution layer, a batch normalization layer, a 1×1 convolution layer, and a batch normalization layer, and then is activated by using Relu activation functions to obtain an activation feature map Fm, and the activation feature map Fm sequentially passes through the 1×1 convolution layer and the batch normalization layer, and then is processed by using a Sigmoid activation function to obtain a first attention map M1.

As a preferred embodiment, after processing the first sub-graph according to a preset attention policy to obtain a first attention graph characterizing the mining result of the most distinct region, the method further comprises:

s21. let o=1;

The second preset relation is:

Wherein μ^k′ is a pixel value of a kth second pixel in the suppression map, μ^k is a pixel value of a kth pixel in the first attention map, a is a preset super parameter, μ^mean is an average number of pixel values of all pixels in the first attention map, and μ^std is a standard deviation of pixel values of all pixels in the first attention map;

s23, determining an o second sub-graph according to the inhibition graph and the first sub-graph;

S24, processing the o second sub-graph according to a preset attention strategy to obtain an o second attention graph, so as to generate an o second local feature graph according to the o second attention graph and the o second sub-graph, and further determining an o second local feature corresponding to the o second local feature graph;

S25: let o=o+1;

s26, judging whether o reaches the preset branch number, wherein the preset branch number is an integer not less than 2, if not, entering S27, and if so, entering S29;

s27, determining a new second pixel point according to the o-1 second attention map and a second preset relation to obtain a new inhibition map corresponding to the new second pixel point;

S28, generating an o second sub-graph according to the new inhibition graph and the o-1 second sub-graph, and returning to S24;

In particular, it is further considered that although the most distinctive region can be mined by the mining of the first local feature, the remaining object-specific regions in the cloud image complementary region are sometimes ignored, that is, a context exists between the cloud-like regions in the satellite cloud image, which means that if the most distinctive region is simply erased, the mining of the remaining specific object region may be affected, and for this purpose, it is necessary to enhance the portion of the other specific object region while suppressing the most distinctive region before, and to develop, after obtaining the first attention map, the determination of the plurality of second local features is achieved by means of the above manner.

The preset super parameter a is a parameter for distinguishing the extent of the suppression ratio of the significant region from the enhancement ratio of the other active region, and the higher a, the higher the suppression and enhancement extent, by this operation based on suppression enhancement, the most distinctive region in the previous stage will be partially suppressed. At the same time, those other complementary regions that are activated will be enhanced accordingly, maintaining the relationship between the activation regions of the previous stage and the activation regions generated in the subsequent stage.

It should be further noted that the specific value of the preset number of branches is set according to the training effect, which is not limited to 3, in S23, the method of determining the o second sub-graph according to the suppression graph and the first sub-graph may be that a Hadamard product between elements is performed on the suppression graph and the first sub-graph, in S24, the method of determining the o second local feature corresponding to the o second local feature graph may be that an average pooling operation is performed on the o second local feature graph, and a result after the average pooling operation is input to the full-connection layer to perform feature dimension reduction, so as to obtain the o second local feature representing the cloud type local semantic information under the o second local feature graph.

Referring to fig. 3, where the preset number of branches=3 is taken as an example for illustration, it can be seen that the fusion feature at this time includes a global feature GF, a first local feature LF1, and a total of 2 second local features (respectively, a first second local feature LF2 and a second local feature LF 3), and in fig. 3, a process of generating the first second sub-graph F21 is denoted as a suppression module.

As a preferred embodiment, the interpretive depth network is trained in advance based on a cloud image training set, the cloud image training set comprises N training cloud images, each training cloud image corresponds to at least one real image classification label, the total category number of the real image classification labels corresponding to all the training cloud images is C, and N and C are integers not less than 1, and the training step of the interpretive depth network comprises the following steps:

Training an interpretable depth network by utilizing the hash-like code characteristics and the image classification label predicted value and a preset loss function;

the preset loss function is:

L_total＝λL_cls+ηL_q+νL_b

L_total is training error, L_cls is multi-label classification loss, lambda is a first penalty parameter corresponding to L_cls, L_q is quantization loss, eta is a second penalty parameter corresponding to L_q, L_b is bit balance loss, and v is a third penalty parameter corresponding to L_b;

Wherein, L_cls is determined based on a third preset relation, which is:

An ith predicted value in the image classification label predicted values corresponding to the nth training cloud image, omega_i is a weight value,Classifying the ith real value in the label for the real image corresponding to the nth training cloud picture, wherein sigma is a sigmoid activation function;

L_q is determined based on a fourth preset relationship:

l_b is determined based on a fifth preset relationship:

mean (d_n) represents the mean value of d_n.

It should be noted that, in the cloud image training set, a plurality of training cloud images are divided into a plurality of batches and input to the interpretive depth network for training (each batch includes N training cloud images), and preferably 64 batches are set, in addition, unlike a general natural image, a satellite cloud image is a multispectral image, different channels of which reflect different physical properties, in the application, the band images of the 1 st, 2 nd, 3 rd and 5 th channels of the LSCIDMR-V2 dataset are selected as the training cloud images during training to obtain the input of the interpretive depth network, wherein the 1 st, 2 nd and 3 th channels are visible light bands, provide aerosol physical characteristics, and the 5 th channels are near infrared bands, provide cloud physical parameter data, and in addition, in order to keep more details of the training cloud image, the robustness of the batch processing is enhanced, a series of preprocessing operations such as 256 x-size adjustment of the cloud image can be performed on the input cloud image 256 x, unified normalization data can be performed.

It can be understood that any one training cloud image corresponds to at least one real image classification label, so that training of the interpretive deep network is completed by using the multi-label labeled training cloud image and the preset loss function to achieve a desired training result, wherein the label vector corresponding to any one training cloud image is { z₁,z₂,...,z_C},z_i ∈ {0,1}, where 1 indicates that the training cloud image has the ith real image classification label, and 0 indicates that the training cloud image does not have the ith real image classification label. The first penalty parameter lambda, the second penalty parameter eta and the third penalty parameter v can be used for balancing multi-label classification loss, quantization loss and bit balance loss, and the preferred setting of the values of the parameters in the scheme is provided as follows, wherein the first penalty parameter lambda=0.5, the second penalty parameter eta=0.5 and the third penalty parameter v=0.0002.

In addition, L_q is used as quantization loss for measuring quantization error generated when fusion characteristics are converted into binary training hash codes, L_b is used as bit balance loss for ensuring that 50% probability of each training hash code is 0 or 1, and all real image classification labels possibly included in a training cloud image are regarded as a classification problem, namely the problem or not, when designing, the multi-label classification loss L_cls is designed, wherein sigma is used as a Sigmoid activation function, and the sigma is applied to the input of the Sigmoid activation functionThe specific operations performed are shown below and are intended to beMapping into (0, 1) intervals:

Therefore, the cloud image training set with multiple labels is utilized to train the interpretable depth network, and each real image classification label contains the whole semantic information of the cloud image, so that the cloud image classification label can be used as a semantic clue to assist the interpretable depth network, the accuracy of the generated hash code is improved, and the hash code also has rich semantics. After the interpretability depth network training is completed, the method can be used for determining the hash codes of each historical cloud image in the cloud image data set, and after the target hash codes of the target cloud images are determined, the Hamming distances between the target hash codes and the hash codes of each historical cloud image can be calculated and sequenced, so that similarity retrieval is realized, and the historical cloud images which are most similar to the target cloud images are found.

Although the present disclosure is described above, the scope of protection of the present disclosure is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications will fall within the scope of the invention.

It should also be noted that in this specification, relational terms such as first, second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily implying any actual such relationship or order between such entities or actions.

Claims

Translated fromChinese

1.一种可解释性深度网络的卫星云图多标签哈希检索方法，其特征在于，包括：1. A satellite cloud image multi-label hash retrieval method based on an interpretable deep network, characterized by comprising:

根据所述目标哈希码及历史云图数据库中各历史云图的哈希码进行相似度测量，检索出与所述目标云图相似的历史云图；Perform similarity measurement based on the target hash code and the hash codes of each historical cloud map in the historical cloud map database, and retrieve a historical cloud map similar to the target cloud map;

所述的可解释性深度网络中的特征学习模块对所述目标云图的融合特征的生成步骤为：将所述目标云图输入至骨干网络以得到深层特征图；根据所述深层特征图及预设全局特征提取策略，确定表征所述目标云图的全局语义信息的全局特征；基于所述深层特征图及预设局部特征提取策略，确定表征所述目标云图中不同区域下多个云类型的局部语义信息的局部特征；将所述全局特征与所述局部特征进行拼接，以得到融合特征；The steps for generating the fusion features of the target cloud map by the feature learning module in the interpretable deep network are as follows: inputting the target cloud map into the backbone network to obtain a deep feature map; determining the global features representing the global semantic information of the target cloud map according to the deep feature map and the preset global feature extraction strategy; determining the local features representing the local semantic information of multiple cloud types in different regions in the target cloud map based on the deep feature map and the preset local feature extraction strategy; splicing the global features with the local features to obtain fusion features;

所述的基于所述深层特征图及预设局部特征提取策略，确定表征所述目标云图中不同区域下云类型的局部语义信息的局部特征的具体过程包括：将所述深层特征图进行卷积后得到第一子图；根据预设注意力策略对所述第一子图进行处理以得到表征最具区别性区域的挖掘结果的第一注意力图；根据所述第一注意力图及所述第一子图生成第一局部特征图；确定表征所述第一局部特征图下云类型的局部语义信息的第一局部特征，以便将所述全局特征与所述第一局部特征进行拼接，进而得到所述目标云图对应的融合特征；The specific process of determining the local features representing the local semantic information of cloud types in different regions in the target cloud map based on the deep feature map and the preset local feature extraction strategy includes: convolving the deep feature map to obtain a first sub-map; processing the first sub-map according to the preset attention strategy to obtain a first attention map representing the mining results of the most distinctive regions; generating a first local feature map according to the first attention map and the first sub-map; determining the first local features representing the local semantic information of cloud types under the first local feature map, so as to splice the global features with the first local features, and then obtain the fusion features corresponding to the target cloud map;

所述的根据预设注意力策略对所述第一子图进行处理以得到表征最具区别性区域的挖掘结果的第一注意力图之后，还包括：After the first sub-graph is processed according to the preset attention strategy to obtain the first attention graph representing the mining result of the most distinctive region, the method further includes:

S21：令o＝1；S21: set o=1;

S22：基于所述第一子图对应的第一注意力图及第二预设关系式确定第二像素点，以得到所述第二像素点对应的抑制图；S22: determining a second pixel point based on a first attention map corresponding to the first sub-image and a second preset relationship to obtain a suppression map corresponding to the second pixel point;

所述第二预设关系式为：The second preset relationship is:

其中，μ^k′为所述抑制图中的第k个第二像素点的像素值，μ^k为所述第一注意力图中的第k个像素点的像素值，a为预设超参数，μ^mean为所述第一注意力图中所有像素点的像素值的平均数，μ^std为所述第一注意力图中所有像素点的像素值的标准差；Wherein, μ^k′ is the pixel value of the kth second pixel in the suppression map, μ^k is the pixel value of the kth pixel in the first attention map, a is a preset hyperparameter, μ^mean is the average of the pixel values of all pixels in the first attention map, and μ^std is the standard deviation of the pixel values of all pixels in the first attention map;

S23：根据所述抑制图及所述第一子图确定第o个第二子图；S23: determining an oth second sub-image according to the suppression image and the first sub-image;

S24：根据所述预设注意力策略对第o个第二子图进行处理以得到第o个第二注意力图，以便根据第o个第二注意力图及第o个第二子图生成第o个第二局部特征图，进而确定第o个第二局部特征图对应的第o个第二局部特征；S24: Processing the oth second sub-graph according to the preset attention strategy to obtain an oth second attention graph, so as to generate an oth second local feature graph according to the oth second attention graph and the oth second sub-graph, and further determining the oth second local feature corresponding to the oth second local feature graph;

S25：令o＝o+1；S25: let o=o+1;

S26：判断所述o是否达到预设分支数目，所述预设分支数目为不小于2的整数；若否，进入S27；若是，进入S29；S26: Determine whether o reaches a preset branch number, where the preset branch number is an integer not less than 2; if not, proceed to S27; if yes, proceed to S29;

S27：根据第o-1个第二注意力图及所述第二预设关系式确定新的第二像素点，以得到新的第二像素点对应的新的抑制图；S27: determining a new second pixel point according to the o-1th second attention map and the second preset relationship to obtain a new suppression map corresponding to the new second pixel point;

S28：根据所述新的抑制图及第o-1个第二子图生成第o个第二子图；并返回S24；S28: Generate the oth second sub-graph according to the new suppression graph and the o-1th second sub-graph; and return to S24;

S29：将所述全局特征、所述第一局部特征与得到的共o-1个第二局部特征进行拼接，以得到所述目标云图对应的融合特征。S29: splicing the global feature, the first local feature and the obtained o-1 second local features to obtain a fusion feature corresponding to the target cloud image.

2.如权利要求1所述的可解释性深度网络的卫星云图多标签哈希检索方法，其特征在于，所述可解释性深度网络中的哈希学习模块根据所述融合特征生成所述目标哈希码，包括：2. The satellite cloud image multi-label hash retrieval method of the interpretable deep network according to claim 1, characterized in that the hash learning module in the interpretable deep network generates the target hash code according to the fusion feature, including:

将所述融合特征作为输入项输入至预先训练好的哈希码学习模块以得到所述融合特征对应的类哈希码；Inputting the fused feature as an input item into a pre-trained hash code learning module to obtain a class hash code corresponding to the fused feature;

根据所述类哈希码及第一预设关系式确定所述目标云图的目标哈希码；Determine a target hash code of the target cloud image according to the class hash code and a first preset relationship;

所述第一预设关系式为：The first preset relationship is:

其中，d为所述类哈希码，b为所述目标哈希码。Wherein, d is the class hash code, and b is the target hash code.

3.如权利要求1所述的可解释性深度网络的卫星云图多标签哈希检索方法，其特征在于，根据所述深层特征图及预设全局特征提取策略，确定表征所述目标云图的全局语义信息的全局特征，包括：3. The satellite cloud image multi-label hash retrieval method of the interpretable deep network according to claim 1 is characterized in that, according to the deep feature map and the preset global feature extraction strategy, the global features representing the global semantic information of the target cloud image are determined, including:

对所述深层特征图进行平均池化操作；Performing an average pooling operation on the deep feature map;

将所述平均池化操作后的结果输入至全连接层进行特征降维，以得到表征所述目标云图的全局语义信息的全局特征。The result after the average pooling operation is input into the fully connected layer for feature dimensionality reduction to obtain global features that characterize the global semantic information of the target cloud image.

4.如权利要求1所述的可解释性深度网络的卫星云图多标签哈希检索方法，其特征在于，根据预设注意力策略对所述第一子图进行处理以得到表征最具区别性区域的挖掘结果的第一注意力图，包括：4. The satellite cloud image multi-label hash retrieval method of the interpretable deep network according to claim 1, characterized in that the first sub-graph is processed according to a preset attention strategy to obtain a first attention graph representing the mining results of the most distinctive area, comprising:

对所述第一子图进行激活以确定其对应的激活特征图；activating the first sub-image to determine its corresponding activation feature image;

对所述激活特征图依次执行卷积操作及聚合归一化操作，并利用Sigmoid激活函数对执行结果进行处理以得到第一注意力图。The convolution operation and the aggregation normalization operation are sequentially performed on the activated feature map, and the execution result is processed using the Sigmoid activation function to obtain the first attention map.

5.如权利要求1至4任一项所述的可解释性深度网络的卫星云图多标签哈希检索方法，其特征在于，所述可解释性深度网络基于云图训练集预先训练，所述云图训练集包括N个训练云图，各所述训练云图对应着至少一个真实图像分类标签，所有的训练云图对应的真实图像分类标签的总类别数为C个，N和C均为不小于1的整数；所述可解释性深度网络的训练步骤，包括：5. The satellite cloud image multi-label hash retrieval method of the interpretable deep network according to any one of claims 1 to 4, characterized in that the interpretable deep network is pre-trained based on a cloud image training set, the cloud image training set includes N training cloud images, each of the training cloud images corresponds to at least one real image classification label, and the total number of categories of real image classification labels corresponding to all training cloud images is C, and N and C are both integers not less than 1; the training steps of the interpretable deep network include:

将所述训练云图输入至待训练的可解释性深度网络，以得到类哈希码特征；Inputting the training cloud graph into an interpretable deep network to be trained to obtain a hash code-like feature;

将所述类哈希码特征输入分类层以得到对应的图像分类标签预测值；Inputting the class hash code feature into the classification layer to obtain the corresponding image classification label prediction value;

利用所述类哈希码特征与所述图像分类标签预测值及预设损失函数训练所述可解释性深度网络；Training the interpretable deep network using the class hash code features, the image classification label prediction value and a preset loss function;

所述预设损失函数为：The preset loss function is:

L_total＝λL_cls+ηL_q+νL_bL_total =λL_cls +ηL_q +νL_b

L_total为训练误差，L_cls为多标签分类损失，λ为与所述L_cls对应的第一惩罚参数，L_q为量化损失，η为与所述L_q对应的第二惩罚参数，L_b为位平衡损失，ν为与所述L_b对应的第三惩罚参数；L_total is the training error, L_cls is the multi-label classification loss, λ is the first penalty parameter corresponding to the L_cls , L_q is the quantization loss, η is the second penalty parameter corresponding to the L_q , L_b is the bit balance loss, and ν is the third penalty parameter corresponding to the L_b ;

其中，所述L_cls基于第三预设关系式确定，所述第三预设关系式为：Wherein, the L_cls is determined based on a third preset relationship, and the third preset relationship is:

为第n个训练云图对应的图像分类标签预测值中第i个预测值，ω_i为权重值，为第n个训练云图对应的真实图像分类标签中第i个真实值，σ为Sigmoid激活函数； is the i-th predicted value in the image classification label prediction value corresponding to the n-th training cloud map, ω_i is the weight value, is the i-th true value in the real image classification label corresponding to the n-th training cloud image, and σ is the Sigmoid activation function;

所述L_q基于第四预设关系式确定，所述第四预设关系式为：The L_q is determined based on a fourth preset relationship, which is:

K为预设哈希码长度，d_n为第n个训练云图对应的类哈希码特征，e为数值为1的K维向量；K is the preset hash code length, d_n is the class hash code feature corresponding to the nth training cloud image, and e is a K-dimensional vector with a value of 1;

所述L_b基于第五预设关系式确定，所述第五预设关系式为：The L_b is determined based on a fifth preset relationship, which is:

mean(d_n)表示所述d_n中值的平均值。mean(d_n ) represents the mean of the values in d_n .