Disclosure of Invention
In order to solve the technical problems, the invention provides a hyperspectral image classification method based on multi-feature fusion. In the method, the accuracy of hyperspectral image classification can be improved.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a hyperspectral image classification method based on multi-feature fusion comprises the following steps:
acquiring a hyperspectral image and preprocessing the hyperspectral image to obtain a data set;
The method comprises the steps of constructing an initial classification model, a fusion module, a classifier and a classification module, wherein the initial classification model comprises three parallel optimized ResNet networks, an optimized 3D-CNN network and an optimized LSTM network, and is used for respectively extracting a feature matrix H1, a feature matrix H2 and a feature matrix H3, and carrying out feature fusion on the feature matrix H1, the feature matrix H2 and the feature matrix H3 to obtain a fused feature matrix H0;
Inputting the data set into an initial classification model for training, calculating a loss function, updating model parameters by an Adam optimizer, and obtaining a classification model when the loss function continuously descends until convergence;
and inputting the hyperspectral image to be identified into a classification model to obtain a classification result of the hyperspectral image to be identified.
Preferably, the pretreatment comprises:
performing dimension reduction treatment on the hyperspectral image by adopting PCA technology;
creating batch data of the data subjected to the dimension reduction processing by using a window, obtaining a data set, and dividing the data set into a training set and a testing set according to a preset proportion;
And storing the preprocessed dataset, and converting the data into a corresponding format and shape according to the input requirements of the optimized ResNet network, the optimized 3D-CNN network and the optimized LSTM network.
Preferably, the method further comprises the following steps:
Oversampling is carried out on a few kinds of label samples in a data set to obtain a synthesized sample, a corresponding synthesized label set is generated for the synthesized sample, and the data set comprises a plurality of samples and a classification label set corresponding to the samples;
And randomly selecting a data enhancement mode to carry out data enhancement on the data set, wherein the data enhancement mode comprises vertical overturn, horizontal overturn or rotation.
Preferably, the optimized ResNet network comprises a 2D convolution layer, a maximum pooling layer, a first residual block, a second residual block, a transform layer, a third residual block, a fourth residual block, a transform layer and a global average pooling layer which are sequentially connected, wherein the first residual block, the second residual block, the third residual block and the fourth residual block comprise two 2D convolution layers, a global average pooling layer and two FC layers which are sequentially connected, and the transform layer consists of a multi-head attention mechanism, two Dropout layers, a full connection layer and residual connection.
Preferably, the optimized 3D-CNN includes a 3D convolution layer, a max pooling layer, an FC layer, an attention layer, and a flame layer, which are sequentially connected.
Preferably, the optimized LSTM includes an LSTM layer, an FC layer, and a bidirectional additive attention layer added between the LSTM layer and the FC layer.
Preferably, the processing procedure of the bidirectional additive attention layer comprises the following steps:
Taking the output of the LSTM layer as the input of an attention mechanism, carrying out self-attention through a multiple layer, carrying out full-connection operation through a Dense layer to obtain an attention weight vector, and compressing the attention weight vector into a one-dimensional vector by using a Flatten layer;
normalizing the compressed attention weight vector by using a Softmax layer, and then sending the normalized attention weight vector into a Reshape layer to obtain a result tensor;
And multiplying the result tensor by the output of the LSTM layer by taking the element as a unit, and summing the weighted results of the Lambda layer along the length direction of the time sequence to obtain a final weighted sum output result.
Preferably, the optimized ResNet network, the optimized 3D-CNN network and the optimized LSTM network all use the cross-class entropy of the loss function.
Preferably, the classifier is a single FC layer, a support vector machine, a random forest or a decision tree.
Preferably, the accuracy is used as an evaluation index of the model in the training process.
Based on the technical scheme, the invention has the beneficial effects that:
1) In the data preprocessing part, the invention firstly uses a weak category replication method and data enhancement processing data to solve the problem of influencing classification performance due to weak category and insufficient data volume, reduces data dimension by using PCA, removes characteristic redundant information, shortens calculation time and converts the data into a plurality of image blocks so as to train a neural network model, and simultaneously reduces the data dimension by using PCA;
2) According to the hyperspectral image classification method, three kinds of neural networks with fewer layers are used for extracting different characteristics of the hyperspectral image, different attention mechanisms and a Transformer layer are added to the neural networks to ensure the performance of the neural networks and optimize characteristic representation, the three kinds of characteristics extracted by the neural networks are fused and input into a new classifier FC, a confusion matrix, a classification report and the classified image are obtained, and the hyperspectral image classification accuracy can be effectively improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, the present embodiment provides a hyperspectral image classification method based on multi-feature fusion, which includes the following steps:
step 1, obtaining a hyperspectral image and preprocessing the hyperspectral image to obtain a data set;
Step 2, an initial classification model is constructed, wherein the initial classification model comprises three parallel optimized ResNet networks, an optimized 3D-CNN network and an optimized LSTM network, and is used for respectively extracting a feature matrix H1, a feature matrix H2 and a feature matrix H3, a fusion module is used for carrying out feature fusion on the feature matrix H1, the feature matrix H2 and the feature matrix H3 to obtain a fused feature matrix H0, and a classifier is used for taking the fused feature matrix H0 as input to obtain a classification result;
Step 3, inputting the data set into an initial classification model for training, calculating a loss function, updating model parameters by an Adam optimizer, and obtaining a classification model when the loss function continuously descends until convergence;
and 4, inputting the hyperspectral image to be identified into a classification model to obtain a classification result of the hyperspectral image to be identified.
In one embodiment, the hyperspectral image classification method based on multi-feature fusion further provides a specific preprocessing process, which comprises the following steps:
performing dimension reduction treatment on the hyperspectral image by adopting PCA technology;
creating batch data of the data subjected to the dimension reduction processing by using a window, obtaining a data set, and dividing the data set into a training set and a testing set according to a preset proportion;
Oversampling is carried out on a few kinds of label samples in a data set to obtain a synthesized sample, a corresponding synthesized label set is generated for the synthesized sample, and the data set comprises a plurality of samples and a classification label set corresponding to the samples;
randomly selecting a data enhancement mode to carry out data enhancement on the data set, wherein the data enhancement mode comprises vertical overturn, horizontal overturn or rotation;
And storing the preprocessed dataset, and converting the data into a corresponding format and shape according to the input requirements of the optimized ResNet network, the optimized 3D-CNN network and the optimized LSTM network.
In this embodiment, the data set is formatted to fit the optimized 3D-CNN network in the shape of (batch_size, depth, height, width, channels). Where batch_size represents the batch size, depth, height, and width represent the image depth, height, and width, respectively, and channels represents the number of input bands, i.e., the number of channels.
The data set is formatted to fit the optimized LSTM network in the form of (batch_size, time_steps, channels), where batch_size is the batch size, time_steps is the length of the time sequence, i.e., the number of pixels in the hyperspectral image, and num_steps represents the number of bands per pixel.
In one embodiment, the hyperspectral image classification method based on multi-feature fusion further provides a specific process of extracting features from an optimized ResNet network, an optimized 3D-CNN network and an optimized LSTM network, which comprises the following steps:
1) Features are extracted by using an optimized ResNet network (ResNet), spectral features are mainly extracted, and texture features, semantic features and the like are extracted.
Specifically, the composition of the residual block in ResNet is modified, as shown in FIG. 3, and the modified ResNet is added to the compression and excitation module in the attention mechanism-SE module (Squeeze-and-Excitation Module) and the transducer block, as shown in FIG. 2. The SE module consists of one global average pooling layer (GAP), two fully connected layers (FC), and the transducer block consists of a multi-head attention mechanism, dropout, and fully connected layers using ReLU as an activation function.
The GAP layer is used for counting global information of each channel in the feature map and generating a channel weight vector. The weight vector will be used to re-weight the feature map to enhance the useful features and suppress the extraneous features as follows:
Where U is a feature vector, X is an input feature map, H and W represent the height and width of the input tensor, respectively, and i, j are indices of H and W, respectively.
U is mapped to the feature vector Z through the first full connection layer (called the Squeeze operation), and the calculation formula is as follows:
Z=ReLU(W1·U+b1)
wherein W1 is the weight matrix of the first fully connected layer, and b1 is the bias vector of the first fully connected layer.
The second full connection layer (called the specification operation) maps Z to a feature vector S, and compresses the feature vector S to a range of (0, 1) by performing a Sigmoid activation function operation to obtain a channel weight vector Ws, where the calculation formula is as follows:
S=W2·Z+b2
Wherein W2 is the weight matrix of the second fully connected layer, and b2 is the bias vector of the second fully connected layer.
Ws=Sigmoid(S)
The final SE module generates an output by weighted summing each channel of the input feature tensorThe calculation formula is as follows:
Where Xj represents the jth channel of the input feature tensor X, Wsj is the jth element of the channel weight vector Ws.
And obtaining a characteristic matrix H1 after the last global average pooling layer.
2) And extracting features by using the optimized 3D-CNN network added with the global context attention mechanism, mainly extracting spatial features, extracting time features and the like.
Specifically, a global context attention mechanism is added on the basis of 3D-CNN, and the network structure is shown in figure 4. Including 3D convolutional layer (CONV 3D), max pooling layer, full connectivity layer (FC), flame layer, and Attention layer (Attention), the activation functions all use ReLU.
The global contextual attention mechanism calculates the attention weight based on the input x and a weight matrix of the Dense layer and performs a Softmax normalization operation on it. The inputs are then weighted together according to the attention weights to yield a weighted feature vector (weighted_input) that can continue to pass into the layer of flame to yield the feature matrix H2.
Initializing weight W, W epsilon Rdmodel×ch, wherein W represents a weight matrix of the layer,dmodel represents the dimension of an input word vector, and ch represents the feature dimension of the output of the layer.
The formula for calculating the attention weight vector a is as follows:
score=xW∈Rbatch_size×sequence_length×ch
a= softmax (score) ∈rbatch_size×sequence_length×ch where x represents the input vector, score represents the attention score matrix, and a represents the attention weight vector.
The weighted sum formula is as follows:
weighted_input=ax
where weighted_input represents the weighted result and output represents the final weighted sum output.
And finally, obtaining a characteristic matrix H2 after the output passes through the flat layer.
3) The optimized LSTM network with the added bidirectional additive attention mechanism is used for extracting features, and mainly extracting other features with more ambiguity.
Specifically, a bidirectional additive attention mechanism is added on the basis of LSTM, and the network structure is shown in FIG. 5, and comprises an LSTM layer, a full connection layer (FC) and a bidirectional additive attention layer. A schematic of the calculation of the bi-directional additive attention layer is shown in fig. 6.
The bi-directional additive Attention layer first takes the output LSTM _out of the LSTM layer as input, and performs self-Attention computation through the multiple layer to obtain tensor attention_mul in the shape of [ batch_size, sequence_length, units ]. Next, a full join operation is performed by the Dense layer, resulting in an attention weight vector and compressing it into one dimension using the Flatten layer. The attention weight vector is then normalized using the Softmax layer, resulting in a tensor that is still [ batch_size ] sequence_length, x_train. Shape [ 1]. Then, it is converted into a tensor attention_out in the shape of [ batch_size, sequence_length,1] by Reshape layers. Finally, element multiplication is performed on the attention_out and lstm _out, and the element multiplication is performed on the attention_out and lstm _out along the length direction of the time sequence by using a Lambda layer, so that a weighted sum output Attention _out with the shape of [ batch_size, units ] is obtained. And inputting the output to the FC layer to obtain the characteristic matrix H3.
The self-attention calculation formula is as follows:
Attention_muli,j,k=lstm_outi,j,k*lstm_outi,j,k
where i denotes the index of the sample, j denotes the position on the time series, and k denotes the feature dimension size of the LSTM layer output.
The attention weight calculation formula is as follows:
Attention_weightsi,j=Softmax(Attention_muli,j,:)
the Softmax function is used for normalizing the attention vector, and the obtained result is a probability distribution used for representing importance weights of all the positions.
The weighted sum output calculation formula is as follows:
Where seq_len is the length of the time series, and the sigma function is used to sum the weighted results at all positions to obtain the final weighted sum output result.
And the output result is transmitted into the FC layer to obtain the characteristic matrix H3.
In one embodiment, the hyperspectral image classification method based on multi-feature fusion further provides a specific process of feature fusion of the feature matrix H1, the feature matrix H2 and the feature matrix H3 to obtain a fused feature matrix H0, and the specific process comprises the following steps:
The shapes of the feature matrix H1, the feature matrix H2 and the feature matrix H3 are [ None, Z ], and Z is the number of neurons of the last FC layer or the global average pooling layer of each network. The fusion feature matrix is a new feature matrix H0, and a single-layer FC is used as a classifier to obtain a classification result because the FC classifier is simple and has an interpretability. The classifier may also use a Support Vector Machine (SVM), random Forest (RF), decision Tree (DT), etc.
In this embodiment, the neural network's loss functions all use class cross entropy (categorical _ crossentropy), categorical _ crossentropy can be expressed as:
Categorical Cross-Entropy Loss=-∑(yi*log(pi))
Wherein Categorical Cross-Entropy Loss are the results of the classification cross entropy loss function, yi represents the one-hot encoding vector of the real label, pi represents the probability of the ith class of the sample, and pi represents the probability that the model predicts that the sample belongs to the ith class, which is obtained through the Softmax function.
The Adam optimizer is used to minimize the loss function when training the network, and the specific optimization flow of Adam is as follows:
1. Initialization parameters of learning rate (LEARNING RATE) alpha=0.0002, momentum parameter (momentum parameter) beta 1=0.5, second moment estimation parameter (INITIAL GRADIENT integration) beta 2=0, decimal epsilon=1e-8 for preventing denominator to zero, momentum (momentum) m=0, second moment estimation (second movement) v=0, iteration Count (Iteration Count) t=0
2. Training iteration, updating parameter theta
Calculating a gradient g.g=dJ (theta)/J (theta), J (theta) being the loss function and dJ (theta)/J (theta) being the gradient of the loss function with respect to the parameter theta;
update momentum m=be1m+ (1-beta 1) g;
Updating a second moment estimate v v=beta 2 v+ (1-beta 2) g2;
Correcting the momentum bias:
Correcting the second moment deviation:
Updating a parameter theta:
in this example, to more specifically illustrate the process of implementation of the classification method of the present invention and the superiority of the results of the present invention, the method was performed on an indian pine dataset, the indian pine image pixels were 145×145, and total included 220 bands, and the ground object categories were 16. And meanwhile, other single neural network classification methods are adopted to carry out classification experiments, and the results are compared with the method. Fig. 7 is an indian pine ground truth label diagram.
When the method of the invention is used, k in PCA is set to 25 first, and the data is reduced to 25 dimensions. Next, 19 x 19 batch data is created using windows, and then the oversampling method enhances the weak data, balancing the number of class samples in the dataset. Finally, one of vertical flip, horizontal flip or rotation is selected for data enhancement and data is saved, and the final X_train, X_test, y_train and y_test data shapes are respectively [27697,19,19,25], [27697], [3075,19,19,25], [3075]. Table 1 shows the data of the training set and the testing set for the final use, 30772 total samples, 27697 training sets, 3075 testing sets, and each specific label and its corresponding feature class data as shown in table 1.
TABLE 1
| Label (Label) | Ground object category | Training set | Test set |
| 1 | Alfalfa | 1728 | 14 |
| 2 | Corn-notill | 2000 | 428 |
| 3 | Corn-mintill | 1743 | 249 |
| 4 | Corn | 1660 | 71 |
| 5 | Grass-pasture | 1690 | 145 |
| 6 | Grass-trees | 1533 | 219 |
| 7 | Grass-pasture-mowed | 1720 | 8 |
| 8 | Hay-windrowed | 1675 | 143 |
| 9 | Oats | 1722 | 6 |
| 10 | Soybean-notill | 2040 | 292 |
| 11 | Soybean-mintill | 1718 | 737 |
| 12 | Soybean-clean | 1660 | 178 |
| 13 | Wheat | 1728 | 61 |
| 14 | Woods | 1770 | 380 |
| 15 | Buildings-Grass-Trees-Drives | 1620 | 116 |
| 16 | Stone-Steel-Towers | 1690 | 28 |
| Totals to | | 27697 | 3075 |
The data shape meets the optimized ResNet network input requirement, and the shape is not required to be remodeled. ResNet the first convolutional layer uses 64 convolutional kernels of size 3 x 3, with a step size of 2. The maximum pooling layer uses a 3 x 3 pooling window with a step size of 2. Among the residual blocks, the convolution layers are all convolution kernels of 3×3 in size, and the step size is 1, wherein the first two residual blocks use 64 convolution kernels, and the second two residual blocks use 128 convolution kernels. The first transducer layer uses a head_num of 8, the middle dimension of 64, the second transducer layer uses a head_num of 8, the middle dimension of 128, and the final pooling layer outputs the feature matrix H1 in the shape of [ None,128]. The total number of parameters is 881,328.
In order to meet the input requirement of the optimized 3D-CNN network, the format of input data is converted, and a dimension of 1 is added, namely, the input shape is [ None,19,19,25,1]. The first convolution layer in 3D-CNN uses 64 convolution kernels of size 3 x3, with a step size of 1. The second convolution layer uses 128 convolution kernels of size 3 x3, with a step size of 1. The first and second pooling layers each use a2 x2 pooling window with a step size of 2. The final output feature matrix H2 through the flat layer has the shape of [ None,4480]. The total number of parameters is 311,312.
To meet the input requirements of the optimized LSTM network, the format of the input data is converted, wherein the height and width of the samples are combined, i.e. the input is [ None,19,475]. LSTM uses 64 neurons, FC layer activation functions in the attention mechanism uses Softmax, the last FC layer neuron is 64. The shape of the output feature matrix H3 is [ None,64]. The total number of parameters is 166,563.
Three FC classifiers are defined respectively, the feature matrix H1, the feature matrix H2 and the feature matrix H3 are input into the FC classifiers, the results of the three classifiers are connected together to form a vector with the size of 48, and 16 classification is carried out through one FC layer.
The feature extraction network and classifier both use an adaptive optimizer Adam to minimize the loss function, which is set to a class cross entropy function (categorical _ crossentropy), a learning rate alpha of 0.0002, a batch_size of 128, and an epoch of 50.
The result of image classification is evaluated using the accuracy, and the calculation formula is as follows:
where TP is the number of positive classes predicted as positive classes, TN is the number of negative classes predicted as negative classes, FP is the number of negative classes predicted as positive classes, and FN is the number of positive classes predicted as negative classes.
Experimental conditions were intel i7-12700kf,3.6ghz central processor, NVIDIA GeForce RTX 3070 graphics processor, memory 16G.
Experimental results in order to more accurately illustrate the superiority of the method of the present invention, different neural network classification methods and their accuracy rates are shown in table 2. The post-classification confusion matrix thermodynamic diagram is shown in fig. 8-12.
TABLE 2
| Evaluation index/method | VGG16 | ResNet16 | LSTM | 3D-CNN | Method of the present embodiment |
| Accuracy rate of | 99.54 | 99.84 | 99.75 | 99.73 | 99.86 |
The results in table 2 demonstrate that the present example method can effectively improve the accuracy of hyperspectral image classification.
Referring to fig. 8 to 12, fig. 8 to 12 are different classification method classification result confusion matrix thermodynamic diagrams and corresponding classified images, wherein fig. 8 is a confusion matrix thermodynamic diagram and corresponding classified images after classification by adopting a VGG16 network, fig. 9 is a confusion matrix thermodynamic diagram and corresponding classified images after classification by adopting a ResNet network, fig. 10 is a confusion matrix thermodynamic diagram and corresponding classified images after classification by adopting an LSTM network, fig. 11 is a confusion matrix thermodynamic diagram and corresponding classified images after classification by adopting a 3D-CNN network, and fig. 12 is a confusion matrix thermodynamic diagram and corresponding classified images after classification by adopting the method of the embodiment, and the classification result shows that the method of the embodiment has the least wrong classification number and can improve the classification precision.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the sub-steps or stages of other steps or other steps.
The foregoing is merely a preferred implementation of the multi-feature fusion-based hyperspectral image classification method disclosed in the present application, and is not intended to limit the embodiments of the present application, and various modifications and variations of the embodiments of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the protection scope of the embodiments of the present application.