Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a multispectral image classification method based on a depth fusion residual error network.
In order to achieve the purpose, the method comprises the following specific steps:
(1) inputting a multispectral image:
inputting multispectral images of five ground object targets, wherein each ground object target comprises two multispectral images, the first multispectral image comprises 4 time phases, each time phase comprises 10 wave band images, and the second multispectral image comprises 9 wave band images;
(2) removing the ground object target normalization processing of the image of each wave band of each multispectral image;
(3) obtaining a multispectral image matrix:
(3a) stacking the normalized images of all wave bands in the first multispectral image to obtain an image with the size of W1i×H1i×C1The five first multispectral image matrices of (1), wherein W1iRepresenting the width, H, of each band image in the first multispectral image1iRepresenting the height, C, of each band image in the first multispectral image1Number of bands, C, representing a first multispectral image1I is 10, i is the serial number of the multispectral image of the ground object target, i is 1,2,3,4, 5;
(3b) stacking the normalized images of each wave band in the second multispectral image to obtain an image with the size of W2i×H2i×C2The five second multispectral image matrices of (1), wherein W2iRepresenting the width, H, of each band image in the second multi-spectral image2iRepresenting the height, C, of each band image in the second multispectral image2Number of bands, C, representing a second multispectral image29, i represents the serial number of the multispectral image of the ground object, and i is 1,2,3,4, 5;
(4) acquiring a data set:
(4a) performing sliding window block fetching operation on the first multispectral image matrix of each ground object target of the first four ground object targets to obtain a training data set D1;
(4b) Performing sliding window block fetching operation on the second multispectral image matrix of each ground object target of the first four ground object targets to obtain a training data set D2;
(4c) Performing sliding window block fetching operation on the first multispectral image matrix of the fifth ground object target, and forming a test data set T by all image blocks1;
(4d) Performing sliding window block fetching operation on a second multispectral image matrix of a fifth ground object target, and forming all image blocks into a test data set T2;
(5) Building a depth fusion residual error network:
(5a) constructing a 31-layer depth residual error net;
(5b) constructing a feature fusion layer of a depth fusion residual error network;
(5d) connecting a multi-classification Softmax layer behind the characteristic fusion layer to obtain a deep fusion residual error network;
(6) training a deep fusion residual error net:
(6a) will train data set D1Inputting the data into a depth residual error net for supervised training;
(6b) will train data set D2Inputting the data into a depth residual error net for supervised training;
(6c) fusing the feature vectors in the network obtained by the two times of training to obtain a trained deep fusion residual error network;
(7) classifying the test data set:
(7a) test data set T1Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C1;
(7b) Test data set T2Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C2;
(7c) Feature vector C1And the feature vector C2And (4) fusion, namely inputting the fusion into a multi-classification Softmax layer in the deep fusion residual error network to obtain a final classification result, and calculating the classification accuracy.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention builds the depth fusion residual error network, and utilizes the depth residual error network in the model to extract the characteristics of the multispectral image, the invention is a self-learning characteristic extraction method, and can completely extract the characteristics of the multispectral image, the characteristic extraction method has no pertinence, can be used for extracting the characteristics of various multispectral images, overcomes the defects of complexity and time consumption of the design of manually selecting various weak classifiers and integration methods in the prior art, and has the advantage of universality.
Secondly, the invention respectively carries out supervised training on different networks in the fusion residual error network to learn the characteristic information of the images shot by different satellites through training the deep fusion residual error network, and then carries out characteristic vector fusion, so the characteristic learning step is simple, the defects of the prior art that the calculation process is complicated, and the semi-supervised training mode causes the similar result to have the similar and different spectrum and the similar phenomenon of the foreign object are overcome, and the invention can extract various high-level characteristic information with multi-direction, multi-spectrum and multi-time phase.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The steps for the implementation of the present invention are described in detail below with reference to fig. 1.
Step 1, inputting a multispectral image.
The multispectral images of five ground object targets are input, each ground object target comprises two multispectral images, the first multispectral image comprises 4 time phases, each time phase comprises 10 wave band images, and the second multispectral image comprises 9 wave band images.
And 2, performing surface feature target removal normalization processing on the image of each wave band of each multispectral image.
Dividing each pixel value in the image of each wave band in the first multispectral image of the five ground object targets by the maximum pixel value of the image of the wave band in each time phase of the five ground object targets to obtain the normalized pixel values of the wave band image, thereby obtaining the images of 10 wave band images in the first multispectral image after being respectively normalized.
And dividing each pixel value of the image of each wave band in the second multispectral image of the five ground object targets by the maximum value of the pixel of the image of the wave band of the five ground object targets to obtain the normalized pixel value of the image of the wave band, thereby obtaining the normalized image of 9 wave band images in the second multispectral image.
And 3, acquiring a multispectral image matrix.
Stacking the normalized images of all wave bands in the first multispectral image to obtain an image with the size of W1i×H1i×C1The five first multispectral image matrices of (1), wherein W1iRepresenting the width, H, of each band image in the first multispectral image1iRepresenting the height, C, of each band image in the first multispectral image1Number of bands, C, representing a first multispectral image1I is 10, i is the serial number of the multispectral image of the ground object target, i is 1,2,3,4, 5;
stacking the normalized images of each wave band in the second multispectral image to obtain an image with the size of W2i×H2i×C2The five second multispectral image matrices of (1), wherein W2iRepresenting the width, H, of each band image in the second multi-spectral image2iRepresenting the height, C, of each band image in the second multispectral image2Number of bands, C, representing a second multispectral image29, i represents the serial number of the multispectral image of the ground object, and i is 1,2,3,4, 5;
and 4, acquiring a data set.
Selecting pixels with similar targets from a first multispectral image matrix of the first four ground object targets, dividing the pixels with similar targets of the multispectral image matrix into image pixel blocks of 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D1。
Selecting pixels with similar marks from a second multispectral image matrix of the first four ground object targets, dividing the pixels with the similar marks of the multispectral image matrix into 9-channel image pixel blocks by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D2。
Selecting pixels with similar marks from the first multispectral image of the fifth ground object target, and dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks of 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels as an image pixel block to be measuredTest data set T1。
Selecting pixels with similar marks from a second multispectral image matrix of a fifth ground object target, and dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks of 9 channels by using a sliding window with the size of 24 multiplied by 24 pixels as a test data set T2。
And 5, building a depth fusion residual error network.
And constructing 31 layers of depth residual error nets.
The first layer is an input layer, which inputs a three-dimensional vector of 24 × 24 × 10 in size, and sets the number of feature maps to 3.
The second layer is a convolution layer, which is a convolution characterization layer obtained by projecting the vector of the input layer, and the number of feature maps is set to be 64.
The third to eleventh layers are the first residual block 9 layers, and the number of feature maps is set to 64.
The twelfth to fourteenth layers are the second residual block 3 layer, the number of feature maps is set to 128, and quick connection is performed.
The fifteenth to twentieth layers are the third residual block 6 layer, and the number of feature maps is set to 128.
Twenty-first to twenty-third layers are the fourth residual block 3 layer, and the number of feature maps is set to 256.
The twenty-fourth to twenty-ninth layers are the fifth residual block 6 layer, and the number of feature maps is set to 256.
The thirtieth layer is a normalization layer and is set in a batch normalization mode.
And a thirty-first pooling layer, wherein the number of feature maps is set to be 256.
Inputting a first multispectral image matrix of the multispectral images of the five ground object targets into a depth residual error network to extract a first characteristic diagram, and vectorizing the first characteristic diagram to obtain a first characteristic vector.
Inputting a second multispectral image matrix of the multispectral images of the five ground object targets into the depth residual error network to extract a second characteristic map, and vectorizing the second characteristic map to obtain a second characteristic vector.
And fusing the two feature vectors to form a feature fusion layer of the depth fusion residual error network.
And connecting a multi-classification Softmax layer behind the feature fusion layer to obtain a deep fusion residual error network.
And 6, training a deep fusion residual error net.
Will train data set D1And inputting the depth residual error net for supervised training.
Will train data set D2And inputting the depth residual error net for supervised training.
And fusing the feature vectors in the network obtained by the two times of training to obtain the trained deep fusion residual error network.
The method for obtaining the trained deep fusion residual error net by fusing the feature vectors in the network obtained by the two times of training comprises the following steps:
step 1, training data set D1Inputting the data into a trained first channel depth residual error net, and performing comparison on a training data set D1Extracting the features to obtain the features S1。
Step 2, training data set D2Inputting the data into a trained second channel depth residual error net, and performing comparison on a training data set D2Extracting the features to obtain the features S2。
Step 3, the characteristics S1And characteristic S2And inputting the fused features into a multi-classification Softmax layer, and carrying out supervised training to obtain a trained deep fusion residual error net.
And 7, classifying the test data set.
Test data set T1Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C1。
Test data set T2Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C2。
Feature vector C1And the feature vector C2And (4) fusion, namely inputting the fusion into a multi-classification Softmax layer in the deep fusion residual error network to obtain a final classification result, and calculating the classification accuracy.
The effect of the invention can be further illustrated by the following simulation experiment:
1. simulation conditions are as follows:
the simulation of the invention is carried out under Hewlett packard Z840, hardware environment of internal memory 8GB and software environment of TensorFlow.
2. Simulation content:
the simulation experiment of the invention is that multispectral image data of Berlin, hong Kong _ kong, Sa Bao _ paulo5 and Rorome in four areas of a satellite sentinel _2 and a satellite landsat _8 which are respectively shot and imaged are used as a training data set to train a depth fusion residual error net, and multispectral image data of Paris area is used as a test data set to classify 17 types of ground objects.
FIG. 2 is a diagram of the real ground object labeling in Paris area, the ground object category includes dense high-rise building, dense middle-rise building, dense low-rise building, open high-rise building, open middle-rise building, open low-rise building, large low-rise building, sparsely distributed building, heavy industrial area, dense forest, scattered trees, shrubs and short trees, low vegetation, bare rock, bare soil and sand, water.
The simulation experiment 1 of the invention is to use the method of the invention, firstly fusing the multispectral image of the Paris area shot and imaged by the satellite landsat _8 and the multispectral image of the Paris area shot and imaged by the satellite sentinel _2, then classifying the fused multispectral images, wherein the results are shown in figure 3, and the comparison results of the classification accuracy rates obtained by the three simulation methods are shown in table 1.
The simulation experiment 2 and the simulation experiment 3 of the invention use the depth residual error net classification method in the prior art to classify the multispectral image of the Paris area shot and imaged by the satellite landsat _8 and the multispectral image of the Paris area shot and imaged by the satellite sentinel _2 respectively.
3. And (3) simulation result analysis:
fig. 3 is a diagram of the result of classifying the multispectral image of the paris region using the method of the present invention. Comparing the classification result graph obtained by the method of the invention in fig. 3 with the real ground object mark graph in fig. 2, it can be seen that the classification result obtained by the method of the invention has higher accuracy compared with the prior art.
The results of the simulation experiment 2 and the simulation experiment 3 are shown in table 1, and as can be seen from table 1, the method inputs multispectral image data obtained by shooting two satellites into the depth fusion residual error network to extract features, and compared with a single-channel network which processes multispectral image data shot by a single satellite and inputs the multispectral image data into the single-channel network, the method has the advantages that the classification accuracy is improved, and the classification result is better compared with the prior art.
TABLE 1 Classification accuracy comparison Table obtained in simulation by Using the prior art
| Method of producing a composite material | Rate of accuracy |
| Method of the invention | 51.12% |
| Single channel depth residual net (landsat _8 data) | 44.82% |
| Single channel depth residual net (sentinel _2 data) | 45.63% |
As can be seen from table 1, for the multispectral data of two satellite data, the two data are respectively input into different channels to extract features, and compared with a single-channel network in the prior art, the classification accuracy is improved.
In conclusion, the depth fusion residual error network is introduced, high-level features such as multiple time phases, multiple spectral bands and multiple directions of the image are extracted by combining feature fusion, the feature characterization capability of the image is improved, the model learns richer multispectral image features, and better classification accuracy is obtained compared with the prior art.