CN107832797B

Movatterモバイル変換

Info

Publication number: CN107832797B
Application number: CN201711144061.5A
Authority: CN
Inventors: 焦李成; 屈嵘; 王美玲; 唐旭; 杨淑媛; 侯彪; 马文萍; 刘芳; 张丹; 马晶晶; 陈璞花; 古晶
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2020-04-07
Anticipated expiration: 2037-11-17
Also published as: CN107832797A

Abstract

Translated fromChinese

本发明公开了一种基于深度融合残差网的高光谱图像分类方法，克服了现有技术人为选择多种弱分类器和集成方法设计的复杂耗时、计算过程繁琐并且半监督训练方式导致的分类结果存在同物异谱及异物同谱现象的缺点。本发明实现的步骤是：(1)输入多光谱图像；(2)对每一幅多光谱图像的每个波段的图像去地物目标归一化处理；(3)获得多光谱图像矩阵；(4)获取数据集；(5)搭建深度融合残差网；(6)训练深度融合残差网；(7)对测试数据集进行分类。本发明具有完备学习多光谱图像特征，过程更加简洁明了，使得分类效果更精确的优点，可用于高光谱图像的分类。

The invention discloses a hyperspectral image classification method based on a deep fusion residual network, which overcomes the complex time-consuming, complicated calculation process and semi-supervised training methods of artificial selection of multiple weak classifiers and integrated methods in the prior art. The classification results have the shortcomings of the phenomenon of homogeneity and heterogeneity. The steps implemented by the present invention are: (1) inputting a multispectral image; (2) normalizing the image of each waveband of each multispectral image to remove the ground objects; (3) obtaining a multispectral image matrix; ( 4) Acquire a data set; (5) Build a deep fusion residual network; (6) Train a deep fusion residual network; (7) Classify the test data set. The invention has the advantages of complete learning of multi-spectral image features, the process is more concise and clear, and the classification effect is more accurate, and can be used for the classification of hyperspectral images.

Description

Multispectral image classification method based on depth fusion residual error network

Technical Field

The invention belongs to the technical field of image processing, and further relates to a depth fusion residual error network-based multispectral image classification method in the technical field of multispectral image classification. The method can be used for classifying the ground objects including water areas, fields, ground object targets and the like in the multispectral image.

Background

The multispectral image is one of remote sensing images, and is an image formed by reflection and transmission of electromagnetic waves of multiple wave bands by an object, and comprises reflection or transmission images of visible light, infrared rays, ultraviolet rays, millimeter waves, X rays and gamma rays. The classification of multispectral images is taken as the basic research of multispectral images, is always an important information acquisition means of multispectral images, and the main aim of the classification is to divide each pixel in the images into different categories according to the space geometric information and the spectral information of the ground object to be detected. There are many traditional classification methods for multispectral images, but most methods need to artificially design and extract feature information according to the characteristics of the images, such as support vector machines, decision trees, minimum distance classification, maximum likelihood classification, spectral angle classification, mixed distance classification, and the like. In recent years, the convolutional neural network in deep learning shows strong characteristic representation capability in the field of image processing, the uncertainty of characteristic extraction through artificial design is reduced, and the workload is reduced.

Li Star et al proposed a multispectral terrain classification method based on ensemble learning in its published paper "multispectral remote sensing image classification research based on ensemble learning" (northern national university school newspaper, 2016). The method adopts integrated learning to classify multispectral remote sensing images, adopts a mode of combining a multi-classification ECOC framework and a two-classification algorithm Logit AdaBoost algorithm to expand the two-classification algorithm into the multi-classification algorithm, applies an LBPV method to extraction of the multispectral remote sensing images, extracts LBP texture characteristics and VAR contrast characteristics of the images and describes space characteristics of the images. The method classifies the multispectral remote sensing image based on ensemble learning, although a better classification result can be obtained. However, the method still has the disadvantages that the method needs to select a plurality of weak classifiers for integrated learning, the selection of the weak classifiers and the design of the integrated method depend on human experience, and the training of the plurality of classifiers is complicated and time-consuming.

In a patent document applied by Henan university of Jane, "a semi-supervised hyperspectral remote sensing image classification method based on information entropy" (patent application No. 201410668553.4, publication No. 104376335A), a semi-supervised hyperspectral remote sensing image classification method based on information entropy is provided. Aiming at the characteristics of hyperspectral data, on the basis of selecting a training label with few marks, a multi-classification logistic algorithm is used for predicting a primary classification result, then renyi entropy is used for measuring the energy of an image, a pixel with the largest information content is selected and supplemented into a training sample, then prediction classification is carried out, and finally hyperspectral remote sensing image classification is achieved. Although a multi-classification logistic algorithm is used for predicting a primary classification result, the energy of the image is measured by renyi entropy of each group of data, and the pixel with the largest information content is selected and supplemented into a training sample to obtain a final classification result. The method has the disadvantages that the calculation process is complicated, and a semi-supervised clustering method is used, so that the phenomena of similarity and difference in spectra and similarity in foreign matters exist in a classification result, and the classification accuracy is influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a multispectral image classification method based on a depth fusion residual error network.

In order to achieve the purpose, the method comprises the following specific steps:

(1) inputting a multispectral image:

inputting multispectral images of five ground object targets, wherein each ground object target comprises two multispectral images, the first multispectral image comprises 4 time phases, each time phase comprises 10 wave band images, and the second multispectral image comprises 9 wave band images;

(2) removing the ground object target normalization processing of the image of each wave band of each multispectral image;

(3) obtaining a multispectral image matrix:

(3a) stacking the normalized images of all wave bands in the first multispectral image to obtain an image with the size of W₁ⁱ×H₁ⁱ×C₁The five first multispectral image matrices of (1), wherein W₁ⁱRepresenting the width, H, of each band image in the first multispectral image₁ⁱRepresenting the height, C, of each band image in the first multispectral image₁Number of bands, C, representing a first multispectral image₁I is 10, i is the serial number of the multispectral image of the ground object target, i is 1,2,3,4, 5;

(3b) stacking the normalized images of each wave band in the second multispectral image to obtain an image with the size of W₂ⁱ×H₂ⁱ×C₂The five second multispectral image matrices of (1), wherein W₂ⁱRepresenting the width, H, of each band image in the second multi-spectral image₂ⁱRepresenting the height, C, of each band image in the second multispectral image₂Number of bands, C, representing a second multispectral image₂9, i represents the serial number of the multispectral image of the ground object, and i is 1,2,3,4, 5;

(4) acquiring a data set:

(4a) performing sliding window block fetching operation on the first multispectral image matrix of each ground object target of the first four ground object targets to obtain a training data set D₁；

(4b) Performing sliding window block fetching operation on the second multispectral image matrix of each ground object target of the first four ground object targets to obtain a training data set D₂；

(4c) Performing sliding window block fetching operation on the first multispectral image matrix of the fifth ground object target, and forming a test data set T by all image blocks₁；

(4d) Performing sliding window block fetching operation on a second multispectral image matrix of a fifth ground object target, and forming all image blocks into a test data set T₂；

(5) Building a depth fusion residual error network:

(5a) constructing a 31-layer depth residual error net;

(5b) constructing a feature fusion layer of a depth fusion residual error network;

(5d) connecting a multi-classification Softmax layer behind the characteristic fusion layer to obtain a deep fusion residual error network;

(6) training a deep fusion residual error net:

(6a) will train data set D₁Inputting the data into a depth residual error net for supervised training;

(6b) will train data set D₂Inputting the data into a depth residual error net for supervised training;

(6c) fusing the feature vectors in the network obtained by the two times of training to obtain a trained deep fusion residual error network;

(7) classifying the test data set:

(7a) test data set T₁Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C₁；

(7b) Test data set T₂Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C₂；

(7c) Feature vector C₁And the feature vector C₂And (4) fusion, namely inputting the fusion into a multi-classification Softmax layer in the deep fusion residual error network to obtain a final classification result, and calculating the classification accuracy.

Compared with the prior art, the invention has the following advantages:

firstly, because the invention builds the depth fusion residual error network, and utilizes the depth residual error network in the model to extract the characteristics of the multispectral image, the invention is a self-learning characteristic extraction method, and can completely extract the characteristics of the multispectral image, the characteristic extraction method has no pertinence, can be used for extracting the characteristics of various multispectral images, overcomes the defects of complexity and time consumption of the design of manually selecting various weak classifiers and integration methods in the prior art, and has the advantage of universality.

Secondly, the invention respectively carries out supervised training on different networks in the fusion residual error network to learn the characteristic information of the images shot by different satellites through training the deep fusion residual error network, and then carries out characteristic vector fusion, so the characteristic learning step is simple, the defects of the prior art that the calculation process is complicated, and the semi-supervised training mode causes the similar result to have the similar and different spectrum and the similar phenomenon of the foreign object are overcome, and the invention can extract various high-level characteristic information with multi-direction, multi-spectrum and multi-time phase.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of manual labeling of images to be classified in the present invention;

fig. 3 is a diagram of the classification result of an image to be classified by using the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The steps for the implementation of the present invention are described in detail below with reference to fig. 1.

Step 1, inputting a multispectral image.

The multispectral images of five ground object targets are input, each ground object target comprises two multispectral images, the first multispectral image comprises 4 time phases, each time phase comprises 10 wave band images, and the second multispectral image comprises 9 wave band images.

And 2, performing surface feature target removal normalization processing on the image of each wave band of each multispectral image.

Dividing each pixel value in the image of each wave band in the first multispectral image of the five ground object targets by the maximum pixel value of the image of the wave band in each time phase of the five ground object targets to obtain the normalized pixel values of the wave band image, thereby obtaining the images of 10 wave band images in the first multispectral image after being respectively normalized.

And dividing each pixel value of the image of each wave band in the second multispectral image of the five ground object targets by the maximum value of the pixel of the image of the wave band of the five ground object targets to obtain the normalized pixel value of the image of the wave band, thereby obtaining the normalized image of 9 wave band images in the second multispectral image.

And 3, acquiring a multispectral image matrix.

Stacking the normalized images of all wave bands in the first multispectral image to obtain an image with the size of W₁ⁱ×H₁ⁱ×C₁The five first multispectral image matrices of (1), wherein W₁ⁱRepresenting the width, H, of each band image in the first multispectral image₁ⁱRepresenting the height, C, of each band image in the first multispectral image₁Number of bands, C, representing a first multispectral image₁I is 10, i is the serial number of the multispectral image of the ground object target, i is 1,2,3,4, 5;

stacking the normalized images of each wave band in the second multispectral image to obtain an image with the size of W₂ⁱ×H₂ⁱ×C₂The five second multispectral image matrices of (1), wherein W₂ⁱRepresenting the width, H, of each band image in the second multi-spectral image₂ⁱRepresenting the height, C, of each band image in the second multispectral image₂Number of bands, C, representing a second multispectral image₂9, i represents the serial number of the multispectral image of the ground object, and i is 1,2,3,4, 5;

and 4, acquiring a data set.

Selecting pixels with similar targets from a first multispectral image matrix of the first four ground object targets, dividing the pixels with similar targets of the multispectral image matrix into image pixel blocks of 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D₁。

Selecting pixels with similar marks from a second multispectral image matrix of the first four ground object targets, dividing the pixels with the similar marks of the multispectral image matrix into 9-channel image pixel blocks by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D₂。

Selecting pixels with similar marks from the first multispectral image of the fifth ground object target, and dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks of 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels as an image pixel block to be measuredTest data set T₁。

Selecting pixels with similar marks from a second multispectral image matrix of a fifth ground object target, and dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks of 9 channels by using a sliding window with the size of 24 multiplied by 24 pixels as a test data set T₂。

And 5, building a depth fusion residual error network.

And constructing 31 layers of depth residual error nets.

The first layer is an input layer, which inputs a three-dimensional vector of 24 × 24 × 10 in size, and sets the number of feature maps to 3.

The second layer is a convolution layer, which is a convolution characterization layer obtained by projecting the vector of the input layer, and the number of feature maps is set to be 64.

The third to eleventh layers are the first residual block 9 layers, and the number of feature maps is set to 64.

The twelfth to fourteenth layers are the second residual block 3 layer, the number of feature maps is set to 128, and quick connection is performed.

The fifteenth to twentieth layers are the third residual block 6 layer, and the number of feature maps is set to 128.

Twenty-first to twenty-third layers are the fourth residual block 3 layer, and the number of feature maps is set to 256.

The twenty-fourth to twenty-ninth layers are the fifth residual block 6 layer, and the number of feature maps is set to 256.

The thirtieth layer is a normalization layer and is set in a batch normalization mode.

And a thirty-first pooling layer, wherein the number of feature maps is set to be 256.

Inputting a first multispectral image matrix of the multispectral images of the five ground object targets into a depth residual error network to extract a first characteristic diagram, and vectorizing the first characteristic diagram to obtain a first characteristic vector.

Inputting a second multispectral image matrix of the multispectral images of the five ground object targets into the depth residual error network to extract a second characteristic map, and vectorizing the second characteristic map to obtain a second characteristic vector.

And fusing the two feature vectors to form a feature fusion layer of the depth fusion residual error network.

And connecting a multi-classification Softmax layer behind the feature fusion layer to obtain a deep fusion residual error network.

And 6, training a deep fusion residual error net.

Will train data set D₁And inputting the depth residual error net for supervised training.

Will train data set D₂And inputting the depth residual error net for supervised training.

And fusing the feature vectors in the network obtained by the two times of training to obtain the trained deep fusion residual error network.

The method for obtaining the trained deep fusion residual error net by fusing the feature vectors in the network obtained by the two times of training comprises the following steps:

step 1, training data set D₁Inputting the data into a trained first channel depth residual error net, and performing comparison on a training data set D₁Extracting the features to obtain the features S₁。

Step 2, training data set D₂Inputting the data into a trained second channel depth residual error net, and performing comparison on a training data set D₂Extracting the features to obtain the features S₂。

Step 3, the characteristics S₁And characteristic S₂And inputting the fused features into a multi-classification Softmax layer, and carrying out supervised training to obtain a trained deep fusion residual error net.

And 7, classifying the test data set.

Test data set T₁Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C₁。

Test data set T₂Inputting the data into a trained deep fusion residual error network, and extracting a feature vector C₂。

Feature vector C₁And the feature vector C₂And (4) fusion, namely inputting the fusion into a multi-classification Softmax layer in the deep fusion residual error network to obtain a final classification result, and calculating the classification accuracy.

The effect of the invention can be further illustrated by the following simulation experiment:

1. simulation conditions are as follows:

the simulation of the invention is carried out under Hewlett packard Z840, hardware environment of internal memory 8GB and software environment of TensorFlow.

2. Simulation content:

the simulation experiment of the invention is that multispectral image data of Berlin, hong Kong _ kong, Sa Bao _ paulo5 and Rorome in four areas of a satellite sentinel _2 and a satellite landsat _8 which are respectively shot and imaged are used as a training data set to train a depth fusion residual error net, and multispectral image data of Paris area is used as a test data set to classify 17 types of ground objects.

FIG. 2 is a diagram of the real ground object labeling in Paris area, the ground object category includes dense high-rise building, dense middle-rise building, dense low-rise building, open high-rise building, open middle-rise building, open low-rise building, large low-rise building, sparsely distributed building, heavy industrial area, dense forest, scattered trees, shrubs and short trees, low vegetation, bare rock, bare soil and sand, water.

The simulation experiment 1 of the invention is to use the method of the invention, firstly fusing the multispectral image of the Paris area shot and imaged by the satellite landsat _8 and the multispectral image of the Paris area shot and imaged by the satellite sentinel _2, then classifying the fused multispectral images, wherein the results are shown in figure 3, and the comparison results of the classification accuracy rates obtained by the three simulation methods are shown in table 1.

The simulation experiment 2 and the simulation experiment 3 of the invention use the depth residual error net classification method in the prior art to classify the multispectral image of the Paris area shot and imaged by the satellite landsat _8 and the multispectral image of the Paris area shot and imaged by the satellite sentinel _2 respectively.

3. And (3) simulation result analysis:

fig. 3 is a diagram of the result of classifying the multispectral image of the paris region using the method of the present invention. Comparing the classification result graph obtained by the method of the invention in fig. 3 with the real ground object mark graph in fig. 2, it can be seen that the classification result obtained by the method of the invention has higher accuracy compared with the prior art.

The results of the simulation experiment 2 and the simulation experiment 3 are shown in table 1, and as can be seen from table 1, the method inputs multispectral image data obtained by shooting two satellites into the depth fusion residual error network to extract features, and compared with a single-channel network which processes multispectral image data shot by a single satellite and inputs the multispectral image data into the single-channel network, the method has the advantages that the classification accuracy is improved, and the classification result is better compared with the prior art.

TABLE 1 Classification accuracy comparison Table obtained in simulation by Using the prior art

Method of producing a composite material	Rate of accuracy
		Method of the invention	51.12％
Single channel depth residual net (landsat _8 data)	44.82％
		Single channel depth residual net (sentinel _2 data)	45.63％

As can be seen from table 1, for the multispectral data of two satellite data, the two data are respectively input into different channels to extract features, and compared with a single-channel network in the prior art, the classification accuracy is improved.

In conclusion, the depth fusion residual error network is introduced, high-level features such as multiple time phases, multiple spectral bands and multiple directions of the image are extracted by combining feature fusion, the feature characterization capability of the image is improved, the model learns richer multispectral image features, and better classification accuracy is obtained compared with the prior art.

Claims

1. A multispectral image classification method based on a depth fusion residual error network is characterized by comprising the following steps:

(1) inputting a multispectral image:

(3) obtaining a multispectral image matrix:

(4) acquiring a data set:

(5) Building a depth fusion residual error network:

(5a) constructing a 31-layer depth residual error net;

(6) training a deep fusion residual error net:

(7) classifying the test data set:

(7c) Feature vector C₁And the feature vector C₂Fusing, inputting into a multi-classification Softmax layer in a deep fusion residual error network to obtain a final classification result,and calculating the classification accuracy.

2. The method for classifying multispectral images based on depth fusion residual error network as claimed in claim 1, wherein the step (2) of normalizing the land object removing target of each band of each multispectral image comprises the following steps:

the method comprises the following steps that firstly, in a first multispectral image of five surface feature targets, each pixel value in an image of each wave band is divided by the maximum pixel value of the image of the wave band at each time phase of the five surface feature targets to obtain the normalized pixel values of the image of the wave band, and then images obtained by respectively normalizing 10 image of the wave band in the first multispectral image are obtained;

and secondly, dividing each pixel value of the image of each wave band in a second multispectral image of five surface feature targets by the maximum pixel value of the image of the wave band of the five surface feature targets to obtain the normalized pixel value of the image of the wave band, thereby obtaining the normalized image of 9 wave band images in the second multispectral image.

3. The method according to claim 1, wherein the sliding window block-fetching operation is performed on the first multispectral image matrix of each of the first four surface feature targets in step (4a) to obtain the training data set D₁The method comprises the following specific steps: selecting pixels with similar targets from a first multispectral image matrix of the first four ground object targets, dividing the pixels with similar targets of the multispectral image matrix into image pixel blocks of 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D₁。

4. The method according to claim 1, wherein the sliding window block-fetching operation is performed on the second multispectral image matrix of each of the first four surface feature targets in step (4b) to obtain the training data set D₂The method comprises the following specific steps:selecting pixels with similar marks from a second multispectral image matrix of the first four ground object targets, dividing the pixels with the similar marks of the multispectral image matrix into 9-channel image pixel blocks by using a sliding window with the size of 24 multiplied by 24 pixels, and randomly selecting 50% of the pixel blocks from the image pixel blocks as a training data set D₂。

5. The method according to claim 1, wherein the step (4c) comprises performing a sliding window block fetching operation on the first multispectral image matrix of the fifth surface feature target, and combining all the image blocks into the test data set T₁The method comprises the following specific steps: selecting pixels with similar marks from the first multispectral image of the fifth ground object target, dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks with 10 channels by using a sliding window with the size of 24 multiplied by 24 pixels, and forming a test data set T by all the image blocks₁。

6. The method according to claim 1, wherein the step (4d) comprises performing a sliding window block fetching operation on the second multispectral image matrix of the fifth surface feature target, and combining all the image blocks into the test data set T₂The method comprises the following specific steps: selecting pixels with similar marks from a second multispectral image matrix of a fifth ground object target, dividing the pixels with similar marks of the multispectral image matrix into image pixel blocks of 9 channels by using a sliding window with the size of 24 multiplied by 24 pixels, and forming all the image blocks into a test data set T₂。

7. The method for classifying multispectral images based on a depth fusion residual error network as claimed in claim 1, wherein the structure of the residual error network of 31 layers in step (5a) is as follows:

the first layer is an input layer, a three-dimensional vector with the size of 24 multiplied by 10 is input, and the number of feature maps is set to be 3;

the second layer is a convolution layer, which is a convolution representation layer obtained by projecting the vector of the input layer, and the number of the feature mapping graphs is set to be 64;

the third to eleven layers are the first residual block 9 layers, and the number of the feature maps is set to be 64;

the twelfth to fourteenth layers are the second residual block 3 layer, the number of the feature maps is set to be 128, and quick connection is carried out;

the fifteenth to the twentieth layers are the third residual block 6 layer, and the number of feature maps is set to be 128;

the twenty-first to twenty-third layers are a fourth residual block 3 layer, and the number of feature maps is set to be 256;

the twenty-fourth to twenty-ninth layers are the fifth 6-layer residual block, and the number of the feature maps is set to be 256;

the thirtieth layer is a normalization layer and is set as a batch normalization mode;

8. The method for classifying multispectral images based on a depth fusion residual error network as claimed in claim 1, wherein the specific steps for constructing the feature fusion layer of the depth fusion residual error network in step (5b) are as follows:

inputting a first multispectral image matrix of multispectral images of five ground object targets into a depth residual error network to extract a first feature map, and vectorizing the first feature map to obtain a first feature vector;

inputting a second multispectral image matrix of the multispectral images of the five ground object targets into a depth residual error network to extract a second characteristic map, and vectorizing the second characteristic map to obtain a second characteristic vector;

and thirdly, fusing the two feature vectors to form a feature fusion layer of the depth fusion residual error network.

9. The method for classifying multispectral images based on a deep fusion residual error network as claimed in claim 1, wherein the step (6c) of fusing the feature vectors in the network obtained by two training sessions specifically comprises the following steps:

first, a training data set D is formed₁Inputting the data into a trained first channel depth residual error net, and performing comparison on a training data set D₁Extracting the features to obtain the features S₁；

Second, a training data set D is set₂Inputting the data into a trained second channel depth residual error net, and performing comparison on a training data set D₂Extracting the features to obtain the features S₂；

Thirdly, the characteristics S₁And characteristic S₂And inputting the fused features into a multi-classification Softmax layer, and carrying out supervised training to obtain a trained deep fusion residual error net.