LU500715B1

Movatterモバイル変換

Info

Publication number: LU500715B1
Application number: LU500715A
Authority: LU
Inventors: Kekun Huang; Yongzhu Xiong; Chuanxian Ren
Original assignee: Univ Sun Yat Sen; Univ Jiaying
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-04-08

Abstract

The invention discloses a hyperspectral image classification method based on discriminant Gabor network, which is used to improve the classification accuracy and efficiency of hyperspectral image classification, including: obtain original hyperspectral image data; perform PCA transform for original hyperspectral image data, obtaining several principal components; extract a cube data as a training sample with each pixel as the center according to the window size; use a Gabor filter and convolutional neural network to extract the features of the hyperspectral image; combine cross-entropy loss and triple hard loss to train the designed convolutional neural network; classify new hyperspectral images by the trained network. The invention proposes a new convolutional operator by combining traditional Gabor filters and learnable filters, so that deep enough features can be extracted by the network with fewer trainable parameters that can be learned by a limited number of training samples.

Description

DESCRIPTION LUS00715 Hyperspectral Image Classification Method Based on Discriminant Gabor Network

TECHNICAL FIELD This application relates to the field of hyperspectral image classification, and in particular to a method, apparatus, device and storage medium for classifying hyperspectral image based on Gabor filter and convolutional neural network.

BACKGROUND A hyperspectral image (HSI) contains hundreds of continuous bands in the ultraviolet, visible, and infrared regions, which effectively combine spatial and spectral information. HSI classification, i.e., classifying every pixel with a certain land-cover type, is the cornerstone of HSI analysis. It has a broad range of applications, including land-cover mapping, mineral exploration, water-pollution detection, natural disasters, and biological threats.

Many HSI classification algorithms have been proposed over the past decade, including subspace-based methods. In particularly, convolutional neural network (CNN)-based methods are drawing increasing attention. The CNN-based methods have shown good results, but still have some problems. First, to learn good features for HSI, the depth and the number of parameters of CNN must be sufficiently large. However, because of the limited training samples for HSI, it is easy to overfit if the network is complex. CNNs normally fail to handle large and unknown object transformations when the training data are insufficient. Second, most CNN-based HSI classification methods only use traditional cross-entropy loss to train the network. However, as some samples have similar spectra but different labels, or vice versa, to solely use the cross-entropy loss is not good enough to learn discriminative features.

On the other hand, Gabor filtering has attracted attention due to its ability to extract edges and textures with different scales and orientations without training processing, and its effectiveness for HSI classification has been shown. Because Gabor filters can extract good convolutional features without training processing, people are inspired to incorporate Gabor filters and CNN for HSI classification. However, most of the methods only use Gabor features instead of the original pixel as input to the network.

Thus, it can be seen that the existing techniques for hyperspectral image classification have different kinds of problems, which leads to lower accuracy and robustness. LU500715

SUMMARY The purpose of this application embodiment is to propose a method, device, apparatus and storage medium for hyperspectral image classification based on Gabor filtering and convolutional neural network, to solve the problem that the prior methods have lower accuracy and robustness when classifying hyperspectral image, which leads to difficulties in troubleshooting.

In order to solve the above technical problems, this application embodiment provides a hyperspectral image classification method based on Gabor filtering and convolutional neural network, using the technical solutions described below: Obtaining original hyperspectral image data; Performing PCA transform for original hyperspectral image data, obtaining 20 principal components; Extracting a cube data as a training sample with each pixel as the center according to the window size of 27x27; Using a Gabor filter and convolutional neural network to extract the features of the hyperspectral image; Combining cross-entropy loss and triple hard loss to train the designed convolutional neural network; Classifying new hyperspectral images by the trained network; Further, the method of using a Gabor filter and convolutional neural network to extract the features of the hyperspectral image comprises: Applying two different GMFs to convolute the cube data, including Relu activation function, max pooling and batch normalization layers.

Performing a standard convolution of spatial size 3x3; Applying two full connection layers; Using the output feature of the first full connection to calculate the triplet hard loss; Using the output of the softmax layer after the second full connection to calculate the cross-entropy loss; Combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss; Further, the method of applying two different GMFs to convolute the cube data comprises as follows: LUS00715 In the first convolution layer, a GMF with 16 fixed Gabor filters generated by two scales À € {8,16} and eight orientations is used to convolute each channel, obtaining data with size17 x 17 x 320, which is downsampled by max pooling with step2 x 2, followed by 128 1X 1 learnable filters and ReLu activation function, resulting in data with size 9 x 9 x 128 for the next layer. The GMF is called GMF1.

In the second convolution layer, using another GMF, called GMF2, with four fixed Gabor filters and one learnable 3 x 3 filter, to depthwise convolute the data, obtaining data with size 9 x 9 x 512 and 9 x 9 x 128, respectively. Then, merging them into data with size9 x 9 x 640, and convoluting it by 128 1 x 1 learnable filters. After applying batch normalization and a ReLu activation function, getting data with size 9 x 9 x 128 for the next layer. In GMF2, the fixed Gabor filters are generated by À = 8 and0 € {2.22} Further, the method of performing a standard convolution of spatial size 3x3 comprises as follows: Applying a convolution with 64 learnable filters of spatial size3 X 3, followed by batch normalization and a ReLu activation function, resulting in data of size” x 7 x 64; reshaping it to3136 x 1; Further, the method of applying two full connection layers comprises as follows: Applying a full connection to a feature of size256 X 1, and then applying a full connection of size Cx1 and a softmax activation function to predict the class label, where C is the number of the classes.

Further, the method of using the output feature of the first full connection to calculate the triplet hard loss comprises as follows: For each samplex,, the triplet hard loss picks the most dissimilar sample with the same identity and the most similar sample with a different identity to obtain a triplet, and it suppresses the distance between the selected positive pairs and maximizes the gap between the selected negative pairs: 1 & La = = h (a + maxDap — minD,,) a=1 where Dap =I f(Xa) —f(Xp) I; is the Euclidean distance for the feature-embedding output from the network, (XaXp) is a positive pair, (Xa,Xn) is a negative pair, h(x) = max(0,x) is the hinge loss function, and a = 5.0 is a LU500715 margin to filter trivial pairs.

Further, the method of using the output of the softmax layer after the second full connection to calculate the cross-entropy loss comprises as follows: The cross-entropy loss can be formulated as: 1 « L = 2 < Yu logy; >, i=1 where y; is a one-hot vector indicating the true label for training sample x;, Ÿ; is the output of the softmax layer connected to the resulting feature for x;.

Further, the method of combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss comprises as follows: Combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss: £Eproposed = Le + Y£a Where y is a given weight to balance the cross-entropy loss and triplet hard loss. It is verified in experiments that it is easy to select an appropriate value of y to obtain good performance. When y = 0 or y — oo, the performance will be diminished.

In order to solve the above technical problems, the present application embodiment also provides a computer device employing a technical solution as described below: A computer device comprises a GPU, a memory and a processors, the memory having a computer program stored in the memory, the processor executes the computer program when implementing the steps of a method for hyperspectral image classification as proposed in the present application embodiment.

Compared with the prior art, the present application embodiment mainly has the following beneficial effects: Propose a new convolutional operator by combining traditional Gabor filters and learnable filters, so that deep enough features can be extracted by the network with fewer trainable parameters that can be learned by a limited number of training samples. The Gabor filters can extract common features with different scales and orientations, while the learnable filters can learn some complementary features that Gabor filters cannot extract. Propose to introduce the local discriminant structure for cross-entropy loss by combining the triplet hard loss, so that more discriminative LUS00715 features and an end-to-end system are learned at the same time. With limited training samples, the invention performs significantly better than other state-of-the-art HSI classification methods. Moreover, the invention is fast for both training and testing.

BRIEF DESCRIPTION OF THE FIGURES In order to more clearly illustrate the embodiments in this application, a brief description of the accompanying drawings required to be used in the description of the embodiments of this application will be given below. It will be apparent that the accompanying drawings in the following description are some embodiments of this application, and that other accompanying drawings may be obtained from these drawings without creative labor to a person of ordinary skill in the field.

Fig. 1 is a diagram of the proposed network architecture. First, applying PCA to original hyperspectral image and extract a patch centered on each pixel to create an input cube for the network. Then, applying two different GMFs to convolute the data, including Relu activation function, max pooling and batch normalization layers. After that, applying a standard convolution of spatial size 3x3 and two full connections to predict the class label; note that using the feature of size 256x1 to calculate the triplet hard loss, and using the output of the softmax layer to calculate the cross-entropy loss, then combining them to formulate the proposed loss.

Fig. 2 is a diagram of the proposed Gabor mixture filter (GMF). In GMF, the input channels is convoluted by some fixed Gabor filters and some learnable filters, followed by some learnable 1 1 filters to get the output channels. The Gabor filters can extract common features with different scales and orientations, while the learnable filters can capture some complementary features that Gabor filters cannot extract.

Fig. 3 is the Salinas dataset, including the false-color image, ground truth, and corresponding class names. It was gathered by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over Salinas Valley, California. The data comprised 512x217 pixels with a spatial resolution of 3.7 m and 204 bands after 20 water-absorption bands were removed. It contained 16 classes, including vegetables, bare soil, and vineyards.

Fig. 4 is the Inidan Pines dataset, including the false-color image, ground truth, and corresponding class names. It was acquired by the AVIRIS sensor over the Indian Pines test site in Northwest Indiana. After removing the water-absorption bands, the image consisted of 200 spectral bands with 145 x 145 pixels. Its spectral covering LUS00715 range was from 0.4 to 2.5 um with a spatial resolution of 20m. It contained 16 classes, including alfalfa, corn, oats, wheat and woods.

Fig. 5 is the classification maps of different methods using 30 training samples per class on the Salinas dataset. It can easily find that many regions of the classification map achieved by the invention are obviously more accurate than those of CNN-DR, CNN-Gabor and CNN-Capsule.

Fig. 6 is the classification maps of different methods using 30 training samples per class on the Indian Pines dataset. It can easily find that many regions of the classification map achieved by the invention are obviously more accurate than those of CNN-DR, CNN-Gabor and CNN-Capsule.

Fig. 7 is training loss vs. iterations for the invention on the Salinas dataset. Here, iteration denotes the use of a batch of 400 samples to train the network. It can find that only after 200 iterations on the Indian Pines and Salinas dataset, and 400 iterations on the Houston dataset, the loss will converge.

DESCRIPTION OF THE INVENTION Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those of skill in the art belonging to the present application; the terms used herein in the specification of the application are intended only for the purpose of describing specific embodiments and are not intended to limit the application; the terms " includes" and "has", and any variations thereof, are intended to cover non-exclusive inclusion. The terms "first", "second", etc. in the specification and claims of this application or in the accompanying drawings above are used to distinguish between different objects and are not intended to describe a particular order.

References herein to "embodiments" mean that particular features, structures or characteristics described in connection with the embodiments may be included in at least one embodiment of the present application. The occurrence of the phrase at various points in the specification does not necessarily mean the same embodiment, nor is it a separate or alternative embodiment that is mutually exclusive with other embodiments. It is understood, both explicitly and implicitly, by those of skill in the art that the embodiments described herein may be combined with other embodiments.

In order to provide those in the art with a better understanding of the present application embodiments, the following is a clear and complete description of the technical embodiments in the present application embodiments, in conjunction with LU500715 the accompanying drawings.

Step 101, obtain original hyperspectral image data; specifically, an airborne spectrometer sensor is used to obtain a hyperspectral image of a certain area, and the category labels of some pixels are obtained by field investigation, and the remaining pixels are of unknown category; step 102, Perform PCA for original hyperspectral image data, obtaining 20 principal components; specifically, suppose x; € RP*! is a pixel including all bands in the hyperspectral image. E = X. (x; — m)(x; — M)" is the covariance matrix where m= ZX, x;. Then the PCA transform matrix is W = [H1, H2,, Hp], where pu, is eigenvector of corresponding to the p biggest eigenvalue; step 103, Extract a cube data as a training sample with each pixel as the center according to the window size of 27x27; step 201, define the Gabor mixture filter (GMF); specifically, each channel of input data is convoluted by some fixed Gabor filters and some learnable filters. Note that here using depthwise convolution instead of normal convolution. The filtering result contains many channels, so applying some learnable 1 X 1 convolutions to reduce the dimensions and learn more discriminative features.

Suppose x denotes the s‘” channel of layer l in the network, and g, denotes the kt" fixed Gabor filter. Convoluting each channel of input data by some fixed Gabor filters as follows: Vo =x * gi (1) where x12 +y2 yr? ; gon) gan =e KA) @) x! = xcosfy + ysin0, y' = —xsin0; + ycos6, 0, is the orientation angle of Gabor kernels, À, is the wavelength of the sinusoidal factor, y is the spatial aspects ratio, 0, is the standard derivation of the Gaussian envelope, and y = 0 and w =- return the real and imaginary parts,

respectively, of the Gabor filter. The parameter 0, is determined by À, and the LU500715 b spatial frequency bandwidth b as 0% = A SZ For example, setting different scales À, € {8,16} and different orientations M 2m 3m 4m 5m 6m 7m . _ 0, € {o22 3 tz on on Tn) with b = 5 and Y = 1.

Supposing wl denotes the i" learnable filter for x: convoluting each channel of input data by some learnable filters as follows: L2 L ys ) = x * Wg i. (3) Then merging the convolutional results by fixed Gabor filters and learnable filters as follows: L1 L2 YO = {Use Us VS} @ There would be many channels in y), so further applying 1 x 1 convolution to reduce the dimension. The t‘* channel of the next layer x can be obtained as +1 L L x( = FX VS) wl + be), (5) where y” is the s™ channel of YO, wl) is the learnable 1 x 1 filter for y to create output channel xD b; 1s the bias, and f is the activation function, such as the rectified linear unit f(x) = max(x, 0).

In GMF, the input channels are convoluted by some fixed Gabor filters and some learnable filters, followed by some learnable 1 x 1 filters to get the output channels. The Gabor filters can extract common features with different scales and orientations, while the learnable filters can capture some complementary features that Gabor filter s cannot extract.

Step 202, using a Gabor filter and convolutional neural network to extract the features of the hyperspectral image.

Specifically, in the first convolution layer, using a GMF with 16 fixed Gabor filters generated by two scales À € {8,16} and eight orientations to convolute each channel, obtaining data with size 17 x 17 x 320, which is downsampled by max pooling with step 2 x 2, followed by 128 1 X 1 learnable filters and ReLu activation function, resulting in data with size 9 x 9 x 128 for the next layer. The GMF is called GMF1.

In the second convolution layer, using another GMF, called GMF2, with four fixed Gabor filters and one learnable 3 x 3 filter, to depthwise convolute the data, LUS00715 obtaining data with size 9 x 9 x 512 and 9 x 9 x 128, respectively. Then merging them into data with size 9 x 9 X 640, and convoluting it by 128 1 x 1 learnable filters. After applying batch normalization and a ReLu activation function, getting data with size 9 x 9 x 128 for the next layer. In GMF2, the fixed Gabor filters are generated by À = 8 and 0 € {2.22} In the third convolution layer, applying a convolution with 64 learnable filters of spatial size 3 x 3, followed by batch normalization and a ReLu activation function, resulting in data of size 7 x 7 x 64. Reshaping it to 3136 x 1.

After three convolution layer, applying a full connection to a feature of size 256 x 1, and a full connection of size Cx1 and a softmax activation function to predict the class label, where C is the number of the classes.

Step 301, using the output feature of the first full connection to calculate the triplet hard loss.

Specifically, for each sample x,, the triplet hard loss picks the most dissimilar sample with the same identity and the most similar sample with a different identity to obtain a triplet, and it suppresses the distance between the selected positive pairs and maximizes the gap between the selected negative pairs: 1 La = = h (a + maxDap — minD,,) a=1 where Dap = f(&a) — f(Xp) lI; is the Euclidean distance for the feature-embedding output from the network, (XaXp) is a positive pair, (Xa,Xn) is a negative pair, h(x) = max(0,x) is the hinge loss function, and a = 5.0 is a margin to filter trivial pairs.

Step 302, using the output of the softmax layer after the second full connection to calculate the cross-entropy loss.

Specifically, the cross-entropy loss can be formulated as:

N L = = < y; logy; >, i=1 where y; is a one-hot vector indicating the true label for training sample X;, Ÿ; is the output of the softmax layer connected to the resulting feature for x;.

Step 303, combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss: LU500715 £Eproposed = Le + Y£a where y is a given weight to balance the cross-entropy loss and triplet hard loss. It will be verified in experiments that it is easy to select an appropriate value of y to obtain good performance. When y =0 or y — oo, the performance will be diminished.

Step 401, combining cross-entropy loss and triple hard loss to train the designed convolutional neural network; Specifically, to compute the discriminative loss, it was randomly selected 400 samples as a training batch. If there were not enough training samples, all of the samples were selected as a training batch. The parameter a was fixed at 5.0 for all of the datasets. Then stochastic gradient descent (SGD) was used with 500 iterations, momentum of 0.99, and weight decay of 0.0001. Initially setting a base learning rate of 0.001; all of the convolutional layers were initialized using zero-mean Gaussian random variables with a standard deviation of Ter where Nj, is the number of input units and No, is the number of output units in the weight tensor.

Step 501, classifying new hyperspectral images by the trained network; specifically, obtaining new hyperspectral image data; performing PCA for original hyperspectral image data, obtaining 20 principal components; extracting a cube data as a training sample with each pixel as the center according to the preset window size 27x27; using the trained convolutional neural network to output the probability that the sample belongs to each category; The category of the pixel is identified as the category with the greatest probability.

Step 601, evaluating algorithm; specifically, using the overall accuracy (OA), the average accuracy (AA) of each category and the Kappa coefficient. The overall accuracy (OA) 1s defined as follows: 04 = Number of samples correctly classified Total sample size n AA = oy OA; i=1 where OA; = Number PRE classified of class i Kappa coefficient is defined as follows: kappa = IA Pe 1—Pe where p, = ET, sample size of class predict sample size of class i From the above description of the implementation, it will be clear to those skilled in the art that the above embodiment method can be implemented with the help of software plus the necessary general hardware platform, or of course by hardware, but in many cases the former is the better implementation.

Based on this understanding, the technical solution of the present application, in essence, or the part that contributes to the prior art, may be embodied in the form of a software product, which is stored in a storage medium (e.g, ROM/RAM, disk, CD-ROM) and includes a number of instructions to enable a terminal device (which may be a cell phone, computer, server, air conditioner, or network device, etc.) to perform the method described in the present application.

The method described in each embodiment of the application.

Obviously, the embodiments described above are only a portion of the embodiments of the present application, not all of them, and the accompanying drawings give a preferred embodiment of the present application, but do not limit the patent scope of the present application.

The present application can be implemented in many different forms, and instead, these embodiments are provided for the purpose of providing a thorough and comprehensive understanding of the disclosure of the present application.

Notwithstanding the detailed description of this application with reference to the foregoing embodiments, it is still possible for a person skilled in the art to modify the technical solutions documented in each of the foregoing specific embodiments or to make equivalent substitutions for some of the technical features thereof.

Any equivalent structure made by using the content of this application specification and the accompanying drawings, which is directly or indirectly applied in other related technical fields, is equally within the scope of protection of this application patent.

Specifically, the present invention also provides a specific implementation description of classifying two hyperspectral images, which is specifically described as LU500715 follows: The first dataset was gathered by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over Salinas Valley, California. The data comprised 512 x 217 pixels with a spatial resolution of 3.7m and 204 bands after 20 water-absorption bands were removed. It contained 16 classes, including vegetables, bare soil, and vineyards. The false-color image, ground truth, and corresponding class names of the Salinas data are shown in Fig. 3.

The second dataset was acquired by the AVIRIS sensor over the Indian Pines test site in Northwest Indiana. After removing the water-absorption bands, the image consisted of 200 spectral bands with 145 x 145 pixels. Its spectral covering range was from 0.4 to 2.5um with a spatial resolution of 20m. It contained 16 classes, including alfalfa, corn, oats, wheat and woods. The false-color image, ground truth, and corresponding class names of the Indian Pines data are shown in Fig. 4.

the method to other methods based on deep learning was compared, including CNN with pixel-pair features (CNN-PPF), diverse region-based CNN (CNN-DR), CNN-C, spectral-spatial unified network (CNN-SSUN), random patches network based on CNN (CNN-RPNet), Gabor-CNN, convolutional capsule network (CNN-Capsule), similarity-based deep metric model (S-DMM), and morphological attribute profile cube and deep random forest (MAPC-DRF) .

Table I Classification results of state-of-the-art methods on the Salinas dataset C | Tra|Test | CN | CNN | CN | CNN | CNN | CNN| CNN| CNN|S-D | MA | DG in| @ | NN | -DR | N= | — - - | - | - [MM | PC- | MF a | #) PP C SSR | SSU | RPN| Gabo| Caps DRF s F N N et r ule

S 3 1197 197.0 98.4 99.41 100.| 100.100 100. 1 100.0 71.30] 99.95 100.0 3 1369 [99.2 98.1 99.1] 100.| 98.9|100 100. 2 98.84 99.97| 95.89 100.0 3 1194 | 98.0 98.8 99.7| 99.9] 993100 100. 3 99.69 99.54 | 100.0 100.0 3 1136 199.8 97.4 99.7| 100.| 97.9199. 96.7 4 95.67 99.85| 100.0 100.0 3 1264 |94.1 97.9 98.4| 99.2| 100.1100 99.5

98.79 98.53 | 98.68 96.56

3 [392 [97.0 98.2 99.7] 99.8] 100.100 100. LU500715

99.95 100.0] 99.92 99.34 0 9 0 4 2 5 01.0 0 3 354 199.5 99.6 99.6| 97.0] 93.8/100 100. 7 96.42 99.891 99.63 100.0 0 9 5 9 9 4 6 1.0 0 3 112 [64.1 79.7 76.8] 83.8) 100.188. 97.2

79.23 80.62| 84.75 99.07 0 | 41 3 3 2 0 0 128 9 3 |617 1948 99.1 99.6| 98.6| 96.699. 100.

99.74 99.98 | 98.98 100.0 0 3 6 7 6 0 9 168 0 1|3 1324 188.0 96.1 98.8| 99.9| 100.198. 99.8

95.54 98.28] 97.04 99.97 010 8 5 2 6 1 0 |74 8 1] 3 |103 [98.7 98.8 99.6| 100.| 100.199. 100.

98.17 9981] 98.94 100.0 110 8 5 4 1 0 0 171 0 1] 3 [189 |100. 99.5 99.9| 100.| 97.51100 100.

100.0 100.0] 100.0 100.0 210 7 0 3 5 0 51.0 0 1 | 3 98.3 97.4 99.7 100.| 100.1100 100. 886 99.55 100.0] 99.44 99.32 310 1 0 7 0 01.0 0 1] 3 1104 194.6 98.9 95.8| 99.8| 74.1100 100.

98.56 98.37| 96.63 99.52 410 0 2 4 7 1 4 1.0 0 1] 3 [723 |749 77.0 86.6] 83.0] 100.86. 99.2

85.24 85.02| 67.44 93.62 510 8 7 7 5 2 0 |03 5 1] 3 1177 [98.4 99.6 99.1| 99.5] 99.5199. 100.

99.49 95.95] 99.44 99.94 610 7 8 6 0 0 0 |10 0 Overall

86.8 91.6 92.9| 93.9] 95.1195. 99,2 Accuracy 92.77 92.48] 91.62 98.70 1 7 1 1 1 (50 1 (%) Average

93.5 95.9 97.0| 97.5] 97.3198. 99,5 Accuracy 96.56 95.44 | 96.05 99.21 6 5 2 4 7 119 4 (%) Kappa 0.85 0.91 0.92] 0.93| 0.94 0.94 0.986| 0.99 . 0.9229 0.9197] 0.9106 Coefficient | 93 11 14 24 78 | 99 1 16 Table II Classification results of state-of-the-art methods on the Indian pines data set Cl | Tra|Tes| CN C C | CN | CN | CNN CN] CNN S-D MA | DG ass | m( [t(#)| N-| N N | N- | N- - | N-| - |MM PC- | MF #) PP | N- N | SS | SS | RPN| Ga| Caps DRF F D — | RN | UN et | bor| ule

R C 3 96.1 | 96.1 100.| 100.| 100.1 87.5] 10] 100.{100 100. 1 16 100.0 0 5 5 0 0 0 0 100 0 10 0 "3 [5 [15 [ovo | 569] #19] 692] 770] 72] 77] 672]01 | Tres] wea

0 98/9 | 61 of 4] 4] 30} 3156 |__| 1] 1900715 3 3 [80 | 68.2 | 62.7) 93.7) 81.7] 92.0] 80.2] 89.1 92.0191. 95.62 87.6 0 (0 3 2 5 5 0 5 | 531 5 |25 2 4 3 20 [74.1 | 76.0| 98.0| 100.| 93.2] 97.1] 10| 99.5199. 100.0 100. 0 |7 9 4 7 0 4 0100 8 |03 0 3 [45 [88.1 | 88.5 97.5] 95.5] 98.6] 90.9| 99.1 98.9193. 96.03 93.8 0 [3 2 5 7 8 8 5165 6 (60 2 3 [70 1978 | 92.1] 95.8] 98.8] 89.1| 96.5] 98.1 97.91100 94.29 99.5 0 |0 9 1 6 6 4 7122 5 10 7 2 87.5 | 100.1 100.1 100.| 100.) 85.7| 10] 100.100 100. 7 7 100.0 1 0 0 0 0 0 1 100 0 10 0 3 [44 [82.3 | 97.5| 98.2] 993] 100.| 100. 10] 100.199. 100.0 100. 0 |8 1 9 1 3 0 0100 0 33 0 1 5 100. | 100.| 100.{ 100.| 100.| 100.1 10| 100.1100 100.0 100. 5 0 0 0 0 0 0100 01.0 0 3 94 [84.7 | 90.2 79.5] 79.7] 82.7] 81.9] 90.1 77.0192. 79 41 95.5 0 |2 4 2 1 2 0 5130 6 (46 4 11 3 (24 [72.7 | 73.9| 71.4| 56.0] 81.6| 71.7| 76.| 88.7/81. 85.86 92.1 0 |25 6 3 2 4 1 5 | 401 2 |57 2 3 56 | 63.7 | 63.5| 77.8] 98.7] 98.5] 86.6] 91.1 80.9194. 88.4 12 94 85 0 [3 0 3 0 6 8 8 | 04] 4 |85 5 3 [17 [98.3 | 98.3] 99.4| 100.| 100.| 100. 99.1 100.1100 100. 13 99.43 0 |5 8 8 3 0 0 0 | 51 01.0 0 3 [12 1944 | 892] 93.5] 97.1] 95.6] 95.0] 91.1 97.8197. 96.3 14 97.98 0 35 5 4 2 7 3 6 | 73] 7 |41 6 3 35 [92.0 | 88.8] 80.9] 98.8] 99.1] 95.5] 98.1 98.7194. 100.0 98.0 0 | 6 8 0 0 8 6 1 | 99 0 138 3 3 100. | 100.| 100.{ 100.| 100.| 100.1 98. 100.1100 100. 16 63 93.65 0 0 0 0 0 0 0192 01.0 0 Overall

78.4 | 77,8] 84.5| 80.5| 88.2 84.4] 86. 88.2(91. 93.4 Accuracy 89.78 0 9 3 0 1 0352] 9 |67 4 (%) Average

85.0 | 85.8] 91.7| 92.2] 94.2| 90.5| 94.| 93.6|95. 96.2 Accuracy 94.67 4 9 5 1 4 2 | 461 9 197 5 (%) Kappa 0.75 | 0.74] 0.83| 0.79| 0.87| 0.82] 0.8| 0.87(0.90 0.890] 0.93 Coefficient | 57 97 | 49 16 | 42 | 45 | 4581 5 |33 9 01 Tables I and II show the class-specific accuracy, overall accuracy (OA), OA standard deviation (OA std), average accuracy (AA), and Kappa coefficient of different methods on the Salinas and Indian Pines datasets, respectively. Here, selecting randomly 30 samples per class for training, and the rest for testing. If there are not enough samples for a certain class, randomly selecting 75% samples for LU500715 training. From the results, it can be observed that: 1) CNN-PPF attains lower performance, for it only takes one-dimensional spectral information as the input to the CNN and cannot automatically learn spatial-spectral features. CNN-DR cannot improve the performance compared with CNN-PPF when using 30 training samples per class on the Indian Pines dataset. CNN-C achieves better accuracy on the Salinas and Indian Pines datasets, but it cannot improve the performance on the Houston dataset.

2) CNN-SSRN gets better performance on the Salinas dataset, but the accuracy is not good on the Indian Pines dataset. CNN-SSUN improves the accuracy on the Houston and Indian Pines datasets except on the Salinas dataset. CNN-RPNet attains good performance the accuracy on the Salinas dataset, but the accuracy is not good on the Indian Pines dataset.

3) CNN-Gabor achieves good performance on the Salinas dataset but not well on the Indian Pines dataset. CNN-Capsule improves the accuracy on the Salinas dataset compared with the above methods, but it the accuracy is not good on the Indian Pines dataset.

4) S-DMM is not good enough on the Salinas and Indian Pines dataset. MAPC-DRF achieves better performance on the Salinas dataset but not well on the Indian Pines dataset.

5) The invention gets the highest overall accuracy and smallest standard deviation for all of the datasets, which demonstrates its effectiveness. It improves a mean overall accuracy of 8.5, 6.0, 3.9, and 22 percent compared with CNN-DR, CNN-Gabor, CNN-Capsule, and MAPC-DRF respectively. The average accuracy and Kappa coefficient of the invention are also significantly better than those of other methods.

Table III Comparison of different numbers of training samples per class on the Salinas dataset po | ie ue em je n(#) N-D Gabor Capsul C-DR ed R € F DGMF

‘ Tables III shows the comparisons of different numbers of training samples per class on the Salinas dataset, respectively. According to the results, It can be found that the fewer the training samples the more improvement the invention obtains. With fewer training samples, the accuracy of the invention only decreases slightly, while that of the other methods dramatically declines.

Fig. 5 and Fig. 6 show the classification maps of different methods using 30 training samples per class on the Salinas dataset and the Indian Pines dataset, respectively. It can be easily found that many regions of the classification maps achieved by the invention are obviously more accurate than those of CNN-DR, CNN-Gabor and CNN-Capsule.

Table VI Comparison of related methods Comparing the method to some related methods based on Gabor features, including original Gabor, Gabor-NRS, Lowrank-Gabor, Gabor-CNN and GFDN. Here randomly selecting 100 samples per class for training and the rest for testing, and discarding 7 small classes for the Indian Pines dataset. The baseline method is the nearest neighbor classifier based on one dimensional spectral feature. Table VI compares the overall accuracy and its standard deviation of the related methods. It can be found that the Gabor feature can get much higher accuracy than the baseline LU500715 method, which shows the superiority of the Gabor feature for HSI classification. Gabor-NRS and lowrank-Gabor significantly improve the accuracy compared with the Gabor feature. Gabor-CNN gets better performance, but not good enough. GFDN achieves good results on the Indian Pines and Salinas datasets, but is not good on the Houston dataset. The proposed DGMF method attains the best results for all of the datasets, which demonstrates its effectiveness. Table V Comparison of train time and testing time | Methods | Salinas | Indian pines | CNN-PPF [Fair Train time | CNN-Gabor (minutes) Proposed DGMF | 6 | 4 CNN-PPF Co CNN-DR Testing time CNN-Gabor (seconds) CNN-Capsule Proposed DGMF Table V shows the comparison of training time in minutes and testing time in seconds. The experiments were conducted on a single GPU of an NVIDIA Titan X in the Python language. The testing time is the whole time for all of the testing samples. It can be found that the training process of the invention only takes a few minutes, while CNN-PPF takes a few hours and CNN-DR takes more than half an hour. CNN-Capsule is fast for training, but takes much more testing time for its complex network. The testing process of the invention is fastest of all of the compared methods.

Fig. 7 shows the training loss vs. iterations for the invention. Here, iteration denotes the use of a batch of 400 samples to train the network. It can be found that only after 200 iterations on the Indian Pines and Salinas dataset. So, the iterations are fixed to 500 for all of the datasets for training.

Claims

CLAIMS LU500715

1. A hyperspectral image classification method based on Gabor filtering and convolutional neural network 1s characterized in that it comprises the steps of: obtaining original hyperspectral image data; performing PCA transform for original hyperspectral image data, obtaining 20 principal components; extracting a cube data as a training sample with each pixel as the center according to the window size of 27x27; using a Gabor filter and convolutional neural network to extract the features of the hyperspectral image; combining cross-entropy loss and triple hard loss to train the designed convolutional neural network; classifying new hyperspectral images by the trained network.

2. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 1 is characterized in that using a Gabor filter and convolutional neural network to extract the features of the hyperspectral image comprises as follows: applying two different GMFs to convolute the cube data, including Relu activation function, max pooling and batch normalization layers; performing a standard convolution of spatial size 3x3; applying two full connection layers; using the output feature of the first full connection to calculate the triplet hard loss; using the output of the softmax layer after the second full connection to calculate the cross-entropy loss; combining the triplet hard loss and the cross-entropy loss to formulate the proposed losing.

3. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that applying two different GMFs to convolute the cube data comprises as follows: in the first convolution layer, using a GMF with 16 fixed Gabor filters generated by two scales À € {8,16} and eight orientations to convolute each channel, obtaining data with size 17 x 17 x 320, which is downsampled by max pooling with step

2x2, followed by 128 1x 1 learnable filters and ReLu activation function, LU500715 resulting in data with size 9 x 9 x 128 for the next layer; the GMF is called GMF1; in the second convolution layer, using another GMF, called GMF2, with four fixed Gabor filters and one learnable 3 x 3 filter, to depthwise convolute the data, obtaining data with size 9 x 9 x 512 and 9 x 9 x 128, respectively; then merging them into data with size 9 x 9 X 640, and convoluting it by 128 1 x 1 learnable filters; after applying batch normalization and a ReLu activation function, getting data with size 9 x 9 x 128 for the next layer, in GMF2, the fixed Gabor filters are generated by À = 8 and 0 € {2.22}

4. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that performing a standard convolution of spatial size 3x3 comprises as follows: applying a convolution with 64 learnable filters of spatial size 3 X 3, followed by batch normalization and a ReLu activation function, resulting in data of size 7 x 7 x 64; reshaping it to 3136 x 1.

5. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that applying two full connection layers comprises as follows: applying a full connection to a feature of size 256 X 1, and then applying a full connection of size Cx1 and a softmax activation function to predict the class label, where C is the number of the classes.

6. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that using the output feature of the first full connection to calculate the triplet hard loss comprises as follows: for each sample x,, the triplet hard loss picks the most dissimilar sample with the same identity and the most similar sample with a different identity to obtain a triplet, and it suppresses the distance between the selected positive pairs and maximizes the gap between the selected negative pairs: 1 & La = = h (a + maxDap — minD,,) a=1 where Dap = f(Xa) — f(Xp) I; is the Euclidean distance for the feature-embedding output from the network, (XaXp) is a positive pair, (Xa,Xn) is a negative pair, h(x) = max(0,x) is the hinge loss function, and a = 5.0 is a LU500715 margin to filter trivial pairs.

7. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that using the output of the softmax layer after the second full connection to calculate the cross-entropy loss comprises as follows: the cross-entropy loss can be formulated as: 1 « Le = 2 < Yu logy; >, i=1 where y; is a one-hot vector indicating the true label for training sample X;, Ÿ; is the output of the softmax layer connected to the resulting feature for x;.

8. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 2 is characterized in that combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss comprises as follows: combining the triplet hard loss and the cross-entropy loss to formulate the proposed loss: Eproposed = Le +VLa where y is a given weight to balance the cross-entropy loss and triplet hard loss; it is verified in experiments that it is easy to select an appropriate value of y to obtain good performance; when y =0 or y — oo, the performance will be diminished.

9. The hyperspectral image classification method based on Gabor filtering and convolutional neural network according to claim 1 is characterized in that classifying new hyperspectral images by the trained network comprises as follows: obtaining new hyperspectral image data; performing PCA for original hyperspectral image data, obtaining 20 principal components; extracting a cube data as a training sample with each pixel as the center according to the preset window size 27x27; using the trained convolutional neural network to output the probability that the sample belongs to each category; the category of the pixel is identified as the category with the greatest probability.