Movatterモバイル変換


[0]ホーム

URL:


CN117372772A - Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage medium - Google Patents

Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage medium
Download PDF

Info

Publication number
CN117372772A
CN117372772ACN202311411642.6ACN202311411642ACN117372772ACN 117372772 ACN117372772 ACN 117372772ACN 202311411642 ACN202311411642 ACN 202311411642ACN 117372772 ACN117372772 ACN 117372772A
Authority
CN
China
Prior art keywords
training
classification
vit
classification result
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311411642.6A
Other languages
Chinese (zh)
Inventor
刘军
郭浩然
贺怡乐
王志辉
彭荧荧
李曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Chinese Medicine
Original Assignee
Hunan University of Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Chinese MedicinefiledCriticalHunan University of Chinese Medicine
Priority to CN202311411642.6ApriorityCriticalpatent/CN117372772A/en
Publication of CN117372772ApublicationCriticalpatent/CN117372772A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请涉及一种高光谱图像分类方法、装置、计算机设备以及存储介质。包括:将训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;基于新的训练集对多Vi T变体模型进行训练,并在每个epoch结束后,将测试集输入多Vi T变体模型进行测试,得到测试集中每个测试样本的分类结果;对测试集中各个测试样本在每个Vi T变体模型下的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中该Vi T变体模型下对应测试样本的候选分类结果;通过遗传算法优化每个Vi T变体模型的候选分类结果的最优权重,并重新对测试样本的候选分类结果进行最多数投票,获取最终分类结果。本申请实施例可以减小训练样本与测试样本不独立的情况,提高分类精度。

The present application relates to a hyperspectral image classification method, device, computer equipment and storage medium. Including: converting the training samples in the training set from three-dimensional images to two-dimensional images to generate a new training set; training the multi-Vi T variant model based on the new training set, and inputting the test set after the end of each epoch Test multiple Vi T variant models to obtain the classification results of each test sample in the test set; conduct the most majority vote on all classification results of each test sample in the test set under each Vi T variant model, and vote with the largest number The classification result is used as the candidate classification result of the corresponding test sample under the Vi T variant model in the test set; the optimal weight of the candidate classification result of each Vi T variant model is optimized through a genetic algorithm, and the candidate classification result of the test sample is re-calculated. The majority vote is used to obtain the final classification result. The embodiments of the present application can reduce the situation where training samples and test samples are not independent and improve classification accuracy.

Description

Translated fromChinese
一种高光谱图像分类方法、装置、计算机设备及存储介质A hyperspectral image classification method, device, computer equipment and storage medium

技术领域Technical Field

本申请属于高光谱图像处理技术领域,特别涉及一种高光谱图像分类方法、装置、计算机设备以及存储介质。The present application belongs to the technical field of hyperspectral image processing, and in particular relates to a hyperspectral image classification method, device, computer equipment and storage medium.

背景技术Background Art

高光谱图像分类是指对一景高光谱图像实现像素级别的分类,每一个像素的光谱信息是分类的重要依据。高光谱图像的数据处理可以简单分为光谱提取与空谱特征提取处理问题。目前,图像的光谱特征提取因其更强、更清晰的电磁波长作用,更快的获取速度和信噪比等优越条件被社会广泛应用与扩展,而空间信息特征作为重要的研究技术成分,随着机器学习领域的成熟也逐渐备受关注与重视。Hyperspectral image classification refers to the pixel-level classification of a hyperspectral image. The spectral information of each pixel is an important basis for classification. The data processing of hyperspectral images can be simply divided into spectral extraction and spatial spectrum feature extraction. At present, the spectral feature extraction of images is widely used and expanded by society due to its stronger and clearer electromagnetic wavelength effect, faster acquisition speed and signal-to-noise ratio. As an important research technology component, spatial information features are gradually receiving attention and attention as the field of machine learning matures.

近年来,随着深度学习领域的繁荣发展,深度学习算法被应用到基于有监督学习的高光谱图像分类中。例如,在提取光谱特征方面,利用一维卷积神经网络(One-dimensional Convolutional Neural Net-work,1D CNN)进行高光谱图像分类,以及利用基于二维卷积神经网络(Two-dimensional Convolutional Neural Network,2D CNN)和三维卷积网络(Three-dimensional Convolutional Neural Network,3D CNN)的空谱融合方式进行高光谱图像分类。由于半监督学习和无监督学习不完全依赖标签信息实现特征学习,它们会利用大量的无标签数据中的信息指导模型的建立。Wu等人[WU H,PRASADS.Semi-supervised deep learning using pseudo labels for hyperspectral imageclassification[J].IEEE Transactions on Image Processing,2018,27(3):1259-1270.]则通过制作伪标签的方法进行半监督学习,Huang等人[HUANG B,GE L,CHEN G,etal.Nonlocal graph theory based transductive learning for hyperspectral imageclassification[J].Pattern Recognition,2021,116:107967.]使用了非局部图论转导学习,Li等人[LI J,BIOUCAS-DIAS J M,PLAZA A.Semisupervised hyperspectral imageclassification using soft sparse multinomial logistic regression[J].IEEEGeoscience and Remote Sensing Letters,2013,10(2):318-322.]提出了适用于高光谱图像分类的软稀疏多项式逻辑回归方法。Fang等人[FANG B,LI Y,ZHANG H,etal.Collaborative learning of lightweight convolutional neural network anddeep clustering for hyperspectral image semi-supervised classification withlimited training samples[J].ISPRS Journal of Photogrammetry and RemoteSensing,2020,161:164-178.]则将轻量卷积神经网络和深度聚类协同学习,增强了算法的学习能力,提高了分类的精度。除了常规的半监督学习外,还有学者提出了小样本学习的概念,并将其应用于高光谱图像分类领域。在众多小样本学习的算法中,空谱融合是常用的技术手段。对于某一个样本像素,选择该像素周围N×N邻域范围内的像素,共同作为一个样本,提取该样本的空间和光谱特征进行融合,再送入到设计好的分类算法中。In recent years, with the booming development of deep learning, deep learning algorithms have been applied to hyperspectral image classification based on supervised learning. For example, in terms of extracting spectral features, one-dimensional convolutional neural network (1D CNN) is used for hyperspectral image classification, and spatial-spectral fusion based on two-dimensional convolutional neural network (2D CNN) and three-dimensional convolutional neural network (3D CNN) is used for hyperspectral image classification. Since semi-supervised learning and unsupervised learning do not completely rely on label information to achieve feature learning, they will use the information in a large amount of unlabeled data to guide the establishment of the model. Wu et al. [WU H, PRASADS. Semi-supervised deep learning using pseudo labels for hyperspectral image classification [J]. IEEE Transactions on Image Processing, 2018, 27 (3): 1259-1270.] performed semi-supervised learning by creating pseudo labels. Huang et al. [HUANG B, GE L, CHEN G, et al. Nonlocal graph theory based transductive learning for hyperspectral image classification [J]. Pattern Recognition, 2021, 116: 107967.] used non-local graph theory based transductive learning. Li et al. [LI J, BIOUCAS-DIAS J M, PLAZA A. Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression [J]. IEEE Geoscience and Remote Sensing Letters, 2013, 10 (2): 318-322.] proposed a soft sparse multinomial logistic regression method suitable for hyperspectral image classification. Fang et al. [FANG B, LI Y, ZHANG H, et al. Collaborative learning of lightweight convolutional neural network and deep clustering for hyperspectral image semi-supervised classification with limited training samples [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 161: 164-178.] used lightweight convolutional neural network and deep clustering for collaborative learning, which enhanced the learning ability of the algorithm and improved the classification accuracy. In addition to conventional semi-supervised learning, some scholars have proposed the concept of small sample learning and applied it to the field of hyperspectral image classification. Among the many small sample learning algorithms, spatial-spectral fusion is a commonly used technical means. For a certain sample pixel, select the pixels in the N×N neighborhood around the pixel as a sample, extract the spatial and spectral features of the sample, fuse them, and then send them to the designed classification algorithm.

然而,现有的高光谱图像分类方法仍然存在以下不足:However, the existing hyperspectral image classification methods still have the following shortcomings:

1、传统的空-谱特征融合算法需要使用较大的空间邻域窗口,可能存在训练样本与测试样本出现重叠而导致不独立的情况。1. The traditional spatial-spectral feature fusion algorithm requires the use of a larger spatial neighborhood window, which may cause overlap between training samples and test samples, resulting in non-independence.

2、传统高光谱图像分类都是基于单个模型进行分类,导致分类效果不够稳定。2. Traditional hyperspectral image classification is based on a single model, resulting in unstable classification results.

发明内容Summary of the invention

本申请提供了一种高光谱图像分类方法、装置、计算机设备以及存储介质,旨在至少在一定程度上解决现有技术中的上述技术问题之一。The present application provides a hyperspectral image classification method, apparatus, computer device and storage medium, aiming to solve at least one of the above-mentioned technical problems in the prior art to a certain extent.

为了解决上述问题,本申请提供了如下技术方案:In order to solve the above problems, this application provides the following technical solutions:

一种高光谱图像分类方法,包括:A hyperspectral image classification method, comprising:

获取高光谱图像的训练集、验证集和测试集;Obtain training, validation, and test sets of hyperspectral images;

利用空间打散算法将所述训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;The training samples in the training set are converted from three-dimensional images to two-dimensional images by using a spatial scattering algorithm to generate a new training set;

基于所述新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将所述测试集输入多ViT变体模型进行测试,得到所述测试集中每个测试样本在各个ViT变体模型下的分类结果;其中,所述多ViT变体模型包括至少两个ViT变体模型;The multi-ViT variant model is trained based on the new training set, and after each epoch, the test set is input into the multi-ViT variant model for testing, so as to obtain the classification result of each test sample in the test set under each ViT variant model; wherein the multi-ViT variant model includes at least two ViT variant models;

针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;For each ViT variant model, a majority vote is performed on all classification results of each test sample in the test set, and the classification result with the largest number of votes is used as the candidate classification result of the corresponding test sample in the test set;

通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,将投票数量最多的分类结果作为所述测试样本的最终分类结果。The optimal weight of the candidate classification results of each ViT variant model is optimized by a genetic algorithm, and the candidate classification results of the test sample are re-voted by the majority, and the classification result with the largest number of votes is used as the final classification result of the test sample.

本申请实施例采取的技术方案还包括:所述获取高光谱图像的训练集、验证集和测试集,具体为:The technical solution adopted by the embodiment of the present application also includes: the training set, verification set and test set of the hyperspectral image are obtained, specifically:

获取高光谱图像,基于所述高光谱图像的地面真值图,按照第一设定比例从每个类别中分别选择训练样本的像素,剩余像素作为测试样本;Acquire a hyperspectral image, and based on a ground truth map of the hyperspectral image, select pixels of training samples from each category according to a first set ratio, and use the remaining pixels as test samples;

按照第二设定比例从所述训练样本中选择设定数量的像素作为验证样本,其余像素作为训练样本,由此得到训练集、验证集和测试集。A set number of pixels are selected from the training samples as verification samples according to a second set ratio, and the remaining pixels are used as training samples, thereby obtaining a training set, a verification set and a test set.

本申请实施例采取的技术方案还包括:所述利用空间打散算法将所述训练集中的训练样本由三维图像转换为二维图像,具体为:The technical solution adopted by the embodiment of the present application also includes: the training samples in the training set are converted from three-dimensional images to two-dimensional images by using a spatial scattering algorithm, specifically:

获取所述训练集中每个训练样本的N*N邻域,得到N*N*B的三维数据立方体,其中B为波段数;Obtain an N*N neighborhood of each training sample in the training set to obtain an N*N*B three-dimensional data cube, where B is the number of bands;

固定中心像素的位置不变,随机对邻域内的其它训练样本进行随机打散,每随机打散一次,将所述N*N*B的三维数据立方体转换成高为N*N、宽为B的二维图像;The position of the central pixel is fixed unchanged, and other training samples in the neighborhood are randomly scattered. Each time the random scattering is performed, the N*N*B three-dimensional data cube is converted into a two-dimensional image with a height of N*N and a width of B;

对各个类别下的每个训练样本分别执行随机打散操作K/M次,其中K表示最终希望得到的每个类别的样本总数,M为各个类别下的训练样本数量,最终分别得到每个类别下的K个二维图像;Perform random scattering operations K/M times for each training sample in each category, where K represents the total number of samples in each category that are ultimately desired, and M represents the number of training samples in each category, and finally obtain K two-dimensional images in each category;

对所有类别执行完随机打散操作后,得到由C*K个二维图像构成的新的训练集,其中C为所述新的训练集中的训练样本类别总数。After performing the random scattering operation on all categories, a new training set consisting of C*K two-dimensional images is obtained, where C is the total number of training sample categories in the new training set.

本申请实施例采取的技术方案还包括:所述针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果,具体为:The technical solution adopted by the embodiment of the present application also includes: for each ViT variant model, a majority vote is performed on all classification results of each test sample in the test set, and the classification result with the largest number of votes is used as the candidate classification result of the corresponding test sample in the test set, specifically:

对于所述测试集中的每一个测试样本,每个ViT变体模型在每个epoch的训练后都会生成一个分类结果,选取各测试样本在所有epoch中在每个ViT变体模型下的分类结果,对每个测试样本进行最多数投票,将投票数最多的类别作为各测试样本的候选分类结果,生成最多数投票后的预测分类图。For each test sample in the test set, each ViT variant model will generate a classification result after training in each epoch, select the classification results of each test sample under each ViT variant model in all epochs, perform a majority vote on each test sample, take the category with the most votes as the candidate classification result of each test sample, and generate a predicted classification graph after the majority vote.

本申请实施例采取的技术方案还包括:所述通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,将投票数量最多的分类结果作为所述测试样本的最终分类结果,具体为:The technical solution adopted in the embodiment of the present application also includes: optimizing the optimal weight of the candidate classification results of each ViT variant model by a genetic algorithm, and re-voting the candidate classification results of the test sample by the majority, and taking the classification result with the largest number of votes as the final classification result of the test sample, specifically:

将所述至少两个ViT变体模型经过各自最多数投票后得到的预测分类图作为输入数据;Using the predicted classification graphs obtained by the at least two ViT variant models after the respective majority voting as input data;

随机生成设定数量的染色体,并随机生成每个染色体的二进制编码,构成初始种群;Randomly generate a set number of chromosomes and randomly generate binary codes for each chromosome to form an initial population;

对于当前种群中的每一个染色体,首先进行解码,然后针对所述验证集中的每个验证样本,取其在所述至少两个ViT变体模型的预测分类图上的候选分类结果,按照每个模型的权重进行叠加,当所有模型的权重叠加完成后,按照最多数投票策略,以投票最多次数的分类结果作为该验证样本的最终分类结果;对所有验证集进行最多数投票完成后,计算分类精度,作为适应度函数,得到每个染色体的适应度值;For each chromosome in the current population, decoding is first performed, and then for each verification sample in the verification set, the candidate classification results on the predicted classification graph of the at least two ViT variant models are taken, and superimposed according to the weight of each model. When the weight superposition of all models is completed, the classification result with the most votes is used as the final classification result of the verification sample according to the majority voting strategy; after the majority voting of all verification sets is completed, the classification accuracy is calculated as the fitness function to obtain the fitness value of each chromosome;

选择P个适应度值最大的染色体,并随机选择两个染色体进行交叉操作,交叉时,先随机选择交叉点位,然后将两个染色体在交叉点位前后的染色体段进行互换,形成两个新的染色体,如此迭代,形成满足种群数量的新的染色体;Select P chromosomes with the largest fitness values, and randomly select two chromosomes for crossover operation. When crossing, first randomly select the crossover point, then swap the chromosome segments before and after the crossover point of the two chromosomes to form two new chromosomes. Repeat this process to form new chromosomes that meet the population size.

按照设定的变异概率,选择当前种群中的每个染色体,随机选择若干个点位的二进制数进行翻转,得到变异后的染色体,形成新的种群;According to the set mutation probability, select each chromosome in the current population, randomly select the binary numbers of several points to flip, obtain the mutated chromosome, and form a new population;

计算所述新的种群中每个染色体的适应度,并保存适应度最高的染色体,将适应度大于历史最优适应度的染色体作为最优染色体;如此循环,当进化次数达到规定次数时,算法结束,输出最优染色体,解码后得到最优权重;Calculate the fitness of each chromosome in the new population, save the chromosome with the highest fitness, and take the chromosome with a fitness greater than the historical optimal fitness as the optimal chromosome; repeat this cycle, and when the number of evolutions reaches the specified number, the algorithm ends, outputs the optimal chromosome, and obtains the optimal weight after decoding;

将所述最优权重应用到至少两个ViT变体模型上,并重新按照最多数投票原则对所述测试样本的候选分类结果进行投票,根据投票结果输出所述测试样本的最终分类结果。The optimal weight is applied to at least two ViT variant models, and the candidate classification results of the test sample are voted again according to the majority voting principle, and the final classification result of the test sample is output according to the voting result.

本申请实施例采取的另一技术方案为:一种高光谱图像分类装置,包括:Another technical solution adopted by the embodiment of the present application is: a hyperspectral image classification device, comprising:

数据获取模块:用于获取高光谱图像的训练集、验证集和测试集;Data acquisition module: used to obtain training sets, validation sets and test sets of hyperspectral images;

数据转换模块:用于利用空间打散算法将所述训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;Data conversion module: used for converting the training samples in the training set from three-dimensional images to two-dimensional images by using a spatial scattering algorithm to generate a new training set;

模型训练模块:用于基于所述新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将所述测试集输入多ViT变体模型进行测试,得到所述测试集中每个测试样本在各个ViT变体模型下的分类结果;其中,所述多ViT变体模型包括至少两个ViT变体模型;Model training module: used to train the multi-ViT variant model based on the new training set, and after each epoch, input the test set into the multi-ViT variant model for testing, and obtain the classification result of each test sample in the test set under each ViT variant model; wherein the multi-ViT variant model includes at least two ViT variant models;

分类投票模块:用于针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;Classification voting module: for each ViT variant model, performing majority voting on all classification results of each test sample in the test set, and taking the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set;

分类优化模块:用于通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,将投票数量最多的分类结果作为所述测试样本的最终分类结果。Classification optimization module: used to optimize the optimal weight of the candidate classification results of each ViT variant model through a genetic algorithm, and re-vote the candidate classification results of the test sample by the majority, and take the classification result with the largest number of votes as the final classification result of the test sample.

本申请实施例采取的技术方案还包括:所述数据转换模块利用空间打散算法将所述训练集中的训练样本由三维图像转换为二维图像,具体为:The technical solution adopted by the embodiment of the present application also includes: the data conversion module uses a spatial scattering algorithm to convert the training samples in the training set from three-dimensional images to two-dimensional images, specifically:

获取所述训练集中每个训练样本的N*N邻域,得到N*N*B的三维数据立方体,其中B为波段数;Obtain an N*N neighborhood of each training sample in the training set to obtain an N*N*B three-dimensional data cube, where B is the number of bands;

固定中心像素的位置不变,随机对邻域内的其它训练样本进行随机打散,每随机打散一次,将所述N*N*B的三维数据立方体转换成高为N*N、宽为B的二维图像;The position of the central pixel is fixed unchanged, and other training samples in the neighborhood are randomly scattered. Each time the random scattering is performed, the N*N*B three-dimensional data cube is converted into a two-dimensional image with a height of N*N and a width of B;

对各个类别下的每个训练样本分别执行随机打散操作K/M次,其中K表示最终希望得到的每个类别的样本总数,M为各个类别下的初始训练样本数量,最终分别得到每个类别下的K个二维图像;Perform random scattering operations K/M times for each training sample in each category, where K represents the total number of samples of each category that are ultimately desired, and M represents the number of initial training samples in each category, and finally obtain K two-dimensional images in each category;

对所有类别执行完随机打散操作后,得到由C*K个二维图像构成的新的训练集,其中C为所述新的训练集中的训练样本类别总数。After performing the random scattering operation on all categories, a new training set consisting of C*K two-dimensional images is obtained, where C is the total number of training sample categories in the new training set.

本申请实施例采取的技术方案还包括:所述分类投票模块针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果,具体为:The technical solution adopted by the embodiment of the present application also includes: the classification voting module performs a majority vote on all classification results of each test sample in the test set for each ViT variant model, and uses the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set, specifically:

对于所述测试集中的每一个测试样本,每个ViT变体模型在每个epoch的训练后都会生成一个分类结果,选取各测试样本在所有epoch中在每个ViT变体模型下的分类结果,对每个测试样本进行最多数投票,将投票数最多的类别作为各测试样本的候选分类结果,生成最多数投票后的预测分类图。For each test sample in the test set, each ViT variant model will generate a classification result after training in each epoch, select the classification results of each test sample under each ViT variant model in all epochs, perform a majority vote on each test sample, take the category with the most votes as the candidate classification result of each test sample, and generate a predicted classification graph after the majority vote.

本申请实施例采取的又一技术方案为:一种计算机设备,所述计算机设备包括处理器、与所述处理器耦接的存储器,其中,Another technical solution adopted by the embodiment of the present application is: a computer device, the computer device includes a processor and a memory coupled to the processor, wherein:

所述存储器存储有用于实现所述高光谱图像分类方法的程序指令;The memory stores program instructions for implementing the hyperspectral image classification method;

所述处理器用于执行所述存储器存储的所述程序指令以控制高光谱图像分类方法。The processor is used to execute the program instructions stored in the memory to control a hyperspectral image classification method.

本申请实施例采取的又一技术方案为:一种存储介质,存储有处理器可运行的程序指令,所述程序指令用于执行所述高光谱图像分类方法。Another technical solution adopted by the embodiment of the present application is: a storage medium storing program instructions executable by a processor, wherein the program instructions are used to execute the hyperspectral image classification method.

相对于现有技术,本申请实施例产生的有益效果在于:本申请实施例的高光谱图像分类方法、装置、计算机设备以及存储介质采用空间打散算法使用小的空间邻域窗口对训练样本进行预处理,可以减小训练样本与测试样本不独立的情况,同时增大样本量,能够极大的缓解深度学习中样本数量不足导致的过拟合问题,能够提取更多多样性的空间特征,有利于提高分类精度。然后利用预处理后的训练集进行多ViT变体模型训练,并采用遗传算法对多ViT变体模型进行优化,相对于单个分类模型,能够稳定地获得更好的分类效果。Compared with the prior art, the beneficial effects of the embodiments of the present application are as follows: the hyperspectral image classification method, device, computer equipment and storage medium of the embodiments of the present application adopt a spatial scattering algorithm to preprocess the training samples using a small spatial neighborhood window, which can reduce the situation where the training samples and the test samples are not independent, and at the same time increase the sample size, which can greatly alleviate the overfitting problem caused by insufficient sample number in deep learning, and can extract more diverse spatial features, which is conducive to improving classification accuracy. Then, the preprocessed training set is used to train the multi-ViT variant model, and the genetic algorithm is used to optimize the multi-ViT variant model, which can stably obtain better classification results than a single classification model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例的高光谱图像分类方法的流程图;FIG1 is a flow chart of a hyperspectral image classification method according to an embodiment of the present application;

图2为本申请实施例中基于遗传算法的集成策略流程图;FIG2 is a flow chart of an integrated strategy based on a genetic algorithm in an embodiment of the present application;

图3为实验一的分类结果示意图;Figure 3 is a schematic diagram of the classification results of Experiment 1;

图4为实验二的分类结果示意图;Figure 4 is a schematic diagram of the classification results of Experiment 2;

图5为本申请实施例的高光谱图像分类装置结构示意图;FIG5 is a schematic diagram of the structure of a hyperspectral image classification device according to an embodiment of the present application;

图6为本申请实施例的计算机设备结构示意图;FIG6 is a schematic diagram of the structure of a computer device according to an embodiment of the present application;

图7为本申请实施例的存储介质的结构示意图。FIG. 7 is a schematic diagram of the structure of a storage medium according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

本申请中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或计算机设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或计算机设备固有的其它步骤或单元。The terms "first", "second", "third" in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined as "first", "second", "third" can expressly or implicitly include at least one of the features. In the description of this application, the meaning of "multiple" is at least two, such as two, three, etc., unless otherwise clearly and specifically defined. In the embodiments of this application, all directional indications (such as up, down, left, right, front, back...) are only used to explain the relative position relationship, movement, etc. between the components under a certain specific posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication also changes accordingly. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions. For example, a process, method, system, product or computer device that includes a series of steps or units is not limited to the steps or units listed, but optionally also includes steps or units that are not listed, or optionally also includes other steps or units inherent to these processes, methods, products or computer devices.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

请参阅图1,是本申请实施例的高光谱图像分类方法的流程图。本申请实施例的高光谱图像分类方法包括以下步骤:Please refer to FIG1 , which is a flow chart of a hyperspectral image classification method according to an embodiment of the present application. The hyperspectral image classification method according to an embodiment of the present application comprises the following steps:

S100:获取高光谱图像,并按照设定比例对高光谱图像进行样本划分,分别得到训练集、验证集和测试集;S100: acquiring a hyperspectral image, and dividing the hyperspectral image into samples according to a set ratio to obtain a training set, a validation set, and a test set;

本步骤中,高光谱图像的样本划分方式具体为:基于高光谱图像对应的地面真值图,按照第一设定比例从每个类别中分别选择训练样本的像素,剩余像素作为测试样本;然后按照第二设定比例从训练样本中选择一定数量的像素作为验证样本,其余像素作为训练样本,由此得到训练集、验证集和测试集。其中,所述第一设定比例和第一设定比例包括但不限于5%、10%等,具体可根据实际应用场景进行设定。In this step, the sample division method of the hyperspectral image is specifically as follows: based on the ground truth map corresponding to the hyperspectral image, pixels of the training sample are selected from each category according to the first set ratio, and the remaining pixels are used as test samples; then a certain number of pixels are selected from the training samples according to the second set ratio as verification samples, and the remaining pixels are used as training samples, thereby obtaining a training set, a verification set, and a test set. The first set ratio and the second set ratio include but are not limited to 5%, 10%, etc., and can be set specifically according to the actual application scenario.

S110:利用空间打散算法对训练集中的训练样本进行预处理,将训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;S110: preprocessing the training samples in the training set by using a spatial scattering algorithm, converting the training samples in the training set from three-dimensional images to two-dimensional images, and generating a new training set;

本步骤中,空谱特征融合是高光谱影像分类的研究热点,通过对当前像素周围N*N邻域的像素信息进行处理,提取邻域的空间特征,与光谱特征一起,即可实现高精度的像素分类。在高光谱图像分类研究中,会选择一定的标记样本像素,并获取标记样本像素在一定邻域窗口内的像素。该操作过程会面临两个问题,即选择多少样本像素以及选择多大的邻域窗口。具体的,选择的样本像素越多,准确率会提升,但是实用性会大打折扣,因为实际应用中经常难以获取到大量的样本像素。选择的样本像素越少,则分类难度会越大,也会造成深度学习模型出现过拟合的问题。邻域窗口越大,精度会越高,但是会存在训练样本与测试样本不独立的情况。选择尽可能小的邻域窗口能够在一定程度上消除该情况的产生,但由于空间特征减少,分类精度会降低。In this step, the fusion of spatial and spectral features is a hot topic in the research of hyperspectral image classification. By processing the pixel information of the N*N neighborhood around the current pixel, the spatial features of the neighborhood are extracted, and together with the spectral features, high-precision pixel classification can be achieved. In the study of hyperspectral image classification, certain labeled sample pixels will be selected, and the pixels of the labeled sample pixels in a certain neighborhood window will be obtained. This operation process will face two problems, namely how many sample pixels to select and how large the neighborhood window is. Specifically, the more sample pixels are selected, the higher the accuracy will be, but the practicality will be greatly reduced, because it is often difficult to obtain a large number of sample pixels in practical applications. The fewer sample pixels are selected, the greater the difficulty of classification will be, and it will also cause the problem of overfitting in the deep learning model. The larger the neighborhood window, the higher the accuracy will be, but there will be a situation where the training sample and the test sample are not independent. Selecting the smallest possible neighborhood window can eliminate the occurrence of this situation to a certain extent, but due to the reduction of spatial features, the classification accuracy will be reduced.

针对上述问题,本申请实施例利用空间打散算法对训练集中的训练样本进行预处理,在获取每个训练样本的邻域后,通过随机打散的策略,打乱除中心像素之外的其余像素的位置,每随机打散一次,就能获得一个新的邻域样本,用于模拟真实世界中可能存在的潜在像素分布模式。例如对于5*5的邻域,即可得到24!=6.2e+23个新的邻域样本,能够极大的缓解深度学习中样本数量不足导致的过拟合问题,同时能够提取更多多样性的空间特征,有利于提高分类精度。In view of the above problems, the embodiment of the present application uses a spatial scattering algorithm to pre-process the training samples in the training set. After obtaining the neighborhood of each training sample, the positions of the remaining pixels except the central pixel are shuffled by a random scattering strategy. Each random scattering can obtain a new neighborhood sample to simulate the potential pixel distribution pattern that may exist in the real world. For example, for a 5*5 neighborhood, 24! = 6.2e+23 new neighborhood samples can be obtained, which can greatly alleviate the overfitting problem caused by insufficient sample quantity in deep learning, and can extract more diverse spatial features, which is conducive to improving classification accuracy.

进一步地,本申请实施例的空间打散算法的预处理过程包括:首先获取训练集中每个训练样本的N*N邻域,得到N*N*B的三维数据立方体,其中B为波段数;然后,固定中心像素的位置不变,随机对邻域内的其它训练样本进行随机打散,每随机打散一次,即将N*N*B的三维数据立方体转换成高为N*N、宽为B的二维图像。对各个类别下的每个训练样本分别执行上述随机打散操作K/M次,其中K表示最终希望得到的每个类别的样本总数,本申请实施例中设定K=100000,具体数值可根据实际应用场景进行设定,M为该类别下的训练样本数量,最终分别得到每个类别下的K个二维图像。对所有类别执行完随机打散操作后,即可得到由C*K个二维图像构成的新的训练集,其中C为新的训练集中的训练样本类别总数。Furthermore, the preprocessing process of the spatial scattering algorithm of the embodiment of the present application includes: first, the N*N neighborhood of each training sample in the training set is obtained to obtain a three-dimensional data cube of N*N*B, where B is the number of bands; then, the position of the fixed center pixel remains unchanged, and the other training samples in the neighborhood are randomly scattered. Each random scattering is performed once, and the three-dimensional data cube of N*N*B is converted into a two-dimensional image with a height of N*N and a width of B. The above random scattering operation is performed K/M times for each training sample in each category, where K represents the total number of samples of each category that is ultimately desired to be obtained. In the embodiment of the present application, K=100000 is set, and the specific value can be set according to the actual application scenario. M is the number of training samples in the category, and finally K two-dimensional images in each category are obtained. After the random scattering operation is performed on all categories, a new training set consisting of C*K two-dimensional images can be obtained, where C is the total number of training sample categories in the new training set.

S120:基于新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将测试集输入多ViT变体模型进行测试,得到测试集中每个测试样本在每个ViT变体模型下的分类结果;S120: training the multiple ViT variant models based on the new training set, and after each epoch, inputting the test set into the multiple ViT variant models for testing, and obtaining the classification results of each test sample in the test set under each ViT variant model;

本步骤中,所述多ViT变体模型包括至少两个ViT变体模型,在本申请实施例中,以包括7种ViT变体模型为例进行具体说明。具体的,所述7种ViT变体模型分别为VisionTransformer、SimpleViT、CaiT、deepvit、Vit with Patch Merger、Learnable memory和Adaptive token sampling,各个模型具体为:In this step, the multiple ViT variant models include at least two ViT variant models. In the embodiment of the present application, 7 ViT variant models are taken as an example for specific description. Specifically, the 7 ViT variant models are VisionTransformer, SimpleViT, CaiT, deepvit, Vit with Patch Merger, Learnable memory and Adaptive token sampling, and each model is specifically:

对于Vision Transformer模型,将输入图片划分为多个设定大小(例如16x16,具体可根据实际应用场景进行设定)的patch,再将每个patch输入到embedding层,通过该层后,得到每个patch对应的向量(token),然后在所有向量之前加入一个用于分类的向量,该向量的维度与其他向量一致,并在向量中加入位置信息;最后将所有token输入Transformer Encoder中,并将TransFormer Encoder重复堆叠L次,将用于分类的token的输出输入MLP Head,得到最终的分类结果。For the Vision Transformer model, the input image is divided into multiple patches of a set size (for example, 16x16, which can be set according to the actual application scenario), and each patch is input into the embedding layer. After passing through this layer, the vector (token) corresponding to each patch is obtained, and then a vector for classification is added before all vectors. The dimension of this vector is consistent with other vectors, and position information is added to the vector; finally, all tokens are input into the Transformer Encoder, and the Transformer Encoder is repeatedly stacked L times, and the output of the token used for classification is input into the MLP Head to obtain the final classification result.

对于SimpleViT模型,该模型的批量大小为1024,使用全局平均池化GAP/GMP(noclass token)、固定的sin-cos位置嵌入以及Randaugment和Mixup数据增强,该baseline还可以进一步架构优化,例如dropout、随机深度等正则化器、SAM优化、Cutmix数据增强、高分辨率微调以及知识蒸馏等,得到强大teacher监督。For the SimpleViT model, the model has a batch size of 1024, uses global average pooling GAP/GMP (noclass token), fixed sin-cos position embedding, and Randaugment and Mixup data enhancement. The baseline can also be further optimized in architecture, such as dropout, random depth and other regularizers, SAM optimization, Cutmix data enhancement, high-resolution fine-tuning, and knowledge distillation, to obtain strong teacher supervision.

对于CaiT模型:为了解决深度问题,CaiT提出以下两个改进:For the CaiT model: In order to solve the depth problem, CaiT proposes the following two improvements:

(1)在分析不同初始化、优化和架构之间的相互作用之后,利用LayerScale改进更深层架构的训练,形式上,在每个残差块的分支上添加一个可学习的对角矩阵,初始化接近0,但不是0。在每个残差块之后添加这个简单层可以改善动态训练,使得能够训练更深的大容量图像Transformer,从而受益于深度。LayerScale显著促进了收敛并提高了更大深度的图像Transformer的准确性,它在训练时向网络添加了数千个参数(权重总数可以忽略不计)。(1) After analyzing the interactions between different initializations, optimizations, and architectures, we use LayerScale to improve the training of deeper architectures. Formally, we add a learnable diagonal matrix to the branches of each residual block, initialized close to 0, but not 0. Adding this simple layer after each residual block improves training dynamics, enabling the training of deeper, larger-capacity image Transformers that benefit from depth. LayerScale significantly promotes convergence and improves the accuracy of deeper image Transformers, which add thousands of parameters to the network when training (the total number of weights is negligible).

(2)类注意力层class-attention layers,类似于编码器/解码器架构,其中明确地将涉及块之间的Transformer layer与类注意力层分开,它们致力于将处理后的patch的内容提取到单个向量中,以便可以将其馈送到线性分类器,这种明确的分离避免了在处理类嵌入时指导注意力过程的矛盾目标。(2) Class-attention layers, similar to the encoder/decoder architecture, where the Transformer layers involved in between blocks are explicitly separated from the class-attention layers. They are dedicated to extracting the content of the processed patch into a single vector so that it can be fed to a linear classifier. This explicit separation avoids the conflicting goals of guiding the attention process when processing class embeddings.

对于deepvit模型:随着Transformers变深,注意力图在某些层之后逐渐变得相似甚至几乎相同,即在深度ViT模型的顶层中,特征图趋于相同。这一事实表明,在更深层次的ViT中,自注意力机制无法学习有效的表示学习概念,并且阻碍了模型获得预期的性能提升。基于上述,deepvit提出了一种简单而有效的方法,即为Re-attention,以可忽略的计算和存储成本重新生成注意图以增加其在不同层的多样性,使得通过对现有ViT模型进行较小的修改来训练具有持续性能改进的更深的ViT模型变得可行。re-attention的来源是观测了同一个block的不同head的attention map之间的相似度即使在高层来说也是比较低的,因此本申请实施例采用一个可学习的变换矩阵和multi-head attention maps相乘得到新的map,通过Re-attention Layer增加不同层之间的attn的diversity,改进之后的ViT-32可以在ImageNet-1K上取得1.6%improvements。For the deepvit model: As Transformers become deeper, the attention maps gradually become similar or even almost the same after certain layers, that is, in the top layer of the deep ViT model, the feature maps tend to be the same. This fact shows that in deeper ViT, the self-attention mechanism cannot learn effective representation learning concepts and hinders the model from achieving the expected performance improvement. Based on the above, deepvit proposes a simple and effective method, namely, Re-attention, which regenerates the attention map at negligible computational and storage costs to increase its diversity at different layers, making it feasible to train deeper ViT models with continuous performance improvements by making minor modifications to the existing ViT model. The source of re-attention is the observation that the similarity between the attention maps of different heads of the same block is relatively low even at high levels. Therefore, the embodiment of the present application adopts a learnable transformation matrix and multi-head attention maps to multiply to obtain a new map, and increases the diversity of attn between different layers through the Re-attention Layer. The improved ViT-32 can achieve 1.6% improvements on ImageNet-1K.

对于Vit with Patch Merger模型,为了使大规模模型在实际系统中保持实用,有必要减少它们的计算开销。Patch Merger是一个简单的模块,通过在两个连续的中间层之间合并补丁或标记来减少网络处理的数量。本申请实施例展示了Patch Merger在不同模型规模下实现了显著的加速,并在微调后与原始性能在上游和下游方面相匹配。For the Vit with Patch Merger model, in order to keep large-scale models practical in real systems, it is necessary to reduce their computational overhead. Patch Merger is a simple module that reduces the amount of network processing by merging patches or markers between two consecutive intermediate layers. The present application example shows that Patch Merger achieves significant speedups at different model sizes and matches the original performance in both upstream and downstream after fine-tuning.

对于Learnable memory ViT,本申请实施例提出用可学习记忆token来增强视觉Transformer模型。该方法允许模型自适应新任务,用很少的参数,同时可以选择保留它在之前学习的任务上的能力。在每一层,引入一组可学习的嵌入向量,提供对特定数据集有用的上下文信息,称为"记忆token"。与传统的仅针对头的微调相比,每层只用少量的token来增强模型的准确性,且其表现仅略低于昂贵的完全微调。本申请实施例还提出一种能够扩展到新的下游任务的注意力掩码方法,并进行计算重用,除了参数效率高之外,模型还能以较小的增量成本执行新旧任务,作为单一推理的一部分。For Learnable memory ViT, an embodiment of the present application proposes to enhance the visual Transformer model with learnable memory tokens. This method allows the model to adapt to new tasks with very few parameters, while optionally retaining its capabilities on previously learned tasks. At each layer, a set of learnable embedding vectors, called "memory tokens", are introduced to provide contextual information that is useful for a specific data set. Compared with traditional fine-tuning of the head only, only a small number of tokens are used per layer to enhance the accuracy of the model, and its performance is only slightly lower than the expensive full fine-tuning. An embodiment of the present application also proposes an attention mask method that can be extended to new downstream tasks and perform computational reuse. In addition to its high parameter efficiency, the model can also perform new and old tasks as part of a single reasoning at a small incremental cost.

对于Adaptive token sampling模型,传统的vision transformer的计算成本高、参数量大,模型不适合部署到边缘设备,虽然可以通过减少网络中的token数量来实现GFLOPs数量的减少,但是没有办法针对不同的输入图片设置最佳的tokens。在分类过程中,不是所有图像信息都是必要的,图像中的部分像素是多余的或者不相干的,是否相关的判断取决于图片本身,Adaptive token sampling模型提出了基于自注意矩阵的AST模块,对token进行打分,以最小的信息损失去掉输入中的冗余信息,解决了DynamicViT引入额外开销的限制,并且不需要预训练就可以达到降低视觉Transformer计算成本,减少模型参数的效果。模型准确性与输入patch数量相关,传统CNN使用池化操作,导致网络的空间分辨率逐渐下降,会导致模型的准确度下降,静态采样会导致忽略重要信息或者信息冗余。Adaptivetoken sampling模型被整合到视觉Transformer块的自注意层中,首先利用自注意层中分类token的自注意权重来计算,然后对分数使用逆变换来选择一个tokens的子集,最后对输出tokens软降采样,以最小的信息损失从输出标记中去除冗余信息。For the Adaptive token sampling model, the traditional vision transformer has high computational cost and large number of parameters, and the model is not suitable for deployment on edge devices. Although the number of GFLOPs can be reduced by reducing the number of tokens in the network, there is no way to set the best tokens for different input images. In the classification process, not all image information is necessary. Some pixels in the image are redundant or irrelevant. Whether it is relevant depends on the image itself. The Adaptive token sampling model proposes an AST module based on the self-attention matrix to score tokens and remove redundant information in the input with minimal information loss. It solves the limitation of DynamicViT introducing additional overhead, and does not require pre-training to achieve the effect of reducing the computational cost of the visual transformer and reducing model parameters. The model accuracy is related to the number of input patches. The traditional CNN uses pooling operations, which causes the spatial resolution of the network to gradually decrease, resulting in a decrease in the accuracy of the model. Static sampling can lead to the neglect of important information or information redundancy. The Adaptive Token Sampling model is integrated into the self-attention layer of the visual Transformer block. It first uses the self-attention weights of the classification tokens in the self-attention layer to calculate, then uses the inverse transform on the scores to select a subset of tokens, and finally soft-downsamples the output tokens to remove redundant information from the output tags with minimal information loss.

本申请实施例利用预处理后的训练集进行多Vit变体模型训练,相较于单个分类模型,能够稳定地获得更好的分类效果。并在每个epoch训练完后,利用训练后的各个模型分别对测试样本进行预测分类,得到预测分类图。其中,所述多Vit变体模型仅以上述7种模型为例,具体也可采用其他类型的ViT模型或深度学习模型作为替代。The embodiment of the present application uses the preprocessed training set to train the multi-Vit variant model, which can stably obtain better classification results than a single classification model. After each epoch of training, the trained models are used to predict and classify the test samples to obtain a prediction classification map. Among them, the multi-Vit variant model only takes the above 7 models as examples, and other types of ViT models or deep learning models can also be used as substitutes.

S130:针对每个ViT变体模型,对测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;S130: For each ViT variant model, perform majority voting on all classification results of each test sample in the test set, and use the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set;

本步骤中,对于测试集中的每一个测试样本,每个ViT变体模型都会生成一个独立的分类结果,选取各测试样本在所有epoch中的分类结果,统计投票数量最多的分类结果作为该测试样本的候选分类结果。经过若干个epoch后,生成预测分类图,对每个测试样本进行最多数投票,将投票数最多的类别作为该测试样本的候选分类结果,生成最多数投票后的预测分类图,从而完成对所有测试样本的预测。In this step, for each test sample in the test set, each ViT variant model will generate an independent classification result, select the classification results of each test sample in all epochs, and count the classification results with the largest number of votes as the candidate classification results of the test sample. After several epochs, generate a predicted classification map, perform the majority vote on each test sample, and take the category with the largest number of votes as the candidate classification result of the test sample, generate a predicted classification map after the majority vote, and complete the prediction of all test samples.

S140:通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新采用最多数投票对测试样本的候选分类结果进行投票,将投票数量最多的分类结果作为测试样本的最终分类结果;S140: optimizing the optimal weight of the candidate classification results of each ViT variant model by a genetic algorithm, and re-voting the candidate classification results of the test sample by majority voting, and taking the classification result with the largest number of votes as the final classification result of the test sample;

本步骤中,如图2所示,为本申请实施例中基于遗传算法的集成策略流程图,其具体步骤包括:In this step, as shown in FIG2 , it is a flow chart of the integrated strategy based on the genetic algorithm in the embodiment of the present application, and its specific steps include:

第一步:将7个模型经过各自最多数投票后得到的预测分类图作为输入数据;Step 1: Use the predicted classification graphs obtained by the 7 models after the majority vote as input data;

第二步:创建初始种群;7个ViT变体模型对应7个权重参数,每个权重用6位二进制表示,权重的范围为0-63,因此每个染色体由42位二进制组成,每6位为一个段,对应于7个权重参数;先随机生成设定数量的(例如100个)染色体,并随机生成每个染色体的二进制编码,构成初始种群。Step 2: Create an initial population; 7 ViT variant models correspond to 7 weight parameters, each weight is represented by 6 bits of binary, and the weight range is 0-63, so each chromosome consists of 42 bits of binary, each 6 bits is a segment, corresponding to 7 weight parameters; first randomly generate a set number of chromosomes (for example, 100), and randomly generate a binary code for each chromosome to form an initial population.

第三步:适应度计算;对于当前种群中的每一个染色体,首先进行解码,按6位一个段,解码成7个十进制的权重数,分别对应于7个ViT变体模型的权重;然后针对验证集里的每个验证样本,取其在7个ViT变体模型的预测分类图上的候选分类结果,按照每个模型的权重进行叠加,例如第一和第二个模型的权重分别为5和15,则这两个模型在该验证样本上的分类结果重复5次和15次,当所有模型的权重都叠加完成后,按照最多数投票策略,以投票最多次数的候选分类结果作为该验证样本的分类结果。对所有验证集进行最多数投票完成后,计算分类精度,作为适应度函数,每个染色体均能得到一个适应度值。其中,所述分类精度包括但不限于Kappa系数、总体精度、平均精度、CSI等。Step 3: Fitness calculation; for each chromosome in the current population, first decode it into 7 decimal weight numbers in 6-bit segments, corresponding to the weights of the 7 ViT variant models; then for each verification sample in the verification set, take its candidate classification results on the prediction classification graph of the 7 ViT variant models, and superimpose them according to the weight of each model. For example, the weights of the first and second models are 5 and 15 respectively, then the classification results of these two models on the verification sample are repeated 5 times and 15 times. When the weights of all models are superimposed, according to the majority voting strategy, the candidate classification result with the most votes is used as the classification result of the verification sample. After the majority voting is completed for all verification sets, the classification accuracy is calculated as the fitness function, and each chromosome can obtain a fitness value. Among them, the classification accuracy includes but is not limited to the Kappa coefficient, overall accuracy, average accuracy, CSI, etc.

第四步:对当前种群中的所有染色体计算适应度值,选择P个适应度值最大的染色体进行下一步操作,以保证父代的优良特性被保存下来。Step 4: Calculate the fitness values of all chromosomes in the current population, and select P chromosomes with the largest fitness values for the next step to ensure that the excellent characteristics of the parent generation are preserved.

第五步:对于选择的P个染色体,随机选择两个染色体进行交叉操作,交叉时,先随机选择交叉点位,然后将两个染色体在交叉点位前后的染色体段进行互换,形成两个新的染色体。如此迭代操作,形成满足种群数量的新的染色体。Step 5: For the selected P chromosomes, randomly select two chromosomes for crossover operation. When crossing, first randomly select the crossover point, and then exchange the chromosome segments before and after the crossover point of the two chromosomes to form two new chromosomes. Repeat this iterative operation to form new chromosomes that meet the population size.

第六步:按照设定的变异概率,选择当前种群中的每个染色体,随机选择若干个点位的二进制数进行翻转,由0变成1或者由1变成0,从而得到变异后的染色体,待所有染色体完成变异后,形成新的种群。Step 6: According to the set mutation probability, select each chromosome in the current population, randomly select several binary numbers at different points to flip from 0 to 1 or from 1 to 0, so as to obtain the mutated chromosome. After all chromosomes have completed the mutation, a new population is formed.

第七步:再次计算新的种群中每个染色体的适应度,并保存适应度最高的染色体,将适应度大于历史最优适应度的染色体作为最优染色体;如此循环,当进化次数达到规定次数时,算法结束,输出最优染色体,解码后即可得到最优权重。Step 7: Calculate the fitness of each chromosome in the new population again, save the chromosome with the highest fitness, and take the chromosome with a fitness greater than the historical optimal fitness as the optimal chromosome; repeat this cycle. When the number of evolutions reaches the specified number, the algorithm ends and outputs the optimal chromosome. After decoding, the optimal weight can be obtained.

第八步:将最优权重应用到7个ViT变体模型上,再次按照最多数投票原则对测试样本的候选分类结果进行投票,根据投票结果输出测试样本的最终分类结果。Step 8: Apply the optimal weights to the 7 ViT variant models, vote on the candidate classification results of the test samples again according to the majority voting principle, and output the final classification results of the test samples based on the voting results.

基于上述,本申请第二实施例的高光谱图像分类方法首先采用空间打散算法使用小的空间邻域窗口对训练样本进行预处理,可以尽量减小训练样本与测试样本重叠的情况,同时增大样本量,能够极大的缓解深度学习中样本数量不足导致的过拟合问题,能够提取更多多样性的空间特征,有利于提高分类精度。然后利用预处理后的训练集进行多ViT变体模型训练,并采用遗传算法对多ViT变体模型进行优化,能够稳定地获得更好的分类效果。Based on the above, the hyperspectral image classification method of the second embodiment of the present application first uses a spatial scattering algorithm to preprocess the training samples using a small spatial neighborhood window, which can minimize the overlap between the training samples and the test samples, while increasing the sample size, which can greatly alleviate the overfitting problem caused by insufficient sample size in deep learning, and can extract more diverse spatial features, which is conducive to improving classification accuracy. Then, the preprocessed training set is used to train the multi-ViT variant model, and the genetic algorithm is used to optimize the multi-ViT variant model, which can stably obtain better classification results.

为了验证本申请实施例的可行性和有效性,以下实施例中,在公开获取的SalinasValley(SV),University of Pavia(UP)两个高光谱数据集上,采用multinomiallogisticregression(MLR)、support vectormachines(SVM)、extreme learning machines(ELM)和random forests(RF)四个传统机器学习算法以及CNN2D和PPF两个卷积神经网络模型,利用ViT、SimpleViT、CaiT、deepViT、ViTwithPatchMerge、LearnableMemoryVi、Adaptivetokensampling以及SpectralFormer的pixel-wise和patch-wise七个基于ViT的算法进行实验。本申请实施例的方法命名为GAEns。实验所采用的系统平台为Ubuntu16.04.1 LTS,深度学习库为Pytorch 1.0,python版本为3.6。GPU硬件为NVIDIATITAN XP,有12GB的显存。以下是各个算法的参数设置:In order to verify the feasibility and effectiveness of the embodiments of the present application, in the following embodiments, on the publicly available Salinas Valley (SV) and University of Pavia (UP) two hyperspectral data sets, four traditional machine learning algorithms, multinomial logistic regression (MLR), support vector machines (SVM), extreme learning machines (ELM) and random forests (RF), and two convolutional neural network models, CNN2D and PPF, were used. ViT, SimpleViT, CaiT, deepViT, ViTwithPatchMerge, LearnableMemoryVi, AdaptiveTokenSampling, and SpectralFormer's pixel-wise and patch-wise seven ViT-based algorithms were used for experiments. The method of the embodiment of the present application is named GAEns. The system platform used in the experiment is Ubuntu16.04.1 LTS, the deep learning library is Pytorch 1.0, and the python version is 3.6. The GPU hardware is NVIDIATITAN XP with 12GB of video memory. The following are the parameter settings of each algorithm:

MLR、SVM和RF采用python下的scikit-learn机器学习库实现,ELM采用python下的scikit-elm库实现,使用默认参数。MLR, SVM, and RF are implemented using the scikit-learn machine learning library in Python, and ELM is implemented using the scikit-elm library in Python with default parameters.

CNN2D采用5×5的patchsize,其网络结构参考原始论文设计为:2个3×3的卷积层+BN层+ReLU层,然后接入全连接层以实现像素级分类。PPF也是采用5×5的patchsize,网络结构参考原始论文。CNN2D uses a 5×5 patchsize, and its network structure is designed with reference to the original paper: 2 3×3 convolutional layers + BN layer + ReLU layer, and then connected to a fully connected layer to achieve pixel-level classification. PPF also uses a 5×5 patchsize, and its network structure is designed with reference to the original paper.

七个ViT算法采用同样的骨干网络超参数,维度dim设置为512,Transformer模块的个数depth=6,多头注意力中“头”的个数heads=16,多层感知机中隐藏层的神经元个数mlp_dim=1024,Dropout几率和进行Embedding操作时Dropout几率均设置为0.1,patchsize为12,输入图像的通道数channels=1。输入为经过5×5的patchsize的spatialshuffle处理之后的25×B的图像,其中B为数据集的原始波段总数。在原始ViT论文种,推荐的patchsize为14或者16,考虑到25×B的输入图像最好要能够均匀切分成patchsize*patchsize大小的小块,所以选择patchsize为12,然后将25×B的输入图像重采样至25×96,舍弃掉最后一行,得到24×96的图像作为最终的输入图像。这样会损失一点点输入信息,但是由于实验所使用的University of Pavia数据集只有103个波段,其他几个数据集的波段数均大于103,为了统一使用同样的网络架构,也为了最大限度地使用数据集的数据,才将输入图像重采样至96个波段。这个设置可以根据实际使用场景进行调整。7个ViT变种算法有一些特殊的超参数,例如CaiT中depth of cross attention of CLStokens to patchc参数cls_depth设置为2,layer_dropout设置为0.05。ViT with PatchMerger中patch_merge_layer和patch_merge_num_tokens分别设置为6和8,ATS中max_tokens_per_depth设置为(256,128,64,32,16,8),这个参数是a tuple that denotes themaximum number of tokens that any given layer should have.If the layer hasgreater than this amount,it will undergo adaptive token sampling。SpectralFormer中,patch-wise模式中patchsize设置为7,与原始论文保持一致,同时也给出了patchsize为5的结果。The seven ViT algorithms use the same backbone network hyperparameters, with dimension dim set to 512, the number of Transformer modules depth = 6, the number of heads in the multi-head attention heads = 16, the number of neurons in the hidden layer of the multi-layer perceptron mlp_dim = 1024, the dropout probability and the dropout probability when performing the embedding operation are both set to 0.1, the patchsize is 12, and the number of channels of the input image channels = 1. The input is a 25×B image after a 5×5 patchsize spatial shuffle, where B is the total number of original bands in the dataset. In the original ViT paper, the recommended patchsize is 14 or 16. Considering that the 25×B input image should be evenly divided into small blocks of patchsize*patchsize, the patchsize is selected as 12, and then the 25×B input image is resampled to 25×96, and the last row is discarded to obtain a 24×96 image as the final input image. This will lose a little input information, but since the University of Pavia dataset used in the experiment has only 103 bands, and the number of bands of several other datasets is greater than 103, in order to use the same network architecture and maximize the use of the data in the dataset, the input image is resampled to 96 bands. This setting can be adjusted according to the actual usage scenario. The 7 ViT variant algorithms have some special hyperparameters, such as the depth of cross attention of CLStokens to patchc parameter cls_depth is set to 2 in CaiT, and layer_dropout is set to 0.05. In ViT with PatchMerger, patch_merge_layer and patch_merge_num_tokens are set to 6 and 8 respectively, and max_tokens_per_depth is set to (256,128,64,32,16,8) in ATS. This parameter is a tuple that denotes the maximum number of tokens that any given layer should have. If the layer has greater than this amount, it will undergo adaptive token sampling. In SpectralFormer, the patchsize in patch-wise mode is set to 7, which is consistent with the original paper. The results of patchsize 5 are also given.

实验一:Experiment 1:

SV数据集同样由AVIRIS传感器获取,图像大小为512×217像素,包含204个有效波段,空间分辨率为3.7米,经过标记的地物类别有16类,下表1显示了具体的16类地物的名称,以及9%的训练样本,1%的验证样本和90%的测试样本数量。The SV dataset is also acquired by the AVIRIS sensor. The image size is 512×217 pixels, contains 204 effective bands, has a spatial resolution of 3.7 meters, and has 16 labeled ground object categories. Table 1 below shows the names of the specific 16 types of ground objects, as well as 9% of the training samples, 1% of the validation samples, and 90% of the test samples.

表1:SV数据集Table 1: SV dataset

使用多种方法对该数据集进行分类,实验一的分类结果如图3所示,其中(a)为original HSI;(b)为ground truth;(c)为MLR;(d)为SVM;(e)为RF;(f)为ELM;(g)为CNN2D;(h)为PPF;(i)为SF_pixel;(j)为SF_patch5;(k)为SF_patch7;(l)为ViT;(m)为SimpleViT;(n)为CaiT;(o)为DeepViT;(p)为ViTPM;(q)为LMViT;(r)为ATSViT;(s)为GAEns。由图3可以看出,四个机器学习方法和SF_pixel在第15个类别Vinyard-untrained上出现了大量的错分,与第8类Grapes-untrained混淆严重。基于patch的SF方法和基于Spatialshuffle的ViT-based方法的主要错分的地方在于将第8类错分为第15类,不过程度比四个机器学习和SF_pixel要低很多。基于CNN的CNN2D和PPF在这两类上的表现也比机器学习的好。在其它类别上,所有方法的表现差别不太大,机器学习的方法仍然存在一定的错分。A variety of methods are used to classify the dataset. The classification results of Experiment 1 are shown in Figure 3, where (a) is original HSI; (b) is ground truth; (c) is MLR; (d) is SVM; (e) is RF; (f) is ELM; (g) is CNN2D; (h) is PPF; (i) is SF_pixel; (j) is SF_patch5; (k) is SF_patch7; (l) is ViT; (m) is SimpleViT; (n) is CaiT; (o) is DeepViT; (p) is ViTPM; (q) is LMViT; (r) is ATSViT; (s) is GAEns. As can be seen from Figure 3, the four machine learning methods and SF_pixel have a large number of misclassifications in the 15th category Vinyard-untrained, which is seriously confused with the 8th category Grapes-untrained. The main misclassification of the patch-based SF method and the spatial shuffle-based ViT-based method is that they misclassify the 8th category into the 15th category, but the degree is much lower than that of the four machine learning methods and SF_pixel. The CNN-based CNN2D and PPF also perform better than the machine learning method in these two categories. In other categories, the performance of all methods is not much different, and the machine learning method still has some misclassification.

客观评价结果如下表2和表3所示,可以看出,四个机器学习方法和SF_Pixel在第15个类别上的正确率远低于其它方法,第8类的分类准确率相比于其它类也显得相对较低,这与主观视觉评价的结果是一致的。从OA和Kappa系数来看,使用了空间打散的ViT-based方法整体优于其它方法,其次是CNN2D和PPF,而基于patch的SpectralFormer也明显优于基于像素的SpectralFormer。在所有ViT变体模型的方法中,本申请实施例的GAEns集成策略的准确率优于单个ViT的方法,说明本申请实施例的GAEns集成策略具有稳定的优势。The objective evaluation results are shown in Tables 2 and 3 below. It can be seen that the accuracy of the four machine learning methods and SF_Pixel in the 15th category is much lower than that of other methods. The classification accuracy of the 8th category is also relatively low compared to other categories, which is consistent with the results of subjective visual evaluation. From the perspective of OA and Kappa coefficients, the ViT-based method using spatial fragmentation is better than other methods as a whole, followed by CNN2D and PPF, and the patch-based SpectralFormer is also significantly better than the pixel-based SpectralFormer. Among all the methods of the ViT variant models, the accuracy of the GAEns integration strategy of the embodiment of the present application is better than that of the single ViT method, indicating that the GAEns integration strategy of the embodiment of the present application has a stable advantage.

表2:实验一客观评价结果1Table 2: Experiment 1 objective evaluation results 1

表3:实验一客观评价结果2Table 3: Experiment 1 objective evaluation results 2

实验二:Experiment 2:

UP数据集由Reflective Optics System Imaging Spectrometer(ROSIS)传感器获取,是University of Pavia校园的一部分,图像大小为610×340,有103个有效波段,空间分辨率为1.3米,光谱范围为430-860纳米,标记的地面真值有9类地物,选取9%的训练样本,1%的验证样本和90%的测试样本数量,具体地物类别和样本数量如下表4所示:The UP dataset is acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor, which is part of the University of Pavia campus. The image size is 610×340, there are 103 effective bands, the spatial resolution is 1.3 meters, the spectral range is 430-860 nanometers, and the marked ground truth has 9 types of objects. 9% of the training samples, 1% of the validation samples and 90% of the test samples are selected. The specific object categories and sample numbers are shown in Table 4 below:

表4:UP数据集Table 4: UP dataset

请参阅图4,为实验二的分类专题图,其中(a)为original HSI;(b)为groundtruth;(c)为MLR;(d)为SVM;(e)为RF;(f)为ELM;(g)为CNN2D;(h)为PPF;(i)为SF_pixel;(j)为SF_patch5;(k)为SF_patch7;(l)为ViT;(m)为SimpleViT;(n)为CaiT;(o)为DeepViT;(p)为ViTPM;(q)为LMViT;(r)为ATSViT;(s)为GAEns。由图4可以看出,分类错误的类别主要集中在第6和7类,与SV数据集类似,四个机器学习算法和SF_pixel在这两个类别上错分非常严重,其它基于CNN和ViT的方法则表现相对较好,尤其是加入空间打散算法后的ViT-based方法,比SpectralFormer-based的方法的误分率更低,表现出了相对稳定的优势。Please refer to Figure 4, which is the classified thematic map of Experiment 2, where (a) is the original HSI; (b) is the groundtruth; (c) is MLR; (d) is SVM; (e) is RF; (f) is ELM; (g) is CNN2D; (h) is PPF; (i) is SF_pixel; (j) is SF_patch5; (k) is SF_patch7; (l) is ViT; (m) is SimpleViT; (n) is CaiT; (o) is DeepViT; (p) is ViTPM; (q) is LMViT; (r) is ATSViT; and (s) is GAEns. As can be seen from Figure 4, the misclassified categories are mainly concentrated in categories 6 and 7. Similar to the SV dataset, the four machine learning algorithms and SF_pixel have very serious misclassifications in these two categories. Other CNN and ViT-based methods perform relatively well, especially the ViT-based method after adding the spatial fragmentation algorithm, which has a lower misclassification rate than the SpectralFormer-based method, showing a relatively stable advantage.

下表5和表6客观评价指标也反映了上述主观评价的结果,四个机器学习算法和SF_pixel在第6类上的准确率非常低,第7类相对好一些,而其它方法在这两类上表现更好,例如CNN2D和加入空间打散算法的ViT-based方法及其集成策略的准确率均超过了99%,本申请实施例的GAEns集成策略的准确率也高于所有单一的ViT方法,说明本申请实施例的GAEns集成策略的有效性。The objective evaluation indicators in Tables 5 and 6 below also reflect the results of the above subjective evaluation. The accuracy of the four machine learning algorithms and SF_pixel in category 6 is very low, and category 7 is relatively better, while other methods perform better in these two categories. For example, the accuracy of CNN2D and the ViT-based method with spatial fragmentation algorithm and their integration strategy are all over 99%. The accuracy of the GAEns integration strategy of the embodiment of the present application is also higher than that of all single ViT methods, indicating the effectiveness of the GAEns integration strategy of the embodiment of the present application.

表5:实验二客观评价结果1Table 5: Experiment 2 objective evaluation results 1

表6:实验二客观评价结果2Table 6: Experiment 2 Objective evaluation results 2

请参阅图5,为本申请实施例的高光谱图像分类装置结构示意图。本申请实施例的高光谱图像分类装置40包括:Please refer to FIG5 , which is a schematic diagram of the structure of a hyperspectral image classification device according to an embodiment of the present application. The hyperspectral image classification device 40 according to an embodiment of the present application comprises:

数据获取模块41:用于获取高光谱图像的训练集、验证集和测试集;Data acquisition module 41: used to acquire training set, validation set and test set of hyperspectral images;

数据转换模块42:用于利用空间打散算法对所述训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;Data conversion module 42: used for converting the training samples in the training set from three-dimensional images to two-dimensional images by using a spatial scattering algorithm to generate a new training set;

模型训练模块43:用于基于所述新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将所述测试集输入多ViT变体模型进行测试,得到所述测试集中每个测试样本在每个ViT变体模型下的分类结果;Model training module 43: used for training the multiple ViT variant models based on the new training set, and after each epoch, inputting the test set into the multiple ViT variant models for testing, and obtaining the classification result of each test sample in the test set under each ViT variant model;

分类投票模块44:用于针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;Classification voting module 44: for performing majority voting on all classification results of each test sample in the test set for each ViT variant model, and taking the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set;

分类优化模块45:用于通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,将投票数量最多的分类结果作为所述测试样本的最终分类结果。Classification optimization module 45: used to optimize the optimal weight of the candidate classification results of each ViT variant model through a genetic algorithm, and re-vote the candidate classification results of the test sample by the majority, and use the classification result with the largest number of votes as the final classification result of the test sample.

需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiment of the present application. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.

本申请实施例提供的装置可以应用在前述方法实施例中,详情参见上述方法实施例的描述,在此不再赘述。The device provided in the embodiment of the present application can be applied in the aforementioned method embodiment. For details, please refer to the description of the aforementioned method embodiment, which will not be repeated here.

请参阅图6,为本申请实施例的计算机设备结构示意图。该计算机设备50包括:Please refer to FIG6 , which is a schematic diagram of the computer device structure of an embodiment of the present application. The computer device 50 includes:

存储有可执行程序指令的存储器51;A memory 51 storing executable program instructions;

与存储器51连接的处理器52;A processor 52 connected to the memory 51;

处理器52用于调用存储器51中存储的可执行程序指令并执行以下步骤:获取高光谱图像的训练集、验证集和测试集;利用空间打散算法将训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;基于所述新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将所述测试集输入多ViT变体模型进行测试,得到所述测试集中每个测试样本在每个ViT变体模型下的分类结果;其中,所述多ViT变体模型包括至少两个ViT变体模型;针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,获取最终分类结果。The processor 52 is used to call the executable program instructions stored in the memory 51 and perform the following steps: obtain a training set, a validation set and a test set of hyperspectral images; use a spatial scattering algorithm to convert the training samples in the training set from three-dimensional images to two-dimensional images to generate a new training set; train the multi-ViT variant model based on the new training set, and after each epoch, input the test set into the multi-ViT variant model for testing to obtain the classification result of each test sample in the test set under each ViT variant model; wherein the multi-ViT variant model includes at least two ViT variant models; for each ViT variant model, perform a majority vote on all classification results of each test sample in the test set, and use the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set; optimize the optimal weight of the candidate classification result of each ViT variant model through a genetic algorithm, and re-perform a majority vote on the candidate classification results of the test sample to obtain the final classification result.

其中,处理器52还可以称为CPU(Central Processing Unit,中央处理单元)。处理器52可能是一种集成电路芯片,具有信号的处理能力。处理器52还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 52 may also be referred to as a CPU (Central Processing Unit). The processor 52 may be an integrated circuit chip having signal processing capabilities. The processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.

请参阅图7,为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现以下步骤的程序指令61:获取高光谱图像的训练集、验证集和测试集;利用空间打散算法将训练集中的训练样本由三维图像转换为二维图像,生成新的训练集;基于所述新的训练集对多ViT变体模型进行训练,并在每个epoch结束后,将所述测试集输入多ViT变体模型进行测试,得到所述测试集中每个测试样本在每个ViT变体模型下的分类结果;其中,所述多ViT变体模型包括至少两个ViT变体模型;针对每个ViT变体模型,对所述测试集中各个测试样本的所有分类结果进行最多数投票,并将投票数量最多的分类结果作为测试集中对应测试样本的候选分类结果;通过遗传算法优化每个ViT变体模型的候选分类结果的最优权重,并重新对所述测试样本的候选分类结果进行最多数投票,获取最终分类结果。其中,该程序指令61可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络计算机设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁碟或者光盘等各种可以存储程序指令的介质,或者是计算机、服务器、手机、平板等终端计算机设备。其中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。Please refer to Figure 7, which is a schematic diagram of the structure of the storage medium of the embodiment of the present application. The storage medium of the embodiment of the present application stores program instructions 61 that can implement the following steps: obtain a training set, a validation set, and a test set of hyperspectral images; use a spatial scattering algorithm to convert the training samples in the training set from three-dimensional images to two-dimensional images to generate a new training set; train the multi-ViT variant model based on the new training set, and after each epoch, input the test set into the multi-ViT variant model for testing to obtain the classification result of each test sample in the test set under each ViT variant model; wherein the multi-ViT variant model includes at least two ViT variant models; for each ViT variant model, perform a majority vote on all classification results of each test sample in the test set, and use the classification result with the largest number of votes as the candidate classification result of the corresponding test sample in the test set; optimize the optimal weight of the candidate classification result of each ViT variant model through a genetic algorithm, and re-perform a majority vote on the candidate classification result of the test sample to obtain the final classification result. Among them, the program instruction 61 can be stored in the above-mentioned storage medium in the form of a software product, including several instructions for a computer device (which can be a personal computer, a server, or a network computer device, etc.) or a processor (processor) to execute all or part of the steps of each implementation method of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program instructions, or terminal computer devices such as computers, servers, mobile phones, tablets, etc. Among them, the server can be an independent server, or it can be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the system embodiments described above are only schematic. For example, the division of units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of software functional units. The above is only an implementation method of the present application, and does not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the content of the specification and drawings of this application, or directly or indirectly used in other related technical fields, is also included in the patent protection scope of the present application.

Claims (10)

CN202311411642.6A2023-10-272023-10-27Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage mediumPendingCN117372772A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202311411642.6ACN117372772A (en)2023-10-272023-10-27Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202311411642.6ACN117372772A (en)2023-10-272023-10-27Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN117372772Atrue CN117372772A (en)2024-01-09

Family

ID=89394438

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202311411642.6APendingCN117372772A (en)2023-10-272023-10-27Hyperspectral image classification method, hyperspectral image classification device, computer equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN117372772A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119068068A (en)*2024-10-302024-12-03云南省农业科学院质量标准与检测技术研究所 Corn ear DUS trait identification method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119068068A (en)*2024-10-302024-12-03云南省农业科学院质量标准与检测技术研究所 Corn ear DUS trait identification method, device, equipment and storage medium

Similar Documents

PublicationPublication DateTitle
CN113011499B (en)Hyperspectral remote sensing image classification method based on double-attention machine system
Yang et al.Hyperspectral image classification with deep learning models
Gao et al.Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty
Zhou et al.Contextual ensemble network for semantic segmentation
CN110689086B (en) A Generative Adversarial Network-Based Semi-Supervised High Score Remote Sensing Image Scene Classification Method
CN115937655B (en)Multi-order feature interaction target detection model, construction method, device and application thereof
CN108509978B (en)Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN111723220A (en) Image retrieval method, device and storage medium based on attention mechanism and hashing
CN108764281A (en)A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN110188827A (en)A kind of scene recognition method based on convolutional neural networks and recurrence autocoder model
Liang et al.A new image classification method based on modified condensed nearest neighbor and convolutional neural networks
CN116740362B (en) An attention-based lightweight asymmetric scene semantic segmentation method and system
Shou et al.Graph information bottleneck for remote sensing segmentation
CN111898703A (en)Multi-label video classification method, model training method, device and medium
CN117635418B (en)Training method for generating countermeasure network, bidirectional image style conversion method and device
CN117351550A (en)Grid self-attention facial expression recognition method based on supervised contrast learning
Sun et al.Second-order encoding networks for semantic segmentation
CN114780767A (en) A large-scale image retrieval method and system based on deep convolutional neural network
CN117876842A (en) A method and system for detecting anomalies of industrial products based on generative adversarial networks
CN112258431A (en) Image classification model and classification method based on hybrid depthwise separable dilated convolution
CN114359972A (en) An attention-based approach for occluded pedestrian detection
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
Wang et al.Image target recognition based on improved convolutional neural network
Suresh et al.A survey of popular image and text analysis techniques
Dan et al.PF‐ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp