Multi-mode ship image classification method based on multi-sequence ManbaTechnical Field
The invention belongs to the field of data identification, and particularly relates to a ship image identification method.
Background
The classification of ship images is one of the key technologies in the field of remote sensing image processing and analysis, and is focused on automatically classifying and identifying different scenes in the ship images. With the rapid development of remote sensing technology, people can acquire a large amount of high-resolution ship image data, and the data cover various scenes such as cities, forests, farmlands and oceans. The data are classified, and the method has important significance for resource management, environment monitoring, disaster emergency and the like.
Traditional ship classification methods rely mainly on manual feature extraction, which has certain limitations in terms of accuracy and efficiency. In recent years, with the development of deep learning, convolutional Neural Networks (CNNs) exhibit strong automatic feature learning capability in a ship image classification task, and the classification precision and efficiency are remarkably improved. However, the existing method is still insufficient in processing complex ship image feature extraction, and the accuracy is low.
Disclosure of Invention
The invention provides a multi-mode ship image classification method based on multi-sequence Manba, which aims to solve the problems of insufficient extracted features and low accuracy of the existing classification method.
The technical scheme of the invention is as follows:
A multimode ship image classification method based on multi-sequence Manba inputs a natural light image and an infrared image of the same ship into a multimode classification model at the same time to obtain a classification result;
The multi-mode classification model comprises a sequence conversion module, a cross attention Manba calculation module, an alternate traversal Manba calculation module, a spectrum space state fusion module and a classification module;
the sequence conversion module is used for respectively converting the input natural light image and the input infrared image into a corresponding natural light token sequence and an infrared token sequence;
The cross attention manba calculation module obtains two groups of first enhancement features based on a natural light token sequence and an infrared token sequence;
The alternate traversal Manba calculation module obtains two groups of second enhancement features based on the two groups of first enhancement features;
the spectrum space state fusion module obtains fusion characteristics based on two groups of second enhancement characteristics;
and the classification module obtains a classification result based on the fusion characteristic.
As a further improvement of the multi-mode ship image classification method based on multi-sequence manba, the sequence conversion module is an image block embedding module;
the natural light token sequence is calculated by the following steps:;
the infrared token sequence is calculated by the following steps:;
Wherein,Representing the activation function in the sequence conversion module,Representing the convolution operation in the sequence conversion module,In order to input the natural light image,Is an input infrared image.
As a further improvement of the multi-mode ship image classification method based on the multi-sequence Manba, a natural light token sequence is setInfrared token sequence,The processing procedure of the cross attention manba calculation module is as follows:
step A-1, calculating a cross attention score:
;
;
Wherein,Representing the normalization operation,Representing the activation function in the cross-attention manba calculation module,Representing a first linear layer in a cross-attention manba calculation module;
Step A-2, pairing sequenceAndForward SSM calculation was performed:
;
;
Wherein,Representing the normalization operation,Representing a second linear layer in the cross-attention manba calculation module,Representing a positive-sequence structured state space model;
step A-3, pairing sequenceAndRespectively performing reverse order arrangement to obtain a sequenceSum sequenceThen pair the sequencesAndReverse order SSM calculation was performed:
;
;
Wherein,Representing a second linear layer in the cross-attention manba calculation module,Representing an inverse structured state space model;
step A-4, calculating two groups of first enhancement features according to the cross attention score, the positive sequence SSM calculation result and the reverse sequence SSM calculation resultAnd,:
;
。
As a further improvement of the multi-mode ship image classification method based on the multi-sequence Manba, two groups of first enhancement features output by the cross attention Manba calculation module are set as followsAnd,WhereinThe processing procedure of the alternate traversal Manba calculation module is as follows:
Step B-1, performing traversal sequencing on the two groups of first enhancement features to obtain features:
;
Step B-2, pair of characteristicsPerforming attention score calculation to obtain characteristics:
;
Wherein,Representing the normalization operation,Representing alternating traversal of the activation function in the mannba calculation module,Representing alternating traversal of a first linear layer in the mannba computing module;
step B-2, pair of characteristicsThe feature extraction is respectively carried out through the positive sequence SSM and the reverse sequence SSM to obtain the featureAnd:
;
;
Wherein,Representing alternating traversal of the second linear layer in the mannba calculation module,Representing a positive-order structured state space model,Representing an inverse structured state space model;
Step B-3, according to the characteristics、AndComputing enhanced features,:
;
Step B-4, characterizing the enhancement in a reverse manner of the traversal order in step B-1Splitting into two second enhanced featuresAnd,。
As a further improvement of the multi-mode ship image classification method based on the multi-sequence Manba, two groups of second enhancement features output by the alternate traversal Manba calculation module are set as followsThe spectrum space state fusion module comprises the following processing procedures:
;
;
;
Wherein,Representing the normalization operation,Representing the first fully-connected layer in the spectral-spatial state fusion module,Representing a second fully-connected layer in the spectral-spatial state fusion module,Representing the activation function in the spectral-spatial state fusion module,Fusion of features for probability output functions。
As a further improvement of the multi-mode ship image classification method based on the multi-sequence Manba, the fusion characteristics obtained by the spectrum space state fusion module are set as followsThe processing procedure of the classification module is as follows:
;
Wherein,Representing the fully connected layer of the classification module,As a probability output function, vectorThe element in the image input at present represents the probability that the ship belongs to each category, wherein the category corresponding to the element with the highest probability is the classification result.
As a further improvement of the multi-modal ship image classification method based on the multi-sequence manba, the training process of the multi-modal classification model is as follows:
Constructing a training set, the training set being expressed asWherein, the method comprises the steps of, wherein,For the samples of the i-th group,Representing natural light images in the i-th set of samples,Representing the infrared image in the i-th set of samples,The height of the image is indicated and,The width of the image is represented and,Representing the number of image channels, the vessels in the natural light image and the infrared image in the same sample are the same vessel,Representing the true class of the ship in the i-th set of samples, Z representing the number of samples;
and inputting the samples in the training set into a multi-mode classification model, solving a cross entropy loss function according to the classification result and the real category, and reversely optimizing the network model by using the cross entropy loss function.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, natural light images and infrared images with larger differences are used as model input, two features are initially fused through the computation of cross attention Manba, so that the two features learn the features of the other party, then the two features are further fused through the computation of cross traversal Manba, the richer image characterization is extracted, and then the two features are fused and analyzed through the fusion of spectral space states, so that the richer image characterization is obtained, and the accuracy of image classification is improved.
Drawings
FIG. 1 is a schematic diagram of a framework of a ship classification model according to the present invention;
FIG. 2 is a schematic diagram of a cross-attention Manba calculation module;
FIG. 3 is a schematic diagram of an alternate traversal Manba calculation module;
fig. 4 is a schematic diagram of a spectral-spatial state fusion module.
Detailed Description
The technical scheme of the invention is described in detail below with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention.
A multi-mode ship image classification method based on multi-sequence Manba inputs natural light images and infrared images of the same ship into a multi-mode classification model at the same time to obtain classification results.
As shown in fig. 1, the multi-modal classification model includes a sequence conversion module, a cross-attention manba calculation module, an alternate traversal manba calculation module, a spectral space state fusion module, and a classification module.
The sequence conversion module is used for respectively converting the input natural light image and the input infrared image into a corresponding natural light token sequence and an infrared token sequence.
In this embodiment, the sequence conversion module is an image block embedding module;
the natural light token sequence is calculated by the following steps:;
the infrared token sequence is calculated by the following steps:;
Wherein,Representing the activation function in the sequence conversion module,Representing the convolution operation in the sequence conversion module,In order to input the natural light image,Is an input infrared image. The natural light token sequence is expressed asThe infrared token sequence is expressed as,。
The cross-attention Manbab computation module obtains two sets of first enhancement features based on the natural light token sequence and the infrared token sequence.
As shown in fig. 2, the processing procedure of the cross-attention manba calculation module is as follows:
step A-1, calculating a cross attention score:
;
;
Wherein,Representing the normalization operation,Representing the activation function in the cross-attention manba calculation module,Representing the first linear layer in the cross-attention manba calculation module.
Step A-2, pairing sequenceAndForward SSM calculation was performed:
;
;
Wherein,Representing the normalization operation,Representing a second linear layer in the cross-attention manba calculation module,Representing a positive-order structured state space model.
Step A-3, pairing sequenceAndRespectively performing reverse order arrangement to obtain a sequenceSum sequenceThen pair the sequencesAndReverse order SSM calculation was performed:
;
;
Wherein,Representing a second linear layer in the cross-attention manba calculation module,Representing an inverse structured state space model.
Step A-4, calculating two groups of first enhancement features according to the cross attention score, the positive sequence SSM calculation result and the reverse sequence SSM calculation resultAnd,:
;
。
The two sets of first enhancement features of the output may be further expressed as:,, Wherein。
The alternating traversal Manba calculation module obtains two sets of second enhancement features based on the two sets of first enhancement features.
As shown in fig. 3, the processing procedure of the alternate traversal mannba calculation module is as follows:
Step B-1, performing traversal sequencing on the two groups of first enhancement features to obtain features:
。
Step B-2, pair of characteristicsPerforming attention score calculation to obtain characteristics:
;
Wherein,Representing the normalization operation,Representing alternating traversal of the activation function in the mannba calculation module,Representing alternating traversal of the first linear layer in the mannba calculation module.
Step B-2, pair of characteristicsThe feature extraction is respectively carried out through the positive sequence SSM and the reverse sequence SSM to obtain the featureAnd:
;
;
Wherein,Representing alternating traversal of the second linear layer in the mannba calculation module,Representing a positive-order structured state space model,Representing an inverse structured state space model.
Step B-3, according to the characteristics、AndComputing enhanced features,:
。
Step B-4, characterizing the enhancement in a reverse manner of the traversal order in step B-1Splitting into two second enhanced featuresAnd,。
The spectrum space state fusion module obtains fusion characteristics based on the two groups of second enhancement characteristics.
As shown in fig. 4, the processing procedure of the spectrum space state fusion module is as follows:
;
;
;
Wherein,Representing the normalization operation,Representing the first fully-connected layer in the spectral-spatial state fusion module,Representing a second fully-connected layer in the spectral-spatial state fusion module,Representing the activation function in the spectral-spatial state fusion module,Fusion of features for probability output functions。
The classification module obtains classification results based on the fusion features:
;
Wherein,Representing the fully connected layer of the classification module,As a probability output function, vectorThe element in the image input at present represents the probability that the ship belongs to each category, wherein the category corresponding to the element with the highest probability is the classification result.
Further, the training process of the multi-mode classification model is as follows:
Constructing a training set, the training set being expressed asWherein, the method comprises the steps of, wherein,For the samples of the i-th group,Representing natural light images in the i-th set of samples,Representing the infrared image in the i-th set of samples,The height of the image is indicated and,The width of the image is represented and,Representing the number of image channels, the vessels in the natural light image and the infrared image in the same sample are the same vessel,Representing the true class of the ship in the i-th set of samples, Z represents the number of samples.
And inputting the samples in the training set into a multi-mode classification model, solving a cross entropy loss function according to the classification result and the real category, and reversely optimizing the network model by adopting an Adam optimizer by utilizing the cross entropy loss function.
After training, natural light images and infrared images of the same ship are input into the model at the same time, and classification results are obtained.
It should be noted that it will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The scope of the invention is indicated by the appended claims rather than by the foregoing description.