Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems in the above-described technology. Therefore, an object of the present invention is to provide a domain-adaptive-based semantic segmentation method for remote sensing images, which can solve the domain difference problem in the task of cross-domain semantic segmentation of remote sensing images by constructing an average teacher framework with an auxiliary prototype classifier, effectively extract the region of interest from the target domain data, and realize the alignment of class layers between the source domain and the target domain, thereby improving the segmentation performance.
A second object of the present invention is to propose a computer readable storage medium.
A third object of the invention is to propose a computer device.
The fourth object of the invention is to provide a remote sensing image semantic segmentation device based on domain self-adaption.
In order to achieve the aim, the first embodiment of the invention provides a remote sensing image semantic segmentation method based on domain adaptation, which comprises the following steps of obtaining a remote sensing data set, constructing an average teacher framework with an auxiliary prototype classifier, training and optimizing parameters of a student model by adopting the remote sensing data set, wherein the teacher model uses an index moving average to update the parameters, the auxiliary prototype classifier uses an index moving average to update the weights, and inputting the unlabeled target domain data into a trained student model for point-by-point prediction to obtain segmentation results corresponding to the unlabeled target domain data.
According to the remote sensing image semantic segmentation method based on domain self-adaption, firstly, a remote sensing data set is obtained, wherein the remote sensing data set comprises labeled source domain data and unlabeled target domain data, then, an average teacher framework with an auxiliary prototype classifier is constructed, the average teacher framework with the auxiliary prototype classifier comprises a teacher model and a student model, then, the remote sensing data set is adopted to train and optimize parameters of the student model, the teacher model uses an index moving average to update the parameters, the auxiliary prototype classifier uses the index moving average to update the weights, finally, unlabeled target domain data is input into the trained student model to conduct point-by-point prediction to obtain segmentation results corresponding to unlabeled target domain data, therefore, the problem of domain difference in a remote sensing image cross-domain semantic segmentation task can be solved by constructing the average teacher framework with the auxiliary prototype classifier, a region of interest can be effectively extracted from the target domain data, and alignment of classes between the source domain and the target domain can be achieved, and segmentation performance is improved.
In addition, the remote sensing image semantic segmentation method based on domain self-adaption provided by the embodiment of the invention can also have the following additional technical characteristics:
optionally, constructing an average teacher framework with an auxiliary prototype classifier comprises the steps that the student model comprises a feature encoder and a parameterized classifier, deepLabV is used as a network structure of the student model, resNet-101 is used as a framework of the student model, the network structure, the framework and the student model of the teacher model are consistent, memory libraries of corresponding categories are constructed for source domain data and target domain data in a queue mode respectively, so that after features output by the corresponding feature encoder are subjected to embedding and filtering, the feature vectors of different categories are stored in the memory libraries of the corresponding categories, the memory libraries of the source domain and the target domain are spliced, and a KMeans clustering algorithm is used for clustering out the corresponding prototypes of the categories to be used as the auxiliary prototype classifier.
Optionally, the remote sensing data set is used for training and optimizing parameters of the student model, the method comprises the steps of training and parameter optimizing the student model by using labeled source domain data in a first epoch, storing different types of feature vectors of the source domain data in a source domain memory bank, carrying out parameter initialization by using parameters of the student model after the first epoch is trained, clustering a source domain memory bank by using a KMeans algorithm to obtain an initialized prototype of each class as an auxiliary prototype classifier, training a student model with an auxiliary prototype classifier by using labeled source domain data in a second epoch, carrying out prediction on the target domain data by using the teacher model to obtain a pseudo tag of the target domain, training the pseudo tag of the target domain by using the student model with the auxiliary prototype classifier, storing different types of feature vectors of the source domain data in the source domain memory bank in a training process, carrying out parameter initialization by using the parameters of the student model after the first epoch is trained, carrying out clustering by using an auxiliary prototype model in a model of the second epoch, carrying out clustering by using the same model, carrying out clustering on the model after the second epoch is trained by using the second epoch, carrying out clustering by using the model with the average model to obtain a model of the same class, and carrying out clustering algorithm after the second epoch is trained by using the model, and carrying out clustering on the model in the model is further training the same with the model, and carrying out the model is subjected to obtain a model of the model is different from the model, until training is completed.
To achieve the above objective, a second aspect of the present invention provides a computer-readable storage medium having stored thereon a domain-adaptive-based remote sensing image semantic segmentation program, which when executed by a processor, implements the domain-adaptive-based remote sensing image semantic segmentation method as described above.
To achieve the above objective, an embodiment of a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the domain-adaptive remote sensing image semantic segmentation method as described above when executing the program.
In order to achieve the above objective, a domain-adaptive remote sensing image semantic segmentation device is provided according to a fourth aspect of the present invention, which comprises an acquisition module, a model construction module and a training module, wherein the acquisition module is used for acquiring a remote sensing data set, the remote sensing data set comprises labeled source domain data and unlabeled target domain data, the model construction module is used for constructing an average teacher frame with an auxiliary prototype classifier, the average teacher frame with the auxiliary prototype classifier comprises a teacher model and a student model, the training module is used for training and optimizing parameters of the student model by adopting the remote sensing data set, the teacher model uses an exponential moving average to update the parameters, the auxiliary prototype classifier uses an exponential moving average to update the weights, and the semantic segmentation module is used for inputting the unlabeled target domain data into the trained student model to conduct point-by-point prediction so as to obtain segmentation results corresponding to the unlabeled target domain data.
According to the remote sensing image semantic segmentation device based on domain self-adaption, the problem of domain difference in a remote sensing image cross-domain semantic segmentation task can be solved by constructing the average teacher framework with the auxiliary prototype classifier, the region of interest can be effectively extracted from the target domain data, and the alignment of class layers between the source domain and the target domain is realized, so that the segmentation performance is improved.
In addition, the remote sensing image semantic segmentation device based on domain self-adaption provided by the embodiment of the invention can also have the following additional technical characteristics:
optionally, constructing an average teacher framework with an auxiliary prototype classifier comprises the steps that the student model comprises a feature encoder and a parameterized classifier, deepLabV is used as a network structure of the student model, resNet-101 is used as a framework of the student model, the network structure, the framework and the student model of the teacher model are consistent, memory libraries of corresponding categories are constructed for source domain data and target domain data in a queue mode respectively, so that after features output by the corresponding feature encoder are subjected to embedding and filtering, the feature vectors of different categories are stored in the memory libraries of the corresponding categories, the memory libraries of the source domain and the target domain are spliced, and a KMeans clustering algorithm is used for clustering out the corresponding prototypes of the categories to be used as the auxiliary prototype classifier.
Optionally, training and optimizing parameters of the student model by using the remote sensing data set, wherein the training and optimizing parameters of the student model by using labeled source domain data in a first epoch of training, and storing different types of feature vectors of the source domain data into a source domain memory bank; after the first epoch of training is finished, the teacher model uses the parameters of the student model to initialize parameters, and uses KMeans algorithm to cluster the source domain memory to obtain initialized prototypes of each class as auxiliary prototype classifier, in the second epoch of training, the student model with auxiliary prototype classifier uses the labeled source domain data to train, the teacher model predicts the target domain data to obtain pseudo labels of the target domain, the student model with auxiliary prototype classifier uses the pseudo labels of the target domain to train to update parameters, at the same time, in the training process, the feature vectors of different classes of the source domain data are stored in the source domain memory, the feature vectors of different classes of the target domain data are stored in the memory of the target domain, after the second epoch of training is finished, the teacher model updates the parameters of the student model by index moving average of the parameters of the student model, in the same class memory and the target domain memory are spliced by KMeans algorithm, in order to update the prototype model by the average of each class of the training model in the training mode, the clustering method is different from the prototype model in the second epoch of the training mode, until training is completed.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
Fig. 1 is a flow chart of a domain-adaptive-based semantic segmentation method for a remote sensing image according to an embodiment of the present invention, as shown in fig. 1, the domain-adaptive-based semantic segmentation method for a remote sensing image includes the following steps:
S101, acquiring a remote sensing data set, wherein the remote sensing data set comprises marked source domain data and unmarked target domain data.
The source domain data is Potsdam red, green and blue (RGB) band data, and the target domain data is VAIHINGEN near infrared, green and blue (IRGB) band data.
That is, the remote sensing dataset may be obtained by downloading over the network.
S102, constructing an average teacher framework with an auxiliary prototype classifier, wherein the average teacher framework with the auxiliary prototype classifier comprises a teacher model and a student model.
As one embodiment, constructing an average teacher framework (MEAN TEACHER Framework with an Auxiliary prototype classifier, MTA) with an auxiliary prototype classifier includes:
The student model comprises a feature encoder and a parameterized classifier, deepLabV is used as a network structure of the student model, resNet-101 is used as a framework of the student model, and the network structure, the framework and the student model of the teacher model are consistent;
respectively constructing a memory library of a corresponding category for the source domain data and the target domain data in a form of a queue so as to store the characteristics output by the corresponding characteristic encoder into the memory library of the corresponding category by characteristic vectors of different categories after embedding and filtering;
It should be noted that, the following embedded filtering (Embedding Filter) mechanism is followed to filter the feature vectors extracted by the student model, and then update and delete the memory bank:
wherein, < > represents the inner product, +.,For the target domain samples,
And splicing the memory libraries of the categories corresponding to the source domain and the target domain, and clustering out the prototypes corresponding to the categories by using KMeans clustering algorithm so as to serve as an auxiliary prototyping classifier.
That is, as shown in FIG. 2, the target image (TARGET IMAGES) is input to the teacher Model (Teacher Model) through weak enhancement (Weak Augmentation) and is input to the Student Model (Student Model) through strong enhancement (Strong Augmentation), wherein the weak enhancement includes horizontal inversion, vertical inversion, image sharpening, and color dithering, the strong enhancement includes random rotation, shearing mapping, and displacement, the teacher Model and the Student Model each include a feature encoder (Feature Encoder) and a parametric classifier (PARAMETRIC CLASSIFIER), and employ DeepLabV2 as a network structure, resNet as a skeleton, the pseudo-label (Pseudo Labels) is obtained after the input of the weakly enhanced target image to the teacher Model, and the prediction results of the parametric classifier and the auxiliary prototype classifier are respectively calculated with the pseudo-label from the teacher Model as cross entropy loss Lt,And simultaneously, after target features output by the feature encoder are subjected to embedded filtering (Embedding Filter), the feature vectors of different categories are stored in a target memory bank (Target Memory Bank) of the corresponding category.
The weak enhanced Source Images (Source Images) are input into a student model, the prediction results of the parameterized classifier and the auxiliary prototype classifier respectively calculate cross entropy loss Ls with the labels of the Source Images,Meanwhile, after the source features output by the feature encoder are embedded and filtered, feature vectors of different categories are stored in a source memory bank (Source Memory Bank) of the corresponding category.
The feature vectors of the corresponding classes in the target memory library and the source memory library are spliced and then input into KMeans clustering algorithm, the prototype of each class is output, and the prototype is updated through index moving average (Exponential Moving Average, EMA) to be used as an auxiliary prototype classifier.
The cross entropy loss is calculated as follows:
Wherein p represents the prediction result of a parameterized classifier of a source domain, a target domain or an auxiliary prototype classifier, y represents the real label of the source domain or the pseudo label of the target domain, N represents the number of pixel points, and K represents the number of categories.
The optimization objective function of the student model is defined as:
Wherein, theThe memory bank size of each class of the other source domain and the target domain is 16384x256, and the parameterized classifier is composed of a convolution layer with a convolution kernel size of 1, filling of 0 and a step length of 1.
The teacher model parameters are updated by the following exponential moving average:
Wherein, theRepresenting the parameters of the teacher model after training the iteration iota epochs,Representing parameters of student models after training for iota epochs, initializing a teacher model by using the parameters of the student models when iota is more than or equal to 2 and a smoothing coefficient alpha=0.99 and iota=1, and assisting a prototype classifierEK denotes a prototype of class K.
The exponential moving average update manner of the class c prototype ec is defined as:
Wherein, theRepresenting a prototype of class c after the I-th epoch of the training iteration,The class prototype obtained by carrying out KMeans clustering on the class memory banks of the source domain and the target domain after training iteration I is represented, wherein I is more than or equal to 3, when I=2, ec is obtained by carrying out KMeans clustering on the class memory banks of the source domain and the target domain, and when I=1, ec is obtained by carrying out KMeans clustering on the class memory banks of the source domain.
And S103, training and optimizing parameters of the student model by adopting a remote sensing data set, wherein the teacher model uses an index moving average to update the parameters, and the auxiliary prototype classifier uses the index moving average to update the weights.
As one embodiment, a remote sensing data set is used for training and optimizing parameters of a student model, the remote sensing data set is used for training and optimizing parameters of the student model by using labeled source domain data in a first epoch, the student model is trained by using the labeled source domain data, different types of feature vectors of the source domain data are stored in a source domain memory, after the first epoch is trained, the teacher model uses parameters of the student model for parameter initialization, and a KMeans algorithm is used for clustering the source domain memory to obtain initialized prototypes of each class as an auxiliary prototype classifier, the student model with the auxiliary prototype classifier uses labeled source domain data for training in a second epoch, the teacher model predicts the target domain data to obtain pseudo labels of the target domain, the student model with the auxiliary prototype classifier uses the pseudo labels of the target domain to train the source domain data to update parameters, different types of feature vectors of the source domain data are stored in the source domain memory in a training process, the different types of the feature vectors of the target domain data are stored in the source domain memory, the initial prototypes of each class of feature vectors of the target domain data are clustered in the second epoch are obtained by using the KMeans algorithm to obtain initial prototypes of each class, the initial prototypes of each class is clustered by using the second epoch, the model is matched with the second epoch model in an average model, the training mode is used for obtaining the average results of the target domain model, and the average results of the model are obtained by the same by the training the model, and the second epoch is matched with the model in the training model, and the average model is matched with the model in the training the second epoch is matched with the model to obtain the average model, until training is completed.
It should be noted that, in training, random gradient GRADIENT DESCENT (SGD) is used as an optimizer, the weight attenuation coefficient, the impulse value and the initial learning rate are respectively set to 5e-4, 0.9 and 2.5e-4, the learning rate is gradually reduced by using a polynomial attenuation strategy, and the current learning rate is equal to the initial learning rate multiplied byWhere power=0.9.
S104, inputting the unlabeled target domain data into a trained student model for point-by-point prediction to obtain a segmentation result corresponding to the unlabeled target domain data.
In summary, the invention can solve the domain difference problem in the remote sensing image cross-domain semantic segmentation task by constructing the average teacher framework with the auxiliary prototype classifier, can effectively extract the region of interest from the target domain data, does not need complex training technology and multi-stage training strategies compared with the multi-stage UDA method, can realize the alignment of class layers between the source domain and the target domain, and has domain invariance by clustering compared with other prototype calculation methods which only use the source domain or the target domain.
In order to achieve the above embodiments, an embodiment of the present invention provides a computer readable storage medium, on which a domain-adaptive-based remote sensing image semantic segmentation program is stored, which implements the domain-adaptive-based remote sensing image semantic segmentation method described above when executed by a processor.
According to the computer readable storage medium, the domain-adaptive remote sensing image semantic segmentation program is stored, so that the processor can realize the domain-adaptive remote sensing image semantic segmentation method when executing the domain-adaptive remote sensing image semantic segmentation program, and therefore, the domain difference problem in a remote sensing image cross-domain semantic segmentation task can be solved by constructing an average teacher framework with an auxiliary prototype classifier, the region of interest can be effectively extracted from target domain data, and the alignment of class layers between a source domain and a target domain is realized, so that the segmentation performance on the target domain data is effectively improved.
In order to achieve the above embodiments, the embodiments of the present invention provide a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where when the processor executes the program, the method for semantic segmentation of a remote sensing image based on domain adaptation as described above is implemented.
According to the computer equipment provided by the embodiment of the invention, the domain-adaptive remote sensing image semantic segmentation program is stored through the memory, so that the domain-adaptive remote sensing image semantic segmentation method is realized when the processor executes the domain-adaptive remote sensing image semantic segmentation program, and therefore, the domain difference problem in a remote sensing image cross-domain semantic segmentation task can be solved by constructing an average teacher framework with an auxiliary prototype classifier, the region of interest can be effectively extracted from target domain data, and the alignment of class layers between a source domain and a target domain is realized, so that the segmentation performance on the target domain data is effectively improved.
In order to realize the above embodiment, the embodiment of the invention also provides a domain-adaptive remote sensing image semantic segmentation device, which, as shown in fig. 3, comprises an acquisition module 10, a model construction module 20, a training module 30 and a semantic segmentation module 40.
The system comprises an acquisition module 10, a model construction module 20, a training module 30 and a semantic segmentation module, wherein the acquisition module 10 is used for acquiring a remote sensing data set, the remote sensing data set comprises labeled source domain data and unlabeled target domain data, the model construction module 20 is used for constructing an average teacher framework with auxiliary prototype classifiers, the average teacher framework with the auxiliary prototype classifiers comprises a teacher model and a student model, the training module 30 is used for training and optimizing parameters of the student model by adopting the remote sensing data set, the teacher model uses an index moving average to update the parameters, the auxiliary prototype classifier uses the index moving average to update the weights, and the semantic segmentation module 40 is used for inputting the unlabeled target domain data into the trained student model to conduct point-to-point prediction so as to obtain segmentation results corresponding to the unlabeled target domain data.
It should be noted that the description and the illustration of the domain-adaptive-based remote sensing image semantic segmentation method are also applicable to the domain-adaptive-based remote sensing image semantic segmentation device of the present embodiment, and are not described herein.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements or in an interaction relationship between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.