Summary of the invention
It is an object of the invention to propose a kind of integrated study Remote Image Classification based on class weight vector, withOvercome the shortcomings of existing MCS method, achievees the purpose that efficient, high-precision classification of remote-sensing images.
The present invention to achieve the goals above, adopts the following technical scheme that
Integrated study Remote Image Classification based on class weight vector, includes the following steps:
S1. according to the basic step of AdaBoost, initialization sample concentrates the weight of sample;
S2. the sample of extraction is divided into training sample and survey using there is the method sample drawn for putting back to sampling from sample setSample sheet;
S3. the base classifier of a variety of different classifications algorithms is respectively trained using training sample;
S4. classified using the base classifier in step s3 to test sample, obtain the classification power of each base classifierWeight vector;
S5. each base classifier is integrated based on class weight vector, obtains integrated classifier;
S6. classified using the integrated classifier in step s5 to the sample in step s2 sample set, obtain classification and missDifference thens follow the steps s7 if error in classification is less than setting error threshold, no to then follow the steps s8;
S7. the sample weights in sample set are updated, updated sample is put back in the sample set in step s2;
S8. s1 is returned to step;
S9. AdaBoost the number of iterations is set, above-mentioned steps s2 to step s7 is repeated, is changed until terminating AdaBoostIn generation, generates a series of integrated classifiers with ballot weight;
S10. for remote sensing image to be sorted, a series of integrated classifiers with ballot weight is utilized respectively and are treated pointThe remote sensing image of class is classified, finally by all classification results of the method integration of Nearest Neighbor with Weighted Voting.
Preferably, the sorting algorithm include C4.5 decision tree, support vector machines, artificial neural network, naive Bayesian,K arest neighbors, Logistic recurrence, minimum distance method, expectation maximization, maximum likelihood method, mahalanobis distance method and random forest;
Multiple and different sorting algorithms is arbitrarily chosen in the step s3 from above-mentioned sorting algorithm for training base to classifyDevice.
Preferably, in step s4, the detailed process of class weight vector is obtained are as follows:
If each base classifier passes through class weight vector W for different classes of susceptibilityijTo indicate;
If base classifier collection M={ M1,M2,M3... Ms }, S is base classifier number;Sample set X={ X1,X2,X3,…XN, N is the number of sample;Classification collection Ω={ ω1,ω2,ω3,…,ωC, C is the number of classification;
Base classifier M is set againiClass weight vector be Wi, 1≤i≤S, tijFor base classifier MiTest sample is divided intoωjThe number of class, eijFor base classifier MiTest sample mistake is divided into classification ωjNumber, 1≤j≤C;
Then base classifier MiTo classification ωjWeight expression are as follows:
Wij=1-Eij (1)
Therefore, base classifier MiClass weight vector expression for identification test sample of voting are as follows:
Wi=(Wi1,Wi2,Wi3,...,Wic) (3)。
Preferably, if M ' expression integrated classifier, x indicates sample;Then classification results of the integrated classifier M ' to sample xAre as follows:
Wherein, M ' (x) indicates that integrated classifier M ' acts on the classification results of sample x;
WijFor base classifier MiFor the weight of j-th of classification;
Mi(x) base classifier M is indicatediAct on the classification results of sample x.
Preferably, in step s7, detailed process is as follows for the sample weights in update sample set:
It improves by the weight of the sample of mistake classification, is chosen as training sample in AdaBoost iteration next time to increaseProbability.
The present invention has the advantage that
The sample of extraction is divided into training sample and test sample first by the present invention, then using the training of different classifications algorithmClassifier later again classifies to test sample with these classifiers, obtains that classification can be made to be adaptive to different classifications deviceClass weight vector realize point of different classifications algorithm integrate integrated classifier to different classifiersClass device, which is realized, to have complementary advantages.On this basis, using the method for AdaBoost iteration, the integrated classifier of generation is promotedObtain that a precision is higher, the better integrated classifier of integrated result.The method achieve adaptive between classification and classifierIt answers, enhances the diversity of base classifier during integrated different classifications algorithm, ensure that having for classification of remote-sensing images precisionEffect is promoted.
Specific embodiment
With reference to the accompanying drawing and specific embodiment invention is further described in detail:
As shown in Figure 1, technology path of the invention is as follows:
First layer and the second layer be generate different classifications algorithm base classifier (base classifier is the base of Multi-classifers integratedPlinth classifier is the integrated main object of the present invention), and an integrated classifier is generated by the method for class weight vector.
Integrated classifier, i.e. Ensemble Learning based on Weight Vector, abbreviation EL_WV classificationDevice.
In integrated classifier, if each base classifier passes through class weight vector W for different classes of susceptibilityijComeIt indicates.
If base classifier collection M={ M1,M2,M3... Ms }, S is base classifier number;Sample set X={ X1,X2,X3,…XN, N is the number of sample;Classification collection Ω={ ω1,ω2,ω3,…,ωC, C is the number of classification.
Base classifier M is set againiClass weight vector be Wi, 1≤i≤S, tijFor base classifier MiTest sample is divided intoωjThe number of class, eijFor base classifier MiTest sample mistake is divided into classification ωjNumber, 1≤j≤C.
Then base classifier MiTo classification ωjWeight expression are as follows:
Wij=1-Eij (1)
Therefore, base classifier MiClass weight vector expression for identification test sample of voting are as follows:
Wi=(Wi1,Wi2,Wi3,...,Wic) (3)。
Based on this, the solution procedure of available integrated classifier:
1. extracting training sample set D from sample set DtrainWith test sample collection Dtest;
2. i is recycled to S from 1, step 2-1 and step 2-2 is executed;Wherein, i is base classifier MiIndex;
2-1. uses training set DtrainTraining base classifier Mi;
2-2. test set DtestIn sample calculate base classifier MiClass weight vector Wi。
3. setting M ' expression integrated classifier, x indicates sample;Then classification results of the integrated classifier M ' to sample x are as follows:
Wherein, M ' (x) indicates that integrated classifier M ' acts on the classification results of sample x;
WijFor base classifier MiFor the weight of j-th of classification;
Mi(x) base classifier M is indicatediAct on the classification results of sample x.
Third layer is collected by way of Nearest Neighbor with Weighted Voting using the multiple EL_WV classifiers of AdaBoost grey iterative generationAt obtaining ELM_CWV classifier (Ensemble Learning Method with the use of Class basedWeight Vector)。
Before introducing the detailed process step in the embodiment of the present invention, AdaBoost method is introduced first.
AdaBoost is that each sample assigns identical weight first, and the size of weight is for determining that the sample is selected to instructionPractice the probability concentrated and generate base classifier.Then, training sample is extracted from sample set in a manner of weighted sample and generates weak pointClass device classifies to all samples using Weak Classifier, calculates the error of current class device, calculates separately further according to errorThe weight of each sample keeps the sample weights by mistake classification higher.In this way, in next round iteration, the training sample of selectionOriginally it focuses more on those to be easy by the sample of mistake point, newly-generated classifier will give these samples higher concern.
And so on, by successive ignition, a series of Weak Classifier can be generated.For some unknown entity, ownWeak Classifier all classify to it, finally by all classification results of the method integration of Nearest Neighbor with Weighted Voting.
AdaBoost has derived many versions, and there are mainly two types of most typical algorithms: AdaBoost.M1 andAdaBoost.M2.AdaBoost.M1 is mainly used for multi-class classification, and AdaBoost.M2 is then mainly used for two-value classification.OneAs in the case of, classification of remote-sensing images belongs to more classification problems, therefore application of the AdaBoost in remote sensing automatic interpretation is mainIt is based on AdaBoost.M1.AdaBoost according to the present invention is primarily referred to as AdaBoost.M1, and specific algorithm is as follows:
Assuming that sample set S={ (x1,y1),(x2,y2),…,(xn,yn)}(xp∈X,yp∈Y)。
Wherein, n is the total number of sample, and X and Y respectively represent the feature space and class label of sample, and K is time of iterationNumber, the weight of W (p) representative sample p, p=1,2 .., n.The detailed process of AdaBoost algorithm are as follows:
Input: S;K;Weak typing algorithm WeakLearn;
Training:
1. by the weights initialisation of the sample p in S: W (p)=1/n;
2. working as q=1 ..., when K, following steps are executed:
3. having the method for putting back to sampling by weighting, training sample is obtained from S, calls one point of WeakLearn trainingClass predicts hq:X→Y;H is calculated according to formula (5)qError εq:
4. if εqThe weight of sample p, W (p)=1/n then go to step 3 in > 0.5, initialization sample collection S;
5. calculating βq=εq/(1-εq), and W is updated according to following formulaq+1(p):
Wherein, ZqFor the normalization factor that the sum of all sample weights are 1 can be made;
6. calculating λq=log (1/ βq) it is used as hqBallot weight;
7. terminating.
Classification: according to formula (7), the final prediction of the entity x of unknown classification is obtained by way of Nearest Neighbor with Weighted Voting.
As shown in Fig. 2, being carried out specifically to the integrated study Remote Image Classification based on class weight vector belowIt is bright:
S1. according to the basic step of AdaBoost, the weight of sample in initialization sample collection D.
S2. using from sample set D has the method sample drawn D for putting back to samplingi, the sample of extraction is divided into training sample DiaWith test sample Dib;Wherein, DiaFor the training of base classifier, DibFor determining class weight vector.
S3. the base classifier of a variety of different classifications algorithms is respectively trained using training sample.
Wherein, sorting algorithm includes C4.5 decision tree, support vector machines, artificial neural network, naive Bayesian, K nearestNeighbour, Logistic recurrence, minimum distance method, expectation maximization, maximum likelihood method, mahalanobis distance method and random forest etc..
Multiple and different sorting algorithms is arbitrarily chosen from above-mentioned sorting algorithm for training base classifier.
Sorting algorithm used in the present embodiment for example can be C4.5 (i.e. C4.5 decision tree), SVM (i.e. supporting vectorMachine), four kinds of ANN (i.e. artificial neural network) and NB (i.e. naive Bayesian).
Use DiaC4.5, SVM, ANN, the base classifier C4.5 of NB is respectively trainedi, SVMi, ANNi, NBi。
S4. base classifier C4.5 is utilizedi, SVMi, ANNi, NBiTo test sample DibClassify, obtains each base classificationThe class weight vector of device, specific calculating process hereinbefore it is stated that.
S5. each base classifier is integrated based on class weight vector, obtains integrated classifier EL_WVi。
S6. integrated classifier EL_WV is utilizediClassify to the sample in step s2 sample set D, obtain error in classification,If error in classification is less than setting error threshold, s7 is thened follow the steps, it is no to then follow the steps s8.
It is 0.5 that error threshold is normally set up in the present embodiment, when error in classification meets condition less than 0.5.
S7. the sample weights in sample set are updated, that is, are improved by the weight of the sample of mistake classification, to increaseIt is chosen as the probability of training sample in AdaBoost iteration next time, updated sample is put back in the sample set in step s2.
S8. s1 is returned to step, the weight of sample in sample set D is reinitialized.
S9. AdaBoost the number of iterations is set, above-mentioned steps s2 to step s7 is repeated, is changed until terminating AdaBoostIn generation, generates a series of EL_WV integrated classifiers with ballot weight.
S10. for remote sensing image to be sorted, a series of integrated classifier EL_WV with ballot weight are utilized respectivelyClassify to remote sensing image to be sorted, finally by all classification results of the method integration of Nearest Neighbor with Weighted Voting.
Certainly, there are four types of the sorting algorithm in the present embodiment is not limited to, can also be certainly three kinds, five kinds even moreIt is more.In addition, being also not limited to the combination of above-mentioned four kinds of sorting algorithms, the combination of other sorting algorithms can also be.
For the present invention by integrating the base classifier of different classifications algorithm, the advantage realized between different classifications algorithm is mutualIt mends.
And the present invention then expands the scale of base classifier, enhances the multiplicity of base classifier by AdaBoost iterationProperty, realize it is adaptive between classification and base classifier, reached efficiently, high-precision classification of remote-sensing images purpose.
Certainly, described above is only that presently preferred embodiments of the present invention is answered the present invention is not limited to enumerate above-described embodimentWhen explanation, anyone skilled in the art is all equivalent substitutes for being made, bright under the introduction of this specificationAobvious variant, all falls within the essential scope of this specification, ought to be by protection of the invention.