Summary of the invention
The embodiment of the present invention provides a kind of Combining Multiple Classifiers and system. Examine by utilizing fullyConsider the genetic algorithm of fitness function structure complementary between grader overall performance and grader, fromAnd can optimize each member classifying device weight, finally obtain an optimum grader.
According to an aspect of the present invention, provide a kind of Combining Multiple Classifiers, comprising:
Training set data is inputted to N member classifying device to predict the outcome to generate corresponding N;
Using the weight vectors of member classifying device as current population, and extract the weight of current populationValue;
Predict the outcome and the weighted value of current population according to N, calculate between member's graderComplementary index Cd;
Calculate the corresponding integrated classification device of each population overall performance FP;
According to overall performance index FP and complementary index Cd, calculate the fitness of each populationFunctional value Fit;
According to the fitness function value Fit of each population, by the selection in genetic algorithm, intersectionProcess with variation, generate population of new generation using as current population;
Judge whether to meet predetermined end condition;
If do not meet predetermined end condition, carry out the step of the weighted value that extracts current population;
If meet predetermined end condition, choose the population with maximum adaptation degree functional value FitAs the weight of member classifying device, thereby construct optimum fusion grader.
In one embodiment, predict the outcome and the weighted value of current population according to N, meterThe step of complementary index Cd between the person's of being counted as grader comprises:
Calculate respectively N the comprehensive evaluation index F1-Neasure predicting the outcome separately;
In N member classifying device, by corresponding to comprehensive evaluation index F1-Neasure maximumMember classifying device is as benchmark grader;
In N member classifying device, calculate respectively other N-1 except benchmark graderThe Jaccard similarity of member classifying device and benchmark grader;
According to the weighted value of described Jaccard similarity and current population, calculate member grader itBetween complementary index Cd.
In one embodiment, utilize formula
Calculate the complementary index Cd between member's grader, wherein SindexFor benchmark graderIndex, simiBe the Jaccard similarity of i member classifying device and benchmark grader, wi' beThe weight of i member classifying device.
In one embodiment, calculate the corresponding integrated classification device of each population overall performanceThe step of FP comprises:
For each population, according to the integrated classification result of all training samples, calculate after mergingComprehensive evaluation index F1-Neasure, using the overall performance FP as corresponding integrated classification device.
In one embodiment, according to overall performance index FP and complementary index Cd, meterThe step of calculating the fitness function value Fit of each population comprises:
Utilize formula
Calculate fitness function value Fit.
According to a further aspect in the invention, provide a kind of Multiple Classifier Systems, comprise prediction knotFruit generation unit, population generation unit, weighted value extraction unit, complementary computing unit, entiretyPerformance computing unit, fitness computing unit, population updating block, recognition unit and optimal classificationDevice construction unit, wherein:
The generation unit that predicts the outcome, for inputting training set data N member classifying device with lifeBecome corresponding N to predict the outcome;
Population generation unit, for using the weight vectors of member classifying device as current population;
Weighted value extraction unit, for extracting the weighted value of current population;
Complementary computing unit, for predicting the outcome and the weighted value of current population according to N,Calculate the complementary index Cd between member's grader;
Overall performance computing unit, for calculating the corresponding integrated classification device of each population globalityEnergy FP;
Fitness computing unit, for according to overall performance index FP and complementary index Cd,Calculate the fitness function value Fit of each population;
Population updating block, for according to the fitness function value Fit of each population, by hereditySelection in algorithm, crossover and mutation processing, generate population of new generation using as current population;
Recognition unit, for judging whether to meet predetermined end condition; If do not meet predetermined endOnly condition, indicates weighted value extraction unit to carry out the operation of the weighted value that extracts current population;
Optimum classifier construction unit, for according to the judged result of recognition unit, is scheduled to if meetEnd condition, choose the population with maximum adaptation degree functional value Fit as member classifying deviceWeight, thereby construct optimum fusion grader.
In one embodiment, complementary computing unit is specifically for calculating respectively N prediction knotFruit comprehensive evaluation index F1-Neasure separately; In N member classifying device, will be corresponding toThe member classifying device of comprehensive evaluation index F1-Neasure maximum is as benchmark grader; At NIn member classifying device, calculate respectively other N-1 member classifying device except benchmark grader andThe Jaccard similarity of benchmark grader; According to described Jaccard similarity and current populationWeighted value, calculates the complementary index Cd between member's grader.
In one embodiment, complementary computing unit utilizes formula
Calculate the complementary index Cd between member's grader, wherein SindexFor benchmark graderIndex, simiBe the Jaccard similarity of i member classifying device and benchmark grader, wi' beThe weight of i member classifying device.
In one embodiment, overall performance computing unit is specifically for each population, according to instituteThere is the integrated classification result of training sample, calculate the comprehensive evaluation index F1-Neasure after merging,Using the overall performance FP as corresponding integrated classification device.
In one embodiment, fitness computing unit specifically utilizes formula
Calculate fitness function value Fit.
The present invention is by utilizing complementary structure between the overall performance of integrated classification device and member classifying deviceMake appropriateness function, thereby can optimize each member classifying device weight, finally obtain optimum dividingClass device.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, to the technical scheme in the embodiment of the present inventionBe clearly and completely described, obviously, described embodiment is only that the present invention's part is realExecute example, instead of whole embodiment. Real to the description of at least one exemplary embodiment belowOn border, be only illustrative, never as any limit to the present invention and application or useSystem. Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative laborThe every other embodiment obtaining under moving prerequisite, belongs to the scope of protection of the invention.
Unless illustrate in addition, otherwise the parts of setting forth in these embodiments and the phase of stepLayout, numeral expression formula and numerical value are not limited the scope of the invention.
, it should be understood that for convenience of description the chi of the various piece shown in accompanying drawing meanwhileVery little is not to draw according to actual proportionate relationship.
May not do in detail for the known technology of person of ordinary skill in the relevant, method and apparatusThin discussion, but in suitable situation, described technology, method and apparatus should be regarded as authorizing and sayA part for bright book.
In all examples with discussing shown here, any occurrence should be construed as merelyExemplary, instead of as restriction. Therefore, other example of exemplary embodiment can toolThere is different values.
It should be noted that: in similar label and letter accompanying drawing below, represent similar terms, therefore,Once be defined in an a certain Xiang Yi accompanying drawing, do not need it to carry out in accompanying drawing subsequentlyFurther discuss.
Fig. 1 is the schematic diagram of an embodiment of Combining Multiple Classifiers of the present invention. Wherein:
Step 101, inputs N member classifying device by training set data pre-to generate corresponding NSurvey result.
Step 102, using the weight vectors of member classifying device as current population.
Step 103, extracts the weighted value of current population.
Step 104, predicts the outcome and the weighted value of current population according to N, calculates member classifyingComplementary index Cd between device.
Preferably, calculating complementary index Cd can be specially:
(this is one to calculate respectively N the comprehensive evaluation index F1-Neasure predicting the outcome separatelyIndividually refer to according to accuracy rate Precision and the two comprehensive evaluation providing of recall rate RecallMark). In N member classifying device, by corresponding to comprehensive evaluation index F1-Neasure maximumMember classifying device is as benchmark grader. In N member classifying device, calculate respectively except benchmarkThe Jaccard similarity of the member classifying device of other N-1 outside grader and benchmark grader.According to the weighted value of described Jaccard similarity and current population, calculate between member's graderComplementary index Cd.
Preferably, can utilize formula
Calculate the complementary index Cd between member's grader, wherein SindexFor benchmark graderIndex, simiBe the Jaccard similarity of i member classifying device and benchmark grader, wi' beThe weight of i member classifying device.
Step 105, calculates the corresponding integrated classification device of each population overall performance FP.
Preferably, for each population, according to the integrated classification result of all training samples, calculateComprehensive evaluation index F1-Neasure after fusion, using the overall performance as corresponding integrated classification deviceFP。
Step 106, according to overall performance index FP and complementary index Cd, calculates each populationFitness function value Fit.
Preferably, utilize formula
Calculate fitness function value Fit.
Step 107, according to the fitness function value Fit of each population, by genetic algorithmSelect, crossover and mutation processing, generate population of new generation using as current population.
Step 108, judges whether to meet predetermined end condition. If do not meet predetermined termination barPart, performs step 103. If meet predetermined end condition, perform step 109.
Step 109, chooses the population with maximum adaptation degree functional value Fit as member classifying deviceWeight, thereby construct optimum fusion grader.
Thereby, by test set data input optimum fusion grader, to obtain optimum prediction result.
The Combining Multiple Classifiers providing based on the above embodiment of the present invention, merges point by utilizationComplementary structure appropriateness function between the overall performance of class device and member classifying device, thus can optimize eachMember classifying device weight, finally obtains an optimum grader.
Here it should be noted that, because genetic algorithm is that those skilled in the art understand, asWhat utilizes genetic algorithm to calculate, select, intersect or the processing that makes a variation not is inventive point of the present inventionPlace, does not therefore launch to describe here.
Fig. 2 is the schematic diagram of an embodiment of Multiple Classifier Systems of the present invention. As shown in Figure 2,It is single that this system can comprise that the generation unit 201 that predicts the outcome, population generation unit 202, weighted value extractUnit 203, complementary computing unit 204, overall performance computing unit 205, fitness computing unit206, population updating block 207, recognition unit 208 and optimum classifier construction unit 209. ItsIn:
The generation unit 201 that predicts the outcome, for training set data is inputted N member classifying device withGenerating corresponding N predicts the outcome.
Population generation unit 202, for using the weight vectors of member classifying device as current population.
Weighted value extraction unit 203, for extracting the weighted value of current population.
Complementary computing unit 204, for predicting the outcome and the weighted value of current population according to N,Calculate the complementary index Cd between member's grader.
Preferably, complementary computing unit 204 is for calculating respectively predict the outcome combining separately of NClose evaluation index F1-Neasure; In N member classifying device, will refer to corresponding to overall meritThe member classifying device of mark F1-Neasure maximum is as benchmark grader; At N member classifying deviceIn, calculate respectively other N-1 member classifying device and benchmark grader except benchmark graderJaccard similarity; According to the weighted value of described Jaccard similarity and current population, meterComplementary index Cd between the person's of being counted as grader.
Preferably, complementary computing unit 204 utilizes formula:
Calculate the complementary index Cd between member's grader, wherein SindexFor benchmark graderIndex, simiBe the Jaccard similarity of i member classifying device and benchmark grader, wi' beThe weight of i member classifying device.
Overall performance computing unit 205, whole for calculating the corresponding integrated classification device of each populationBody performance FP.
Preferably, overall performance computing unit 205 is specifically for each population, according to all trainingThe integrated classification result of sample, calculate merge after comprehensive evaluation index F1-Neasure, using asThe overall performance FP of corresponding integrated classification device.
Fitness computing unit 206, for according to overall performance index FP and complementary index Cd,Calculate the fitness function value Fit of each population.
Preferably, fitness computing unit 206 can utilize formula
Calculate fitness function value Fit.
Population updating block 207, for according to the fitness function value Fit of each population, passes throughSelection in genetic algorithm, crossover and mutation processing, generate population of new generation using as current population.
Recognition unit 208, for judging whether to meet predetermined end condition; If do not meet predeterminedEnd condition, indicate weighted value extraction unit 203 to carry out to extract the weighted value of current populationOperation.
Optimum classifier construction unit 209, for according to the judged result of recognition unit 208, ifMeet predetermined end condition, choose the population with maximum adaptation degree functional value Fit as one-tenthThe weight of member's grader, thus construct optimum fusion grader.
Thereby, by test set data input optimum fusion grader, can obtain optimum prediction result.
The Multiple Classifier Systems providing based on the above embodiment of the present invention, merges point by utilizationComplementary structure appropriateness function between the overall performance of class device and member classifying device, thus can optimize eachMember classifying device weight, finally obtains an optimum grader.
Below by a concrete example, the present invention will be described.
For example analyze for the China Telecom's warning against losing customers based on Spark internal memory Computational frame, this is realExecute example and be one with online number of days mean value, mean value, online duration trend, online for broadband access feeTraffic trends, online number of days trend, trend for broadband access fee, surfing flow mean value (g), onNet duration mean value (hour/day) is characteristic parameter, and the machine of whether tearing open is that the binary classification of label is askedTopic. As shown in Figure 3, concrete implementation step is as follows:
Step 301, inputs multiple member classifying device generation forecast results by training set data. ThenPerform step respectively 302 and step 304.
First sample distribution is analyzed, sample total is 1976789, wherein tears machine customer volume openBe 162451, only account for 8% of total amount. Because sample distribution is unbalanced, therefore first adoptC_SMOTE method has been carried out equalization processing to sample, makes positive and negative sample size roughly equal. ByIn the span difference of characteristic value, therefore adopt Z_Score method to carry out specification to sample characteristicsChange and process. Sample through above-mentioned processing is divided into training set and test set, training set is input toMultiple graders, obtain each grader predicting the outcome to each sample.
Step 302, using grader weight vector as population, produces multiple populations.
Step 303, extracts population weighted value. Then perform step 306 and step 307.
Step 304, calculates the F1_Measure value of each member classifying device.
Step 305, choosing the maximum member classifying device of F1_Measure value is benchmark grader,Calculate the Jaccard similarity of other grader and this benchmark grader.
Step 306, calculates the complementary Cd between each member classifying device. Then perform step 310.
Step 307, is normalized population weight vector.
Step 308, calculates the multi-categorizer Nearest Neighbor with Weighted Voting result of each training sample.
Step 309, the overall performance FP of calculating multi-categorizer Nearest Neighbor with Weighted Voting result.
, calculate the overall performance FP of integrated classification device voting results.
For example, can first obtain the classification results of integrated classification device, adopt following formula to calculate:
Wherein N is member classifying device sum, wi' be the normalized weight of i member classifying device, fi(x)Be the discriminant function of i member classifying device, T is classification thresholds.
According to the integrated classification result of all training samples, calculate after fusion(one according to accuracy rate Precision and recall rate Recall, the two provides F1-Measure oneIndividual comprehensive evaluation index), represent the overall performance FP of integrated classification device with it.
Step 310, calculates the fitness function value Fit of each population.
Step 311, selecting, intersect, make a variation to process according to fitness function value obtains new oneFor population.
Step 312, judges whether to meet end condition. If do not meet, return to step 303,To upgrade member's grader complementarity and integrated classification device overall performance; If meet, execution step313。
End condition can finish or reach predefined threshold value for iterations.
Step 313, selects the population of fitness function maximum wherein as the power of member classifying deviceHeavy, construct optimum fusion grader with this.
Step 314, by test set data input optimum fusion grader, to obtain predicting class label.
By implementing the present invention, can obtain following beneficial effect:
1) complementarity between fusion performance and the member classifying device of use multi-categorizer builds genetic algorithmFitness function, ensured between Multiple Classifier Fusion performance and member classifying device complementary equilibrium;
2) complementarity between the Jaccard measuring similarity member classifying device of use weighting, has reducedAmount of calculation has been taken into account the otherness of member classifying device simultaneously;
3) training sample is independently inputted different member classifying devices, and the training of each grader is independent, can concurrency strong.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodimentSuddenly can complete by hardware, also can carry out the hardware that instruction is relevant by program and complete, instituteThe program of stating can be stored in a kind of computer-readable recording medium, and the above-mentioned storage of mentioning is situated betweenMatter can be read-only storage, disk or CD etc.