here, WOE represents the proof weight of a certain characteristic variable, distgoods represents the distribution proportion of "good" borrowers in the sample data to the characteristic variable, and distbats represents the distribution proportion of "bad" borrowers in the sample data to the characteristic variable. The higher the positive value of the WOE, the lower the risk of credit default for the customer's activity, and the higher the negative value of the WOE, the higher the risk of credit default for the customer's activity. WOE can convert variables into a format of rules and information, which allows different types of variables to be in the same way. Variables can be transferred into WOE, and the freedom of small sample problems can be protected more effectively. Therefore, WOE is employed to compare different variables in a small sample data set. The information value can evaluate the prediction capability of the characteristic variables, and the specific calculation formula is as follows:

IV＝(DistrGoods-DistrBads)*WOE，

wherein, IV represents the information value of a certain characteristic variable, DistrGoods represents the distribution proportion of "good" borrowers in the sample data in the characteristic variable, distbats represents the distribution proportion of "bad" borrowers in the sample data in the characteristic variable, and WOE represents the evidence weight of the characteristic variable.

As shown in fig. 2, the present embodiment provides a data processing apparatus for optimizing a credit evaluation model, based on the same inventive concept as the above-described data mining method for credit evaluation, including:

Preferably, as shown in fig. 3, the model training module specifically includes:

the first classification module is used for performing segmentation processing on the continuous variables in the training set by adopting a decision tree algorithm and converting the continuous variables into discrete variables;

the second classification module is used for classifying the discrete variables in the training set by adopting a clustering algorithm;

the variable merging module is used for merging the variables according to the classification result and determining a preliminary model characteristic value;

and the logistic regression module is used for carrying out logistic regression on the sample data of the model characteristic value to establish a preliminary evaluation model.

Preferably, the system further comprises an alternative variable module for:

calculating Euclidean distances between variables;

Preferably, the data completing module is further included for: and before carrying out logistic regression, if the model characteristic value of the borrower lacks data, complementing the data of the model characteristic value.

Preferably, the data completion module is specifically configured to:

Preferably, the data completion module is configured to:

Preferably, the data acquisition module may be further configured to acquire external statistical data; correspondingly, the data completion module is specifically configured to: and if the model characteristic value of the borrower lacks data, supplementing the model characteristic value of the borrower lacking data according to the external statistical data.

Preferably, the variable cleaning module is further included for: calculating the information value of each variable before performing logistic regression; checking according to a preset value threshold value, and judging whether the variable is effective or not; no logistic regression was involved for invalid feature variables.

The data mining device for credit evaluation provided by the embodiment and the data mining method for credit evaluation have the same inventive concept and the same beneficial effects, and are not repeated herein.

Based on the same inventive concept as the above-described data mining method for credit evaluation, the present implementation provides a computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the method as described in any of the method embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A data processing method for optimizing a credit evaluation model, comprising:

acquiring relevant information of a borrower as sample data;

dividing the sample data into a training set and a test set;

testing the preliminary evaluation model by using the test set;

if the test result meets the evaluation standard, finishing the training and determining a final evaluation model;

the data modeling is carried out by utilizing the training set to obtain a preliminary evaluation model, and the preliminary evaluation model comprises the following steps:

performing logistic regression on the sample data of the model characteristic value to establish a preliminary evaluation model;

before performing the logistic regression, the method further comprises:

if the model characteristic value of the borrower lacks data, the data of the model characteristic value is supplemented;

2. The method of claim 1, wherein determining the replacement variable comprises:

calculating Euclidean distances between variables;

3. The method of claim 1, wherein the supplementing the data of the model feature value of the borrower if the model feature value lacks data comprises:

4. The method of claim 1, further comprising: acquiring external statistical data;

5. The method of claim 1, prior to performing logistic regression, further comprising:

calculating the information value of each variable;

no logistic regression was involved for the invalid variables.

6. A data processing apparatus for optimizing a credit evaluation model, comprising:

the model testing module is used for testing the preliminary evaluation model by utilizing the test set; if the test result does not meet the evaluation standard, the training set and the test set are divided again, and the training of the divided training set and the test set is utilized to carry out data modeling and testing; if the test result meets the evaluation standard, finishing the training and determining a final evaluation model;

the model training module specifically comprises:

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of one of claims 1 to 5.