Disclosure of Invention
In order to solve the above problems, the present application provides a method for evaluating a consumption loan pre-authorization, comprising: collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample;
performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user representation scoring system;
clustering the credit evaluation samples to obtain clustering results, and determining the outlier distance of outlier samples in the clustering results relative to a clustering center according to the clustering results;
calculating a composite risk score for the user based on the outlier distance, the first risk score, and the second risk score;
adding a corresponding type label to the credit evaluation sample according to the comprehensive risk score, and performing supervised learning on the credit evaluation sample added with the type label according to a preset target default prediction model to obtain default probability of the user;
and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
The embodiment of the application provides a consumption loan pre-granting credit evaluation device, which is characterized by comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample;
performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user representation scoring system;
clustering the credit evaluation samples to obtain clustering results, and determining the outlier distance of outlier samples in the clustering results relative to a clustering center according to the clustering results;
calculating a composite risk score for the user based on the outlier distance, the first risk score, and the second risk score;
adding a corresponding type label to the credit evaluation sample according to the comprehensive risk score, and performing supervised learning on the credit evaluation sample added with the type label according to a preset target default prediction model to obtain default probability of the user;
and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
An embodiment of the present application provides a non-volatile computer storage medium, which stores computer-executable instructions, and is characterized in that the computer-executable instructions are configured to:
collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample;
performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user representation scoring system;
clustering the credit evaluation samples to obtain clustering results, and determining the outlier distance of outlier samples in the clustering results relative to a clustering center according to the clustering results;
calculating a composite risk score for the user based on the outlier distance, the first risk score, and the second risk score;
adding a corresponding type label to the credit evaluation sample according to the comprehensive risk score, and performing supervised learning on the credit evaluation sample added with the type label according to a preset target default prediction model to obtain default probability of the user;
and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
The method for evaluating the consumption loan pre-granting credit provided by the application can bring the following beneficial effects:
the method comprises the steps that risk assessment is conducted on credit of a user through an expert scoring system and a user portrait scoring system, the outlier distance is determined through cluster analysis, and then the comprehensive risk score is obtained through the outlier distance, the first risk score and the second risk score, so that compared with a traditional single scoring mode, the assessment error is reduced, and the adaptability is better; machine learning technologies such as full-supervised learning and semi-supervised learning are integrated, credit assessment under a non-default sample is realized, and stability of an output result is improved; the pre-credit line distribution model based on the default probability measure can optimize the credit line of the user, improve the accuracy of the pre-credit line and simultaneously minimize the total expected loss.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an evaluation method for consumer loan pre-credit provided in an embodiment of the present application includes:
s101: and collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample.
The business database stores business information such as credit information, consumption information and the like of different users, and the credit evaluation information comprises basic attributes of the users, asset conditions, personal credit, income conditions, consumption characteristics, enterprise data, plus and minus fraction items and the like.
And after determining the credit evaluation sample, the server needs to preprocess the credit evaluation sample, fill missing field information in the sample data, and normalize the filled complete data, thereby improving the quality of the data sample.
S102: performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user representation scoring system.
After the data preprocessing is carried out on the credit evaluation sample, the server carries out risk evaluation on the credit evaluation sample through a preset risk evaluation model so as to determine a corresponding first risk score and a second risk score. The risk assessment model can be an expert system or a user portrait scoring system, wherein the expert scoring system comprises scoring rules deployed according to expert experience or industry basic rules, the user portrait scoring system is a scoring system driven by historical portrait data of a user, the user portrait scoring system and the user portrait scoring system are scored according to credit assessment information of the user, the first risk score corresponds to the expert scoring system, and the second risk score corresponds to the user portrait scoring system.
S103: and clustering the credit evaluation samples to obtain a clustering result, and determining the outlier distance of the outlier samples in the clustering result relative to the clustering center according to the clustering result.
After the data preprocessing is performed on the credit evaluation samples, the server clusters the credit evaluation samples through a clustering analysis algorithm (for example, a K-Means clustering method based on spatial distance) to obtain a final clustering result. And for each cluster, determining the outlier distance of the outlier sample of the cluster relative to the clustering center, summing the outlier distances, and taking the sum of the outlier distances as the final outlier distance of the credit evaluation sample.
It should be noted that S102 and then S103 may be executed first, S103 and then S102 may be executed first, or S102 and S103 may be executed simultaneously, which is not limited in the embodiment of the present application.
S104: and calculating the comprehensive risk score of the user according to the outlier distance, the first risk score and the second risk score.
After obtaining the outlier distance, the first risk Score, and the second risk Score, the server needs to perform a non-dimensionalization process (e.g., Z-Score) on the first risk Score, the second risk Score, and the outlier distance. The non-dimensionalization processing is also called normalization, normalization or standardization of data, and is to eliminate unit, order and trend differences among different data, so as to achieve homounitization, homovalue localization and homotrenization, and improve the evaluation accuracy of data.
After the non-dimensionalization process is completed, the server calculates a composite risk score for different users according to the outlier distance, the first risk score and the second risk score. The higher the composite risk score, the higher the trust risk level of the user.
Specifically, the comprehensive risk score of the user is determined by an expert scoring system and a user portrait scoring system together, the server needs to determine a first weight and a second weight corresponding to the first risk score and the second risk score respectively, then, the first risk score and the second risk score are weighted and summed according to the first weight and the second weight to obtain a corresponding weighted and summed result, and the weighted and summed result obtained at this time is the risk score weighted by the user. The weighted sum is multiplied by the outlier distance, and the final product is the composite risk score for the user. The above calculation process can be expressed by the following formula:
RiskScorei=(UZSi*w1+XCZSi*w2)*SDisi
wherein, RiskScoreiFor the i-th user's composite risk score, USiIs a first risk score, XCSiIs the second windRisk score, DisiIs the distance of outlier, and is UZS after dimensionless and normalized respectivelyi,XCZSiAnd SDisiThe first weight is w1The second weight is w2。
Compared with the traditional expert scoring mode based on the rule engine, the method has the advantages that on the basis of risk scoring through the personal portrait of the user, the final comprehensive risk scoring is calculated through the outlier distance, the problems of inaccurate assessment, serious homogenization and the like caused by short service period, lack of experience accumulation and the like can be solved, and the personalized requirements of different financial institutions are met.
S105: and adding a corresponding type label to the credit evaluation sample according to the comprehensive risk score, and performing supervised learning on the credit evaluation sample added with the type label according to a preset target default prediction model to obtain the default probability of the user.
In one embodiment, after obtaining the composite risk score, the server may add different type labels to the credit evaluation sample based on the composite risk score to divide the credit evaluation sample into default samples and non-default samples.
Specifically, default samples, first non-default samples, second non-default samples and unknown samples in the credit assessment samples are determined according to the comprehensive risk scores, and corresponding type labels are added to the default samples, the first non-default samples, the second non-default samples and the unknown samples. The samples with the comprehensive risk scores lower than the first preset threshold are default samples, the samples with the comprehensive risk scores higher than the first preset threshold are first non-default samples, the samples with the comprehensive risk scores higher than the second preset threshold are second non-default samples, and other samples except the default samples and the second non-default samples in the credit evaluation samples are unknown samples. It should be noted that the second preset threshold is greater than the first preset threshold.
The server can perform full-supervised learning and semi-supervised learning on the credit assessment sample through a pre-trained target default prediction model (such as a full-supervised target default prediction model and a semi-supervised target default prediction model) to respectively obtain a first default probability and a second default probability of the user. After the first default probability and the second default probability are obtained, a third weight and a fourth weight which correspond to the full-supervision target default prediction model and the semi-supervision target default prediction model respectively are determined, and the first default probability and the second default probability are subjected to weighted summation according to the third weight and the fourth weight, so that the default probability of the user is obtained. According to the embodiment of the application, full-supervised learning and semi-supervised learning are integrated, the accuracy of credit assessment can be improved to the maximum extent, and the credit assessment under a non-default sample is realized.
S106: and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
And directly identifying the user with the default probability higher than a third preset threshold as a loan client and freezing the existing credit line of the user. For the target users with the default probability smaller than the third preset threshold, the profit maximization of the pre-granted credit line can be realized through the following constraint conditions:
wherein, the formula (4) is the decision target, and the income condition is siThe default probability is piThe annual loan rate is riThe response probability is reiDecision variable xiIs the user's payroll multiple, the payroll multiple is an integer L not more than xiU is less than or equal to U, the total credit line is T, and the upper limit of the personal loan is A.
Because limited by the credit scale total amount T, each user can not be guaranteed to obtain the credit line, therefore, under the heuristic principle that default probability is prior, a credit line distribution model can be constructed for target users with default probability smaller than a third preset threshold:
wherein p is
iIn order to be a probability of a breach,
as decision target, y
iAs decision variables, s
iFor the user income case, x
iAs multiples of wages, T
iThe total credit line is granted, and A is the upper limit of the personal loan. Constraint condition (1) represents a heuristic rule with preferential default probability, and constraint condition (2) represents that the total credit line does not exceed T
iThe constraint (3) indicates whether or not the credit can be predicted.
Therefore, the establishment of the credit line distribution model is completed, and the credit line can be directly determined based on the credit line distribution model. After the user submits the personal consumption credit application information, the server inquires the credit evaluation information of the user from the transacted historical business information in the business database according to the business requirement of the user, and outputs the credit line of the user through the credit line distribution model according to the credit evaluation information and the evaluation method.
Fig. 2 is a flowchart illustrating another method for evaluating a consumption loan pre-authorization provided in an embodiment of the present application. As shown in fig. 2, credit evaluation information of a user, such as enterprise data, basic attributes of the user, etc., is collected to construct a corresponding credit evaluation sample, and the credit evaluation sample is subjected to data preprocessing. After preprocessing, on one hand, user risk scores corresponding to different users are determined through an expert scoring system and a user figure scoring system. Wherein the user risk score comprises a first risk score and a second risk score. On the other hand, the preprocessed credit evaluation samples are clustered through a K-mean clustering analysis method to obtain corresponding sample outlier distances. And calculating the comprehensive risk score of the user through a first weight w1 corresponding to the first risk score and a second weight w2 corresponding to the second risk score, as well as the first risk score, the second risk score and the outlier distance. After the comprehensive risk score is obtained, credit evaluation samples are divided into different types according to the comprehensive risk score, corresponding labels are added, and then full-supervised learning and semi-supervised learning are carried out on the credit evaluation samples through a pre-trained target default prediction model to obtain corresponding first default probability and second default probability. And according to the third weight and the fourth weight which respectively correspond to the full-supervision target default prediction model and the semi-supervision target default prediction model, carrying out weighted summation on the obtained first default probability and the second default probability, and obtaining a final default probability. And after the default probability is obtained, constructing a corresponding pre-granted credit line distribution model based on a heuristic principle of default probability priority, so as to determine the credit line of the corresponding user through the pre-granted credit line distribution model.
The above is the method embodiment proposed by the present application. Based on the same idea, some embodiments of the present application further provide a device and a non-volatile computer storage medium corresponding to the above method.
Fig. 3 is a schematic structural diagram of a consumer loan pre-credit assessment apparatus according to an embodiment of the present disclosure. As shown in fig. 3, includes:
at least one processor; and the number of the first and second groups,
at least one processor communicatively coupled memory; wherein,
the memory stores instructions executable by the at least one processor to cause the at least one processor to:
collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample;
performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user portrait scoring system;
clustering the credit evaluation samples to obtain clustering results, and determining the outlier distance of the outlier samples in the clustering results relative to the clustering center according to the clustering results;
calculating a comprehensive risk score of the user according to the outlier distance, the first risk score and the second risk score;
adding corresponding type labels to the credit evaluation samples according to the comprehensive risk scores, and performing supervised learning on the credit evaluation samples added with the type labels according to a preset target default prediction model to obtain default probabilities of users;
and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
An embodiment of the present application provides a non-volatile computer storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are set to:
collecting credit evaluation information of different users, and taking the credit evaluation information as a credit evaluation sample;
performing a risk assessment on the credit assessment sample to determine corresponding first and second risk scores; wherein the first risk score is derived by an expert scoring system and the second risk score is derived by a user portrait scoring system;
clustering the credit evaluation samples to obtain clustering results, and determining the outlier distance of the outlier samples in the clustering results relative to the clustering center according to the clustering results;
calculating a comprehensive risk score of the user according to the outlier distance, the first risk score and the second risk score;
adding corresponding type labels to the credit evaluation samples according to the comprehensive risk scores, and performing supervised learning on the credit evaluation samples added with the type labels according to a preset target default prediction model to obtain default probabilities of users;
and constructing a credit line distribution model aiming at the default probability so as to determine the credit line of the corresponding user according to the credit line distribution model.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and media embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for relevant points.
The device and the medium provided by the embodiment of the application correspond to the method one to one, so the device and the medium also have the similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the method are explained in detail above, so the beneficial technical effects of the device and the medium are not repeated herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.