Disclosure of Invention
In view of the above, there is a need to provide a loan amount calculation method, device, computer device and storage medium, which can accurately determine the amount of a personal loan, thereby reducing the risk of personal loan default and increasing the overall income of the personal loan.
A loan amount calculation method comprises the following steps:
calculating the difference value between the income limit data and the liability limit data to obtain the basic limit data of the object, wherein the income limit data is determined according to the financial information of the object, and the liability limit data is determined according to the credit information of the object;
calculating to obtain a risk coefficient by adopting a linear programming through a pre-trained default probability model, a preset fractional span, a basic total amount obtained by calculation according to basic amount data, bad amount data determined according to an object with overdue loan, a preset risk constraint condition;
determining the hierarchy of an object of each dimension according to a preset dimension, and determining the layering coefficient of the object based on the hierarchy of the object of each dimension;
and calculating the loan amount of the object by using the basic amount data, the risk coefficient and the layering coefficient.
In one embodiment, the method for training the default probability model includes:
analyzing and processing target variables determined through account age analysis to obtain characteristic variables, wherein the target variables comprise objects with loan overdue exceeding the first time and objects with loan not overdue;
performing model fitting on the characteristic variables by using a logistic regression algorithm, and performing model evaluation on a logistic regression model obtained by model fitting;
and under the condition that the evaluation index is not lower than a first preset value and the stability index is not higher than a second preset value in the model evaluation, the logistic regression model for carrying out the model evaluation is a default probability model.
In one embodiment, analyzing and processing the target variable determined by account age analysis to obtain a characteristic variable includes:
acquiring information data of an object for establishing a model, determining a target variable of the information data through account age analysis, and acquiring modeling data in the target variable, wherein the modeling data comprises owned data and third-party data acquired after the object is authorized;
performing descriptive statistics on modeling data;
carrying out data processing on the modeling data subjected to descriptive statistics to obtain characteristic variables, wherein the data processing comprises the following steps: deleting repeated values, processing abnormal values, processing missing values, standardizing data, deriving characteristics, separating variables into boxes, converting evidence weights, and screening characteristics according to information values and correlation coefficients of the variables derived through the characteristics.
In one embodiment, the feature screening according to the information values and the correlation coefficients of the variables derived by the features includes:
calculating an information value of the modeling data;
deleting the modeling data corresponding to the information value smaller than the first information threshold value or the information value larger than the second information threshold value;
calculating correlation coefficients of variables derived through the features and modeling data;
and obtaining the modeling data with the largest information value in the modeling data with the correlation coefficient larger than the threshold value of the correlation coefficient.
In one embodiment, the method for obtaining the risk coefficient by using the linear programming calculation through the pre-trained default probability model, the pre-set fractional span, the basic total amount calculated according to the basic amount data, the bad amount data determined according to the object with overdue loan, the preset risk constraint condition and the linear programming calculation comprises the following steps:
calculating the default probability of the object by using a pre-trained default probability model;
determining a score of the object based on the probability and a conversion coefficient obtained by the fractional span calculation;
determining a risk level of the subject according to the score;
and calculating basic total amount and bad amount data in the risk grade, and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the grade.
In one embodiment, the step of calculating the basic total amount and bad amount data in the risk level and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the level includes:
calculating the sum of basic quota data of the object in each risk level to obtain basic total quota data of each level;
calculating the sum of basic limit data of objects with overdue loan in each risk level to obtain bad limit data of each level;
calculating to obtain a risk adjustment coefficient of each risk grade according to the basic total limit data and the bad limit data of each risk grade and a preset risk constraint condition;
calculating the reject ratio of the total quota according to the basic total quota data, the bad quota data and the risk adjustment coefficient of each risk grade;
and under the condition that the total amount reject ratio is minimum and the risk constraint condition is met, the corresponding risk adjustment coefficient is the risk coefficient corresponding to the risk grade.
In one embodiment, the preset risk constraints include:
arranging risk coefficients according to the risk grades, and gradually reducing the risk coefficients;
the difference between the adjacent risk coefficients is greater than or equal to a first preset difference;
ranking the first and second risk coefficients greater than a first threshold, the risk coefficients outside the first and second rankings being less than the first threshold;
and calculating the quota ratio of each risk level according to the risk adjustment coefficient of each risk level and the basic total quota data, wherein the sum of the quota ratios of each risk level is a first preset percentage.
In one embodiment, the quota ratio of each risk level is calculated by adopting the following formula through the risk adjustment coefficient of each risk level and the basic total quota data:
the basic total amount data, the bad amount data and the risk adjustment coefficient of each risk grade are calculated by adopting the following formula to obtain the total amount bad rate:
wherein, biBad credit data for the ith risk class, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
In one embodiment, the preset dimensions include: unit dimension, payroll horizontal dimension, unit zone dimension, generation month dimension, credit history length dimension, and history repayment performance dimension.
In a second aspect, the present disclosure also provides a loan amount calculation apparatus, including:
the income amount determining module is used for determining income amount data according to the financial information of the object;
the liability credit limit determining module is used for determining liability credit limit data according to credit information of the object;
the basic limit calculation module is used for calculating the difference value of the income limit data and the liability limit data to obtain the basic limit data of the object;
the risk coefficient calculation module is used for calculating a risk coefficient through a pre-trained default probability model, a preset conversion coefficient, a basic total amount obtained through calculation according to basic amount data, bad amount data determined according to an overdue object of the loan, a preset risk constraint condition and linear programming;
the hierarchical coefficient determining module is used for determining the hierarchy of the object of each dimension according to the preset dimension and determining the hierarchical coefficient of the object based on the hierarchy of the object of each dimension;
and the loan amount calculation module is used for calculating the loan amount of the object by using the basic amount data, the risk coefficient and the layering coefficient.
In one embodiment of the apparatus, the apparatus further comprises: the device comprises an analysis processing module, a model evaluation module and a model determination module;
and the analysis processing module is used for analyzing and processing the target variables determined through the account age analysis to obtain the characteristic variables, wherein the target variables comprise objects with loan overdue exceeding the first time and objects with loan not overdue.
And the model evaluation module is used for performing model fitting on the characteristic variables by using a logistic regression algorithm and performing model evaluation on the logistic regression model obtained by model fitting.
And the model determining module is used for determining the logistic regression model for model evaluation as the default probability model under the condition that the evaluation index is not lower than a first preset value and the stability index is not higher than a second preset value in the model evaluation performed by the model evaluation module.
In one embodiment of the apparatus, the analysis processing module includes: the system comprises a target variable determining module, a descriptive counting module and a data processing module;
and the target variable determining module is used for acquiring information data of the object for establishing the model, determining the target variable of the information data through account age analysis, and acquiring modeling data in the target variable, wherein the modeling data comprises owned data and third-party data acquired after the object is authorized.
And the descriptive statistic module is used for performing descriptive statistics on the modeling data.
The data processing module is used for carrying out data processing on the modeling data subjected to the descriptive statistics to obtain characteristic variables, and the data processing comprises the following steps: deleting repeated values, processing abnormal values, processing missing values, standardizing data, deriving characteristics, separating variables into boxes, converting evidence weights, and screening characteristics according to information values and correlation coefficients of the variables derived through the characteristics.
In one embodiment of the apparatus, the data processing module comprises: the device comprises an information value calculating module, an information value deleting module, a correlation coefficient calculating module and an obtaining module;
and the information value calculating module is used for calculating the information value of the modeling data.
And the information value deleting module is used for deleting the modeling data corresponding to the information value smaller than the first information threshold value or the information value larger than the second information threshold value.
And the correlation coefficient calculation module is used for calculating the correlation coefficient of the variable derived from the characteristics and the modeling data.
And the acquisition module is used for acquiring the modeling data with the largest information value in the modeling data with the correlation coefficient larger than the correlation coefficient threshold value.
In one embodiment of the apparatus, the risk factor calculation module comprises: the system comprises a default probability calculation module, a score determination module, a risk grade matching module and a risk coefficient calculation module;
and the default probability calculation module is used for calculating the default probability of the object by utilizing a pre-trained default probability model.
And the score determining module is used for determining the score of the object based on the probability and the conversion coefficient obtained by the fractional span calculation.
And the risk grade matching module is used for determining the risk grade of the object according to the score.
And the risk coefficient calculation module is used for calculating the basic total amount and bad amount data in the risk level and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the level.
In one embodiment of the apparatus, the risk factor calculation module comprises: the system comprises a basic total amount data calculation module, a bad amount data calculation module, a risk adjustment coefficient calculation module, a total amount bad rate calculation module and a risk coefficient determination module;
and the basic total amount data calculation module is used for calculating the sum of the basic amount data of the object in each risk level to obtain the basic total amount data of each level.
And the bad credit line data calculation module is used for calculating the sum of basic credit line data of objects with overdue loan in each risk level to obtain the bad credit line data of each level.
And the risk adjustment coefficient calculation module is used for calculating the risk adjustment coefficient of each risk grade according to the basic total amount data and the bad amount data of each risk grade and the preset risk constraint condition.
And the total amount reject ratio calculation module is used for calculating the total amount reject ratio through the basic total amount data, the bad amount data and the risk adjustment coefficient of each risk grade.
And the risk coefficient determining module is used for determining the corresponding risk adjustment coefficient as the risk coefficient corresponding to the risk grade under the condition that the total limit reject ratio is minimum and the risk constraint condition is met.
In one embodiment of the apparatus, the risk factor calculating module further includes: a risk constraint condition setting module for setting risk constraint conditions, wherein the risk constraint conditions comprise: arranging risk coefficients according to the risk grades, and gradually reducing the risk coefficients; the difference between the adjacent risk coefficients is greater than or equal to a first preset difference; ranking the first and second risk coefficients greater than a first threshold, the risk coefficients outside the first and second rankings being less than the first threshold; and calculating the quota ratio of each risk level according to the risk adjustment coefficient of each risk level and the basic total quota data, wherein the sum of the quota ratios of each risk level is a first preset percentage.
In one embodiment of the apparatus, the risk constraint setting module comprises: the quota ratio calculation module is used for calculating quota ratio of each risk level by adopting the following formula through the risk adjustment coefficient of each risk level and the basic total quota data:
wherein, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
In one embodiment of the apparatus, the total credit rejection rate calculating module is further configured to calculate the total credit rejection rate by using the following formula:
wherein, biBad credit data for the ith risk class, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
In one embodiment of the apparatus, the preset dimension in the hierarchical coefficient determining module includes: unit dimension, payroll horizontal dimension, unit zone dimension, generation month dimension, credit history length dimension, and history repayment performance dimension.
In a third aspect, the present disclosure also provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.
In a fifth aspect, the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the above method.
The loan limit calculation method, the loan limit calculation device, the loan limit calculation computer equipment and the storage medium combine financial data and credit investigation data to determine income limit and liability limit, further determine basic limit data, adopt risk coefficients calculated by default probability models and based on preset fractional spans, basic total limit calculated according to the basic limit data, bad limit data determined according to objects whose loans are overdue, preset risk constraint conditions and risk coefficients of the objects calculated by a linear programming method, effectively reduce the bad rate of the total limit, determine layering coefficients through preset dimensions, further adjust credit lines by adopting the layering coefficients, effectively improve the credit lines of high-quality objects, finally determine the credit lines of the objects through the layering coefficients, the risk coefficients and the basic limit data, and can realize accurate assessment of client risks, the method has the advantages that the limit of the personal loan is accurately determined, the personal loan default risk is reduced, and the overall income of the personal loan is improved.
On the other hand, the characteristic variables can be obtained by carrying out data analysis and data processing on the target variables used by the training model, wherein the characteristic variables comprise the characteristics of multiple dimensions and are meaningful characteristics for model training, and the default probability model is trained by using the characteristic variables during model training, so that the default probability model obtained after training can accurately obtain default probability.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more clearly understood, the present disclosure is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not intended to limit the disclosure.
In the embodiments herein, the term "and/or" is only one kind of association relation describing an associated object, and means that there may be three kinds of relations. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be noted that the terms "first," "second," and the like in the description and claims herein and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments herein described are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
In recent years, with the continuous transition and increase of consumption level and consumption concept, the proportion of personal credit is increasing, but at present, the following problems exist in determining the credit line of personal credit:
(1) data and information are not comprehensive, most of the existing credit measurement methods only consider unilateral most basic data, such as existing assets, income, liabilities and the like. If only the most basic data of one side is considered, the credit is excessively granted, so that the personal loan default risk is greatly improved.
(2) The existing limit measuring and calculating method uses the traditional grading card method, only considers the risk coefficient of the client, does not subdivide according to the specific situation of the client, and lacks the limit measuring and calculating model subdivided by the client, which can cause inaccurate limit measuring and calculating and further cause greatly improved personal loan default risk.
Therefore, in view of the above problems, the present disclosure provides a loan amount calculation method, which is exemplified by applying the method to a terminal, and it is understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server, where the terminal may be, but is not limited to, various personal computers, notebook computers, smart phones, and tablet servers, and may be implemented by an independent server or a server cluster consisting of a plurality of servers. In this embodiment, as shown in fig. 1, the method includes the following steps:
s102, calculating a difference value between income limit data and liability limit data to obtain basic limit data of the object, wherein the income limit data is determined according to financial information of the object, and the liability limit data is determined according to credit information of the object.
The income amount data can be the income amount of the object determined according to multi-dimensional data in the financial information. The liability amount data may be liability data of the subject determined from the credit amount and the credit card amount in the credit information. The basic quota data may represent the most basic quota available for use by the subject. The financial information may be information of various expenses and incomes of the subject acquired through a financial institution. The credit investigation information can be collected, arranged and stored by a personal credit database established by a specific organization, provides credit report inquiry service for commercial banks and individuals, and provides personal credit information used by related information service for other purposes of currency policy making, financial supervision and legal and regulatory provisions. An object may generally refer to a person or a personal business that is capable of making loans.
Specifically, by acquiring financial information of a subject in a pedestrian, a bank supervisor or a union pay, the financial information may include multi-dimensional data such as annual income, annual accrual payroll, public accumulation fund, running water amount and the like; the multi-dimensional data is used for fitting personalized scenes corresponding to different objects, different products and different scene characteristics can be considered, and the multi-dimensional data has advantages compared with single-dimensional data. And calculating to obtain income amount data according to the financial information and corresponding coefficients preset by technicians in the field. And determining the liability amount data through the information such as the total supply amount of the existing consumption loan month and the average using amount of the credit card amount month in the acquired credit information of the object. The liability limit data is helpful for further examining the current liability information of the subject and incorporating the information of other financial institutions, so that excessive credit is avoided and the risk is further reduced. In some embodiments, the liability credit data may be calculated by: the data of the credit line is the total supply of the current consumption loan month and the monthly usage amount of the credit card line. And calculating the difference value between the income amount data and the liability amount data, and obtaining the basic amount data of the object according to the difference value.
And S104, calculating to obtain a risk coefficient by using a pre-trained default probability model, a pre-set fractional span, a basic total amount obtained by calculation according to the basic amount data, bad amount data determined according to the object with overdue loan, a preset risk constraint condition and linear programming.
The default probability model trained in advance can be a model obtained by training based on methods such as feature engineering and machine learning. Feature engineering may be a process of performing a series of engineering processes on raw data to refine it into features that can be used as input for algorithms and/or models, and is a process of representing and presenting data. In actual practice, feature engineering aims to remove impurities and redundancies in the raw data. Machine learning is a multi-domain interdisciplinary of artificial intelligence, and the main research object in the field is artificial intelligence, particularly how to improve the performance of a specific algorithm in empirical learning. Currently, popular machine learning algorithms include gradient boosting trees (GBDT, LGBM, etc.), linear regression, naive bayes, random forests, ensemble models, etc. The skilled person can select the corresponding machine learning algorithm according to the actual situation.
The fractional span may be data that can calculate a conversion coefficient so that the hierarchical level of the object is calculated by the conversion coefficient. The bad amount data may be basic amount data of an object whose loan is overdue. The risk constraints are usually referred to as construction conditions of the linear programming problem. The risk factor may be a factor for adjusting the credit limit of the subject, and may be a factor for minimizing the limit failure rate.
Specifically, default probability is calculated through a pre-trained default probability model, conversion coefficients are calculated through fractional spans, the default probability is converted into corresponding scores through the conversion coefficients, risk levels of objects are determined according to the corresponding scores, basic total amount obtained through calculation according to the basic amount data, bad amount data determined according to the objects with overdue loans and preset risk constraint conditions are calculated, and risk coefficients under each risk level are obtained.
S106, determining the object level of each dimension according to the preset dimension, and determining the layering coefficient of the object based on the object level of each dimension.
The preset dimension may be a dimension determined according to a region, a payroll level, loan information, and the like. The tier factor may generally be a factor that can reduce the amount of credit risk.
Specifically, the hierarchy of the object of each dimension is determined according to the unit dimension, the payroll horizontal dimension, the unit area dimension, the generation month dimension, the credit history length dimension and the history repayment expression dimension, and the hierarchical coefficient of the object is determined according to the hierarchy of the object of each dimension. Wherein, those skilled in the art can determine the layering coefficient of the object by the preset determination rule. The person skilled in the art can also determine the layering coefficients of the objects through preset dimensions according to self experience.
And S108, calculating the loan amount of the object by using the basic amount data, the risk coefficient and the layering coefficient.
Specifically, the loan amount of the subject is the basic amount data × the risk coefficient × the tier coefficient.
In the loan amount calculation method, income amount and liability amount are determined by combining financial data and credit investigation data, basic amount data is further determined, risk coefficients obtained by calculation of default probability models are adopted, basic total amount obtained by calculation of the basic amount data is calculated based on preset fractional span, bad amount data determined by the object whose loan is overdue and preset risk constraint conditions are used for calculating the risk coefficient of a client by linear programming, the total amount reject ratio is effectively reduced, the credit line of a high-quality object is effectively improved by adopting a layering coefficient to adjust the credit line, and finally the credit line of the object is determined by the layering coefficient, the risk coefficient and the basic amount data, so that accurate assessment of client risks can be realized, the amount of personal loan is accurately determined, and personal loan default risks are reduced, and improving the overall income of the personal loan.
In one embodiment, as shown in fig. 2, the method for training the default probability model includes:
and S202, analyzing and processing target variables determined through account age analysis to obtain characteristic variables, wherein the target variables comprise objects with loan overdue exceeding the first time and objects with loan not overdue.
The account age analysis can be a view analysis which is widely applied to the financial credit industry, and the analysis method is to track credit accounts generated in different periods respectively and perform synchronous comparison according to the account age so as to know the quality conditions of assets approved by the accounts in different periods. The analysis process may generally be a way of processing the target variable, which may include both data analysis and data processing. Feature variables are typically data whose raw data attributes are transformed into features by processing, which can be used to train the model.
Specifically, the characteristic variables are obtained by performing data analysis and data processing on the target variables determined after the view analysis. The target variables may include objects for which the loan is overdue beyond a first time (which may be generally defined as bad customers) and objects for which the loan is not overdue (which may be generally defined as good customers). It should be noted that, in this embodiment, the first time is a preset time, and a person skilled in the art may select a setting according to a specific scenario, and the setting may be 30 days, 60 days, and the like, which is not limited in this embodiment. The first time of this embodiment is typically 30 days.
And S204, performing model fitting on the characteristic variables by using a logistic regression algorithm, and performing model evaluation on a logistic regression model obtained by model fitting.
Among them, logistic regression is also called logistic regression analysis, which is a generalized linear regression analysis model. Model fitting may be the process of supervised learning, which may be the knowledge of the relationship between input and output results from an existing data set (which may be a training set). Based on this known relationship, an optimal model is trained. By finding the relation between the characteristics and the labels, when only the characteristics have no labeled data, the labels of the data are judged more accurately.
Specifically, the obtained feature variables are divided into a training set and a verification set by performing data division, the division mode may be random division or setting a time point, the feature variables before the time point are a test set, and the feature variables after the time point are a verification set. The feature variables before the time point may be a verification set, and the feature variables after the time point may be a test set. The random division may be a division of the training set and the validation set according to a preset ratio or other means. After the classification into a training set and a verification set, model fitting is carried out through the training set pair and by using a logistic regression algorithm, namely the model training process, and a logistic regression model is obtained after fitting. And performing model evaluation on the obtained logistic regression model.
S206, under the condition that the evaluation index is not lower than a first preset value and the stability index is not higher than a second preset value in model evaluation, the logistic regression model for model evaluation is a default probability model.
Model evaluation may generally include, among other things: and evaluating the accuracy of the model and evaluating the stability of the model. The evaluation index may be generally a KS value, which enables an accuracy evaluation of the model, and the KS value is an evaluation index used in the model for distinguishing the degree of separation of the predictive positive and negative samples. The Stability indicator may typically be a psi (Stability index) value, which enables an assessment of the Stability of the model.
Specifically, a KS value and a PSI value of the logistic regression model are calculated. In the model evaluation, when the KS value is not lower than the first preset data (where the first preset value may be 0.35), and the PSI value is not higher than the second preset value (where the second preset value may be 0.1), the logistic regression model for the model evaluation is the default probability model.
The default probability model may be:
where p is the probability of violation, x is the characteristic variable, and w is the correlation coefficient of the characteristic variable.
And if the KS value of the model to be evaluated is higher than the first preset data and/or the stability index is lower than the second preset value, fitting the model through the test set again until the evaluation index is not lower than the first preset value and the stability index is not higher than the second preset value.
In another embodiment, the default probability model may also use credit scoring cards to predict the default probability. Credit rating cards may be a means of measuring risk by credit rating, enabling prediction of overdue for a future period of time. The model principle is a generalized linear model Of binary variables, which is carried out by discretizing the variable WOE (weight Of event) coding mode and then applying logistic regression.
In this embodiment, the target variables are subjected to data analysis and data processing, so that the feature variables can be obtained, wherein the feature variables include features of multiple dimensions and are features meaningful for model training, the default probability model is trained by using the feature variables during model training, and the default probability model can accurately obtain the default probability.
In one of the first embodiments, as shown in fig. 3, the analyzing and processing the target variable determined by account age analysis to obtain the characteristic variable includes:
s302, acquiring information data of an object for establishing a model, determining a target variable of the information data through account age analysis, and acquiring modeling data in the target variable, wherein the modeling data comprises owned data and third-party data acquired after the object is authorized;
specifically, the historical loan objects are found from a database of the financial institution, and objects meeting the conditions are screened from the historical loan objects according to preset conditions to serve as objects for establishing the model. Determining customer standards, namely target variables, of the object of the model through the view analysis in the object of the model, and specifically comprising the following steps: and when the clients whose loan is overdue for more than 30 days are bad clients, the clients whose loan is overdue for 0 to 30 days are defined as good clients, and the clients whose loan is overdue for 0 to 30 days need to temporarily delete the clients from the object of the model establishment and define the clients as uncertain clients. And obtaining modeling data in the target variable, wherein the modeling data can comprise self-owned data and third-party data obtained after the object is authorized. The owned data may include loan information for the subject, such as the date of the loan, the date of the repayment, etc. The third-party data obtained after the object authorization can include credit investigation information, financial information and the like.
And S304, performing descriptive statistics on the modeling data.
Descriptive statistics generally refers to the activities of characterizing data using tabulations and classifications, graphs, and computing generalized data. Descriptive statistical analysis is to statistically describe the data about all variables of the survey population, and mainly includes frequency analysis, central tendency analysis, discrete degree analysis, distribution, some basic statistical graphs and the like of the data.
Specifically, the modeling data is subjected to descriptive statistical evaluation of the distribution of each variable value in the modeling data, identification of extreme values, and the like.
S306, performing data processing on the modeling data subjected to descriptive statistics to obtain characteristic variables, wherein the data processing comprises the following steps: deleting repeated values, processing abnormal values, processing missing values, standardizing data, deriving characteristics, separating variables into boxes, converting evidence weights, and screening characteristics according to information values and correlation coefficients of the variables derived through the characteristics.
The evidence weight conversion can be WOE (weight Of event) conversion, which is the influence Of the argument taking a certain value on the target variable. The size of the information value determines the degree of influence of the independent variable on the target variable. The WOE and the information value are used for measuring the prediction capability of the variable, and the larger the value is, the stronger the prediction capability of the variable is.
Specifically, repeated value deletion, abnormal value processing, missing value processing, data standardization, feature derivation, variable binning, evidence weight conversion and feature screening are performed on modeling data subjected to descriptive statistics according to information values and correlation coefficients of variables derived through features.
Wherein, the feature derivation may include cross-comparison (deriving information related thereto according to the provided address information of the object), i.e. an operation of deriving information related thereto according to the information in the target variable.
The variable binning may use an optimal binning method based on a tree model, or may use other optimal binning methods such as chi-square, and the binning method is not limited in this embodiment.
In one embodiment, as shown in fig. 4, the performing feature screening according to the information values and the correlation coefficients of the variables derived by the features includes:
s402, calculating an information value of the modeling data;
s404, deleting the modeling data corresponding to the information value smaller than the first information threshold value or the information value larger than the second information threshold value;
s406, calculating correlation coefficients of variables derived through characteristics and the modeling data;
s408, obtaining the modeling data with the largest information value in the modeling data with the correlation coefficient larger than the correlation coefficient threshold value.
In particular, by
The WOE value is calculated, and then by the formula:
and calculating to obtain the information value of the modeling data.
And deleting the modeling data of which the information value is smaller than the first information threshold or larger than the second information threshold to obtain the first modeling data, wherein in some embodiments, the first information threshold can be 0.02, and the second information threshold can be 0.5. And calculating the correlation coefficient of the variable derived from the modeling variable and the characteristic through a correlation coefficient calculation formula. And obtaining modeling data with the correlation coefficient larger than the threshold value of the correlation coefficient, recording the modeling data as second modeling data, and obtaining the modeling data with the largest information value in the second modeling data. And combining the modeling data with the largest information value in the second modeling data with the first modeling data to obtain final modeling data, namely the characteristic variables.
In the embodiment, the modeling variables with better prediction capability can be obtained by screening the modeling variables through the information values and the correlation coefficients, and the accuracy of the default probability model in calculating the default probability is improved.
In one embodiment, as shown in fig. 5, the obtaining of the risk coefficient by using a trained default probability model, a preset fractional span, a basic total amount calculated according to the basic amount data, bad amount data determined according to the object whose loan is overdue, a preset risk constraint condition, and linear programming calculation includes:
s502, calculating the default probability of the object by utilizing the pre-trained default probability model.
S504, determining the score of the object based on the probability and a conversion coefficient obtained through fractional span calculation.
S506, determining the risk level of the subject according to the score.
And S508, calculating the basic total amount and the bad amount data in the risk grade, and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the grade.
Specifically, after the default probability is calculated by using a pre-trained default probability model, the conversion coefficient is calculated based on the preset fractional span and by the following formula:
where a and B are calculated transformation coefficients, and PDO and p0 are preset specific point scores, both of which are constants, in some embodiments, PDO may be 20 and p0 may be 600.
After the conversion coefficient is calculated, the score of the corresponding object is determined by the following formula:
after the object score is calculated, the object is graded according to the preset grading standard and the object score, so that the risk level of the object is determined. The preset grading criteria may include:
(1) in each risk level, the upper and lower boundaries between the acquired regions are positive numbers.
(2) Can be divided into 5 risk levels, the risk levels from high to low account for 20%, 25%, 30%, 20% and 5%, respectively.
(3) The risk levels increase from high to low overdue rates.
(4) The risk level is increased from high to low in sequence, wherein the calculation formula of the lifting rate can be as follows:
in some embodiments, a specific risk rating table is shown in table 1:
TABLE 1 Risk ratings table
It should be noted that the data in table 1 are only examples, the cumulative number of overdue clients of the E risk level may be the number of overdue clients of the E risk level, the cumulative number of overdue clients of the D risk level may be the sum of the number of overdue clients of the E risk level and the number of overdue clients of the D risk level, and the cumulative number of overdue clients of each level is obtained by analogy. The calculation method of accumulating all clients may refer to the accumulated overdue clients, and will not be described herein repeatedly.
In one embodiment, as shown in fig. 6, calculating the basic total amount and the bad amount data in the risk level and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the level includes:
s602, calculating the sum of the basic quota data of the object in each risk level to obtain the basic total quota data of each level.
And S604, calculating the sum of basic limit data of objects with overdue loans in each risk level to obtain bad limit data of each level.
And S606, calculating to obtain a risk adjustment coefficient of each risk level according to the basic total amount data and the bad amount data of each risk level and a preset risk constraint condition.
S608, calculating the bad rate of the total amount according to the basic total amount data, the bad amount data and the risk adjustment coefficient of each risk level.
S610, when the total amount reject ratio is minimum and the risk constraint condition is satisfied, the corresponding risk adjustment coefficient is the risk coefficient corresponding to the risk grade.
The linear programming is an important branch of operational research, and is widely applied to aspects of military operations, economic analysis, operation management, engineering technology and the like. Provides scientific basis for making optimal decision by reasonably utilizing limited resources such as manpower, material resources, financial resources and the like.
Specifically, basic quota data of the objects of each level are obtained, the basic quota data of each object are added to obtain corresponding basic total quota data in the level, objects with overdue loans in each level are obtained, and basic quota data of all objects with overdue loans in the level are added to obtain corresponding bad quota data of the level. And calculating according to the basic total amount data and the bad amount data which are calculated by each level and the preset risk constraint conditions of linear programming to obtain the risk adjustment coefficient corresponding to each risk level. At this time, only the risk adjustment coefficient is roughly calculated, but a specific condition has not been satisfied. And calculating the total limit reject ratio according to the basic total limit data, the bad limit data and the risk adjustment coefficient of each risk grade, and adjusting the risk adjustment coefficient to ensure that the total limit reject ratio is minimum.
The defective rate before the risk coefficient, which is the sum of the defective credit line data for each rank/the sum of the basic total credit line data for each rank, may be calculated. The credit limit reject rate before the risk coefficient adjustment is compared with the total credit limit reject rate to obtain the total credit limit reject rate, and the credit limit reject rate before the risk coefficient adjustment is higher than the total credit limit reject rate, so that the whole credit limit reject rate can be reduced through the risk coefficient.
In this embodiment, the risk adjustment coefficient is adjusted by calculating the risk adjustment coefficient of each risk level, the risk adjustment coefficient corresponding to the minimum total credit rate is the risk level, the purpose of solving the risk coefficient is to reduce the credit of bad customers with high default probability in order to promote the credit of customers with lower default probability, so that the total credit rate is reduced, and the overdue risk is controlled.
In one embodiment, the preset risk constraints include:
arranging the risk coefficients according to the risk grades, and gradually reducing the risk coefficients;
the difference between the adjacent risk coefficients is greater than or equal to a first preset difference;
ranking the first and second risk coefficients above a first threshold, the risk coefficients outside the first and second ranking being below a first threshold;
and calculating the quota ratio of each risk level according to the risk coefficient of each risk level and the basic total quota data, wherein the sum of the quota ratios of each risk level is a first preset percentage.
Specifically, in some embodiments, for example, the risk coefficients ranked according to risk level A, B, C, D, E are X1, X2, X3, X4, and X5, respectively, then the preset risk constraint may be:
(1) the risk factors X1 to X5 are gradually decreased.
(2) The risk coefficients X1 to X5 are different from each other by 0.1 or more, such as the difference between X1 and X2 is 0.1 or more, and the difference between X2 and X3 is 0.1 or more.
(3) The risk coefficients X1 and X2 are greater than 1, and the risk grade coefficients X3, X4, X5 are less than or equal to 1.
(4) And calculating the quota ratio of each risk grade through the risk coefficient of each grade and the basic total quota data, wherein the sum of the quota ratios of each grade is one hundred percent.
In this embodiment, the risk factor with the risk level of A, B and a low default probability is adjusted to be greater than 1, so that the purpose of increasing the risk limit after the factor adjustment is achieved, and the risk factor with the risk level of C, D, E and a high default probability is adjusted to be less than 1, so that the risk limit can be reduced.
In one embodiment, the quota ratio of each risk level is calculated by adopting the risk adjustment coefficient of each risk level and the basic total quota data according to the following formula:
the basic total amount data, the bad amount data and the risk adjustment coefficient of each risk grade are calculated by adopting the following formula to obtain the total amount bad rate:
wherein, biBad credit data for the ith risk class, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
Specifically, the total credit bad rate may be a sum of a value obtained by multiplying the bad credit of each level by the corresponding risk factor, and a sum of a value obtained by multiplying the basic total credit data of each level by the corresponding risk factor.
In one embodiment, as shown in fig. 7, determining the hierarchy of the object in each dimension according to a preset dimension, and determining the layering coefficient of the object based on the hierarchy of the object in each dimension includes:
s701, determining the unit dimension level of the object according to the unit dimension.
Specifically, a unit dimension is set according to a unit property of the object, and the unit dimension includes: national enterprises, government institutions, public institutions, troops, private enterprises, foreign enterprises and the like. And determining the unit level of the object according to the unit of the object. The national enterprise, government organization, public institution and troops prove that the work of the object is stable, and the unit level can be correspondingly set to be a higher unit level.
S702, determining the payroll level of the object according to the payroll level dimension of the object.
Specifically, after the unit of the object is determined, the unit is classified according to the payroll level of the object in the unit, the surrogated payrolls of all employees in the unit are sorted into a plurality of grades, and the grade of the object in the unit is further determined, wherein the higher the grade is, the higher the payroll level of the object is.
In some embodiments, the surrendering payroll levels may be divided into a, b, c, d, e, f, g, h, i, with several levels each matching a quantile value, see table 2, with higher quantile values indicating higher levels of payroll levels.
TABLE 2 payroll level hierarchy table
| More than 90 minutes | a |
| 80-90 quantile | b |
| 70-80 quantile | c |
| 60-70 quantile | d |
| 50-60 quantiles | e |
| 40-50 quantile | f |
| 30-40 quantiles | g |
| 20-30 quantiles | h |
| Below 20 decimals | i |
S703, determining the unit area level of the object according to the unit area dimension.
Specifically, a unit area dimension is determined according to a city where a unit of the object is located, and the unit area dimension includes: first-line city dimension, new first-line city dimension, second-line city dimension, third-line city dimension, four-line city dimension, and five-line city dimension. And determining unit area dimensions according to the city where the unit is located, wherein the unit area corresponding to the first-line city dimension and the new-line city dimension is higher in level.
S704, determining a generation month degree level according to the generation month dimension of the object.
Specifically, the number of months in which the object has received payroll since the last 12 months is counted, and if the object has received payroll for 3 months since the last 12 months, the number of months is 3. The generation month dimension includes: the generation and issue months are extremely short (the generation and issue months are 0-3), the generation and issue months are short (the generation and issue months are 4-6), the generation and issue months are common (the generation and issue months are 6-9), the generation and issue months are long (the generation and issue months are 9-11), and the generation and issue months are long (the generation and issue months are 12). And matching the corresponding generation month dimension according to the number of months in which the object has received payroll since 12 months, thereby determining a generation month degree hierarchy. The more the generation months, the higher the corresponding generation month level.
S705, determining a credit hierarchy according to the credit history length dimension of the object.
Specifically, the credit history length dimension of the object is obtained according to the application date of the object in the last loan minus the application date of the object in the first loan. The credit history length dimension is divided into 5 grades from low to high according to the time length, and the grades can be respectively as follows: the method is characterized in that the credit history length is extremely short (A gear), the credit history length is short (B gear), the credit history length is medium (C gear), the credit history length is long (D gear), and the credit history length is extremely long (E gear). The credit history length dimension can be graded by the skilled person according to self experience, and the time length of grading is not limited in the embodiment. A credit hierarchy is determined from the credit history length dimension. The shorter the credit history length, the higher the corresponding credit tier.
S706, determining a payment dimension level according to the historical payment expression dimension of the object.
Specifically, the historical repayment representing dimension is determined according to the repayment condition of the subject consumption loan, and the historical repayment representing dimension may include: after the history is over, normal clear (less than 3 strokes), normal clear (more than 3 strokes) and normal clear (more than 6 strokes). Normal clear (6 pens and above) and normal clear (3 pens and above), normal clear (6 pens and above) knot corresponds to repayment dimension level higher.
And S707, determining the hierarchical level of the object according to the unit dimension level, the payroll level, the unit area level, the generation month level, the credit level and the repayment dimension level, and determining the hierarchical coefficient according to the hierarchical level. The hierarchical levels may include: the system comprises a high-quality client 1, a high-quality client 2, a high-quality client 3, a common client 4, a common client 5 and a common client 6, wherein the high-quality client can obtain higher risk quota after being adjusted by a layering coefficient, and the risk quota is reduced after the common client is adjusted by the layering coefficient. And setting a layering coefficient corresponding to the layering level according to a rule preset by a person in the field or self experience. As shown in table 3.
TABLE 3 hierarchical level and hierarchical coefficient correspondence
It should be understood that, although the steps in the flowcharts in the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of the flowcharts in the figures may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
In one embodiment, as shown in fig. 8, there is provided a loan amount calculation apparatus 800 including: incomeamount determining module 801, liabilityamount determining module 802, basicamount calculating module 803, riskcoefficient calculating module 804, layeringcoefficient determining module 805, loanamount calculating module 806, wherein:
and an incomeamount determining module 801, configured to determine income amount data according to the financial information of the subject.
And a liability creditlimit determining module 802, configured to determine liability credit limit data according to the credit information of the object.
The basicamount calculating module 803 is used for calculating the difference between the income amount data and the liability amount data to obtain the basic amount data of the object.
The riskcoefficient calculation module 804 is configured to calculate a risk coefficient by using a pre-trained default probability model, a pre-set conversion coefficient, a basic total amount calculated according to the basic amount data, bad amount data determined according to an object whose loan is overdue, a preset risk constraint condition, and linear programming.
A layeringcoefficient determining module 805, configured to determine, according to a preset dimension, a hierarchy of an object in each dimension, and determine, based on the hierarchy of the object in each dimension, a layering coefficient of the object.
And the loanamount calculation module 806 is used for calculating the loan amount of the object by using the basic amount data, the risk coefficient and the layering coefficient.
In one embodiment of the apparatus, the apparatus further comprises: the device comprises an analysis processing module, a model evaluation module and a model determination module;
and the analysis processing module is used for analyzing and processing the target variables determined through the account age analysis to obtain the characteristic variables, wherein the target variables comprise objects with loan overdue exceeding the first time and objects with loan not overdue.
And the model evaluation module is used for performing model fitting on the characteristic variables by using a logistic regression algorithm and performing model evaluation on the logistic regression model obtained by model fitting.
And the model determining module is used for determining the logistic regression model for model evaluation as the default probability model under the condition that the evaluation index is not lower than a first preset value and the stability index is not higher than a second preset value in the model evaluation performed by the model evaluation module.
In one embodiment of the apparatus, the analysis processing module includes: the system comprises a target variable determining module, a descriptive counting module and a data processing module;
and the target variable determining module is used for acquiring information data of the object for establishing the model, determining the target variable of the information data through account age analysis, and acquiring modeling data in the target variable, wherein the modeling data comprises owned data and third-party data acquired after the object is authorized.
And the descriptive statistic module is used for performing descriptive statistics on the modeling data.
The data processing module is used for carrying out data processing on the modeling data subjected to the descriptive statistics to obtain characteristic variables, and the data processing comprises the following steps: deleting repeated values, processing abnormal values, processing missing values, standardizing data, deriving characteristics, separating variables into boxes, converting evidence weights, and screening characteristics according to information values and correlation coefficients of the variables derived through the characteristics.
In one embodiment of the apparatus, the data processing module comprises: the device comprises an information value calculating module, an information value deleting module, a correlation coefficient calculating module and an obtaining module;
and the information value calculating module is used for calculating the information value of the modeling data.
And the information value deleting module is used for deleting the modeling data corresponding to the information value smaller than the first information threshold value or the information value larger than the second information threshold value.
And the correlation coefficient calculation module is used for calculating the correlation coefficient of the variable derived from the characteristics and the modeling data.
And the acquisition module is used for acquiring the modeling data with the largest information value in the modeling data with the correlation coefficient larger than the correlation coefficient threshold value.
In one embodiment of the apparatus, the riskfactor calculation module 804 comprises: the system comprises a default probability calculation module, a score determination module, a risk grade matching module and a risk coefficient calculation module;
and the default probability calculation module is used for calculating the default probability of the object by utilizing a pre-trained default probability model.
And the score determining module is used for determining the score of the object based on the probability and the conversion coefficient obtained by the fractional span calculation.
And the risk grade matching module is used for determining the risk grade of the object according to the score.
And the risk coefficient calculation module is used for calculating the basic total amount and bad amount data in the risk level and performing linear programming calculation according to a preset risk constraint condition to obtain a risk coefficient corresponding to the level.
In one embodiment of the apparatus, the risk factor calculation module comprises: the system comprises a basic total amount data calculation module, a bad amount data calculation module, a risk adjustment coefficient calculation module, a total amount bad rate calculation module and a risk coefficient determination module;
and the basic total amount data calculation module is used for calculating the sum of the basic amount data of the object in each risk level to obtain the basic total amount data of each level.
And the bad credit line data calculation module is used for calculating the sum of basic credit line data of objects with overdue loan in each risk level to obtain the bad credit line data of each level.
And the risk adjustment coefficient calculation module is used for calculating the risk adjustment coefficient of each risk grade according to the basic total amount data and the bad amount data of each risk grade and the preset risk constraint condition.
And the total amount reject ratio calculation module is used for calculating the total amount reject ratio through the basic total amount data, the bad amount data and the risk adjustment coefficient of each risk grade.
And the risk coefficient determining module is used for determining the corresponding risk adjustment coefficient as the risk coefficient corresponding to the risk grade under the condition that the total limit reject ratio is minimum and the risk constraint condition is met.
In one embodiment of the apparatus, the risk factor calculating module further includes: a risk constraint condition setting module for setting risk constraint conditions, wherein the risk constraint conditions comprise: arranging risk coefficients according to the risk grades, and gradually reducing the risk coefficients; the difference between the adjacent risk coefficients is greater than or equal to a first preset difference; ranking the first and second risk coefficients greater than a first threshold, the risk coefficients outside the first and second rankings being less than the first threshold; and calculating the quota ratio of each risk level according to the risk adjustment coefficient of each risk level and the basic total quota data, wherein the sum of the quota ratios of each risk level is a first preset percentage.
In one embodiment of the apparatus, the risk constraint setting module comprises: the quota ratio calculation module is used for calculating quota ratio of each risk level by adopting the following formula through the risk adjustment coefficient of each risk level and the basic total quota data:
wherein, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
In one embodiment of the apparatus, the total credit rejection rate calculating module is further configured to calculate the total credit rejection rate by using the following formula:
wherein, biBad credit data for the ith risk class, XiRisk coefficient for the ith risk class, aiThe basic total amount data of the ith risk level.
In one embodiment of the apparatus, the preset dimensions include: unit dimension, payroll horizontal dimension, unit zone dimension, generation month dimension, credit history length dimension, and history repayment performance dimension.
For the detailed implementation of the loan amount calculation device, reference may be made to the above embodiments of the loan amount calculation method, which are not described herein again. The modules in the loan amount calculation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing income amount data, liability amount data, risk coefficient, layering coefficient and other data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a loan amount calculation method.
Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the disclosed aspects and does not constitute a limitation on the computing device to which the disclosed aspects apply, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that the information (including but not limited to financial information, credit information, etc.) and data (including but not limited to data for basic total amount, bad amount, etc.) of the subject related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.