Embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein identical from beginning to endOr similar label represents same or similar element or the element with same or like function.Retouched below with reference to accompanying drawingThe embodiment stated is exemplary, it is intended to for explaining the application, and it is not intended that limitation to the application.
, it is necessary to build the credit of the user by the big data of long term accumulation user in the application scenarios of credit productModel, wherein, the sample of use is performance data of the user in credit product, as with purchase function by stagesBehavioral data in product and product with refund function after first use, the variable of use is the phase of each dimension of userClose variable, it is overdue that the target of models fitting is whether user occurs in the preset time period after drawing, and passes through fittingThe data build Credit Model.
Because credit product is the product that region be directly facing user, the interpretation to model requires higher, in practical applicationModel is often built using linear algorithm such as Logistic Regression (abbreviation LR), but linear model is complicatedDegree is relatively low compared to nonlinear model, and the performance of model can be less than nonlinear model, and the complexity of nonlinear model is higher,Model performance is preferable, but explanatory generally poor.
Therefore, present applicant proposes a kind of training method of credit evaluation model, by using nonlinear tree-model structure groupFeature is closed, linear Algorithm for Training final mask is reused, finally show that a performance is higher than purely linear model, while have againThe model of standby interpretation.Specifically, below with reference to the accompanying drawings the training side of the credit evaluation model of the embodiment of the present application is describedMethod, method and device using credit evaluation model progress credit evaluation.
Fig. 1 is the flow chart of the training method of the credit evaluation model according to the application one embodiment.As shown in figure 1, shouldThe training method of credit evaluation model can include:
S110, obtains training primitive behavior data of the training user in operation system.
Specifically, the user profile of a large amount of training users can be first obtained, afterwards, can be according to the user profile for largely training usersObtain training primitive behavior data of the training user in operation system.It should be noted that in embodiments herein,The operation system can be the system with credit product, and can also be has the system of purchase function or with energyEnough embody the system of user credit function.
Wherein, in embodiments herein, the training primitive behavior data may include but be not limited to website or webpage click rowFor (such as click on the time, number of times, frequency), training customer transaction data and behavior (as pay product information, payment,Means of payment etc.), fund incremental data and behavior (flow direction of such as fund, the funds flow amount of money) etc..
In addition, user profile may include but be not limited to user name or ID, account name etc..That is, for example working as businessWhen system carrys out the uniqueness of recognition training user by using user name, can now obtain training user user name, and according toThe user name obtains all training primitive behavior data of the training user in the operation system;When operation system is by using accountWhen name in an account book carrys out the uniqueness of recognition training user, the account name of training user can be now obtained, and instruction is obtained according to account namePractice training primitive behavior data of the user in current business system.
S120, extracts the training primitive character in training primitive behavior data.
Specifically, it can will be extracted from training primitive behavior data with the data that feature is showed, and this is had into featureThe data of performance as the training primitive behavior data training primitive character.For example, the training primitive character may include purchaseThe amount of money, web page/site number of clicks, web page/site click frequency, funds flow etc..
S130, carries out combinations of features to training primitive character according to iteration decision tree GBDT models and is handed over generating corresponding trainingPitch assemblage characteristic.
Specifically, in embodiments herein, GBDT models can be trained according to training primitive character to buildGBDT models with N tree, wherein, N is positive integer, and the N tree in GBDT models is excavated and train originalIncidence relation between feature, finally, carries out combinations of features to training primitive character according to incidence relation and is intersected with generating trainingAssemblage characteristic.
It is appreciated that GBDT is a kind of decision Tree algorithms of iteration, the algorithm is made up of many decision trees, the conclusion of all treesAdd up and do final result, for example, each tree can be gone to the residual error of K tree before being fitted, it is possible to understand that into every one treeThe result of one tree before being dependent on, therefore, between tree needs to ensure certain order.So, by many in GBDT modelsDecision tree carries out Decision Classfication to training primitive character, so as to find out the incidence relation between training primitive character, andFeature with incidence relation is combined, obtains training combined crosswise feature.
In one embodiment of the application, as shown in Fig. 2 the N tree in GBDT models excavates the original spy of trainingIncidence relation between levying, and carry out combinations of features to training primitive character to generate training combined crosswise spy according to incidence relationThe process that implements levied may include following steps:
S131, will train the corresponding sample data of primitive character to pass sequentially through N tree in GBDT models, until each sampleData divide equally the leaf node for being assigned to each tree.
S132, for each tree in GBDT models, by from the root node of each tree to leaf node paths traversedCorresponding training primitive character is combined, to generate training combined crosswise feature.
Specifically, as shown in figure 3, node of each tree in addition to leaf node in GBDT models all corresponds to a division spyDisruptive features of seeking peace value, if the value of the disruptive features of sample is more than the disruptive features value of node, will be trained in primitive characterSample be assigned to the right child node of the node, otherwise assign to left child node, lower level node similarly, until the sample falls on certainLeaf node.For example, by taking one tree as shown in Figure 3 as an example, it is assumed that sample is assigned to No. 2 leaf nodes, then is formedAssemblage characteristic is [G<=g_v&&I>I_v]=1, equivalent to punishing into two sections from g_v this value to this feature of G, I thisFrom i_v, this value punishes into two sections to feature, and two Feature Segmentations of G and I are multiplied, and draw 4 features, wherein G first paragraphsThe characteristic value being multiplied with I second segments is 1, and remaining characteristic value is 0.It is appreciated that in as shown in Figure 3, depth is 3The multiplication cross of as two features drawn is set, similarly, what the tree that depth is 4 drew is the multiplication cross of three features.
It is further appreciated that each node of tree corresponds to a disruptive features, the node one split values of correspondence divide this featureInto two parts, for example, it is assumed that this feature assigns to the left side of the node less than or equal to the sample of split values, more than split valuesSample assign to the right side of the node, then the node can be translated into a pair of 0,1 binary features, e.g., work as G<=G_split is (i.e.Node G split values) when, F_G_L=1, F_G_R=0 work as G>During G_split, F_G_L=0, F_G_R=1, it is corresponding underFeature F_I_L and F_I_R as a pair, and F_H_L and F_H_R can be also formed after node layer division, ifG<=G_split, then sample fall into left then node I, upper layer node is multiplied with the feature of lower level node, obtains two pairs of new features,F_G_L*F_I_L, F_G_L*F_I_R, if sample eventually falls into F_I_R, F_G_L*F_I_R=1, remaining assemblage characteristic is equalFor 0.So, it is combined by finding feature to be multiplied the feature for being 1, so as to obtain the combination with incidence relationFeature.It is appreciated that above-mentioned implementation is intended only as a kind of example, with for the one kind run and provided suitable for computerRepresentation, is restriction as the application without being understood that.
Thus, the association between feature is excavated by using GBDT tree-models, to build training combined crosswise feature, is carriedThe high performance in combinations of features stage, improves combinations of features efficiency.
S140, is trained to build according to training combined crosswise feature to logistic regression Logistic Regression modelsCredit evaluation model.
Specifically, after generation training combined crosswise feature, training combined crosswise feature can be used linear modelLogistic Regression (referred to as LR) model is trained to obtain credit evaluation model.
In order to improve the performance of credit evaluation model, and the accuracy of assessment result when using the model is improved, further,, can be according to training primitive character and training combined crosswise feature to Logistic in one embodiment of the applicationRegression models are trained to build credit evaluation model.Specifically, after training combined crosswise feature is obtained,Training combined crosswise feature can be put into LR models together with training primitive character and be trained, finally give credit evaluation mouldType, the model is explainable linear model.It is appreciated that the model is on the premise of interpretation is ensured, effect is moreBetter than GBDT and LR models.
The training method of the credit evaluation model of the embodiment of the present application, is carried out by non-linear GBDT models to training primitive characterTraining with build it is corresponding training combined crosswise feature, and by train combined crosswise feature to be trained linear LR models withBuild credit evaluation model so that the credit evaluation model had both possessed the high-performance of nonlinear model, but also with linear modelInterpretation.
Training method with the credit evaluation model that above-mentioned several embodiments are provided is corresponding, and a kind of embodiment of the application is also carriedFor a kind of trainer of credit evaluation model, the trainer of the credit evaluation model provided due to the embodiment of the present application with it is upperThe training method for stating the credit evaluation model that several embodiments are provided is corresponding, therefore in the training side of foregoing credit evaluation modelThe embodiment of method is also applied for the trainer of the credit evaluation model of the present embodiment offer, no longer detailed in the present embodimentDescription.Fig. 4 is the structured flowchart of the trainer of the credit evaluation model according to the application one embodiment.As shown in figure 4,The trainer of the credit evaluation model can include:Acquisition module 110, extraction module 120, generation module 130 and trainingModule 140.
Wherein, acquisition module 110 can be used for obtaining training primitive behavior data of the training user in operation system.
Extraction module 120 can be used for extracting the training primitive character in training primitive behavior data.
Generation module 130 can be used for carrying out combinations of features to training primitive character to generate according to iteration decision tree GBDT modelsCorresponding training combined crosswise feature.
Specifically, in one embodiment of the application, as shown in figure 5, on the basis of as shown in Figure 4, the generationModule 130 may include:Training unit 131 and generation unit 132.
Wherein, training unit 131 can be used for being trained iteration decision tree GBDT models according to training primitive character buildingGBDT models with N tree, wherein, N is positive integer.
The N tree that generation unit 132 can be used in GBDT models excavates the incidence relation between training primitive character, andCarry out combinations of features to training primitive character to generate training combined crosswise feature according to incidence relation.
Specifically, in embodiments herein, generation unit 132 will can train the corresponding sample data of primitive character according toN tree in the secondary model by GBDT, until each sample data divides equally the leaf node for being assigned to each tree, and for GBDTEach tree in model, by the training primitive character corresponding from the root node of each tree to leaf node paths traversedIt is combined, to generate training combined crosswise feature.
Training module 140 can be used for entering logistic regression Logistic Regression models according to training combined crosswise featureRow trains to build credit evaluation model.
In order to improve the performance of credit evaluation model, and the accuracy of assessment result when using the model is improved, further,In one embodiment of the application, training module 140 can be additionally used according to training primitive character and training combined crosswise featureLogistic Regression models are trained to build credit evaluation model.
The trainer of the credit evaluation model of the embodiment of the present application, is carried out by non-linear GBDT models to training primitive characterTraining with build it is corresponding training combined crosswise feature, and by train combined crosswise feature to be trained linear LR models withBuild credit evaluation model so that the credit evaluation model had both possessed the high-performance of nonlinear model, but also with linear modelInterpretation.
The application also proposed a kind of method of credit evaluation, and the credit described in any of the above-described embodiment can be used to comment for this methodEstimate model and credit evaluation is carried out to user.
Fig. 6 is the flow chart of the method for the credit evaluation according to the application one embodiment.As shown in fig. 6, the credit evaluationMethod can include:
S610, obtains primitive behavior data of the targeted customer in operation system.
Specifically, the user profile of targeted customer can be first obtained, afterwards, the targeted customer can be obtained according to user profile in industryPrimitive behavior data in business system.It should be noted that in embodiments herein, the operation system can be hadThe system of credit product, can also be system with purchase function or with can embody user credit functionSystem.
Wherein, in embodiments herein, the primitive behavior data may include but be not limited to website or webpage click behavior (such asClick time, number of times, frequency etc.), the transaction data of targeted customer and behavior be (as paid product information, payment, branchPay mode etc.), fund incremental data and behavior (flow direction of such as fund, the funds flow amount of money) etc..
In addition, user profile may include but be not limited to user name or ID, account name etc..That is, for example working as businessWhen system recognizes the uniqueness of targeted customer by using user name, the user name of targeted customer can be now obtained, and according toThe user name obtains all primitive behavior data of the targeted customer in the operation system;When operation system is by using account nameDuring uniqueness to recognize targeted customer, the account name of targeted customer can be now obtained, and target is obtained according to account name and is usedPrimitive behavior data of the family in current business system.
S620, extracts the primitive character in primitive behavior data.
Specifically, it can will be extracted from primitive behavior data with the data that feature is showed, and this is had into feature performanceData as the primitive behavior data primitive character.For example, the primitive character may include the purchase amount of money, web page/site pointHit number of times, web page/site click frequency, funds flow etc..
S630, carries out combinations of features to generate corresponding combined crosswise feature according to GBDT models to primitive character.
Specifically, in embodiments herein, first GBDT models can be trained according to primitive character has to buildThe GBDT models of N tree, wherein, N is positive integer, afterwards, can excavate primitive character according to the tree of N in GBDT modelsBetween incidence relation, and combinations of features is carried out to primitive character to generate combined crosswise feature according to incidence relation.
It is appreciated that GBDT is a kind of decision Tree algorithms of iteration, the algorithm is made up of many decision trees, the conclusion of all treesAdd up and do final result, for example, each tree can be gone to the residual error of K tree before being fitted, it is possible to understand that into every one treeThe result of one tree before being dependent on, therefore, between tree needs to ensure certain order.So, by many in GBDT modelsDecision tree carries out Decision Classfication to primitive character, is closed so as to find out the incidence relation between primitive character, and will haveThe feature of connection relation is combined, and obtains combined crosswise feature.
In one embodiment of the application, the process that implements of generation combined crosswise feature may include:By primitive character pairThe sample data answered passes sequentially through N tree in GBDT models, until each sample data divides equally the leaf section for being assigned to each treePoint, and for each tree in GBDT models, will be corresponding from the root node of each tree to leaf node paths traversedPrimitive character be combined, to generate combined crosswise feature.It is appreciated that the method for the credit evaluation of the embodiment of the present applicationThe side of the mode of middle generation combined crosswise feature and generation training combined crosswise feature in the training method of above-mentioned credit evaluation modelThe realization principle of formula is identical, and the realization that can refer to the above-mentioned generating mode to training combined crosswise feature is described, and is no longer gone to live in the household of one's in-laws on getting married hereinState.
S640, is predicted to obtain the credit information of targeted customer according to credit evaluation model to combined crosswise feature.
Specifically, after generation combined crosswise feature, combined crosswise feature can be trained using LR models to obtainCredit evaluation model.
, further, can be according to credit evaluation in one embodiment of the application in order to improve the accuracy of assessment resultModel is predicted to obtain the credit information of targeted customer to primitive character and combined crosswise feature.Specifically, handed overPitch after assemblage characteristic, combined crosswise feature can be put into credit evaluation model together with primitive character and carry out credit prediction, obtainedGo out it is final predict the outcome, i.e. the credit information of the targeted customer.
It should be noted that the method for the credit evaluation of the embodiment of the present application is based on using C language and across the communication association of languageMPI parallel computation frames are discussed come what is realized, better performance can be so reached.
The method of the credit evaluation of the embodiment of the present application, when carrying out credit evaluation prediction to targeted customer, can first obtain the meshPrimitive behavior data of the user in operation system are marked, and extract the primitive character in primitive behavior data, afterwards, will be originalFeature is trained by GBDT tree-model to obtain corresponding combined crosswise feature, finally, and the combined crosswise feature is putIt is predicted into credit evaluation model, draws the user profile of the targeted customer, i.e., by using both possesses higher solveThe property released carries out credit evaluation but also with high performance credit evaluation model to targeted customer so that assessment result is relatively reliable, effectFruit more preferably, improves the accuracy of assessment result.
Method with the credit evaluation that above-mentioned several embodiments are provided is corresponding, and a kind of embodiment of the application also provides a kind of letterWith the device of assessment, because the credit that the device and above-mentioned several embodiments of the credit evaluation of the embodiment of the present application offer are provided is commentedThe method estimated is corresponding, thus the method in foregoing credit evaluation embodiment be also applied for the present embodiment offer credit commentThe device estimated, is not described in detail in the present embodiment.Fig. 7 is the device of the credit evaluation according to the application one embodimentStructured flowchart.It should be noted that the device of the credit evaluation of the embodiment of the present application is by using any of the above-described embodimentDescribed credit evaluation model carries out credit evaluation to targeted customer.
As shown in fig. 7, the device of the credit evaluation can include:Acquisition module 210, extraction module 220, generation module 230With prediction module 240.
Wherein, acquisition module 210 can be used for obtaining primitive behavior data of the targeted customer in operation system.
Extraction module 220 can be used for extracting the primitive character in primitive behavior data.
Generation module 230 can be used for carrying out combinations of features to primitive character to generate corresponding combined crosswise according to GBDT modelsFeature.
Specifically, in one embodiment of the application, as shown in figure 8, on the basis of as shown in Figure 7, the generationModule 230 may include:Training unit 231 and generation unit 232.
Wherein, training unit 231 can be used for being trained GBDT models according to primitive character what is set with N to buildGBDT models, wherein, N is positive integer.
The incidence relation that generation unit 232 can be used between the N tree excavation primitive character in GBDT models, and according toIncidence relation carries out combinations of features to generate combined crosswise feature to primitive character.
Specifically, in embodiments herein, generation unit 232 can lead to the corresponding sample data of primitive character successivelyThe N tree crossed in GBDT models, until each sample data divides equally the leaf node for being assigned to each tree, and for GBDT mouldsEach tree in type, by the primitive character carry out group corresponding from the root node of each tree to leaf node paths traversedClose, to generate combined crosswise feature.
Prediction module 240 can be used for combined crosswise feature is predicted according to credit evaluation model to obtain the letter of targeted customerUse information.
In order to improve the accuracy of assessment result, further, in one embodiment of the application, prediction module 240 is alsoBelieved available for being predicted according to credit evaluation model to primitive character and combined crosswise feature with the credit for obtaining targeted customerBreath.
The device of the credit evaluation of the embodiment of the present application, can be by obtaining mould when carrying out credit evaluation prediction to targeted customerBlock obtains primitive behavior data of the targeted customer in operation system, and extraction module extracts the original spy in primitive behavior dataLevy, primitive character is trained to obtain corresponding combined crosswise feature by generation module by GBDT tree-model, predicts mouldThe combined crosswise feature is put into credit evaluation model and is predicted by block, draws the user profile of the targeted customer, that is, passes throughUsing both possess higher interpretation but also with high performance credit evaluation model to targeted customer carry out credit evaluation so thatAssessment result is relatively reliable, better, improves the accuracy of assessment result.
In the description of the present application, in the description of the present application, " multiple " are meant that at least two, such as two, threeDeng unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specific example ",Or the description of " some examples " etc. means to combine specific features, structure, material or the feature that the embodiment or example are describedIt is contained at least one embodiment of the application or example.In this manual, need not to the schematic representation of above-mentioned termIdentical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with officeCombined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, this areaTechnical staff can be tied the not be the same as Example or the feature of example and non-be the same as Example or example described in this specificationClose and combine.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes oneOr more be used for executable instruction the step of realize specific logical function or process code module, fragment or part,And the scope of the preferred embodiment of the application includes other realization, wherein order that is shown or discussing can not be pressed,Including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application'sEmbodiment person of ordinary skill in the field is understood.
Represent in flow charts or logic and/or step described otherwise above herein, for example, being considered for realThe order list of the executable instruction of existing logic function, may be embodied in any computer-readable medium, for instructionExecution system, device or equipment (such as computer based system including the system of processor or other can be performed from instructionThe system of system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or setIt is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagatingOr transmission procedure uses for instruction execution system, device or equipment or with reference to these instruction execution systems, device or equipmentDevice.The more specifically example (non-exhaustive list) of computer-readable medium includes following:With one or more clothThe electrical connection section (electronic installation) of line, portable computer diskette box (magnetic device), random access memory (RAM) is read-onlyMemory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and it is portableCompact disc read-only memory (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereonOr other suitable media, because can then enter edlin, solution for example by carrying out optical scanner to paper or other mediaTranslate or handled electronically to obtain described program with other suitable methods if necessary, be then stored in computerIn memory.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.In above-mentioned realityApply in mode, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage orFirmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardwareAny one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signalDiscrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA) is existingField programmable gate array (FPGA) etc..
Those skilled in the art be appreciated that to realize all or part of step that above-described embodiment method is carried is canTo instruct the hardware of correlation to complete by program, described program can be stored in a kind of computer-readable recording medium,The program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in the application each embodiment can be integrated in a processing module or eachIndividual unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated module was bothIt can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module withThe form of software function module realize and as independent production marketing or in use, can also be stored in one it is computer-readableTake in storage medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although having been shown and described aboveEmbodiments herein, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limitation to the application,One of ordinary skill in the art can be changed to above-described embodiment, change, replacing and modification within the scope of application.