Disclosure of Invention
The application mainly aims to provide a domain decomposition machine-based model construction method, a domain decomposition machine-based model construction device, a domain decomposition machine-based model construction system and a storage medium, and aims to improve the interactivity among features in a model and the accuracy of the model.
In order to achieve the above object, the present application provides a domain decomposition machine-based model building method, which includes the steps of:
acquiring user information, converting classification features in the user information into corresponding binary features through one-hot coding, and determining corresponding feature domains based on the binary features;
carrying out normalization processing on the numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing;
and training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model.
Optionally, the step of converting the classification features in the user information into corresponding binary features through one-hot coding includes:
and converting the classification features in the user information into corresponding binary features through one-hot coding according to preset classification variables.
Optionally, the step of determining corresponding respective feature domains based on the binary features comprises:
and classifying the binary characteristics according to the data characteristics of the user information, and classifying the binary characteristics converted from the same data characteristics into the same characteristic domain to obtain corresponding characteristic domains.
Optionally, the step of constructing a corresponding domain decomposition machine model based on the normalized numerical features and the feature domains includes:
constructing a linear part and a nonlinear part corresponding to the model based on the numerical characteristics after normalization processing and each characteristic domain;
constructing the domain decomposer model based on the linear part and the non-linear part.
Optionally, the step of training the domain decomposer model and constructing a corresponding credit rating card model based on the trained domain decomposer model comprises:
determining sample data, and solving the domain decomposition machine model according to the sample data and a preset loss function to obtain a parameter solution corresponding to the domain decomposition machine model;
and constructing the linear part and the nonlinear part based on the parameter solution, and constructing a corresponding credit rating card model based on the trained linear part and the trained nonlinear part.
Optionally, after the step of training the domain decomposer model and constructing the credit rating card model based on the trained domain decomposer model, the method further includes:
responding to a query request of a user, determining user information to be evaluated according to the query request, and determining default probability value corresponding to the user information to be evaluated through the domain decomposition machine model;
and determining a credit score corresponding to the default probability value based on the credit score card model.
Optionally, the step of determining the credit score corresponding to the default probability value based on the credit score card model includes:
and determining a basic score and a doubling probability score, and determining the credit score of the user to be evaluated through the credit score card model based on the basic score, the doubling probability score and the default probability value.
The embodiment of the present application further provides a domain decomposition machine-based model building apparatus, where the domain decomposition machine-based model building apparatus includes:
the conversion module is used for acquiring user information and converting the classification features in the user information into corresponding binary features through one-hot coding;
a determining module, configured to determine corresponding feature domains based on the binary features;
the processing module is used for carrying out normalization processing on the numerical characteristics in the user information;
the construction module is used for constructing a corresponding domain decomposition machine model based on the numerical characteristics and the characteristic domains after the normalization processing;
the building module is also used for training the domain decomposition machine model and building a corresponding credit rating card model based on the trained domain decomposition machine model.
The embodiment of the present application further provides a domain decomposition machine-based model building system, which includes a memory, a processor, and a domain decomposition machine-based model building program stored in the memory and running on the processor, and when being executed by the processor, the domain decomposition machine-based model building program implements the steps of the domain decomposition machine-based model building method as described above.
In addition, to achieve the above object, the present application also provides a storage medium having a domain decomposition machine-based model building program stored thereon, which when executed by a processor implements the steps of the domain decomposition machine-based model building method as described above.
The embodiment of the application provides a model construction method, a device, a system and a storage medium based on a domain decomposition machine, which are characterized in that classification features in user information are converted into corresponding binary features through one-hot coding by acquiring the user information, and corresponding feature domains are determined based on the binary features; carrying out normalization processing on numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing; and training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model. According to the method and the system, the credit scoring card model is built through the domain decomposition mechanism, the intersection among the data characteristics is considered, and the model is built according to the interaction among all nested variables, so that the interactivity among the data characteristics in the credit scoring card model is improved, and the accuracy of the credit scoring card model is improved.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in fig. 1, fig. 1 is a schematic system structure diagram of a hardware operating environment according to an embodiment of the present application. The system is a model building system based on a domain decomposition machine, and the model building system based on the domain decomposition machine can comprise: aprocessor 1001, such as a CPU, acommunication bus 1002, auser interface 1003, anetwork interface 1004, and amemory 1005. Wherein acommunication bus 1002 is used to enable connective communication between these components. Theuser interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and theoptional user interface 1003 may also include a standard wired interface, a wireless interface. Thenetwork interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). Thememory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). Thememory 1005 may alternatively be a storage device separate from theprocessor 1001.
Those skilled in the art will appreciate that the system architecture shown in FIG. 1 is not intended to be limiting of the system, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, amemory 1005 as a storage medium (it should be noted that the storage medium in this application is a computer-readable storage medium) may include an operating system, a network communication module, a user interface module, and a model building program based on a domain decomposition machine.
In the terminal shown in fig. 1, thenetwork interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; theuser interface 1003 is mainly used for connecting a client and performing data communication with the client; andprocessor 1001 may be configured to invoke a domain decomposition machine-based model building program stored inmemory 1005 and perform the following operations:
acquiring user information, converting classification features in the user information into corresponding binary features through one-hot coding, and determining corresponding feature domains based on the binary features;
carrying out normalization processing on the numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing;
and training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
and converting the classification features in the user information into corresponding binary features through one-hot coding according to preset classification variables.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
and classifying the binary characteristics according to the data characteristics of the user information, and classifying the binary characteristics converted from the same data characteristics into the same characteristic domain to obtain corresponding characteristic domains.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
constructing a linear part and a nonlinear part corresponding to the model based on the numerical characteristics after normalization processing and each characteristic domain;
constructing the domain decomposer model based on the linear part and the non-linear part.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
determining sample data, and solving the domain decomposition machine model according to the sample data and a preset loss function to obtain a parameter solution corresponding to the domain decomposition machine model;
and constructing the linear part and the nonlinear part based on the parameter solution, and constructing a corresponding credit rating card model based on the trained linear part and the trained nonlinear part.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
responding to a query request of a user, determining user information to be evaluated according to the query request, and determining default probability value corresponding to the user information to be evaluated through the domain decomposition machine model;
and determining a credit score corresponding to the default probability value based on the credit score card model.
Further,processor 1001 may call a domain decomposition machine-based model building program stored inmemory 1005, and also perform the following operations:
and determining a basic score and a doubling probability score, and determining the credit score of the user to be evaluated through the credit score card model based on the basic score, the doubling probability score and the default probability value.
According to the method, user information is obtained, classification features in the user information are converted into corresponding binary features through one-hot coding, and corresponding feature domains are determined based on the binary features; carrying out normalization processing on numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing; and training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model. According to the method and the system, the credit scoring card model is built through the domain decomposition mechanism, the intersection among the data characteristics is considered, and the model is built according to the interaction among all nested variables, so that the interactivity among the data characteristics in the credit scoring card model is improved, and the accuracy of the credit scoring card model is improved.
The application provides a model construction method based on a domain decomposition machine, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the model construction method based on the domain decomposition machine.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than that shown or described herein.
The embodiment of the application takes a model building system based on a domain decomposition machine as an execution subject for illustration, and the model building method based on the domain decomposition machine comprises the following steps:
step S10, user information is obtained, classification features in the user information are converted into corresponding binary features through one-hot coding, and corresponding feature domains are determined based on the binary features.
It should be noted that the present embodiment represents a domain decomposition machine-based model building system as a model building system. The model building system obtains user information from a database, wherein the user information includes available user-related information. Then, the model building system codes each user information according to a preset classification variable in the preset code and a classification feature in the user information to obtain a unique hot code corresponding to each user information, where the unique hot code is represented by a binary vector and can also be understood as a binary feature, where the preset classification variable is set by a service person, and this embodiment is not limited. The preset code in this embodiment is a One-Hot code, i.e., One-bit valid code, and mainly uses an N-bit status register to code N states, each state is defined by its own independent register bit, and only One bit is valid at any time. One-Hot encoding is a representation of categorical variables as binary vectors, requiring first the mapping of categorical values to integer values, and then each integer value is represented as a binary vector, which is a zero value, except for the index of the integer, which is labeled 1. The user information in this embodiment is "whether or not to violate; sex; age; city ", the user information of the four users is shown in table 1. Presetting a classification variable as' whether the default is violated; male sex and female sex; age [20,30], age [31,40 ]; the city is one line, two lines and three lines, and the four user information are converted into corresponding binary features as shown in table 2.
| Whether or not to violate | Sex | Age (age) | City |
| 1 | Formale | 20 | Two lines |
| 1 | Woman | 26 | Three-wire |
| 0 | For male | 31 | A thread |
| 0 | Woman | 35 | A thread |
TABLE 1
TABLE 2
After the model construction system obtains the binary features corresponding to each user information, the binary features are classified according to the data features in each user information, the binary features obtained by converting the same data features are classified into the same feature domain, and the feature domain corresponding to each binary feature is obtained, as shown in table 3.
TABLE 3
As shown in table 3, the model construction system divides the data feature of "default information" in each piece of User information into a feature domain (Field) according to the classification feature in each piece of User information, divides the data feature of "gender information and age information" in each piece of User information into a feature domain (User _ Field), and divides the data feature of "City information" in each piece of User information into a feature domain (City _ Field).
Further, the step S10 includes:
step S101, converting the classification features in the user information into corresponding binary features through one-hot coding according to preset classification variables;
and S102, classifying the binary characteristics according to the data characteristics of the user information, and classifying the binary characteristics converted from the same data characteristics into the same characteristic domain to obtain the corresponding characteristic domains.
Specifically, the model construction system encodes each user information according to preset classification variables in the one-hot code and classification features in the user information, and converts the classification features in each user information into corresponding binary features. Then, the model construction system classifies the binary characteristics according to the data characteristics of the user information, classifies the binary characteristics obtained by converting the same data characteristics into the same characteristic domain, and obtains the corresponding characteristic domains.
And step S20, carrying out normalization processing on the numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing.
After the model construction system obtains the feature domain of each user information, the numerical value features in the user information are normalized, corresponding linear parts and nonlinear parts are constructed according to the numerical value features and each feature domain after normalization, and corresponding FFM (Field-aware Factorization mechanisms) models are constructed by combining the linear parts and the nonlinear parts.
It should be noted that the polynomial model is the most intuitive model containing the combination of features, and in the polynomial model, the feature xiAnd xjIn combination with xixjIs represented by, i.e. xiAnd xjAre all non-zero, combine feature xixjIt makes sense. Therefore, this embodiment only discusses the second order polynomial domain decomposition model, as follows y (x),
wherein n represents the number of data features in the user information, xiIs the eigenvalue, x, of the ith data characteristicjIs the characteristic value of the jth data characteristic. f. ofjIndicating the feature domain to which the jth feature belongs. w is a0,wixiAre the model parameters.
For the linear part of the domain decomposition machine model,
is the nonlinear part of the domain decomposition model, i.e. the cross feature part.
Further, for higher order feature combining, FM (Factorization Machines) and DNN (Deep Neural Networks) may be combined, replacing FFM with Deep FM.
Further, the step S20 is to construct a corresponding domain decomposition machine model based on the normalized numerical features and the feature domains, and includes:
step S201, constructing a linear part and a nonlinear part corresponding to the model based on the numerical characteristics after normalization processing and each characteristic domain;
step S202, constructing the domain decomposition machine model based on the linear part and the nonlinear part.
Specifically, the model construction system normalizes the numerical features in the user information, constructs corresponding linear parts and nonlinear parts according to the numerical features and each feature domain after normalization, and constructs a corresponding domain decomposition machine model by combining the linear parts and the nonlinear parts.
And step S30, training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model.
After the model construction system constructs the domain decomposition machine model, sample data of a user is determined, then the domain decomposition machine model is trained through a preset algorithm by combining the sample number of the sample data and a preset loss function to obtain a parameter solution corresponding to the domain decomposition machine model, and then a corresponding credit rating card model is constructed through the trained domain decomposition machine model. The default loss function in this embodiment is a logistic regression loss function based on a regular term. The preset algorithm includes, but is not limited to, a random gradient descent algorithm, a batch gradient descent algorithm, and a small batch gradient descent algorithm, and the preset algorithm is the random gradient descent algorithm in the present embodiment by default. The logistic regression loss function based on the regularization term is:
wherein, L is the sample number of the sample data, and other variables have the same meaning with the domain decomposition machine model.
Further, the step S30 includes:
step S301, determining sample data, and solving the domain decomposition machine model according to the sample data and a preset loss function to obtain a parameter solution corresponding to the domain decomposition machine model;
and S302, constructing the linear part and the nonlinear part based on the parameter solution, and constructing a corresponding credit rating card model based on the trained linear part and the trained nonlinear part.
Specifically, the model construction system determines sample data of a user, and trains the domain decomposition machine model by a random gradient descent algorithm and combining the sample number of the sample data and a logistic regression loss function based on a regular term to obtain a parameter solution corresponding to the domain decomposition machine model. Then, the model construction system constructs a linear part and a nonlinear part in the domain decomposition machine model through a parameter solution, and then constructs a corresponding credit rating card model through the trained linear part and the trained nonlinear part.
The embodiment acquires user information, converts classification features in the user information into corresponding binary features through one-hot coding, and determines corresponding feature domains based on the binary features; carrying out normalization processing on numerical characteristics in the user information, and constructing a corresponding domain decomposition machine model based on the numerical characteristics and each characteristic domain after the normalization processing; and training the domain decomposition machine model, and constructing a corresponding credit rating card model based on the trained domain decomposition machine model. Therefore, in the embodiment, the credit rating card model is built through the domain decomposition mechanism, the intersection among the data features is considered, and the model is built according to the interaction among all the nested variables, so that the interactivity among the data features in the credit rating card model is improved, and the accuracy of the credit rating card model is improved.
Further, referring to fig. 3, fig. 3 is a schematic flow chart of another embodiment provided by the method for building a model based on a domain decomposition machine according to the present application, and after step S30, the method further includes:
step S40, responding to a query request of a user, determining user information to be evaluated according to the query request, and determining default probability value corresponding to the user information to be evaluated through the domain decomposition machine model;
step S50, determining a credit score corresponding to the default probability value based on the credit score card model.
Specifically, it should be noted that, when a user needs to query a credit score corresponding to the user through the model building system, the user needs to input corresponding user information on a query input interface of the model building system, package the user information into a query request, and trigger the query request on the query input interface of the model building system. After the model building system builds a credit score card model, whether a query request sent by a user exists is detected, if the model building system detects that the query request sent by the user exists, the model building system responds to the query request and carries out instruction analysis on the query request to obtain user information to be evaluated (user information corresponding to the user needing to query the credit score value) carried in the query request. And then, the model construction system determines a basic score and a doubling probability score which are set in the model construction system, the information of the user to be evaluated is input into a domain decomposition machine model, and the default probability value corresponding to the information of the user to be evaluated is calculated through the domain decomposition machine model. And finally, determining the credit score value corresponding to the user by the model construction system through a credit score card model according to the basic score value, the doubling probability score value and the default probability value of the user. And after the credit score corresponding to the user is obtained, the model building system outputs the credit score of the user to a display interface of the model building system.
Further, the step S50 includes:
step S501, determining a basic score and a doubling probability score, and determining a credit score of the user to be evaluated through the credit score card model based on the basic score, the doubling probability score and the default probability value.
Specifically, the model construction system determines a basic score and a doubling probability score set in the model construction system, inputs user information to be evaluated into a domain decomposition machine model, calculates default probability values corresponding to the user information to be evaluated according to data characteristics in the user information to be evaluated, and finally substitutes the basic score, the doubling probability score and the default probability values into a score calculation formula of a credit score card model for calculation to determine a credit score corresponding to the user. The scoring formula is credit score value, basic score and probability doubling score value, ln ((1-p)/p)/ln (2), and p is default probability value.
It should be noted that the default probability value ranges from 0 to 1, and the decimal place is set by the service personnel, but the embodiment is not limited, for example, the decimal place is set to 2, and may be 0.11, 0.15, and 0.35, etc., and the decimal place is set to 3, and may be 0.125, 0.254, and 0.567, etc.
In this embodiment, the default basic score in the model building system is 300 points, and the probability doubling score is 50 points. And obtaining a final credit score of 300+50 x ln ((1-P)/P)/ln (2) by a scoring formula according to the default probability value P obtained by the domain decomposition machine model.
In the embodiment, the query request of the user is responded, the information of the user to be evaluated is determined according to the query request, the default probability value corresponding to the information of the user to be evaluated is determined through the domain decomposition machine model, and the credit score value corresponding to the default probability value is determined based on the credit score card model. Therefore, in the embodiment, the carried user information to be evaluated is directly determined according to the query request, the default probability value of the user is determined through the domain decomposition machine model, and finally the credit score value corresponding to the user is obtained through the basic score value, the doubling probability score value and the default probability value.
Further, referring to fig. 4, fig. 4 is a flow chart of the present application for determining a credit score based on a domain decomposition machine. The step of outputting the credit score of the client (user) by the model construction system is that firstly, the model construction system obtains user information from a database, continuous characteristic normalization processing and category independent hot coding processing are carried out on the user information, and then, the corresponding relation between the characteristic and the domain is constructed on the user information after the continuous characteristic normalization processing and the category independent hot coding processing based on an FFM (domain decomposition machine) model, so that a credit score card model is constructed. Secondly, after the model building system completes the credit rating card model, the model building system receives input data (user information to be evaluated) of a user, then the input data is input into a domain decomposition machine, the default probability P value of the user is obtained through calculation according to the domain decomposition machine model, then the default probability P value is converted into a corresponding credit rating through the credit rating card model, and the credit rating is output to a display interface of the model building system.
In addition, the present application further provides a domain decomposition machine-based model building apparatus, referring to fig. 5, fig. 5 is a schematic structural diagram of the domain decomposition machine-based model building apparatus according to the present application, where the domain decomposition machine-based model building apparatus includes:
theconversion module 10 is configured to acquire user information, and convert classification features in the user information into corresponding binary features through one-hot coding;
a determiningmodule 20, configured to determine corresponding feature domains based on the binary features;
theprocessing module 30 is configured to perform normalization processing on the numerical features in the user information;
theconstruction module 40 is used for constructing a corresponding domain decomposition machine model based on the numerical characteristics and the characteristic domains after the normalization processing;
thebuilding module 40 is further configured to train the domain decomposition machine model, and build a corresponding credit rating card model based on the trained domain decomposition machine model.
Further, theconversion module 10 is further configured to convert the classification features in the user information into corresponding binary features through one-hot encoding according to preset classification variables.
Further, the determiningmodule 20 includes:
and the classification unit is used for classifying the binary characteristics according to the data characteristics of the user information, classifying the binary characteristics obtained by converting the same data characteristics into the same characteristic domain, and obtaining the corresponding characteristic domains.
Further, thebuilding module 40 is further configured to build a linear part and a nonlinear part corresponding to the model based on the normalized numerical features and the feature domains;
theconstruction module 40 is further configured to construct the domain decomposer model based on the linear part and the non-linear part;
the determiningmodule 20 is further configured to determine sample data, and solve the domain decomposition machine model according to the sample data and a preset loss function to obtain a parameter solution corresponding to the domain decomposition machine model;
thebuilding module 40 is further configured to build the linear part and the nonlinear part based on the parameter solution, and build a corresponding credit rating card model based on the trained linear part and nonlinear part;
the determiningmodule 10 is further configured to respond to a query request of a user, determine user information to be evaluated according to the query request, and determine a default probability value corresponding to the user information to be evaluated through the domain decomposition machine model;
the determiningmodule 10 is further configured to determine a credit score corresponding to the default probability value based on the credit score card model;
the determiningmodule 10 is further configured to determine a basic score and a doubling probability score, and determine a credit score of the user to be assessed through the credit score card model and based on the basic score, the doubling probability score and the default probability value.
The specific implementation of the model building apparatus based on the domain decomposition machine of the present application is substantially the same as that of each embodiment of the model building method based on the domain decomposition machine, and is not described herein again.
In addition, the embodiment of the present application also provides a storage medium, on which a domain decomposition machine-based model building program is stored, and the domain decomposition machine-based model building program, when executed by a processor, implements the steps of the domain decomposition machine-based model building method as described above.
The specific implementation of the storage medium of the present application is substantially the same as that of each embodiment of the model construction method based on the domain decomposition machine, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation manner in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of software goods stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and including instructions for causing a system to perform the methods according to the embodiments of the present application.