Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and the like in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
Model interpretation: in the machine learning task, different models are proposed to model the problem. In addition to the direct output of the model, we need further understanding of the results, for example, which features have the greatest impact on the output of the model, and what factors determine the output for a particular prediction instance, which requires a corresponding interpretation of the model.
The interpretability of the existing model is integral to the model and is divided into two types:
local interpretability of the model: for a specific sample, analyzing the contribution of each characteristic to the final score;
global interpretability of the model: also known as feature importance, refers to the contribution of each feature to the model.
Whether it is a global interpretation or a local interpretation, the existing data model interpretable scheme is directed to the model as a whole, and thus can only interpret the end-to-end model results, with an important focus on the relative contribution size between features. The interpretability of the migration learning is to pay attention to the influence of the source domain and the target domain on the final model respectively, and the change of the characteristic contribution values in the two domains is explained, so that the migration is explained.
The transfer learning is a machine learning method, namely, a model developed for a task A is taken as an initial point and is reused in the process of developing the model for a task B.
As shown in fig. 1, a method for developing a transfer learning model includes the following steps:
step 102: a source task is selected. Selecting a relevant predictive modeling problem with rich data, there is a relationship between the input data, the output data, and concepts learned from the mapping between the input data and the output data for the original task and the target task.
Step 104: a source model is developed. A model is developed for the source task.
Step 106: and reusing the model to obtain a target model. A model applicable to the source task may be used as a learning starting point for the target task. This may involve using the first model in whole or in part, depending on the modeling technique used.
It can be seen that the model developed for task a is used as an initial point in the process of developing the model for task B, and therefore, how effective this kind of transfer learning is very important for the user using the transfer learning. Therefore, it is necessary to develop a model interpretation implementation method for transfer learning.
Fig. 2 is a flowchart illustrating an embodiment of a migration learning model interpretation implementation method according to an embodiment of the present invention.
As shown in fig. 2, the implementation method for interpreting the transfer learning model includes:
step 202: the source model is derived using sample training in the source domain.
In one or more embodiments of the present specification, the samples in the source domain are user consumption behavior data generated by a first user group in a first time period, and the source model is used for predicting the consumption behavior occurrence probability of the target user.
Step 204: explanatory data for the source model is recorded.
In one or more embodiments of the present description, the source model is a gradient boosting decision tree model (GBDT). The explanatory data includes scores for leaf nodes of each decision tree, number of samples on each node of each decision tree, splitting gain for splitting characteristics of intermediate nodes of each decision tree.
Optionally, the splitting feature comprises at least one of a number of items clicked by the user, a category of the items, a brand of the items, a maximum number of pages viewed at a single time, and an average interaction time of the user with the items.
Step 206: and on the basis of the source model, continuously training the source model by using samples in the target domain to obtain a target model.
The samples in the target domain are user consumption behavior data generated by a second user group in a second time period, and the target models are all used for predicting the consumption behavior occurrence probability of the target users.
Step 208: recording explanatory data of the target model.
In one or more embodiments of the present description, the target models are all gradient boosting decision tree models; the explanatory data includes scores for leaf nodes of each decision tree, number of samples on each node of each decision tree, splitting gains for splitting characteristics of intermediate nodes of each decision tree.
Optionally, the splitting feature comprises at least one of a number of items clicked by the user, a category of the items, a brand of the items, a maximum number of pages viewed at a single time, and an average interaction time of the user with the items.
Step 210: and processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model.
Therefore, the source model is trained in the source domain, the corresponding interpretative data are recorded, then the source model is continued in the target domain to obtain the target model, the corresponding interpretative data are recorded, and finally the interpretation result of the migration learning model can be analyzed by comparing the interpretative data, so that the effect of the migration learning is interpreted.
For example, in one or more embodiments of the present description, the source model and the target model are both gradient boosting decision tree models; the explanatory data includes scores for leaf nodes of each decision tree, number of samples on each node of each decision tree, splitting gains for splitting characteristics of intermediate nodes of each decision tree; the interpretive data may include global interpretive data and local interpretive data; the global explanatory data may be calculated from split gains of split characteristics of intermediate nodes of each decision tree, and the local explanatory data may be calculated from scores of leaf nodes of each decision tree and a number of samples at each node of each decision tree. Optionally, the samples in the source domain and the target domain both have nominal tag values; the leaf node score is determined by the gradient boosting decision tree model during training based on the labeled values of the plurality of samples classified into the leaf node. Therefore, when the source model and the target model are gradient lifting decision tree models, the transfer learning model interpretation results can be obtained, and therefore the transfer learning effect is interpreted.
For example, in one or more embodiments of the present specification, the sample in the source domain is user consumption behavior data generated by a first user group in a first time period, the sample in the target domain is user consumption behavior data generated by a second user group in a second time period, and both the source model and the target model are used for predicting the probability of occurrence of consumption behavior of the target user; the splitting characteristic comprises at least one of the number of commodities clicked by the user, commodity category, commodity brand, maximum page number of single browsing and average interaction time between the user and the commodities.
The first time period and the second time period may be different or the same, and the first user group and the second user group may be different or the same. However, when the first time period and the second time period are the same, the first user group and the second user group are different; the first and second time periods are different when the first and second user groups are the same.
Optionally, the first and second time periods are different, and the first and second user groups are also different. For example, the user consumption behavior data of the group A one month before the pair eleven is taken as a source domain, and the user consumption behavior data of the group B on the day of the pair eleven is taken as a target domain. And after the source model is obtained by training the samples of the source domain, the source model is continuously trained by using the samples of the target domain to obtain the target model. Thus, after the explanatory data are compared, the obtained interpretation result of the migration learning model can embody the migration learning effect from daily conversion rate estimation to special date conversion rate estimation.
In one or more embodiments of the present description, after obtaining the interpretation result of the migration learning model, the target model may be further adjusted based on the interpretation result of the migration learning model, i.e., selectively fine-tuned on the input-output pairs in the target domain to adapt it to the target task.
Fig. 3 is a flowchart illustrating another embodiment of a migration learning model interpretation implementation method according to an embodiment of the present invention.
As shown in fig. 3 and 4, the migration learning model interpretation implementation method includes:
step 302: the source model is derived using sample training in the source domain.
In one or more embodiments of the present description, the source model is a gradient boosting decision tree model (GBDT). Referring to FIG. 4, in this step, a set of trees T is first trained on the source domainS 。
Step 304: explanatory data for the source model is recorded.
In one or more embodiments of the present specification, the explanatory data includes scores for leaf nodes of each decision tree of the source model, number of samples on each node of each decision tree, splitting gains for splitting characteristics of intermediate nodes of each decision tree.
Optionally, the explanatory data may comprise global explanatory data GI of the source modelS And local explanatory data LIS (ii) a The global explanatory data may be calculated from splitting gains of splitting characteristics of intermediate nodes of the each decision tree, and the local explanatory data may be calculated from scores of leaf nodes of the each decision tree and a number of samples at each node of the each decision tree.
Step 306: and correcting the source model by using the samples in the target domain to obtain a corrected source model. Referring to FIG. 4, in this step, T is measuredS Corrected to obtain TS’ 。
Optionally, the method for correcting the model may adopt a model correction method commonly used in the art, and details are not described herein.
Step 308: recording explanatory data of the modified source model.
In one or more embodiments of the present specification, the explanatory data includes scores of leaf nodes of each decision tree of the modified source model, number of samples on each node of each decision tree, splitting gains of splitting characteristics of intermediate nodes of each decision tree.
Optionally, said explanatory data may comprise global explanatory data GI of said modified source modelS’ And local explanatory data LIS’ (ii) a What is needed isThe global explanatory data may be calculated from splitting gains of splitting characteristics of intermediate nodes of each decision tree, and the local explanatory data may be calculated from scores of leaf nodes of each decision tree and a number of samples at each node of each decision tree.
Step 310: and continuously training the source model on the modified source model by using the samples in the target domain to obtain the target model. Referring to FIG. 4, in this step, at TS’ On the basis, a plurality of trees T are obtained by continuously learning by using the target domain dataT Obtaining the final target model TS’+T 。
Step 312: recording explanatory data of the target model.
In one or more embodiments of the present description, the target models are all gradient boosting decision tree models; the explanatory data includes scores for leaf nodes of each decision tree of the target model, number of samples at each node of each decision tree, splitting gains for splitting characteristics of intermediate nodes of each decision tree.
Optionally, the explanatory data may include a newly added portion T of the target modelT Global explanatory data GIT And local explanatory data LIT And the target model TS’+T Global explanatory data GIS’+T And local explanatory data LIS’+T (ii) a The global explanatory data may be calculated from splitting gains of splitting characteristics of intermediate nodes of the each decision tree, and the local explanatory data may be calculated from scores of leaf nodes of the each decision tree and a number of samples at each node of the each decision tree.
Step 314: and processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model.
In one or more embodiments of the present description, the global explanatory data is a feature split gain average of the gradient boosting decision tree model; the feature splitting gain average value is an average value of splitting gains of nodes which take target features as splitting features in all decision trees in the gradient lifting decision tree model; the splitting gain is a parameter value on which the splitting principle of the decision tree is based.
For example, for the decision tree, when splitting, the splitting may be performed based on an information gain of the information entropy, and in this case, the splitting gain is the information gain. For another example, if the decision tree is split based on the information gain ratio, the split gain is the information gain ratio. As another example, for the CART decision tree, the splitting gain may also be a Gini index (Gini index). Therefore, the splitting gain is set and calculated according to the type of the decision tree and the selected splitting principle, and the specific gain type is not limited herein.
In one or more embodiments of the present specification, processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model includes:
calculating global explanatory data GI of each splitting characteristic in the source model according to the explanatory data of the source model and the explanatory data of the modified source modelS With global explanatory data GI in said modified source modelS’ A first change value in between; for example, if the global explanatory data is a feature splitting gain average, for each split feature, here GIS And GIS’ All the values can be calculated to obtain a numerical value, and the numerical values are subtracted to obtain the first change value;
and sequencing the first change values of the splitting characteristics to obtain a model interpretation result reflecting the distribution change condition of the core splitting characteristics of the sample data set of the source domain.
Here, for example, the ranking is arranged from small to large according to the numerical value, the split feature corresponding to the top ranking is the feature with small distribution change, and the feature with large distribution change is the feature with large distribution change, that is, the distribution change condition of the sample data set core split feature of the source domain can be reflected, so as to guide the subsequent model adjustment.
In one or more embodiments of the present specification, processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model includes:
calculating global interpretative data GI of each splitting feature in the source model according to the interpretative data of the source model and the interpretative data of the target modelS With global explanatory data GI in the object modelS’+T A second variation value therebetween; for example, if the global explanatory data is a feature split gain average, for each split feature, here GIS And GIS’+T All the values can be calculated to obtain a numerical value, and the numerical values are subtracted to obtain the second change value;
and sequencing the second change values of the splitting characteristics, and interpreting a model interpretation result reflecting the distribution change condition of the core splitting characteristics in the migration learning.
Here, for example, the ranking is arranged from small to large according to numerical values, and the split features that are ranked earlier are features with small distribution change, and the split features that are ranked later are features with large distribution change, so that the distribution change situation of the core split features of the whole sample data set in the migration learning can be reflected, and guidance is provided for subsequent model adjustment.
In one or more embodiments of the present specification, processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model includes:
according to the explanatory data of the target model and the explanatory data of the modified source model, the target model (namely the newly added part T in the target model) after the modified source model is removedT ) Global explanatory data GI of each split feature inT And sequencing to obtain a model interpretation result reflecting the change or newly added situation of the core splitting characteristic of the sample data set of the target domain, thereby guiding the subsequent model adjustment.
In one or more embodiments of the present specification, processing to obtain the interpretation result of the migration learning model according to the interpretative data of the source model and the interpretative data of the target model includes:
extracting local explanatory statistical data of the source model and local explanatory statistical data of the target model from the explanatory data of the source model and the explanatory data of the target model;
and according to the local explanatory statistical data of the source model and the local explanatory statistical data of the target model, explaining the migration learning influence of the single data to obtain a local explanatory model explanation result.
Here, the local explanatory data may be calculated using a method commonly used in the art, and is not particularly limited herein.
In one or more embodiments of the present description, the local explanatory data may be calculated based on scores of leaf nodes of each decision tree and a number of samples at each node of each decision tree. The samples in the source domain and the target domain both have a nominal tag value; the leaf node score is determined by the gradient boosting decision tree model during training based on the labeled values of the plurality of samples classified into the leaf node. Alternatively, the calculation method of the local explanatory data of the single piece of data may adopt the following method:
selecting a sample;
determining a propagation path of the sample in the source model and/or the target model;
finding out a leaf node where the sample is located and obtaining a score of the leaf node;
acquiring splitting characteristics and scores of all father nodes on the propagation path, wherein the scores of all father nodes are determined based on the scores of all leaf nodes of a decision tree where the father nodes are located; for example, the score of the parent node may be the average of the scores of its two children;
for each child node on the propagation path, determining a feature local increment at each child node through the score of the child node, the score of the parent node and the splitting feature of the parent node; for example, the local increment of the feature is the score of the child node minus the score of the parent node;
acquiring a set of splitting characteristics corresponding to all child nodes on the propagation path;
and obtaining local explanatory data of the splitting characteristics of the sample corresponding to the at least one child node by adding the characteristic local increments of the at least one child node corresponding to the same splitting characteristics on the propagation path.
In one or more embodiments of the present specification, the samples in the source domain are user consumption behavior data generated by a first user group in a first time period, the samples in the target domain are user consumption behavior data generated by a second user group in a second time period, and both the source model and the target model are used for predicting the consumption behavior occurrence probability of the target user; the splitting characteristic comprises at least one of the number of commodities clicked by the user, commodity category, commodity brand, maximum page number of single browsing and average interaction time between the user and the commodities.
As can be seen from the foregoing embodiments, one or more embodiments of the present specification provide an interpretation scheme for a lifted tree model migration learning, which can interpret the migration learning based on the lifted tree model, and at the same time, can support local interpretation and global interpretation of the migration model.
It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In this distributed scenario, one device of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.
Fig. 5 is a block diagram and a schematic structural diagram illustrating an embodiment of a migration learning model interpretation implementation apparatus provided in an embodiment of the present invention.
As shown in fig. 5, the migration learning model interpretation implementation apparatus includes:
a sourcemodel training module 401, configured to train using samples in a source domain to obtain a source model;
a source modeldata recording module 402 for recording explanatory data of the source model;
a targetmodel training module 403, configured to continue training the source model by using samples in a target domain on the basis of the source model to obtain a target model;
a target modeldata recording module 404 for recording explanatory data of the target model;
and theinterpretation module 405 is configured to process the source model explanatory data and the target model explanatory data to obtain the migration learning model interpretation result.
As can be seen from the foregoing embodiments, one or more embodiments of the present specification provide an interpretation scheme for a lifted tree model migration learning, which can interpret the migration learning based on the lifted tree model, and at the same time, can support local interpretation and global interpretation of the migration model.
In one or more embodiments of the present description, the source model and the target model are both gradient boosting decision tree models; the explanatory data includes scores for leaf nodes of each decision tree, number of samples on each node of each decision tree, splitting gain for splitting characteristics of intermediate nodes of each decision tree.
In one or more embodiments of the present description, the samples in the source domain and the target domain each have a nominal tag value; the leaf node score is determined during training of the gradient boosting decision tree model based on the calibrated label values of the plurality of samples divided into the leaf node.
In one or more embodiments of the present specification, the present specification further includes a sourcemodel modification module 406, configured to modify the source model by using a sample in the target domain, so as to obtain a modified source model;
the source modeldata recording module 402 is configured to record the interpretive data of the modified source model.
In one or more embodiments of the present specification, the interpretingmodule 405 is configured to:
calculating a first variation value between the global explanatory data of each split feature in the source model and the global explanatory data in the modified source model according to the explanatory data of the source model and the explanatory data of the modified source model;
and sequencing the first change values of the splitting characteristics to obtain a model interpretation result reflecting the distribution change condition of the core splitting characteristics of the sample data set of the source domain.
In one or more embodiments of the present description, the interpretingmodule 405 is configured to:
calculating a second variation value of each split feature between the global explanatory data in the source model and the global explanatory data in the target model according to the explanatory data of the source model and the explanatory data of the target model;
and sequencing the second change values of the splitting characteristics, and interpreting a model interpretation result reflecting the distribution change condition of the core splitting characteristics in the migration learning.
In one or more embodiments of the present specification, the interpretingmodule 405 is configured to:
and sequencing the global explanatory data of all the splitting characteristics in the target model after the corrected source model is removed according to the explanatory data of the target model and the explanatory data of the corrected source model to obtain a model explanation result reflecting the change or the newly added situation of the core splitting characteristics of the sample data set of the target domain.
In one or more embodiments of the present description, the global explanatory data is a feature split gain average of the gradient boosting decision tree model; the feature splitting gain average value is an average value of splitting gains of nodes which take target features as splitting features in all decision trees in the gradient lifting decision tree model; the splitting gain is a parameter value on which the splitting principle of the decision tree is based.
In one or more embodiments of the present description, the interpretingmodule 405 is configured to:
extracting local explanatory statistical data of the source model and local explanatory statistical data of the target model from the explanatory data of the source model and the explanatory data of the target model;
and according to the local explanatory statistical data of the source model and the local explanatory statistical data of the target model, explaining the migration learning influence of the single data to obtain a local explanatory model explanation result.
In one or more embodiments of the present specification, the samples in the source domain are user consumption behavior data generated by a first user group in a first time period, the samples in the target domain are user consumption behavior data generated by a second user group in a second time period, and both the source model and the target model are used for predicting the probability of occurrence of consumption behavior of the target user; the splitting characteristic comprises at least one of the number of commodities clicked by the user, commodity category, commodity brand, maximum page number of single browsing and average interaction time between the user and the commodities.
The apparatus in the foregoing embodiment is used for implementing the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 6 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the device may include: aprocessor 501, amemory 502, an input/output interface 503, acommunication interface 504, and abus 505. Wherein theprocessor 501, thememory 502, the input/output interface 503 and thecommunication interface 504 are communicatively connected to each other within the device via abus 505.
Theprocessor 501 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.
TheMemory 502 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. Thememory 502 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in thememory 502 and called to be executed by theprocessor 501.
The input/output interface 503 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.
Thecommunication interface 504 is used to connect a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
Bus 505 comprises a path that transfers information between the various components of the device, such asprocessor 501,memory 502, input/output interface 503, andcommunication interface 504.
It should be noted that although the above-mentioned device only shows theprocessor 501, thememory 502, the input/output interface 503, thecommunication interface 504 and thebus 505, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.