CN112668723B

Movatterモバイル変換

Info

Publication number: CN112668723B
Application number: CN202011589671.8A
Authority: CN
Inventors: 李国琪
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2024-01-02
Anticipated expiration: 2040-12-29
Also published as: CN112668723A

Abstract

The embodiment of the invention provides a machine learning method and a machine learning system. Wherein the method comprises the following steps: the decision-making end determines a target mapping relation identifier corresponding to a data set to be processed of the execution end according to a preset feature engineering policy, wherein the target mapping relation identifier is used for representing a target mapping relation between original features of all objects in the data set to be processed and target features of all objects, and the target features are features obtained by carrying out feature engineering on the original features according to the preset feature engineering policy; the decision-making end sends the target mapping relation identification to the executing end; the execution end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain target characteristics of each object; the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object. The development cost of machine learning can be effectively reduced.

Description

Machine learning method and system

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a machine learning method and system.

Background

An electronic device with machine learning capability may obtain a model through machine learning, the model being used to represent a mapping relationship between features and results learned from a dataset, and use the model for reasoning. However, in some application scenarios, there may not be a displayed mapping relationship between the features in the data set and the results, so it is difficult for the electronic device to learn, according to the features in the data set, a model capable of effectively representing the mapping relationship between the features and the results, that is, machine learning is low in efficiency and poor in accuracy.

Therefore, in these application scenarios, feature engineering needs to be performed on the data set, so that a more obvious mapping relationship exists between the features in the data set and the results, and for convenience of description, the features of each object in the data set before feature engineering are referred to as original features, and the features of each object in the data set after feature engineering are referred to as target features.

However, the representation of the target feature after different feature engineering according to the machine learning framework used may be different, such as the representation of the target feature under a Python (a programming language) framework and a java (a programming language) framework. In order to enable the target features obtained after feature engineering to be applicable to different machine learning frames, corresponding feature engineering methods need to be developed for the different machine learning frames. For example, one feature engineering method is developed for a Python framework to obtain target features suitable for the Python framework, and another feature engineering method is developed for a java framework to obtain target features suitable for the java framework.

The development costs of machine learning are high due to the need to develop a number of different feature engineering methods.

Disclosure of Invention

The embodiment of the invention aims to provide a machine learning method so as to reduce the development cost of machine learning. The specific technical scheme is as follows:

in a first aspect of an embodiment of the present invention, there is provided a machine learning method, the method including:

the decision-making end determines a target mapping relation identifier corresponding to a data set to be processed of the execution end according to a preset feature engineering policy, wherein the target mapping relation identifier is used for representing a target mapping relation between original features of all objects in the data set to be processed and target features of all objects, and the target features are features obtained by carrying out feature engineering on the original features according to the preset feature engineering policy;

the decision end sends the target mapping relation identification to the execution end;

the executing end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain target characteristics of each object;

and the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object.

In a possible embodiment, before determining, according to the preset feature engineering policy, the target mapping relationship identifier corresponding to the to-be-processed data set of the executing end, the method further includes:

the method comprises the steps that an execution end collects a mapping relation and a mapping relation identifier which are supported by the execution end to obtain mapping relation information, wherein the mapping relation information is used for representing a corresponding relation between the mapping relation supported by the execution end and the mapping relation identifier;

the execution end sends the mapping relation information to a decision end;

the decision terminal receives the mapping relation information sent by the execution terminal;

the determining the target mapping relation identifier corresponding to the data set to be processed of the execution end according to the preset characteristic engineering strategy comprises the following steps:

determining a target mapping relation between original features and target features of a data set to be processed of an execution end according to a preset feature engineering strategy;

and determining a mapping relation identifier corresponding to the target mapping relation as a target mapping relation identifier according to the corresponding relation represented by the mapping relation information.

In a possible embodiment, the determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a data set to be processed at an executing end includes:

Determining a target characteristic engineering strategy corresponding to the data set to be processed from a plurality of preset characteristic engineering strategies;

and determining a target mapping relation identifier corresponding to the data set to be processed of the execution end by adopting the target feature engineering strategy.

In a possible embodiment, the determining, from a plurality of different preset feature engineering policies, a target feature engineering policy corresponding to the data set to be processed includes:

determining, for each preset feature engineering policy, a pre-estimated score of the preset feature engineering policy, where the pre-estimated score is used to represent a degree of dispersion of feature values in each dimension in a target feature obtained by performing feature engineering on original features of a to-be-processed data set at an execution end according to the preset feature engineering policy, and the pre-estimated score is inversely related to the degree of dispersion;

and determining a preset characteristic engineering strategy with the highest estimated score as a target characteristic engineering strategy.

In a possible embodiment, after the mapping the original features of the to-be-processed dataset according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target features of the to-be-processed dataset, the method further includes:

The decision end takes the target feature of the data set to be processed as the new original feature of the data set to be processed, and returns to execute the step of determining the target mapping relation identifier corresponding to the data set to be processed of the execution end according to the preset feature engineering strategy;

the machine learning based on the target features of the objects, to obtain a model for processing the same class objects of the objects, includes:

and performing machine learning based on the target characteristics of each object until a preset cycle ending condition is reached, and obtaining a model for processing the similar objects of each object.

In a second aspect of embodiments of the present invention, there is provided a machine learning system, the system comprising a decision-making end and an execution end;

the decision-making end is used for determining a target mapping relation identifier corresponding to a data set to be processed of the executing end according to a preset feature engineering strategy, wherein the target mapping relation identifier is used for representing a target mapping relation between original features of each object in the data set to be processed and target features of each object, and the target features are features obtained by carrying out feature engineering on the original features according to the preset feature engineering strategy; the target mapping relation identification is sent to the execution end;

The execution end is used for mapping the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain target characteristics of each object; and performing machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects.

In a possible embodiment, the executing end is further configured to collect a mapping relationship and a mapping relationship identifier that are supported by the executing end to obtain mapping relationship information, where the mapping relationship information is used to represent a correspondence relationship between the mapping relationship supported by the executing end and the mapping relationship identifier; sending the mapping relation information to a decision terminal;

the decision end is further used for receiving the mapping relation information sent by the execution end;

the decision-making end is specifically used for determining a target mapping relation between original features and target features of a data set to be processed of the execution end according to a preset feature engineering strategy;

In a possible embodiment, the decision-making end is specifically configured to determine, for each preset feature engineering policy, a pre-estimated score of the preset feature engineering policy, where the pre-estimated score is used to represent a degree of dispersion of feature values in each dimension of a target feature obtained by feature engineering of an original feature of a data set to be processed at the execution end according to the preset feature engineering policy, and the pre-estimated score is inversely related to the degree of dispersion;

In a possible embodiment, the decision-making end is further configured to use the target feature of the to-be-processed data set as a new original feature of the to-be-processed data set, and return to execute the step of determining, according to a preset feature engineering policy, the target mapping relationship identifier corresponding to the to-be-processed data set of the executing end;

the execution end is specifically configured to perform machine learning based on the target features of the objects until a preset cycle end condition is reached, so as to obtain a model for processing similar objects of the objects.

And returning to the step of executing the target mapping relation identification sent by the receiving decision end until the preset cycle ending condition is reached.

The embodiment of the invention has the beneficial effects that:

according to the machine learning method and system provided by the embodiment of the invention, the decision-making end can determine the target mapping relation identification according to the preset characteristic engineering strategy, and the execution end is guided to map the original characteristics in the data set by utilizing the target mapping relation identification, so that the characteristic engineering and the characteristic mapping are decomposed into two mutually independent steps, and the execution end can map the characteristics according to the machine learning framework used by the execution end, so that the obtained target characteristics can be suitable for the machine learning framework used by the execution end.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a machine learning method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another machine learning method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to more clearly describe the machine learning method provided by the embodiment of the present invention, an exemplary description will be given below of one possible application scenario of the machine learning method provided by the embodiment of the present invention, and it will be understood that the following example is only one possible application scenario of the machine learning method provided by the embodiment of the present invention, and in other possible embodiments, the machine learning provided by the embodiment of the present invention may also be applied to other possible application scenarios, which is not limited in any way.

If the user needs to obtain a classification model for judging whether the person is overweight through machine learning, data of a plurality of persons can be collected in advance to obtain a data set. The dataset includes raw features of a plurality of persons, and the raw features of each person may include the height, weight of the person.

Assuming machine learning based on the original features of each person, the classification model obtained by machine learning can be theoretically used to represent the following mapping relationship:

R＝F(H，W)

where R is a classification result for indicating whether a person is overweight, for example, it may be that R is overweight when it is greater than a preset threshold, it is that R is not overweight when it is not greater than a preset threshold, H is the height of the person, W is the weight of the person, and F (H, W) is a mapping function, it being understood that whether a person is overweight depends on both the height of the person and the weight of the person, if it is assumed that it is determined whether a person is overweight in body mass index, it may be desirable in one possible embodiment that F (H, W) is expressed in the following form:

if the original characteristics of each person are subjected to characteristic engineering, for example, if the user experience is that whether the person is overweight is judged to be often dependent on the body mass index, the body mass index of each person can be obtained as the target characteristic of the person by dividing the weight of the person by the square of the height of the person.

Machine learning is performed based on target features of each person, and a classification model obtained by machine learning can be theoretically used to represent the following mapping relationship:

R＝G(BMI)

where BMI is the body mass index of the person and G (BMI) is the mapping function, if it is assumed that the person is overweight or not, then in one possible embodiment the G (BMI) may be ideally represented in the form:

F(BMI)＝BMI

it can be seen that the form of G (BMI) is simpler than F (H, W), and therefore machine learning based on target features is more efficient and an accurate model is easier to obtain. Therefore, in machine learning, feature engineering is often performed on original features of each person, so as to improve the efficiency and accuracy of machine learning.

However, the manner in which the division of the weight of a person by the square of the height of the person is accomplished under different machine learning frameworks is different, and illustratively the function that implements the square calculation for the java framework is a pow function and the function that implements the square calculation for the python framework is a power function. Therefore, the user is required to develop corresponding codes according to the framework used by the execution end so as to realize feature engineering, and the development cost of machine learning is high.

Referring to fig. 1, fig. 1 is a schematic flow chart of a machine learning method according to an embodiment of the present invention, which may include:

S101, determining a target mapping relation representation corresponding to a data set to be processed of an executing end by a decision end according to a preset characteristic engineering strategy.

S102, the decision end sends the target mapping relation identification to the execution end.

And S103, the execution end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain the target characteristics of each object.

S104, the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object.

According to the method, the decision-making end can determine the target mapping relation identification according to the preset feature engineering strategy, the execution end is guided to map the original features in the data set by utilizing the target mapping relation identification, so that feature engineering and feature mapping are decomposed into two mutually independent steps, the execution end can map the features according to the machine learning framework used by the execution end, the obtained target features can be suitable for the machine learning framework used by the execution end, and the machine learning method provided by the embodiment of the application can obtain the target features suitable for different machine learning frameworks according to the different machine learning frameworks used by the execution end, so that development cost of machine learning can be reduced without developing different feature engineering aiming at different machine learning frameworks.

In S101, the target mapping relationship identifier is used to represent a target mapping relationship between an original feature of each object in the data set to be processed and a target feature of each object, where the target feature is a feature obtained by performing feature engineering on the original feature according to a preset feature engineering policy. The mapping relationship may be represented in different manners according to application scenarios, and in one possible embodiment, the target mapping relationship identifier may be used to implement an operator name of a feature operator of the target mapping relationship, for example, assuming that the target feature is obtained by normalizing an original feature, that is, the target mapping relationship may be implemented by a normalization operator, then the target mapping relationship identifier may be the name of the normalization operator. In other possible embodiments, the target mapping relationship identifier may also be represented by a number, an identifier, or the like of a feature operator for implementing the target mapping relationship, which is not limited in this embodiment.

The objects in the data set to be processed may be objects such as personnel, vehicles, road identifications and the like according to different application scenes, and features included in original features of each object may be different according to different object types and different application scenes, for example, when the object is a person, the original features may include one or more of the features such as height, weight, age, face image, sex, voiceprint, wearing mask and the like of the person, and for example, when the object is a vehicle, the original features may include one or more of the features such as color, model, license plate number, outline and the like of the vehicle.

The decision-making end and the execution end can be two different entity devices, can be two different virtual devices, and can also be one of the two entity devices and the other one of the two virtual devices. When the decision end and the execution end are two virtual devices, the decision end and the execution end can be two virtual devices running on the same entity device or two virtual devices running on different entity devices.

One decision end can be connected with a plurality of execution ends, and one execution end can also be connected with a plurality of decision ends. For convenience of description, a decision end and an execution end are taken as examples, and the principles of the case of the decision end and the execution ends, the decision ends and the execution end, and the decision ends and the execution ends are the same, so that they are not described herein.

In S102, the decision-making end may send the target mapping relationship identifier to the execution end through a connection established with the execution end.

In S103, since the original feature is mapped by the execution end, the mapped target feature should be suitable for the machine learning framework used by the execution end. For example, assuming that the machine learning framework used by the execution end is a Python framework, the target features obtained theoretically are applicable to the Python framework.

It will be appreciated that although the manner in which the mapping is implemented under different machine learning frameworks is different, the mapping implemented is theoretically the same. For example, the normalization is performed in a different manner under the phton framework than under the java framework, but both the phton framework and the java framework can theoretically perform normalization. Therefore, different machine learning frameworks can theoretically map the original features of the data set to be processed according to the mapping relationship represented by the target mapping relationship identifier. That is, the target mapping relation identification sent by the decision-making end can be accurately responded by the execution end using different machine learning frameworks.

In S104, the model obtained by machine learning may be used for performing different processing on the same type of objects as each object according to the application scenario and the actual requirement, for example, may be a classification model for determining the sex of the person, may be a detection model for detecting the road identifier in the image, may be a recognition model for recognizing the license plate number, or may be a model for performing other processing, which is not limited in this embodiment.

In the foregoing analysis in S103, the decision-making end in the machine learning method provided in the embodiment of the present application does not need to care about the machine learning framework used by the execution end, so that the execution end using different machine learning frameworks can all obtain the target features of the machine learning framework suitable for use by itself. Therefore, the machine learning method provided by the embodiment of the invention has strong usability, and different machine learning methods do not need to be developed for different machine learning frames.

Referring to fig. 2, fig. 2 is another flow chart of a machine learning method according to an embodiment of the present invention, which may include:

s201, the execution end collects the mapping relation and the mapping relation identification which are supported by the execution end, and mapping relation information is obtained.

The mapping relation information is used for representing the corresponding relation between the mapping relation supported by the execution end and the mapping relation identification. The execution end can acquire the mapping relation and the mapping relation identification supported by the execution end to obtain the mapping relation information. Taking the mapping relation as an example of the representation in the form of the feature operator, the executing end can scan the feature operator of the machine learning framework and the names of the feature operator, such as normalization, standardization, addition, subtraction, multiplication, division, cross entropy and the like, to obtain the corresponding relation between the feature operator and the name of the feature operator as mapping relation information.

S202, the decision end receives the mapping relation information sent by the execution end.

S203, the decision end determines a target mapping relation between original features and target features of the data set to be processed of the execution end according to a preset feature engineering strategy.

In one possible embodiment, the preset feature engineering policy may be a feature engineering policy, such as any one of a Meta-Learning policy, an Expand-Reduce policy, a Hierarchical organization of transformations policy, and a Reinforcement Learning policy.

In other possible embodiments, the preset feature engineering policy may also be a plurality of feature engineering policies, such as the aforementioned Meta-Learning policies, expand-Reduce policies, hierarchical organization of transformations policies, and Reinforcement Learning policies. And the preset feature engineering strategies can comprise part of the four feature engineering strategies, and can also comprise all the four feature engineering strategies. And, in other possible embodiments, other feature engineering strategies besides the four feature engineering strategies described above may also be included.

When the preset feature engineering strategies include a plurality of feature engineering strategies, a target feature engineering strategy corresponding to the data set to be processed can be determined from a plurality of preset feature engineering strategies, and a target mapping relation corresponding to the data set to be processed at the executing end is determined by adopting the target feature engineering strategy.

The mode of determining the target feature engineering strategy corresponding to the data set to be processed from a plurality of preset feature engineering strategies can be different according to different application scenes, and different preset feature engineering strategies have different advantages, so that different preset feature engineering strategies can be selected according to actual requirements to determine the target mapping relation, and the target features obtained by the execution end can be suitable for different application scenes.

In one possible embodiment, for each preset feature engineering policy, determining an estimated score of the preset feature engineering policy, where the estimated score is used to represent a degree of dispersion of feature values in each dimension of a target feature obtained by performing feature engineering on an original feature of a data set to be processed at an execution end according to the preset feature engineering policy, and the estimated score is inversely related to the degree of dispersion.

By selecting the embodiment, a proper characteristic engineering strategy can be selected from a plurality of preset characteristic engineering strategies which are built in to determine the target mapping relation, so that the machine learning method provided by the embodiment of the invention can be suitable for different application scenes. Meanwhile, as a plurality of preset feature engineering strategies are built in, a user does not need to write codes manually, the feature engineering efficiency is improved, and the labor cost consumed by the feature engineering is reduced. And reduces the demands on the user.

It will be appreciated that features are used to distinguish between different objects, and thus if the degree of dispersion between the feature values of different objects is greater in a feature dimension, different objects may be better distinguished based on the feature values in that feature dimension. If the degree of dispersion between the feature values of different objects is small in a feature dimension, it is difficult to distinguish different objects according to the feature values in the feature dimension.

For example, assuming that the object is a student, a certain characteristic dimension is whether a red scarf is worn, since both male and female students wear the red scarf, that is, the degree of dispersion between characteristic values of each object in the characteristic dimension of whether the red scarf is worn is small, it is difficult to distinguish the male and female students according to whether the red scarf is worn. Assuming that the skirt is worn or not in yet another feature dimension, the skirt is not worn by the male student and the skirt is worn by the female student due to the school suit design, so that the feature values of the male student and the female student are different in the feature dimension of whether the skirt is worn or not, that is, the degree of dispersion between the feature values of the objects in the feature dimension of whether the skirt is worn or not is large, and it is relatively easy to distinguish the male student and the female student according to whether the skirt is worn or not.

The degree of discretization may be expressed differently in different embodiments, for example, in the form of entropy values, and in one possible embodiment, the degree of discretization may be expressed by feature importance calculated by a random forest method.

S204, the decision end determines a mapping relation identifier corresponding to the target mapping relation according to the corresponding relation represented by the mapping relation information, and the mapping relation identifier is used as the target mapping relation identifier.

The decision end can be obtained by determining based on the data set to be processed or based on the characteristic information of the data set to be processed when determining the target mapping relation according to the characteristic engineering strategy. In an exemplary embodiment, if the bandwidth between the decision end and the execution end is sufficient and the transmission rate is fast, the execution end may send the to-be-processed data set to the decision end, where the decision end constructs basic information and meta features of the to-be-processed data set according to the to-be-processed data set as feature information of the to-be-processed data set, and determines to obtain the target mapping relationship according to the feature information of the to-be-processed data set and a preset feature engineering policy.

In another possible embodiment, the executing end may construct basic information and meta features of the data set to be processed according to the data set to be processed as feature information of the data set to be processed, and send the feature information to the decision end, where the decision end determines to obtain the target mapping relationship according to the feature information of the data set to be processed and a preset feature engineering policy.

S205, the decision end sends the target mapping relation identification to the execution end.

This step is the same as S102 described above, and reference may be made to the description of S102 described above, which is not repeated here.

S206, the executing end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identification, and the target characteristics of each object in the data set to be processed are obtained.

This step is the same as S103, and reference may be made to the description of S103, which is not repeated here.

S207, the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object.

This step is the same as S104 described above, and reference may be made to the description of S104 described above, which is not repeated here.

It can be appreciated that in some possible application scenarios, feature conversion is performed only once on the original features, and the obtained target features may still be difficult to have a displayed association relationship with the results. Therefore, in one possible embodiment, after the executing end maps the original feature of the data set to be processed according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target feature of the data set to be processed, the executing end may execute the step of determining the target mapping relationship identifier corresponding to the data set to be processed of the executing end according to the preset feature engineering policy again by using the target feature of the data set to be processed as a new original feature, and send the newly determined target mapping relationship identifier to the executing end, and the executing end maps the original feature of the data to be processed according to the target mapping relationship represented by the newly determined target mapping relationship identifier to obtain the target feature of each object, until the target feature of each object reaches the preset cycle end condition, if the cycle is completed for 3-5 times, the executing end performs machine learning based on the latest target feature of each object, so as to obtain the model for processing the similar objects of each object. By adopting the embodiment, the target characteristics can have more explicit association relation with the result.

In order to more clearly describe the machine learning method provided by the embodiment of the present invention, the feature engineering strategies mentioned in the foregoing S203 will be described, and since each feature engineering strategy is not a main point of the present invention, only a simple description will be made herein.

Meta-learning strategy: the original characteristics of the data to be processed of the execution end can be predicted directly through the meta model in the super parameter library of the decision end, so that the target mapping relation is deduced.

Expandad-Reduce strategy: the Expand-Reduce strategy can be divided into an Expand phase and a Reduce phase, wherein the Expand phase can be executed by the decision end or the execute end, and the Reduce phase is executed by the decision end.

In the Expand phase, k feature transfer functions (T1, T2, T3, …, tk) may be invoked, where T1 is the first feature function, T2 is the second feature function, and so on. And performing feature conversion on the original features to generate new features, and recording the original features as (f 1, f2, …, fn) for descriptive convenience, wherein f1 is the first feature in the original features, f2 is the second feature in the original features, and so on. The newly generated features are (T1 (f 1), T1 (f 2), …, T1 (fn), T2 (f 1), … Tk (fn)), and are called Expand phases because the dimensions of the newly generated features are k x n dimensions, as compared to the original features of n dimensions. It will be appreciated that if the Expand phase is performed by the decision-making end, the executing end needs to send the original features to the decision-making end.

In the Reduce stage, N features are selected from newly generated k-N dimensions according to a preset screening strategy, and the N features can be selected according to the accuracy and/or recall rate and other evaluation indexes during selection. The decision end records the corresponding relation among the feature operators, the feature column identifiers and the feature operators of the selected features, and sends the corresponding relation to the execution end. The feature operator is used for representing a feature conversion function corresponding to the selected feature, and the feature column identifier is used for representing an original feature corresponding to the selected feature. For example, assuming that the selected N features include T2 (f 3), a feature operator for representing the feature conversion function T2 and a feature column identifier for representing the original feature f3 may be recorded, and the execution end may perform feature conversion on the original feature f3 by using the feature conversion function T2 according to the recorded feature operator and feature column identifier, to obtain the feature T2 (f 3).

Hierarchical organization of transformations strategy: hierarchical organization of transformations strategy also includes an Expand phase and a phase similar to the Reduce phase described above. In the Expand phase, the original feature may be expanded into multiple features, for example, assuming that the original feature is represented in the form of a feature table, the feature table may be expanded into multiple feature tables. And training each developed feature to obtain an evaluation value of each feature, such as auc (Area under the ROC curve, area under ROC curve), and accuracy, wherein ROC curve refers to the operation characteristic curve of the receiver.

Reinforcement learning strategy: similar to the principles of the Hierarchical organization of transformations strategy described previously, the only difference is that the search therein is not DFS or BFS, but rather is based on MDP (Markov Decision Process ).

Referring to fig. 3, fig. 3 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention, which may include:

decision end 301 and execution end 302. It can be understood that the machine learning system shown in fig. 3 is only one possible structural schematic diagram of the machine learning system provided by the embodiment of the present invention, and in other possible embodiments, the machine learning system provided by the embodiment of the present invention may also include a plurality of decision terminals 301 and may also include a plurality of execution terminals 302.

The decision-making end 301 is configured to determine, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a data set to be processed of the executing end 302, where the target mapping relationship identifier is used to represent a target mapping relationship between an original feature of each object in the data set to be processed and a target feature of each object, where the target feature is a feature obtained by performing feature engineering on the original feature according to the preset feature engineering policy; sending the target mapping relation identifier to the executing end 302;

The executing end 302 is configured to map the original features of the to-be-processed dataset according to the target mapping relationship represented by the target mapping relationship identifier, so as to obtain target features of the objects; and performing machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects.

In a possible embodiment, the executing end 302 is further configured to collect a mapping relationship and a mapping relationship identifier that are supported by the executing end, so as to obtain mapping relationship information, where the mapping relationship information is used to represent a correspondence relationship between the mapping relationship supported by the executing end and the mapping relationship identifier; sending the mapping relation information to a decision terminal 301;

the decision-making end 301 is further configured to receive the mapping relationship information sent by the executing end 302;

the decision-making end 301 is specifically configured to determine, according to a preset feature engineering policy, a target mapping relationship between an original feature and a target feature of a data set to be processed of the executing end;

In a possible embodiment, the decision-making end 301 is specifically configured to determine, from a plurality of preset feature engineering strategies, a target feature engineering strategy corresponding to the data set to be processed;

In a possible embodiment, the decision-making end 301 is specifically configured to determine, for each preset feature engineering policy, a pre-estimated score of the preset feature engineering policy, where the pre-estimated score is used to represent a degree of dispersion of feature values in each dimension of a target feature obtained by performing feature engineering on an original feature of a data set to be processed at an executing end according to the preset feature engineering policy, and the pre-estimated score is inversely related to the degree of dispersion;

In a possible embodiment, the decision-making end 301 is further configured to take the target feature of the to-be-processed dataset as a new original feature of the to-be-processed dataset, and return to execute the step of determining, according to a preset feature engineering policy, the target mapping relationship identifier corresponding to the to-be-processed dataset of the executing end 302;

The executing end 302 is specifically configured to perform machine learning based on the target features of the objects until a preset cycle end condition is reached, so as to obtain a model for processing the similar objects of the objects.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the system, since they are substantially similar to the method embodiments, the description is relatively simple, as relevant to see the section of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A machine learning method, the method comprising:

the execution end performs machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects;

Before determining the target mapping relation identifier corresponding to the data set to be processed of the execution end according to the preset characteristic engineering strategy, the method further comprises the following steps:

the execution end sends the mapping relation information to a decision end;

the determining the target mapping relation identifier corresponding to the data set to be processed at the executing end according to the preset characteristic engineering strategy comprises the following steps:

determining a mapping relation identifier corresponding to the target mapping relation according to the corresponding relation represented by the mapping relation information, and taking the mapping relation identifier as a target mapping relation identifier; or alternatively, the first and second heat exchangers may be,

Determining a target mapping relation identifier corresponding to a data set to be processed of an execution end by adopting the target feature engineering strategy;

the determining the target feature engineering strategy corresponding to the data set to be processed from a plurality of different preset feature engineering strategies comprises the following steps:

2. The method according to claim 1, wherein after the mapping of the original features of the data set to be processed according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target features of the data set to be processed, the method further comprises:

3. A machine learning system, wherein the machine learning system comprises a decision-making end and an execution end;

the execution end is used for mapping the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain target characteristics of each object; performing machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object;

The execution end is further used for collecting the mapping relation and the mapping relation identifier which are supported by the execution end to obtain mapping relation information, wherein the mapping relation information is used for representing the corresponding relation between the mapping relation and the mapping relation identifier which are supported by the execution end; sending the mapping relation information to a decision terminal;

the method is particularly used for determining a target characteristic engineering strategy corresponding to the data set to be processed from a plurality of preset characteristic engineering strategies;

the decision terminal determines a target feature engineering strategy corresponding to the data set to be processed from a plurality of preset feature engineering strategies, and the decision terminal comprises the following steps: determining, for each preset feature engineering policy, a pre-estimated score of the preset feature engineering policy, where the pre-estimated score is used to represent a degree of dispersion of feature values in each dimension in a target feature obtained by performing feature engineering on original features of a to-be-processed data set at an execution end according to the preset feature engineering policy, and the pre-estimated score is inversely related to the degree of dispersion;

4. The system of claim 3, wherein the decision-making end is further configured to take a target feature of the data set to be processed as a new original feature of the data set to be processed, and return to execute the step of determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to the data set to be processed at the executing end;