Movatterモバイル変換


[0]ホーム

URL:


CN108446741B - Method, system and storage medium for evaluating the importance of machine learning hyperparameters - Google Patents

Method, system and storage medium for evaluating the importance of machine learning hyperparameters
Download PDF

Info

Publication number
CN108446741B
CN108446741BCN201810270934.5ACN201810270934ACN108446741BCN 108446741 BCN108446741 BCN 108446741BCN 201810270934 ACN201810270934 ACN 201810270934ACN 108446741 BCN108446741 BCN 108446741B
Authority
CN
China
Prior art keywords
data set
meta
hyperparameter
parameter
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810270934.5A
Other languages
Chinese (zh)
Other versions
CN108446741A (en
Inventor
孙运雷
魏倩
孔言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East ChinafiledCriticalChina University of Petroleum East China
Priority to CN201810270934.5ApriorityCriticalpatent/CN108446741B/en
Publication of CN108446741ApublicationCriticalpatent/CN108446741A/en
Application grantedgrantedCritical
Publication of CN108446741BpublicationCriticalpatent/CN108446741B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了机器学习超参数重要性评估方法、系统及存储介质,获取OpenML中不同的数据集,并提取元特征来表示每个数据集,同时收集待评估分类算法在不同超参配置下性能的数据;提取元特征来表示使用的目标数据集,并通过计算元特征之间的距离获得目标数据集与历史数据集之间距离的递增序列;使用待评估分类算法不同超参的性能数据来评估超参重要性,根据历史数据集与目标数据集距离递增的有序序列,对距离目标数据集较近的前m个历史数据集依次执行提出的Relief和聚类算法,最终获得待评估分类算法的超参重要性排序并指导的自动化调参过程。本发明对于分类算法黑盒的超参调整给予一定的指导,从而达到节省时间,提高效率的目的。

Figure 201810270934

The invention discloses a machine learning hyperparameter importance evaluation method, system and storage medium, obtains different data sets in OpenML, extracts meta-features to represent each data set, and simultaneously collects the performance of the classification algorithm to be evaluated under different hyperparameter configurations data; extract meta-features to represent the target data set used, and obtain an increasing sequence of distances between the target data set and historical data sets by calculating the distance between meta-features; use the performance data of different hyperparameters of the classification algorithm to be evaluated to Evaluate the importance of hyperparameters. According to the ordered sequence of increasing distance between the historical data set and the target data set, execute the proposed Relief and clustering algorithm on the first m historical data sets that are closer to the target data set in turn, and finally obtain the classification to be evaluated. The algorithm's hyperparameter importance ranks and guides the automated parameter tuning process. The invention provides certain guidance for the hyperparameter adjustment of the black box of the classification algorithm, so as to achieve the purpose of saving time and improving efficiency.

Figure 201810270934

Description

Translated fromChinese
机器学习超参数重要性评估方法、系统及存储介质Method, system and storage medium for evaluating the importance of machine learning hyperparameters

技术领域technical field

本发明是机器学习超参数重要性评估方法、系统及存储介质。The present invention is a method, a system and a storage medium for evaluating the importance of machine learning hyperparameters.

背景技术Background technique

机器学习为数据处理和数据分类提供了重要的技术支撑,然而模型选择和调参依然是困扰用户的两大难题,于是自动化机器学习系统应运而生。自动化机器学习系统利用自动化机器学习算法达到了自动化数据预处理,自动化选择算法,自动化调参的目的,提高了数据分类预测的准确性,同时将用户从选择算法和反复调参的繁重任务中解脱出来。Machine learning provides important technical support for data processing and data classification. However, model selection and parameter adjustment are still two major problems that plague users, so automated machine learning systems emerge as the times require. The automated machine learning system uses automated machine learning algorithms to achieve the purpose of automatic data preprocessing, automatic algorithm selection, and automatic parameter adjustment, improving the accuracy of data classification and prediction, and at the same time freeing users from the arduous task of selecting algorithms and repeatedly adjusting parameters come out.

由于自动化机器学习的核心是自动化算法选择及自动化超参配置,因此该系统将机器学习过程归约成了算法选择和超参优化(Combined Algorithm Selection andHyper-parameter optimization,CASH)问题。CASH问题即把算法的选择当做根层次的新的超参数,从而将选择算法和超参数值的问题映射到选择超参值的问题。通过将数据预处理和特征选择技术作为超参数,系统可以自动选择数据预处理和特征选择技术。最终归结为的超参优化问题可以通过经典的贝叶斯优化算法找到最优解,从而达到提升数据分类预测精度的效果。Since the core of automatic machine learning is automatic algorithm selection and automatic hyper-parameter configuration, the system reduces the machine learning process to the problem of algorithm selection and hyper-parameter optimization (CASH). The CASH problem regards the choice of algorithm as a new hyperparameter at the root level, thereby mapping the problem of choosing an algorithm and hyperparameter values to the problem of choosing hyperparameter values. By using data preprocessing and feature selection techniques as hyperparameters, the system can automatically select data preprocessing and feature selection techniques. The hyperparameter optimization problem that ultimately boils down to can find the optimal solution through the classical Bayesian optimization algorithm, so as to achieve the effect of improving the accuracy of data classification and prediction.

然而目前的自动化机器学习系统的超参配置模块的配置过程全凭经验,或者通过反复迭代得到最后的结果来对若干个超参数的配置进行一一调整,这样存在的缺陷是:浪费机器学习的时间,而且反复迭代也浪费计算机资源,不分重要性地对所有超参数的配置进行调整会浪费用户的时间和精力。However, the configuration process of the hyperparameter configuration module of the current automated machine learning system is entirely based on experience, or the configuration of several hyperparameters is adjusted one by one through repeated iterations to obtain the final result. Time, and repeated iterations waste computer resources, tweaking the configuration of all hyperparameters indiscriminately wastes the user's time and effort.

发明内容SUMMARY OF THE INVENTION

本发明是机器学习超参数重要性评估方法、系统及存储介质,所要解决的技术问题是如何准确评估机器学习算法的超参重要性,并将其用于指导自动化超参配置以及增强超参配置的可解释性问题。The present invention is a machine learning hyperparameter importance evaluation method, system and storage medium, and the technical problem to be solved is how to accurately evaluate the hyperparameter importance of the machine learning algorithm, and use it to guide automatic hyperparameter configuration and enhance hyperparameter configuration interpretability issues.

作为本发明的第一方面:As a first aspect of the present invention:

机器学习超参数重要性评估方法,包括:Machine learning hyperparameter importance assessment methods, including:

步骤(1):从开放式机器学习环境OpenML中获取与目标数据集类型相似的若干新数据集,并对每个新数据集提取元特征向量,使得每个新数据集都用元特征向量来表示;Step (1): Obtain several new datasets of the same type as the target dataset from the open machine learning environment OpenML, and extract the meta-feature vector for each new dataset, so that each new dataset is represented by the meta-feature vector. express;

从开放式机器学习环境OpenML中收集待评估分类算法在不同超参数配置下性能的数据;Collect data on the performance of the classification algorithm to be evaluated under different hyperparameter configurations from OpenML, an open machine learning environment;

将每个新数据集的元特征向量以及不同超参数配置对应的性能数据存储于对应的历史数据集中;Store the meta-feature vector of each new dataset and the performance data corresponding to different hyperparameter configurations in the corresponding historical dataset;

步骤(2):提取目标数据集的元特征向量来表示目标数据集,计算目标数据集元特征向量与历史数据集元特征向量之间的距离,获得目标数据集与每个历史数据集之间距离由近至远的距离序列;Step (2): Extract the meta-feature vector of the target data set to represent the target data set, calculate the distance between the meta-feature vector of the target data set and the meta-feature vector of the historical data set, and obtain the distance between the target data set and each historical data set distance sequence from near to far;

步骤(3):对距离目标数据集最近的前f个历史数据集依次执行Relief-Cluster算法:通过Relief算法得到的每类超参数的权重,进一步计算每类超参数的平均权重,利用每类超参数的平均权重初步得到每类超参数重要性权重排序;利用聚类算法进一步验证超参数重要性评估的准确性;最后,得到待评估分类算法的超参数重要性排序。Step (3): Execute the Relief-Cluster algorithm on the first f historical data sets closest to the target data set in turn: obtain the weight of each type of hyperparameters through the Relief algorithm, and further calculate the average weight of each type of hyperparameters. The average weight of the hyperparameters preliminarily obtains the ranking of the importance weights of each type of hyperparameters; the clustering algorithm is used to further verify the accuracy of the hyperparameter importance evaluation; finally, the hyperparameter importance ranking of the classification algorithm to be evaluated is obtained.

所述机器学习超参数重要性评估方法,包括以下步骤:The method for evaluating the importance of machine learning hyperparameters includes the following steps:

步骤(4):根据得到的待评估分类算法的超参数重要性排序,对重要性排序靠前的若干个参数进行设置,然后,利用设置好参数的分类算法对待分类数据进行分类。Step (4): According to the obtained hyperparameter importance ranking of the classification algorithm to be evaluated, set several parameters with the highest importance ranking, and then use the classification algorithm with the set parameters to classify the data to be classified.

所述步骤(1)中,每个数据集Di被描述为由F个元特征表示的向量In the step (1), each dataset Di is described as a vector represented by F element features

所述步骤(1)中,元特征,包括:简单的元特征、数据集的统计元特征和重要性元特征;In the step (1), the meta-features include: simple meta-features, statistical meta-features and important meta-features of the data set;

所述简单的元特征,包括:数据集样本数量、特征数量、类别数量或缺失值数量;The simple meta-features include: the number of data set samples, the number of features, the number of categories or the number of missing values;

所述数据集的统计元特征,包括:平均值、方差或距离向量的峰度;Statistical meta-features of the data set, including: mean, variance or kurtosis of distance vectors;

重要性元特征,包括:在数据集上运行机器学习算法获得的性能。Importance meta-features, including: performance obtained by running machine learning algorithms on the dataset.

所述步骤(1)中待评估分类算法在不同超参数配置下的性能,包括:错误分类率或者RMSE;In the step (1), the performance of the classification algorithm to be evaluated under different hyperparameter configurations, including: misclassification rate or RMSE;

另外,对于许多常见算法,开放式机器学习环境OpenML已经包含了非常全面的性能数据,适用于各种数据集上的不同超参数配置,即收集数据集Di在待评估分类算法下的超参配置θi及性能yi数据

Figure GDA0002223355370000022
In addition, for many common algorithms, the open machine learning environment OpenML already contains very comprehensive performance data for different hyperparameter configurations on various datasets, that is, collecting the hyperparameters of the dataset Di under the classification algorithm to be evaluated Configuration θi and performanceyi data
Figure GDA0002223355370000022

对于目标数据集DN',提取元特征VN'来表示目标数据集,并基于不相似的数据集其使用算法的超参数配置也具有差异这一原则,利用元特征向量之间的距离获得目标数据集与历史数据集之间的距离序列。对距离目标数据集近的前f个历史数据集,使用算法在不同超参数的性能数据来评估超参数重要性;For the target data setDN' , extract the meta-feature VN' to represent the target data set, and based on the principle that the hyperparameter configuration of the algorithm used by dissimilar data sets also has differences, use the distance between the meta-feature vectors to obtain A sequence of distances between the target dataset and the historical dataset. For the first f historical datasets close to the target dataset, use the performance data of the algorithm in different hyperparameters to evaluate the importance of hyperparameters;

利用元特征向量之间的距离来衡量目标数据集DN'与历史数据集Di之间的距离dpn(DN′,Di):Use the distance between the meta-feature vectors to measure the distance dpn (DN' , Di ) between the target dataset DN' and the historical dataset Di :

dpn(DN′,Di)=||VN′-Vi||pndpn (DN′ , Di )=||VN′ −Vi ||pn

其中,VN'表示数据集DN'的元特征向量,Vi表示历史数据集Di的元特征向量,pn表示p范数。Among them, VN' represents the meta-feature vector of the data set DN' , Vi represents the meta-feature vector of the historical data set Di , and pn represents the p-norm.

通过目标数据集与历史数据集元特征向量之间的距离比较,得到历史数据集与目标数据集距离由近至远的排序序列π(1),...,π(N),其中

Figure GDA0002223355370000031
By comparing the distance between the target data set and the meta-feature vector of the historical data set, the sorted sequence π(1), ..., π(N) of the distance between the historical data set and the target data set from near to far is obtained, where
Figure GDA0002223355370000031

根据历史数据集与目标数据集距离由近至远的排序队列π(1),...,π(N),对距离目标数据集较近的前f个历史数据集依次执行Relief-Cluster算法。首先通过Relief算法得到的每类超参的平均权重来初步评估超参重要性,然后利用聚类算法的r(C)指标进一步验证超参重要性评估的准确性,重复以上两步m次,选择r(C)指标最大时对应的超参重要性评估结果,最后得到待评估分类算法的超参重要性排序,转而用于指导目标数据集在待评估分类算法的自动化调参过程。According to the sorting queue π(1), ..., π(N) of the distance between the historical data set and the target data set, the Relief-Cluster algorithm is sequentially performed on the first f historical data sets that are closer to the target data set. . Firstly, the importance of hyperparameters is preliminarily evaluated by the average weight of each type of hyperparameters obtained by the Relief algorithm, and then the r(C) index of the clustering algorithm is used to further verify the accuracy of the evaluation of the importance of hyperparameters. Repeat the above two steps m times. Select the corresponding hyperparameter importance evaluation result when the r(C) index is the largest, and finally obtain the hyperparameter importance ranking of the classification algorithm to be evaluated, which is then used to guide the automatic parameter adjustment process of the target data set in the classification algorithm to be evaluated.

所述通过Relief算法得到的每类超参数的权重包括:The weight of each type of hyperparameter obtained by the Relief algorithm includes:

根据不同超参数配置下的性能数据大小设置阈值,将历史数据集中不同超参数配置对应的性能数据分为高性能样本和低性能样本,Relief算法首先从性能数据中随机选择一个样本si,然后从性能高样本和性能差样本中各选择一个距离si最近的样本;The threshold is set according to the performance data size under different hyperparameter configurations, and the performance data corresponding to different hyperparameter configurations in the historical data set is divided into high-performance samples and low-performance samples. The Relief algorithm first randomly selects a samplesi from the performance data, and then Select a sample that is closest tosi from the high-performance sample and the poor-performance sample;

与si同类的样本sj用M表示,与si不同类的样本sj用Q表示,每类超参数h的权重wh根据公式(1)更新:The samples sj of the same class as si are denoted by M, and the samples sj of different classes from si are denoted by Q, and the weight wh of the hyperparameter h of each class is updated according to formula (1):

wh=wh-diff(h,si,M)/rt+diff(h,si,Q)/rt (1)wh =wh -diff(h,si ,M)/rt+diff(h,si ,Q)/rt (1)

diff(h,si,M)表示两个样本si与M在超参数h上的差异;diff(h,si ,M) represents the difference between the two samples si and M on the hyperparameter h;

diff(h,si,Q)表示两个样本si与Q在超参数h上的差异;diff(h, si , Q) represents the difference between the two samples si and Q on the hyperparameter h;

两个样本si与sj在超参数h上的差异diff(h,si,sj)定义为:The difference diff(h,si ,sj ) of two samples si and sj on the hyperparameter h is defined as:

若超参数h为标量型超参数,If the hyperparameter h is a scalar hyperparameter,

Figure GDA0002223355370000032
Figure GDA0002223355370000032

若超参数h为数值型超参数,If the hyperparameter h is a numerical hyperparameter,

Figure GDA0002223355370000033
Figure GDA0002223355370000033

其中,1≤i≠j≤m,1≤h≤ph,maxh为超参数h在样本集中的最大值,minh为超参数h在样本集中的最小值,m表示样本数,每个样本包含ph个超参数,rt表示迭代次数,rt>1,为了避免一次抽样的随机性;sih表示在样本si上超参h的值,sjh表示在样本sj上超参h的值。Among them, 1≤i≠j≤m, 1≤h≤ph, maxh is the maximum value of the hyperparameter h in the sample set, minh is the minimum value of the hyperparameter h in the sample set, m represents the number of samples, and each sample Contains ph hyperparameters, rt represents the number of iterations, rt>1, in order to avoid the randomness of one sampling; sih represents the value of the hyperparameter h on the samplesi , and sjh represents the value of the hyperparameter h on the samplesj .

由公式(1)可知,对于高性能贡献大的超参数表现为在异类间差异大而在同类间差异小,因此具有区分能力的超参数的权值为正值。From formula (1), it can be seen that the hyperparameters that contribute greatly to high performance have large differences between different classes and small differences between similar classes, so the weights of hyperparameters with discriminating ability are positive values.

为避免一次抽样的随机性,迭代进行rt>1次,得到每类超参的重要性权重排序。In order to avoid the randomness of one sampling, iteratively performs rt>1 times to obtain the importance weight ranking of each type of hyperparameters.

所述利用聚类算法进一步验证超参数重要性评估的准确性包括:The described use of clustering algorithm to further verify the accuracy of hyperparameter importance evaluation includes:

根据得到的每类超参数的重要性权重排序,对位于前k类的超参数进行聚类,并计算超参数重要性,假设超参数样本集为S,T为超参数样本集合的大小,K为超参数样本所属类的个数,pik表示样本隶属于类k的概率,Ck表示超参数样本的实际类标签,C表示超参数集,则在C的重要性度量r(C)表示为:According to the obtained importance weights of each type of hyperparameters, the hyperparameters located in the top k categories are clustered, and the hyperparameter importance is calculated, assuming that the hyperparameter sample set is S, T is the size of the hyperparameter sample set, K is the number of classes to which the hyperparameter samples belong, pik represents the probability that the samples belong to class k, Ck represents the actual class label of the hyperparameter samples, and C represents the hyperparameter set, then the importance measure r(C) in C represents for:

Figure GDA0002223355370000041
Figure GDA0002223355370000041

Figure GDA0002223355370000042
Figure GDA0002223355370000042

Figure GDA0002223355370000043
Figure GDA0002223355370000043

其中,F(C)表示在超参数集C上聚类的结果与类标签在整个超参数样本集上的差异,C代表超参数集,Fi(C)表示在超参数集C上聚类的结果与类标签在各个类内的差异,Xi表示第i个类的超参数样本集合。Among them, F(C) represents the difference between the result of clustering on the hyperparameter set C and the class label on the entire hyperparameter sample set, C represents the hyperparameter set, and Fi (C) represents the clustering on the hyperparameter set C The difference between the results and the class labels within each class, Xi represents the set of hyperparameter samples of the ith class.

r(C)值越高,聚类结果与实际类标签之间的相关度越大,超参数集C对分类的影响越大。选择r(C)指标最大时对应的超参重要性评估结果。The higher the r(C) value, the greater the correlation between the clustering results and the actual class labels, and the greater the impact of the hyperparameter set C on the classification. Select the corresponding hyperparameter importance evaluation result when the r(C) index is the largest.

类标签是指性能高和性能低的标签。Class labels refer to high-performing and low-performing labels.

作为本发明的第二方面,As a second aspect of the present invention,

机器学习超参数重要性评估系统,包括:存储器、处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成上述任一方法所述的步骤。A machine learning hyperparameter importance assessment system includes: a memory, a processor, and computer instructions stored in the memory and executed on the processor, and when the computer instructions are executed by the processor, the steps described in any of the above methods are completed.

作为本发明的第三方面,As a third aspect of the present invention,

一种计算机可读存储介质,其上运行有计算机指令,所述计算机指令被处理器运行时,完成上述任一方法所述的步骤。A computer-readable storage medium on which computer instructions run, and when the computer instructions are executed by a processor, completes the steps described in any of the above methods.

本发明的有益效果:Beneficial effects of the present invention:

本发明可以准确评估机器学习算法的超参重要性,用于指导自动化超参配置以及增强超参配置的可解释性问题。用于描述机器学习算法本身的超参重要性,为超参配置过程提供有效借鉴和良好的可解释性。此模块着重解决的技术问题为如何准确评估机器学习算法的超参重要性,并将其用于指导自动化超参配置以及增强超参配置的可解释性问题。The invention can accurately evaluate the importance of the hyperparameters of the machine learning algorithm, and is used to guide the automatic hyperparameter configuration and enhance the interpretability of the hyperparameter configuration. It is used to describe the importance of hyperparameters in the machine learning algorithm itself, and provides effective reference and good interpretability for the configuration process of hyperparameters. The technical problem that this module focuses on is how to accurately assess the importance of hyperparameters in machine learning algorithms, and use it to guide automated hyperparameter configuration and enhance the interpretability of hyperparameter configuration.

(1)节约资源,节省时间,通过提供合适的先验知识,缩小搜索空间,使得超参配置过程具有一定的指导性,摆脱以往完全黑盒的状态。(1) Save resources and save time. By providing appropriate prior knowledge and narrowing the search space, the hyperparameter configuration process is instructive to a certain extent, and it can get rid of the previous complete black box state.

(2)同时可以让用户直观的了解哪类超参数对算法性能影响更大。(2) At the same time, it allows users to intuitively understand which type of hyperparameters has a greater impact on algorithm performance.

附图说明Description of drawings

构成本申请的一部分的说明书附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。The accompanying drawings that form a part of the present application are used to provide further understanding of the present application, and the schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute improper limitations on the present application.

图1为本发明提供的流程图;Fig. 1 is the flow chart provided by the present invention;

具体实施方式Detailed ways

应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

本发明充分利用开放式机器学习环境OpenML中的多个数据集以及其每个数据集在多种算法下的性能数据,结合元学习方法计算目标数据集与历史数据集的距离,并利用Relief算法和聚类算法得到待评估分类算法每类超参数的重要性排序,排序结果转而用于指导目标数据集在待评估分类算法的自动化调参过程。本发明为提供合适的先验知识,缩小搜索空间,使得超参配置过程具有一定的指导性,摆脱以往完全黑盒的状态;同时可以让用户直观的了解哪类超参数对算法性能影响更大。The invention makes full use of multiple data sets in the open machine learning environment OpenML and the performance data of each data set under multiple algorithms, combines the meta-learning method to calculate the distance between the target data set and the historical data set, and uses the Relief algorithm. And the clustering algorithm obtains the importance ranking of each type of hyperparameters of the classification algorithm to be evaluated, and the ranking results are then used to guide the automatic parameter adjustment process of the target data set in the classification algorithm to be evaluated. The present invention provides appropriate prior knowledge, reduces the search space, makes the hyperparameter configuration process have certain guidance, and gets rid of the previous complete black box state; at the same time, it allows users to intuitively understand which type of hyperparameters has a greater impact on algorithm performance .

如图1所示,本发明包括以下步骤:As shown in Figure 1, the present invention comprises the following steps:

步骤A、获取OpenML中不同的数据集,并对每个数据集提取元特征,使得每个数据集都可以用元特征来表示,同时收集待评估分类算法在不同超参配置θi下性能yi(例如,错误分类率或者RMSE)的数据

Figure GDA0002223355370000051
并将每个数据集的元特征向量以及不同超参配置对应的性能数据存储于历史数据集样本库;Step A. Obtain different data sets in OpenML, and extract meta-features for each data set, so that each data set can be represented by meta-features, and collect the performance y of the classification algorithm to be evaluated under different hyperparameter configurations θii (e.g. misclassification rate or RMSE) data
Figure GDA0002223355370000051
and store the meta-feature vector of each dataset and the performance data corresponding to different hyperparameter configurations in the historical dataset sample library;

在步骤A中提取的元特征主要包括:简单的元特征(例如,数据集样本数量,特征数量,类别数量,缺失值数量等)、数据集的统计元特征(例如,平均值,方差,距离向量的峰度等)、重要性元特征(例如在数据集上运行机器学习算法获得的性能等信息)这三大部分。The meta-features extracted in step A mainly include: simple meta-features (for example, the number of data set samples, the number of features, the number of categories, the number of missing values, etc.), the statistical meta-features of the data set (for example, the mean, variance, distance, etc.) The three major parts are the kurtosis of the vector, etc.) and the importance meta-features (such as the performance obtained by running the machine learning algorithm on the data set).

步骤B、对于我们使用的目标数据集,我们也提取元特征来表示目标数据集,并基于不相似的数据集其使用算法的超参配置也具有差异这一原则,利用元特征向量之间的距离获得目标数据集与历史数据集之间的距离序列。对距离目标数据集较近的前f个历史数据集,我们可以使用待评估分类算法不同超参的性能数据来评估超参重要性;Step B. For the target data set we use, we also extract meta-features to represent the target data set, and based on the principle of dissimilar data sets, the hyperparameter configuration of the algorithm used is also different, using the difference between the meta-feature vectors. Distance gets the sequence of distances between the target dataset and the historical dataset. For the first f historical datasets that are closer to the target dataset, we can use the performance data of different hyperparameters of the classification algorithm to be evaluated to evaluate the importance of hyperparameters;

在步骤B中,利用元特征向量之间的距离来衡量目标数据集DN'与历史数据集Di(i=1,2,…N)之间的距离,其中的距离公式我们使用的是衡量数据集元特征向量之间差异的常用p-范数:dpn(DN′,Di)=||VN′-Vi||pn。通过目标数据集与历史数据集元特征向量之间的距离比较,我们可以得到历史数据集与目标数据集距离由近至远的排序序列π(1),...,π(N),其中

Figure GDA0002223355370000061
In step B, the distance between the meta-feature vectors is used to measure the distance between the target dataset DN' and the historical dataset Di (i=1,2,...N), where the distance formula we use is A common p-norm to measure the difference between meta-eigenvectors of a dataset: dpn (DN' , Di ) = ||VN' -Vi ||pn . By comparing the distance between the target data set and the meta-feature vector of the historical data set, we can obtain the sorted sequence π(1), ..., π(N) of the distance between the historical data set and the target data set from near to far, where
Figure GDA0002223355370000061

步骤C、根据历史数据集与目标数据集距离由近至远的有序序列,对距离目标数据集较近的前f个历史数据集依次执行我们提出的Relief-Cluster算法。首先通过Relief算法得到的每类超参的平均权重来初步评估超参重要性,然后利用聚类算法的r(C)指标进一步验证超参重要性评估的准确性,重复以上两步m次,选择r(C)指标最大时对应的超参重要性评估结果,最后得到待评估分类算法的超参重要性排序转而用于指导目标数据集在待评估分类算法的自动化调参过程。Step C. According to the ordered sequence of the distance between the historical data set and the target data set from near to far, execute the Relief-Cluster algorithm proposed by us in turn on the first f historical data sets that are closer to the target data set. Firstly, the importance of hyperparameters is preliminarily evaluated by the average weight of each type of hyperparameters obtained by the Relief algorithm, and then the r(C) index of the clustering algorithm is used to further verify the accuracy of the evaluation of the importance of hyperparameters. Repeat the above two steps m times. Select the corresponding hyperparameter importance evaluation result when the r(C) index is the largest, and finally obtain the hyperparameter importance ranking of the classification algorithm to be evaluated, which is then used to guide the automatic parameter adjustment process of the target data set in the classification algorithm to be evaluated.

在本发明中,步骤C具体包括以下步骤:In the present invention, step C specifically comprises the following steps:

步骤C1、我们根据不同超参配置下的性能数据大小设置阈值将数据分为性能高的一类和性能差的一类,Relief算法首先从超参样本集合中随机选择一个样本si,然后从两类样本中各选择一个距离si最近的样本。与si同类的样本用M表示,与si不同类的样本用Q表示,每类超参h的权重wh根据公式(1)更新:Step C1, we set the threshold according to the performance data size under different hyperparameter configurations to divide the data into a class with high performance and a class with poor performance. The Relief algorithm first randomly selects a sample si from the hyperparameter sample set, and then selects a Select a sample that is closest tosi in each of the two types of samples. The samples of the same class as si are represented by M, and the samples of different classes from si are represented by Q, and the weight wh of each type of hyperparameter h is updated according to formula (1):

wh=wh-diff(h,si,M)/rt+diff(h,si,Q)/rt (1)wh =wh -diff(h,si ,M)/rt+diff(h,si ,Q)/rt (1)

上述公式中,两个样本si与sj(1≤i≠j≤m)在超参h(1≤h≤ph)上的差定义为:In the above formula, the difference between two samples si and sj (1≤i≠j≤m) on the hyperparameter h (1≤h≤ph) is defined as:

若超参h为标量型超参,If the hyperparameter h is a scalar hyperparameter,

Figure GDA0002223355370000062
Figure GDA0002223355370000062

若超参h为数值型超参,If the hyperparameter h is a numeric hyperparameter,

Figure GDA0002223355370000063
Figure GDA0002223355370000063

其中,maxh和minh分别为超参h在样本集中的最大值和最小值。Among them, maxh and minh are the maximum and minimum values of the hyperparameter h in the sample set, respectively.

由公式(1)可知,对于高性能贡献较大的超参应该表现为在异类间差异较大而在同类间差异较小,因此具有区分能力的超参的权值应为正值。为避免一次抽样的随机性,上述过程迭代进行rt>1次。From formula (1), it can be seen that the hyperparameters that contribute more to high performance should show large differences between different types and small differences between similar types, so the weights of hyperparameters with discriminating ability should be positive. In order to avoid the randomness of one sampling, the above process is iteratively performed rt>1 times.

步骤C2、根据上步得到的每类超参的重要性权重排序,我们对位于前k类的超参进行聚类,并计算特征重要性,假设超参样本集为S,T为超参样本集合的大小,K为超参样本所属类的个数,pik表示样本隶属于类k的概率,Ck表示超参样本的实际类标号,C表示超参子集,则在C的重要性度量r(C)可以表示为:Step C2. According to the importance weight ranking of each type of hyperparameter obtained in the previous step, we cluster the hyperparameters located in the top k types and calculate the feature importance, assuming that the hyperparameter sample set is S and T is the hyperparameter sample The size of the set, K is the number of classes to which the hyperparameter samples belong, pik represents the probability that the samples belong to class k, Ck represents the actual class label of the hyperparameter samples, and C represents the hyperparameter subset, then the importance of C The metric r(C) can be expressed as:

Figure GDA0002223355370000072
Figure GDA0002223355370000072

其中F(C)表示在超参集C上聚类的结果与类标签在整个超参样本集上的差异,C代表超参子集,Fi(C)表示各个类内的差异,Xi表示第i个类的超参样本集合。r(C)值越高,聚类结果与实际类标签之间的相关度越大,超参集C对分类的影响越大。where F(C) represents the difference between the result of clustering on the hyperparameter set C and the class label on the entire hyperparameter sample set, C represents the hyperparameter subset, Fi (C) represents the difference within each class, and Xi represents A collection of hyperparameter samples for the ith class. The higher the r(C) value, the greater the correlation between the clustering results and the actual class labels, and the greater the impact of the hyperparameter set C on the classification.

对以上两步迭代m次,选取r(C)最大时对应的超参重要性排序,最后将得到的超参重要性排序结果转而用于指导目标数据集在待评估分类算法的自动化调参过程。Iterate m times for the above two steps, select the corresponding hyperparameter importance ranking when r(C) is the largest, and finally use the obtained hyperparameter importance ranking results to guide the automatic parameter adjustment of the target data set in the classification algorithm to be evaluated. process.

本发明中Relief-Cluster算法的流程图:The flow chart of the Relief-Cluster algorithm in the present invention:

输入:超参数样本集S,超参数类别数hc,取样/迭代次数rtInput: hyperparameter sample set S, number of hyperparameter categories hc, number of samples/iterations rt

输出:聚类评价指标r(C),超参数重要性权重矩阵WOutput: clustering evaluation index r(C), hyperparameter importance weight matrix W

Figure GDA0002223355370000073
Figure GDA0002223355370000073

从S中随机选择一个样本sirandomly select a samplesi from S;

从与si同类的样本中选择与si最近的一个近邻,记为M;Select a nearest neighbor to si from the samples of the same kind as si , denoted as M;

从与si异类的样本中选择与si最近的一个近邻,记为N;Select a nearest neighbor to si from samples that are different from si , denoted as N;

采用公式(1)更新超参重要性权重向量W;Use formula (1) to update the hyperparameter importance weight vector W;

选取大小为X的超参子集;Select a subset of hyperparameters of size X;

在超参子集上对样本聚类;Clustering samples on a subset of hyperparameters;

计算聚类结果与实际结果的相关度r(C)Calculate the correlation r(C) between the clustering results and the actual results

从m个r(C)中选取值最大时对应的超参重要性排序;Select the corresponding hyperparameter importance ranking when the value is the largest from m r(C);

EndEnd

以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the protection scope of this application.

Claims (18)

1. The classification system of the data to be classified based on the machine learning super-parameter importance evaluation is characterized by comprising the following steps:
a historical data set acquisition module configured to: acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-features from each new data set to enable each new data set to be represented by a meta-feature vector;
collecting data of the performance of a classification algorithm to be evaluated under different hyper-parameter configurations from an open machine learning environment OpenML;
storing the meta-feature vector of each new data set and the performance data corresponding to different hyper-parameter configurations in corresponding historical data sets;
a distance sequence acquisition module configured to: extracting meta-feature vectors of the target data set to represent the target data set, calculating the distance between the meta-feature vectors of the target data set and the meta-feature vectors of the historical data sets, and obtaining a distance sequence from near to far between the target data set and each historical data set;
an output module configured to: sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated;
a classification module configured to: and setting a plurality of parameters with the top importance ranking according to the obtained super-parameter importance ranking of the classification algorithm to be evaluated, and then classifying the data to be classified by using the classification algorithm with the set parameters.
2. The system of claim 1, wherein each data set D in the historical data set acquisition moduleiIs described as a vector represented by F meta-features
Figure FDA0002276900670000011
3. The system of claim 1, wherein the meta-features in the historical data set acquisition module include: simple meta-features, statistical meta-features and significance meta-features of the data set;
the simple meta-features include: the number of data set samples, the number of features, the number of categories, or the number of missing values;
statistical meta-features of the data set, including: the kurtosis of the mean, variance, or distance vector;
the importance meta-feature comprises: performance obtained by running a machine learning algorithm on the data set.
4. The system of claim 1, wherein the performance of the classification algorithm to be evaluated in the historical data set acquisition module under different hyper-parameter configurations comprises: misclassification rate or RMSE.
5. The system of claim 1, wherein the distance between meta-feature vectors is used to scale the target data set DN+1With historical data set DiA distance d betweenpn(DN′,Di):
dpn(DN′,Di)=||VN′-Vi||pn
Wherein, VN′Representing a target data set DN′Meta feature vector of (V)iRepresenting a historical data set DiP represents the p-norm;
and comparing the distances between the target data set and the meta-feature vectors of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far.
6. The system of claim 1, wherein,
the weight of each type of hyper-parameter obtained by the Relief algorithm comprises the following steps:
setting a threshold according to the size of performance data under different super-parameter configurations, dividing the performance data corresponding to different super-parameter configurations in a historical data set into high-performance samples and low-performance samples, and randomly selecting a sample s from the performance data by a Relief algorithmiThen, a distance s is selected from each of the high-performance samples and the low-performance samplesiThe most recent sample;
and siHomogeneous samples sjIs represented by M, with siSamples of different classes sjWeight w of per-class hyperparameter h, denoted by QhUpdating according to equation (1):
wh=wh-diff(h,si,M)/rt+diff(h,si,Q)/rt (1)
diff(h,sim) represents two samples siThe difference from M in the hyperparameter h;
diff(h,siq) represents two samples siThe difference from Q in the hyperparameter h;
two samples siAnd sjThe difference diff (h, s) in the hyperparameter hi,sj) Is defined as:
if the superparameter h is a scalar type superparameter,
Figure FDA0002276900670000021
if the hyperparameter h is a numerical hyperparameter,
Figure FDA0002276900670000022
wherein i is not less than 1 but not more than j and m is not less than 1 but not more than hph,maxhIs the maximum value of the hyperparameter h in the sample set, minhIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, rt >1, sihIs shown in sample siValue of upper parameter h, sjhIs shown in sample sjThe value of the upper parameter h.
7. The classification system of the data to be classified based on the machine learning super-parameter importance evaluation is characterized by comprising the following steps: a memory, a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of:
step (1): acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-features from each new data set to enable each new data set to be represented by a meta-feature vector;
collecting data of the performance of a classification algorithm to be evaluated under different hyper-parameter configurations from an open machine learning environment OpenML;
storing the meta-feature vector of each new data set and the performance data corresponding to different hyper-parameter configurations in corresponding historical data sets;
step (2): extracting meta-feature vectors of the target data set to represent the target data set, calculating the distance between the meta-feature vectors of the target data set and the meta-feature vectors of the historical data sets, and obtaining a distance sequence from near to far between the target data set and each historical data set;
and (3): sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated;
and (4): and setting a plurality of parameters with the top importance ranking according to the obtained super-parameter importance ranking of the classification algorithm to be evaluated, and then classifying the data to be classified by using the classification algorithm with the set parameters.
8. The system of claim 7, wherein in step (1), each data set DiIs described as a vector represented by F meta-features
Figure FDA0002276900670000031
9. The system of claim 7, wherein in step (1), the meta-features comprise: simple meta-features, statistical meta-features and significance meta-features of the data set;
the simple meta-features include: the number of data set samples, the number of features, the number of categories, or the number of missing values;
statistical meta-features of the data set, including: the kurtosis of the mean, variance, or distance vector;
the importance meta-feature comprises: performance obtained by running a machine learning algorithm on the data set.
10. The system of claim 7, wherein the performance of the classification algorithm to be evaluated in step (1) under different hyper-parameter configurations comprises: misclassification rate or RMSE.
11. The system of claim 7, wherein the distance between meta feature vectors is used to scale the target data set DN+1With historical data set DiA distance d betweenpn(DN′,Di):
dpn(DN′,Di)=||VN′-Vi||pn
Wherein, VN′Representing a target data set DN′Meta feature vector of (V)iRepresenting a historical data set DiP represents the p-norm;
and comparing the distances between the target data set and the meta-features of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far.
12. The system of claim 7, wherein,
the weight of each type of hyper-parameter obtained by the Relief algorithm comprises the following steps:
setting a threshold according to the size of performance data under different super-parameter configurations, dividing the performance data corresponding to different super-parameter configurations in a historical data set into high-performance samples and low-performance samples, and randomly selecting a sample s from the performance data by a Relief algorithmiThen, a distance s is selected from each of the high-performance samples and the low-performance samplesiThe most recent sample;
and siHomogeneous samples sjIs represented by M, with siSamples of different classes sjWeight w of per-class hyperparameter h, denoted by QhUpdating according to equation (1):
wh=wh-diff(h,si,M)/rt+diff(h,si,Q)/rt (1)
diff(h,sim) represents two samples siThe difference from M in the hyperparameter h;
diff(h,siq) represents two samples siThe difference from Q in the hyperparameter h;
two samples siAnd sjThe difference diff (h, s) in the hyperparameter hi,sj) Is defined as:
if the superparameter h is a scalar type superparameter,
Figure FDA0002276900670000041
if the hyperparameter h is a numerical hyperparameter,
Figure FDA0002276900670000042
wherein i is not less than 1 and not more than j and not more than m, h is not less than 1 and not more than ph, maxhIs the maximum value of the hyperparameter h in the sample set, minhIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, rt >1, sihIs shown in sample siValue of upper parameter h, sjhIs shown in sample sjThe value of the upper parameter h.
13. A computer readable storage medium having computer instructions embodied thereon, said computer instructions when executed by a processor performing the steps of:
step (1): acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-features from each new data set to enable each new data set to be represented by a meta-feature vector;
collecting data of the performance of a classification algorithm to be evaluated under different hyper-parameter configurations from an open machine learning environment OpenML;
storing the meta-feature vector of each new data set and the performance data corresponding to different hyper-parameter configurations in corresponding historical data sets;
step (2): extracting meta-feature vectors of the target data set to represent the target data set, calculating the distance between the meta-feature vectors of the target data set and the meta-feature vectors of the historical data sets, and obtaining a distance sequence from near to far between the target data set and each historical data set;
and (3): sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated;
and (4): and setting a plurality of parameters with the top importance ranking according to the obtained super-parameter importance ranking of the classification algorithm to be evaluated, and then classifying the data to be classified by using the classification algorithm with the set parameters.
14. The medium of claim 13, wherein in step (1), each data set DiIs described as a vector represented by F meta-features
Figure FDA0002276900670000051
15. The medium of claim 13, wherein in step (1), the meta-feature comprises: simple meta-features, statistical meta-features and significance meta-features of the data set;
the simple meta-features include: the number of data set samples, the number of features, the number of categories, or the number of missing values;
statistical meta-features of the data set, including: the kurtosis of the mean, variance, or distance vector;
the importance meta-feature comprises: performance obtained by running a machine learning algorithm on the data set.
16. The medium of claim 13, wherein the performance of the classification algorithm under evaluation in step (1) under different hyper-parameter configurations comprises: misclassification rate or RMSE.
17. The medium of claim 13, wherein the distance between meta feature vectors is used to scale the target data set DN+1With historical data set DiA distance d betweenpn(DN′,Di):
dpn(DN′,Di)=||VN′-Vi||pn
Wherein, VN′Representing a target data set DN′Meta feature vector of (V)iRepresenting the number of historiesData set DiP represents the p-norm;
and comparing the distances between the target data set and the meta-feature vectors of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far.
18. The medium of claim 13, wherein the weight for each type of hyperparameter obtained by the Relief algorithm comprises:
setting a threshold according to the size of performance data under different super-parameter configurations, dividing the performance data corresponding to different super-parameter configurations in a historical data set into high-performance samples and low-performance samples, and randomly selecting a sample s from the performance data by a Relief algorithmiThen, a distance s is selected from each of the high-performance samples and the low-performance samplesiThe most recent sample;
and siHomogeneous samples sjIs represented by M, with siSamples of different classes sjWeight w of per-class hyperparameter h, denoted by QhUpdating according to equation (1):
wh=wh-diff(h,si,M)/rt+diff(h,si,Q)/rt (1)
diff(h,sim) represents two samples siThe difference from M in the hyperparameter h;
diff(h,siq) represents two samples siThe difference from Q in the hyperparameter h;
two samples siAnd sjThe difference diff (h, s) in the hyperparameter hi,sj) Is defined as:
if the superparameter h is a scalar type superparameter,
if the hyperparameter h is a numerical hyperparameter,
Figure FDA0002276900670000062
wherein i is not less than 1 and not more than j and not more than m, h is not less than 1 and not more than ph, maxhIs the maximum value of the hyperparameter h in the sample set, minhIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, rt >1, sihIs shown in sample siValue of upper parameter h, sjhIs shown in sample sjThe value of the upper parameter h.
CN201810270934.5A2018-03-292018-03-29 Method, system and storage medium for evaluating the importance of machine learning hyperparametersExpired - Fee RelatedCN108446741B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810270934.5ACN108446741B (en)2018-03-292018-03-29 Method, system and storage medium for evaluating the importance of machine learning hyperparameters

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810270934.5ACN108446741B (en)2018-03-292018-03-29 Method, system and storage medium for evaluating the importance of machine learning hyperparameters

Publications (2)

Publication NumberPublication Date
CN108446741A CN108446741A (en)2018-08-24
CN108446741Btrue CN108446741B (en)2020-01-07

Family

ID=63197670

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810270934.5AExpired - Fee RelatedCN108446741B (en)2018-03-292018-03-29 Method, system and storage medium for evaluating the importance of machine learning hyperparameters

Country Status (1)

CountryLink
CN (1)CN108446741B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP6892424B2 (en)*2018-10-092021-06-23株式会社Preferred Networks Hyperparameter tuning methods, devices and programs
CN109447277B (en)*2018-10-192023-11-10厦门渊亭信息科技有限公司Universal machine learning super-ginseng black box optimization method and system
CN109460825A (en)*2018-10-242019-03-12阿里巴巴集团控股有限公司For constructing the Feature Selection Algorithms, device and equipment of machine learning model
CN111160459A (en)*2019-12-302020-05-15上海依图网络科技有限公司Device and method for optimizing hyper-parameters
CN111260243A (en)*2020-02-102020-06-09京东数字科技控股有限公司Risk assessment method, device, equipment and computer readable storage medium
CN111401567A (en)*2020-03-202020-07-10厦门渊亭信息科技有限公司Universal deep learning hyper-parameter optimization method and device
CN111539536B (en)*2020-06-192020-10-23支付宝(杭州)信息技术有限公司Method and device for evaluating service model hyper-parameters
CN111917648B (en)*2020-06-302021-10-26华南理工大学Transmission optimization method for rearrangement of distributed machine learning data in data center
CN113760188A (en)*2021-07-302021-12-07浪潮电子信息产业股份有限公司Parameter adjusting and selecting method, system and device for distributed storage system
CN114490094B (en)*2022-04-182022-07-12北京麟卓信息科技有限公司GPU (graphics processing Unit) video memory allocation method and system based on machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105531725A (en)*2013-06-282016-04-27D-波系统公司 Systems and methods for quantum processing of data
CN105701509A (en)*2016-01-132016-06-22清华大学Image classification method based on cross-type migration active learning
CN106295682A (en)*2016-08-022017-01-04厦门美图之家科技有限公司A kind of judge the method for the picture quality factor, device and calculating equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CA2542937A1 (en)*2003-07-012005-01-13Cardiomag Imaging, Inc. (Cmi)Machine learning for classification of magneto cardiograms
CN106203432B (en)*2016-07-142020-01-17杭州健培科技有限公司Positioning system of region of interest based on convolutional neural network significance map

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105531725A (en)*2013-06-282016-04-27D-波系统公司 Systems and methods for quantum processing of data
CN105701509A (en)*2016-01-132016-06-22清华大学Image classification method based on cross-type migration active learning
CN106295682A (en)*2016-08-022017-01-04厦门美图之家科技有限公司A kind of judge the method for the picture quality factor, device and calculating equipment

Also Published As

Publication numberPublication date
CN108446741A (en)2018-08-24

Similar Documents

PublicationPublication DateTitle
CN108446741B (en) Method, system and storage medium for evaluating the importance of machine learning hyperparameters
US12210917B2 (en)Systems and methods for quickly searching datasets by indexing synthetic data generating models
US20220391767A1 (en)System and method for relational time series learning with the aid of a digital computer
US10013636B2 (en)Image object category recognition method and device
US20190340533A1 (en)Systems and methods for preparing data for use by machine learning algorithms
CN111553127B (en) A multi-label text data feature selection method and device
JP5521881B2 (en) Image identification information addition program and image identification information addition device
WO2019015246A1 (en)Image feature acquisition
CN106779087A (en)A kind of general-purpose machinery learning data analysis platform
CN110737805B (en)Method and device for processing graph model data and terminal equipment
US11971892B2 (en)Methods for stratified sampling-based query execution
CN110008259A (en)The method and terminal device of visualized data analysis
CN111027636B (en)Unsupervised feature selection method and system based on multi-label learning
Yang et al.A feature-metric-based affinity propagation technique for feature selection in hyperspectral image classification
WO2018036547A1 (en)Data processing method and device thereof
JP2020053073A (en)Learning method, learning system, and learning program
US20160019267A1 (en)Using data mining to produce hidden insights from a given set of data
CN111125469A (en) A kind of user clustering method, device and computer equipment of social network
CN110516950A (en) A Risk Analysis Method Oriented to Entity Resolution Task
WO2020024444A1 (en)Group performance grade recognition method and apparatus, and storage medium and computer device
CN111782805A (en) A text label classification method and system
De Silva et al.Recursive hierarchical clustering algorithm
Sivakumar et al.A hybrid text classification approach using KNN and SVM
CN113961808B (en) Recommended ways to increase diversity
CN116229330A (en)Method, system, electronic equipment and storage medium for determining video effective frames

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20200107

CF01Termination of patent right due to non-payment of annual fee

[8]ページ先頭

©2009-2025 Movatter.jp