CN117853180A

Movatterモバイル変換

Info

Publication number: CN117853180A
Application number: CN202311841451.3A
Authority: CN
Inventors: 黄晶; 宋洁; 张平文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-09

Abstract

The invention discloses a data pricing method, a data pricing device, computing equipment and a storage medium based on reinforcement learning. According to the technical scheme provided by the invention, the original data of a plurality of data providers are acquired to generate a data set; constructing a sample cost function and a feature cost function, and initializing the sample cost function and the feature cost function; determining training data participating in the training of a prediction model and training to obtain a predictor; calculating a predicted value based on the verification data, namely, a reward value as a current sample cost function and a characteristic cost function; performing iterative computation by adopting a gradient descent method, and determining parameters in a current sample cost function and a characteristic cost function and corresponding loss functions; parameters in the two cost functions are output according to the conditions and the data value is calculated. The invention can make the complexity of the value calculation independent of the size of the training set, and is very simple and accurate; and parameters and loss functions in the two cost functions are obtained through a gradient descent method, and finally more reasonable data value is obtained.

Description

Translated fromChinese

一种基于强化学习的数据定价方法、装置、设备和介质A data pricing method, device, equipment and medium based on reinforcement learning

技术领域Technical Field

本发明涉及数据交易领域，特别涉及一种基于强化学习的数据定价方法、装置、计算设备和计算机存储介质。The present invention relates to the field of data trading, and in particular to a data pricing method, device, computing equipment and computer storage medium based on reinforcement learning.

背景技术Background technique

随着大数据、机器学习、人工智能等技术的持续发展，这些技术已经逐步深入到社会经济生活的各个方面。许多企业都在此基础上推进数字化转型，而在这一过程中，这些企业都积累了大量数据，这些数据成为了它们宝贵的资产。但出于对数据隐私保护和企业资产保护的考虑，企业通常都不愿意直接出售源数据，但同时又希望能够利用其他企业的数据来改善自身采用的相关数据模型。因此，基于协同训练的场景成为了未来数据交易的一个重要方向。协同训练是指通过多个企业共同训练一个模型的方式，每个企业贡献出自己的数据，从而获得更好的模型效果。基于协同训练的这种数据交易方式不仅可以提高数据的使用效率，还可以通过联邦学习等手段保护企业的数据隐私和资产。基于此，协同训练将在数据交易中扮演越来越重要的角色。With the continuous development of technologies such as big data, machine learning, and artificial intelligence, these technologies have gradually penetrated into all aspects of social and economic life. Many companies are promoting digital transformation on this basis. In the process, these companies have accumulated a large amount of data, which has become their valuable assets. However, due to the consideration of data privacy protection and corporate asset protection, companies are usually reluctant to sell source data directly, but at the same time hope to use the data of other companies to improve the relevant data models they adopt. Therefore, scenarios based on collaborative training have become an important direction for future data transactions. Collaborative training refers to the method of jointly training a model through multiple companies, and each company contributes its own data to obtain better model effects. This data transaction method based on collaborative training can not only improve the efficiency of data use, but also protect the data privacy and assets of enterprises through means such as federated learning. Based on this, collaborative training will play an increasingly important role in data transactions.

在协同训练的交易场景中，由于来自于不同数据所有者的数据集对最后的机器学习模型有不同的贡献，因此需要一种算法来评估各个数据所有者的数据集对模型的贡献，以此为基础进一步进行收入分配。基于协同训练场景的数据交易定价就是为了解决收入分配问题。In the collaborative training transaction scenario, since the data sets from different data owners have different contributions to the final machine learning model, an algorithm is needed to evaluate the contribution of each data owner's data set to the model, and further distribute income based on this. Data transaction pricing based on collaborative training scenarios is to solve the income distribution problem.

目前，为了量化每个数据集的价值，一种的方式是将加入该数据集后训练得到的模型性能增益作为该数据集的价值，即LOO方法(Leave-One-Out，留一法)。但此方法存在两个问题，首先如果存在两个相同的数据集，利用LOO方法计算的数据集价值就为0；其次，由于数据量过少会使数据集对整体模型的性能增益忽略不计，会使此种情况下LOO方法进行评估的数据价值结果基本均为0，因此，LOO方法无法对单个数据样本或数据量较少的数据集进行价值评估。另一种方法是利用Data-Shapley算法，将数据样本或数据集视为联合博弈中的参与者，每个待评估的数据样本或数据集对应一个参与者，基于部分数据集合训练的到的模型性能对应这些参与者合作产生的奖励。但采用Data-Shapley算法会使得计算复杂度随着样本集合的增大而指数级地增长，在面对复杂样本时计算效率过低。At present, in order to quantify the value of each data set, one way is to use the model performance gain obtained after adding the data set as the value of the data set, that is, the LOO method (Leave-One-Out). However, there are two problems with this method. First, if there are two identical data sets, the value of the data set calculated using the LOO method is 0; second, because the amount of data is too small, the performance gain of the data set to the overall model will be negligible, which will make the data value evaluated by the LOO method in this case basically 0. Therefore, the LOO method cannot evaluate the value of a single data sample or a data set with a small amount of data. Another method is to use the Data-Shapley algorithm, which regards the data sample or data set as a participant in a joint game. Each data sample or data set to be evaluated corresponds to a participant, and the model performance trained based on a partial data set corresponds to the reward generated by the cooperation of these participants. However, the use of the Data-Shapley algorithm will cause the computational complexity to increase exponentially with the increase of the sample set, and the computational efficiency is too low when facing complex samples.

近一段时间，Yoon等人提出了一个元学习框架DVRL(Data Valuation UsingReinforcement Learning，利用强化学习的数据价值评估)，通过一个数据估值计算器(DVE，Data Value Estimator)选择“有价值”的数据进行训练，以数据点被选择的“概率”作为该数据点的“价值”。DVRL在针对多个数据集进行价值评估的场景下，效果比LOO和基于联合博弈的Shapley系列算法更优。然而，当前DVRL只解决了在数据拥有方各自拥有一部分样本和全部特征的场景下进行数据定价的问题，现实中还包括其他情况。例如，在担保行业企业授信场景下，多个数据拥有方全部样本的一部分特征；在营销场景下，每家公司则通常只拥有一部分样本和一部分特征。而当前DVRL无法有效针对这两种情形进行评估。Recently, Yoon et al. proposed a meta-learning framework DVRL (Data Valuation Using Reinforcement Learning), which selects "valuable" data for training through a data valuation calculator (DVE, Data Value Estimator), and uses the "probability" of a data point being selected as the "value" of the data point. DVRL is better than LOO and the Shapley series of algorithms based on joint games in the scenario of value evaluation for multiple data sets. However, the current DVRL only solves the problem of data pricing in the scenario where the data owners each own a part of the samples and all the features, and there are other situations in reality. For example, in the corporate credit scenario of the guarantee industry, multiple data owners have a part of the features of all samples; in the marketing scenario, each company usually only owns a part of the samples and a part of the features. The current DVRL cannot effectively evaluate these two situations.

发明内容Summary of the invention

为了解决上述技术问题，本发明提供了一种基于强化学习的数据定价方法和相应的基于强化学习的数据定价装置、计算设备和计算机存储介质。In order to solve the above technical problems, the present invention provides a data pricing method based on reinforcement learning and a corresponding data pricing device, computing equipment and computer storage medium based on reinforcement learning.

根据本发明的一个方面，提供了一种基于强化学习的数据定价方法，所述方法包括：According to one aspect of the present invention, a data pricing method based on reinforcement learning is provided, the method comprising:

获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合；Obtaining original data from multiple data providers, and generating a data set based on the obtained original data;

构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化；Construct sample value function and feature value function, and initialize the sample value function and feature value function;

根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值；According to the sample value function and the feature value function, determine the training data involved in the prediction model training; train the prediction model according to the training data to obtain a predictor; input the verification data into the predictor, calculate the estimated value, and use the estimated value as the reward value of the current sample value function and the feature value function;

采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数；Using the gradient descent method for iterative calculation, determining the value of the first parameter in the current sample value function and the second parameter in the current feature value function based on the reward value, and determining the loss function corresponding to the first parameter and the second parameter;

在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。When the preset conditions are met, the first parameter in the sample value function and the second parameter in the feature value function are output; based on the first parameter and the second parameter, the data value is calculated.

上述方案中，所述获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合，进一步包括：In the above solution, the step of obtaining the original data from multiple data providers and generating a data set based on the obtained original data further includes:

获取多个数据提供方的原始数据；Obtain raw data from multiple data providers;

将原始数据基于样本维度和特征维度进行划分；Divide the original data based on sample dimension and feature dimension;

基于划分结果和原始数据，生成矩阵形式的数据集合。Based on the partitioning results and the original data, a data set in matrix form is generated.

上述方案中，所述构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化，进一步包括：In the above scheme, the step of constructing a sample value function and a feature value function and initializing the sample value function and the feature value function further includes:

样本价值函数中包含第一参数，所述第一参数为一个n×1的向量，其中，n为数据集合中样本的个数，也是数据集合的行数；The sample value function includes a first parameter, which is an n×1 vector, where n is the number of samples in the data set, which is also the number of rows in the data set;

特征价值函数中包含第二参数，所述第二参数为一个m×1的向量，其中，m为数据集合中特征的个数，也是数据集合的列数。The feature value function includes a second parameter, which is an m×1 vector, where m is the number of features in the data set, which is also the number of columns in the data set.

上述方案中，所述根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器，进一步包括：In the above scheme, the method further comprises: determining the training data involved in the training of the prediction model according to the sample value function and the feature value function; and training the prediction model according to the training data to obtain the predictor.

根据样本价值函数，筛选出参与预测模型训练的训练样本；According to the sample value function, the training samples participating in the prediction model training are screened out;

根据特征价值函数，筛选出参与预测模型训练的训练特征；According to the feature value function, the training features involved in the prediction model training are selected;

基于训练样本和训练特征确定出参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器。Based on the training samples and training features, training data involved in the prediction model training is determined; the prediction model is trained according to the training data to obtain a predictor.

上述方案中，所述验证数据由场景使用方确定，或者，从数据集合中进行抽样得到验证数据。In the above solution, the verification data is determined by the scenario user, or the verification data is obtained by sampling from a data set.

上述方案中，所述采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数，进一步包括：In the above scheme, the gradient descent method is used for iterative calculation, the value of the first parameter in the current sample value function and the value of the second parameter in the current feature value function are determined based on the reward value, and the loss function corresponding to the first parameter and the second parameter is determined, further comprising:

基于梯度下降法，多次重复训练数据的确定过程和奖励值的计算，得到多个奖励值，确定每次迭代时的第一参数及第二参数对应的损失函数；其中，所述损失函数为：Based on the gradient descent method, the process of determining the training data and calculating the reward value is repeated multiple times to obtain multiple reward values, and the loss function corresponding to the first parameter and the second parameter in each iteration is determined; wherein the loss function is:

J(W₁,W₂)＝p₁(T₁(1)|W₁)p₂(T₂(1)|W₂)R(1)J(W₁ ,W₂ )＝p₁ (T₁ (1)|W₁ )p₂ (T₂ (1)|W₂ )R(1)

+p₁(T₁(2)|W₁)p₂(T₂(2)|W₂)R(2)+…+p₁ (T₁ (2)|W₁ )p₂ (T₂ (2)|W₂ )R(2)+…

+p₁(T₁(k)|W₁)p₂(T₂(k)|W₂)R(k)+p₁ (T₁ (k)|W₁ )p₂ (T₂ (k)|W₂ )R(k)

其中，W₁为第一参数；W₂为第二参数；R(k)为第k次的奖励值；T₁(k)为n×1的向量，取值为0或1，若取值为0，则表示本次采样未选取该样本，若取值为1，则表示本次采样选取了该样本；T₂(k)为m×1的向量，取值为0或1，若取值为0，则表示本次采样未选取该特征，若取值为1，则表示本次采样选取了该特征；p₁(T₁(k)|W₁)表示在当前参数为W₁时，采样的向量为T₁(k)的概率；p₂(T₂(k)|W₂)表示在当前参数为W₂时，采样的向量为T₂(k)的概率；并且，Wherein, W₁ is the first parameter; W₂ is the second parameter; R(k) is the kth reward value; T₁ (k) is an n×1 vector, whose value is 0 or 1. If the value is 0, it means that the sample is not selected in this sampling, and if the value is 1, it means that the sample is selected in this sampling; T₂ (k) is an m×1 vector, whose value is 0 or 1. If the value is 0, it means that the feature is not selected in this sampling, and if the value is 1, it means that the feature is selected in this sampling; p₁ (T₁ (k)|W₁ ) represents the probability that the sampled vector is T₁ (k) when the current parameter is W₁ ; p₂ (T₂ (k)|W₂ ) represents the probability that the sampled vector is T₂ (k) when the current parameter is W₂ ; and,

(W₁,W₂)＝(W₁,W₂)+γ·grad(J(W₁,W₂))(W₁ ,W₂ )＝(W₁ ,W₂ )+γ·grad(J(W₁ ,W₂ ))

其中，γ为学习率；grad(J(W₁,W₂))为梯度。Among them, γ is the learning rate; grad(J(W₁ ,W₂ )) is the gradient.

上述方案中，所述预设条件为迭代次数超过次数阈值，或者，第一参数和第二参数的更新幅度小于幅度阈值；In the above solution, the preset condition is that the number of iterations exceeds the number threshold, or the update amplitude of the first parameter and the second parameter is less than the amplitude threshold;

所述基于第一参数和第二参数，计算数据价值，进一步包括：The calculating of the data value based on the first parameter and the second parameter further comprises:

将第一参数代入样本价值函数，第二参数代入特征价值函数，得到样本价值函数的样本输出结果以及特征价值函数的特征输出结果；Substitute the first parameter into the sample value function and the second parameter into the feature value function to obtain the sample output result of the sample value function and the feature output result of the feature value function;

将样本输出结果与特征输出结果相乘，得到数据价值。Multiply the sample output result by the feature output result to get the data value.

根据本发明的另一方面，提供了一种基于强化学习的数据定价装置，包括：获取模块、初始化模块、训练预估模块、迭代模块以及价值计算模块；其中，According to another aspect of the present invention, a data pricing device based on reinforcement learning is provided, comprising: an acquisition module, an initialization module, a training estimation module, an iteration module and a value calculation module; wherein,

所述获取模块，用于获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合；The acquisition module is used to acquire original data from multiple data providers and generate a data set based on the acquired original data;

所述初始化模块，用于构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化；The initialization module is used to construct a sample value function and a feature value function, and initialize the sample value function and the feature value function;

所述训练预估模块，用于根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值；The training estimation module is used to determine the training data involved in the prediction model training according to the sample value function and the feature value function; train the prediction model according to the training data to obtain a predictor; input the verification data into the predictor, calculate the estimated value, and use the estimated value as the reward value of the current sample value function and the feature value function;

所述迭代模块，用于采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数；The iteration module is used to perform iterative calculation using a gradient descent method, determine the values of a first parameter in a current sample value function and a second parameter in a current feature value function based on the reward value, and determine a loss function corresponding to the first parameter and the second parameter;

所述价值计算模块，用于在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。The value calculation module is used to output the first parameter in the sample value function and the second parameter in the feature value function when the preset conditions are met; and calculate the data value based on the first parameter and the second parameter.

根据本发明的又一方面，提供了一种计算设备，包括：处理器、存储器、通信接口和通信总线，所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信；According to another aspect of the present invention, there is provided a computing device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus;

所述存储器用于存放至少一可执行指令，所述可执行指令使所述处理器执行如上述的基于强化学习的数据定价方法对应的操作。The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform operations corresponding to the above-mentioned reinforcement learning-based data pricing method.

根据本发明的再一方面，提供了一种计算机存储介质，所述存储介质中存储有至少一可执行指令，所述可执行指令使处理器执行如上述的基于强化学习的数据定价方法对应的操作。According to yet another aspect of the present invention, a computer storage medium is provided, wherein the storage medium stores at least one executable instruction, wherein the executable instruction enables a processor to perform operations corresponding to the above-mentioned reinforcement learning-based data pricing method.

根据本发明提供的技术方案，获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合；构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化；根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值；采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数；在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。通过样本和特征两个维度来整理原始数据，依据样本价值函数和特征价值函数，基于数据点被选择的概率来表征数据点的价值，其计算复杂度不依赖于训练集的大小，十分简洁且准确，使本方法可以扩展到大型数据集以及复杂数据模型；之后，通过梯度下降法对两种价值函数中的参数值进行迭代计算，并得到最小化的损失函数，基于此确定出该参数，计算出对应的样本价值和特征价值，并最终得到更为合理的数据价值。According to the technical solution provided by the present invention, the original data of multiple data providers are obtained, and a data set is generated based on the obtained original data; a sample value function and a feature value function are constructed, and the sample value function and the feature value function are initialized; according to the sample value function and the feature value function, the training data participating in the training of the prediction model is determined; the prediction model is trained according to the training data to obtain a predictor; the verification data is input into the predictor, the estimated value is calculated, and the estimated value is used as the reward value of the current sample value function and the feature value function; the gradient descent method is used for iterative calculation, and the value of the first parameter in the current sample value function and the second parameter in the current feature value function is determined based on the reward value, and the loss function corresponding to the first parameter and the second parameter is determined; when the preset conditions are met, the first parameter in the sample value function and the second parameter in the feature value function are output; based on the first parameter and the second parameter, the data value is calculated. The original data is organized through the two dimensions of samples and features. The value of the data point is characterized based on the probability of the data point being selected according to the sample value function and the feature value function. The computational complexity does not depend on the size of the training set. It is very concise and accurate, so that this method can be extended to large data sets and complex data models. Afterwards, the parameter values in the two value functions are iteratively calculated through the gradient descent method, and the minimized loss function is obtained. Based on this, the parameter is determined, the corresponding sample value and feature value are calculated, and finally a more reasonable data value is obtained.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be described in the following description, and partly become apparent from the description, or understood by practicing the present invention. The purpose and other advantages of the present invention can be realized and obtained by the structures particularly pointed out in the written description, claims, and drawings.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solution of the present invention is further described in detail below through the accompanying drawings and embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制。在附图中：The accompanying drawings are used to provide a further understanding of the present invention and constitute a part of the specification. Together with the embodiments of the present invention, they are used to explain the present invention and do not constitute a limitation of the present invention. In the accompanying drawings:

图1示出了根据本发明一个实施例的一种基于强化学习的数据定价方法的流程示意图；FIG1 is a schematic diagram showing a flow chart of a data pricing method based on reinforcement learning according to an embodiment of the present invention;

图2A示出了根据本发明一个实施例的一种原始数据划分情形；FIG2A shows a situation of original data partitioning according to an embodiment of the present invention;

图2B示出了根据本发明一个实施例的另一种原始数据划分情形；FIG2B shows another original data partitioning scenario according to an embodiment of the present invention;

图2C示出了根据本发明一个实施例的再一种原始数据划分情形；FIG2C shows another original data partitioning scenario according to an embodiment of the present invention;

图3示出了根据本发明一个实施例的一种价值函数的参数确定方法的示意图；FIG3 is a schematic diagram showing a method for determining parameters of a value function according to an embodiment of the present invention;

图4示出了根据本发明一个实施例的数据价值计算中的预测模型训练方法的流程示意图；FIG4 is a schematic diagram showing a flow chart of a prediction model training method in data value calculation according to an embodiment of the present invention;

图5示出了根据本发明另一个实施例的基于强化学习的数据定价方法的流程示意图；FIG5 shows a schematic flow chart of a data pricing method based on reinforcement learning according to another embodiment of the present invention;

图6示出了根据本发明一个实施例的基于强化学习的数据定价装置的结构框图；FIG6 shows a structural block diagram of a data pricing device based on reinforcement learning according to an embodiment of the present invention;

图7示出了根据本发明实施例的一种计算设备的结构示意图。FIG. 7 shows a schematic diagram of the structure of a computing device according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行说明，应当理解，此处所描述的优选实施例仅用于说明和解释本发明，并不用于限定本发明。The preferred embodiments of the present invention are described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, and are not used to limit the present invention.

图1示出了根据本发明一个实施例的一种基于强化学习的数据定价方法的流程示意图，该方法包括如下步骤：FIG1 shows a schematic flow chart of a data pricing method based on reinforcement learning according to an embodiment of the present invention. The method comprises the following steps:

步骤S101，获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合。Step S101: acquiring original data from multiple data providers, and generating a data set based on the acquired original data.

具体的，获取多个数据提供方的原始数据；将原始数据基于样本维度和特征维度进行划分；基于划分结果和原始数据，生成矩阵式的数据集合。Specifically, original data from multiple data providers are obtained; the original data is divided based on sample dimensions and feature dimensions; and a matrix data set is generated based on the division results and the original data.

优选的，基于样本维度和特征维度对多个数据提供方提供的原始数据进行划分，可能出现三种情形，如图2所示：Preferably, the original data provided by multiple data providers are divided based on the sample dimension and the feature dimension, and three situations may occur, as shown in FIG2:

第一种情形如图2A所示，图2A示出了根据本发明一个实施例的一种原始数据划分情形：The first situation is shown in FIG2A , which shows a situation of original data division according to an embodiment of the present invention:

其中，从A1方至An方的多个数据拥有方，各自拥有一部分样本以及全部特征。Among them, multiple data owners from A1 to An each own a part of the samples and all the features.

第二种情形，如图2B所示，图2B示出了根据本发明一个实施例的另一种原始数据划分情形：The second situation is shown in FIG. 2B , which shows another situation of original data division according to an embodiment of the present invention:

其中，从A1方至An方的多个数据拥有方，各自拥有全部样本以及一部分特征。Among them, multiple data owners from A1 to An each own all samples and part of the features.

第三种情形，如图2C所示，图2C示出了根据本发明一个实施例的再一种原始数据划分情形：The third situation is shown in FIG. 2C , which shows another situation of original data division according to an embodiment of the present invention:

其中，从A1方至An方的多个数据拥有方，各自拥有一部分样本以及一部分特征。Among them, multiple data owners from A1 to An each own a portion of samples and a portion of features.

步骤S102，构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化。Step S102, construct a sample value function and a feature value function, and initialize the sample value function and the feature value function.

具体的，样本价值函数Ins Value(W₁)中包含第一参数W₁，所述第一参数W₁为一个n×1的向量，其中，n为数据集合中样本的个数，也是数据集合的行数；Specifically, the sample value function Ins Value (W₁ ) includes a first parameter W₁ , which is an n×1 vector, where n is the number of samples in the data set, and_is also the number of rows in the data set;

优选的，样本价值函数的输出结果也是一个n×1的向量(i₁，i₂，…，i_n)，向量中每个元素取值均在0-1之间。Preferably, the output result of the sample value function is also an n×1 vector (i₁ , i₂ , ...,_in ), and the value of each element in the vector is between 0 and 1.

具体的，特征价值函数Fea Value(W₂)中包含第二参数W₂，所述第二参数W₂为一个m×1的向量，其中，m为数据集合中特征的个数，也是数据集合的列数；Specifically, the feature value function Fea Value (W₂ ) includes a second parameter W₂ , which is an m×1 vector, where m is_the number of features in the data set, and is also the number of columns in the data set;

优选的，特征价值函数的输出结果也是一个m×1的向量(f₁，f₂，…，f_m)，向量中每个元素取值均在0-1之间。Preferably, the output result of the feature value function is also an m×1 vector (f₁ , f₂ , ..., f_m ), and the value of each element in the vector is between 0 and 1.

步骤S103，根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值。Step S103, based on the sample value function and the feature value function, determine the training data involved in the prediction model training; train the prediction model based on the training data to obtain a predictor; input the verification data into the predictor, calculate the estimated value, and use the estimated value as the reward value of the current sample value function and feature value function.

优选地，所述验证数据(Valid Data)由场景使用方确定，或者，从数据集合中进行抽样得到验证数据。Preferably, the verification data (Valid Data) is determined by the scenario user, or the verification data is obtained by sampling from a data set.

步骤S104，采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数。Step S104, using the gradient descent method to perform iterative calculations, determine the values of the first parameter in the current sample value function and the second parameter in the current feature value function based on the reward value, and determine the loss function corresponding to the first parameter and the second parameter.

步骤S105，在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。Step S105, when the preset conditions are met, output the first parameter in the sample value function and the second parameter in the feature value function; calculate the data value based on the first parameter and the second parameter.

图3示出了根据本发明一个实施例的一种价值函数的参数确定方法的示意图，如图3所示，其中，FIG3 shows a schematic diagram of a method for determining parameters of a value function according to an embodiment of the present invention. As shown in FIG3 ,

通过样本价值函数和特征价值函数从数据集中选择用于训练的样本和特征，确定出训练数据以完成训练，生成预测器；基于预测器，生成奖励值；基于奖励值，对样本价值函数和特征价值函数进行调整，确定两种价值函数中的参数，完成迭代，以确定新的训练数据。Samples and features for training are selected from the data set through the sample value function and the feature value function, the training data is determined to complete the training, and a predictor is generated; based on the predictor, a reward value is generated; based on the reward value, the sample value function and the feature value function are adjusted to determine the parameters in the two value functions, and the iteration is completed to determine new training data.

根据本实施例提供的基于强化学习的数据定价方法，获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合；构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化；根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值；采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数；在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。通过样本和特征两个维度来整理原始数据，依据样本价值函数和特征价值函数，基于数据点被选择的概率来表征数据点的价值，其计算复杂度不依赖于训练集的大小，十分简洁且准确，使数据价值评估可以扩展到大型数据集以及复杂数据模型；并进一步的，通过梯度下降法对两种价值函数中的参数值进行迭代计算，并得到最小化的损失函数，基于此确定出该参数，计算出对应的样本价值和特征价值，最终计算得到更为合理的数据价值，可以更好地满足在多个数据提供方的情况下，定量计算每个数据提供方的数据价值的需求。According to the data pricing method based on reinforcement learning provided in this embodiment, the original data of multiple data providers are obtained, and a data set is generated based on the obtained original data; a sample value function and a feature value function are constructed, and the sample value function and the feature value function are initialized; according to the sample value function and the feature value function, the training data participating in the training of the prediction model is determined; the prediction model is trained according to the training data to obtain a predictor; the verification data is input into the predictor, the estimated value is calculated, and the estimated value is used as the reward value of the current sample value function and the feature value function; the gradient descent method is used for iterative calculation, and the value of the first parameter in the current sample value function and the second parameter in the current feature value function is determined based on the reward value, and the loss function corresponding to the first parameter and the second parameter is determined; when the preset conditions are met, the first parameter in the sample value function and the second parameter in the feature value function are output; based on the first parameter and the second parameter, the data value is calculated. The original data is organized through the two dimensions of samples and features, and the value of the data point is characterized based on the probability of the data point being selected according to the sample value function and the feature value function. The computational complexity does not depend on the size of the training set, and is very concise and accurate, so that the data value assessment can be extended to large data sets and complex data models; and further, the parameter values in the two value functions are iteratively calculated through the gradient descent method, and the minimized loss function is obtained. Based on this, the parameter is determined, and the corresponding sample value and feature value are calculated. Finally, a more reasonable data value is calculated, which can better meet the needs of quantitatively calculating the data value of each data provider in the case of multiple data providers.

图4示出了根据本发明一个实施例的数据价值计算中的预测模型训练方法的流程示意图，如图4所示，该方法包括如下步骤：FIG4 shows a flow chart of a prediction model training method in data value calculation according to an embodiment of the present invention. As shown in FIG4 , the method includes the following steps:

步骤S401，根据样本价值函数，筛选出参与预测模型训练的训练样本。Step S401, screening out training samples for participating in prediction model training according to the sample value function.

具体的，样本价值函数的输出为(i₁，i₂，…，i_n)，其中，i_x即表示第x行的样本参与训练的概率(该概率取值范围为0-1)；根据该概率来决定是否选择数据集合中第x行的样本来参与预测模型的训练。Specifically, the output of the sample value function is (i₁ , i₂ , …, i_n ), where i_x represents the probability that the sample in the x-th row participates in the training (the probability ranges from 0 to 1); based on the probability, it is decided whether to select the sample in the x-th row in the data set to participate in the training of the prediction model.

步骤S402，根据特征价值函数，筛选出参与预测模型训练的训练特征。Step S402: Filter out training features that participate in prediction model training according to the feature value function.

具体的，特征价值函数的输出为(f₁，f₂，…，f_m)，其中，f_y即表示第y列的特征参与训练的概率(该概率取值范围为0-1)；根据该概率来决定是否选择数据集合中第y列的特征来参与预测模型的训练。Specifically, the output of the feature value function is (f₁ , f₂ , …, f_m ), where f_y represents the probability that the feature in the yth column participates in the training (the probability ranges from 0 to 1); based on the probability, it is decided whether to select the feature in the yth column in the data set to participate in the training of the prediction model.

步骤S403，基于训练样本和训练特征确定出参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器。Step S403: determine the training data involved in the prediction model training based on the training samples and the training features; train the prediction model according to the training data to obtain a predictor.

优选地，由于数据集合中，由样本确定数据集合中的行，由特征确定数据集合中的列，基于步骤S401及步骤S402选定的样本和特征，从数据集合中确定具体的训练数据。例如，步骤S401样本确定了第2行和第4行的样本参与预测模型训练，步骤S402确定了第1列和第3列的特征参与预测模型训练，则最终选择第2行第1列、第2行第3列、第4行第1列以及第4行第3列的数据作为训练数据，进行预测模型的训练；Preferably, since in the data set, the rows in the data set are determined by the samples, and the columns in the data set are determined by the features, specific training data is determined from the data set based on the samples and features selected in step S401 and step S402. For example, in step S401, the samples in the 2nd and 4th rows are determined to participate in the prediction model training, and in step S402, the features in the 1st and 3rd columns are determined to participate in the prediction model training, then the data in the 2nd row and 1st column, the 2nd row and 3rd column, the 4th row and 1st column, and the 4th row and 3rd column are finally selected as the training data to train the prediction model;

优选地，对预测模型进行训练可以采用神经网络算法、决策树算法等多种算法，在此不做限定。Preferably, the prediction model can be trained using a variety of algorithms such as a neural network algorithm and a decision tree algorithm, which are not limited here.

根据上述方法，可以通过样本价值函数和特征价值函数，基于两种价值函数输出结果中的参与训练的概率，确定实际参与训练的行和列，从数据集合中具体筛选参与预测模型训练的训练数据，简洁且直观地完成训练数据的筛选，有效地提升了训练数据确定过程的效率。According to the above method, the sample value function and the feature value function can be used to determine the rows and columns that actually participate in the training based on the probability of participating in the training in the output results of the two value functions, and the training data participating in the prediction model training can be specifically screened from the data set. The screening of training data can be completed concisely and intuitively, which effectively improves the efficiency of the training data determination process.

如图5所示，该方法包括如下步骤：As shown in FIG5 , the method comprises the following steps:

步骤S501，确定样本价值函数中的第一参数和特征价值函数中的第二参数。Step S501, determining a first parameter in a sample value function and a second parameter in a feature value function.

步骤S502，根据第一参数和第二参数确定训练数据，并完成预测模型训练得到预测器；利用预测器，获取预估值作为样本价值函数和特征价值函数的奖励值。Step S502, determining training data according to the first parameter and the second parameter, and completing the prediction model training to obtain a predictor; using the predictor, obtaining an estimated value as a reward value of a sample value function and a feature value function.

步骤S503，依据奖励值利用梯度下降法更新第一参数和第二参数，并计算损失函数。Step S503, updating the first parameter and the second parameter using the gradient descent method according to the reward value, and calculating the loss function.

具体的，基于梯度下降法，多次重复训练数据的确定过程和奖励值的计算，得到多个奖励值，确定每次迭代时的第一参数及第二参数对应的损失函数。Specifically, based on the gradient descent method, the process of determining the training data and calculating the reward value is repeated multiple times to obtain multiple reward values, and the loss function corresponding to the first parameter and the second parameter in each iteration is determined.

优选的，损失函数为：Preferably, the loss function is:

其中，W₁为第一参数；W₂为第二参数；R(k)为第k次的奖励值；T₁(k)为n×1的向量，取值为0或1，若取值为0，则表示本次采样未选取该样本，若取值为1，则表示本次采样选取了该样本；T₂(k)为m×1的向量，取值为0或1，若取值为0，则表示本次采样未选取该特征，若取值为1，则表示本次采样选取了该特征；p₁(T₁(k)|W₁)表示在当前参数为W₁时，采样的向量为T₁(k)的概率；p₂(T₂(k)|W₂)表示在当前参数为W₂时，采样的向量为T₂(k)的概率；其中，Wherein, W₁ is the first parameter; W₂ is the second parameter; R(k) is the kth reward value; T₁ (k) is an n×1 vector, whose value is 0 or 1. If the value is 0, it means that the sample is not selected in this sampling; if the value is 1, it means that the sample is selected in this sampling; T₂ (k) is an m×1 vector, whose value is 0 or 1. If the value is 0, it means that the feature is not selected in this sampling; if the value is 1, it means that the feature is selected in this sampling; p₁ (T₁ (k)|W₁ ) represents the probability that the sampled vector is T₁ (k) when the current parameter is W₁ ; p₂ (T₂ (k)|W₂ ) represents the probability that the sampled vector is T₂ (k) when the current parameter is W₂ ; Wherein,

由此，逐步调整第一参数及第二参数。Thus, the first parameter and the second parameter are gradually adjusted.

步骤S504，判断更新后的第一参数和第二参数是否满足预设条件。Step S504, determining whether the updated first parameter and second parameter meet preset conditions.

具体的，若是，则执行步骤S505；若否，则执行步骤S502。Specifically, if yes, execute step S505; if no, execute step S502.

优选的，预设条件可以为迭代次数，或者，第一参数和第二参数的更新幅度；即，Preferably, the preset condition may be the number of iterations, or the update amplitude of the first parameter and the second parameter; that is,

判断当前已进行的迭代次数是否大于预设迭代次数；Determine whether the current number of iterations is greater than the preset number of iterations;

若是，则判定满足预设条件；若否，则判定不满足预设条件；例如，预设迭代次数为8，则当迭代次数超过8时，则判定满足预设条件。If so, it is determined that the preset condition is met; if not, it is determined that the preset condition is not met; for example, the preset number of iterations is 8, then when the number of iterations exceeds 8, it is determined that the preset condition is met.

或者，判断当前第一参数和第二参数对比于上一次迭代确定的第一参数和第二参数，更新幅度是否全部小于预设幅度阈值；Alternatively, it is determined whether the update amplitudes of the current first parameter and the second parameter compared with the first parameter and the second parameter determined in the previous iteration are all smaller than a preset amplitude threshold;

若是，则判定满足预设条件；若否，则判定不满足预设条件。If so, it is determined that the preset condition is met; if not, it is determined that the preset condition is not met.

步骤S505，输出当前样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。Step S505, output the first parameter in the current sample value function and the second parameter in the feature value function; calculate the data value based on the first parameter and the second parameter.

优选的，将第一参数代入样本价值函数，第二参数代入特征价值函数，得到样本价值函数的样本输出结果以及特征价值函数的特征输出结果；Preferably, the first parameter is substituted into the sample value function, and the second parameter is substituted into the feature value function to obtain a sample output result of the sample value function and a feature output result of the feature value function;

例如，若根据第一参数，从数据集合中确定的样本编号为1、3和7，根据第二参数，确定的特征编号为5和8，则样本价值函数的输出结果为(i₁，i₃，i₇)，特征价值函数的输出结果为(f₅，f₈)；基于此，计算数据则为(i₁，i₃，i₇)×(f₅，f₈)。For example, if the sample numbers determined from the data set according to the first parameter are 1, 3 and 7, and the feature numbers determined according to the second parameter are 5 and 8, then the output result of the sample value function is (i₁ , i₃ , i₇ ), and the output result of the feature value function is (f₅ , f₈ ); based on this, the calculated data is (i₁ , i₃ , i₇ )×(f₅ , f₈ ).

根据上述方法，可以通过梯度下降法对两种价值函数中的参数值进行迭代计算，得到最优的参数值和最小化的损失函数，并据此计算出对应的样本价值和特征价值，同时，经过对迭代次数或参数更新幅度的判断，确定何时停止迭代，既保证了参数值计算的收敛情况，也兼顾了计算效率，并最终得到科学合理的数据价值。According to the above method, the parameter values in the two value functions can be iteratively calculated through the gradient descent method to obtain the optimal parameter value and the minimized loss function, and the corresponding sample value and feature value can be calculated accordingly. At the same time, by judging the number of iterations or the parameter update amplitude, it is determined when to stop the iteration, which not only ensures the convergence of the parameter value calculation, but also takes into account the calculation efficiency, and finally obtains a scientific and reasonable data value.

图6示出了根据本发明一个实施例的基于强化学习的数据定价装置的结构框图，如图6所示，该系统包括：获取模块601、初始化模块602、训练预估模块603、迭代模块604以及价值计算模块605；其中，FIG6 shows a structural block diagram of a data pricing device based on reinforcement learning according to an embodiment of the present invention. As shown in FIG6 , the system includes: an acquisition module 601, an initialization module 602, a training estimation module 603, an iteration module 604, and a value calculation module 605; wherein,

所述获取模块601，用于获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合。The acquisition module 601 is used to acquire original data from multiple data providers and generate a data set based on the acquired original data.

具体的，所述获取模块601，进一步用于，Specifically, the acquisition module 601 is further used to:

所述初始化模块602，用于构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化。The initialization module 602 is used to construct a sample value function and a feature value function, and initialize the sample value function and the feature value function.

具体的，所述初始化模块602，进一步用于，Specifically, the initialization module 602 is further used to:

所述训练预估模块603，用于根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值。The training estimation module 603 is used to determine the training data involved in the prediction model training according to the sample value function and the feature value function; train the prediction model according to the training data to obtain a predictor; input the verification data into the predictor, calculate the estimated value, and use the estimated value as the reward value of the current sample value function and feature value function.

具体的，所述训练预估模块603，进一步用于，Specifically, the training estimation module 603 is further used to:

根据样本价值函数，筛选出参与预测模型测试的训练样本；According to the sample value function, the training samples participating in the prediction model test are screened out;

根据特征价值函数，筛选出参与预测模型测试的训练特征；According to the feature value function, the training features involved in the prediction model test are selected;

所述验证数据由场景使用方确定，或者，从数据集合中进行抽样得到验证数据。The verification data is determined by the scenario user, or the verification data is obtained by sampling from a data set.

所述迭代模块604，用于采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数。The iteration module 604 is used to perform iterative calculations using a gradient descent method, determine the values of a first parameter in a current sample value function and a second parameter in a current feature value function based on the reward value, and determine a loss function corresponding to the first parameter and the second parameter.

具体的，所述迭代模块604，进一步用于，Specifically, the iteration module 604 is further used to:

所述价值计算模块605，用于在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。The value calculation module 605 is used to output the first parameter in the sample value function and the second parameter in the feature value function when the preset conditions are met; and calculate the data value based on the first parameter and the second parameter.

具体的，所述预设条件为迭代次数超过次数阈值，或者，第一参数和第二参数的更新幅度小于幅度阈值；Specifically, the preset condition is that the number of iterations exceeds a number threshold, or the update amplitude of the first parameter and the second parameter is less than an amplitude threshold;

所述价值计算模块605，进一步用于，The value calculation module 605 is further used to:

根据本实施例提供的基于强化学习的数据定价装置，获取多个数据提供方的原始数据，并基于获取到的原始数据生成数据集合；构建样本价值函数以及特征价值函数，对样本价值函数和特征价值函数进行初始化；根据样本价值函数和特征价值函数，确定参与预测模型训练的训练数据；根据训练数据对预测模型进行训练，得到预测器；将验证数据输入预测器，计算预估值，并将预估值作为当前样本价值函数及特征价值函数的奖励值；采用梯度下降法进行迭代计算，基于奖励值确定当前样本价值函数中的第一参数和当前特征价值函数中的第二参数的值，并确定与第一参数及第二参数对应的损失函数；在满足预设条件时，输出样本价值函数中的第一参数和特征价值函数中的第二参数；基于第一参数和第二参数，计算数据价值。通过本实施例提供的基于强化学习的数据定价装置，通过样本和特征两个维度来整理原始数据，依据样本价值函数和特征价值函数，基于数据点被选择的概率来表征数据点的价值，其计算复杂度不依赖于训练集的大小，十分简洁且准确，使数据价值评估可以扩展到大型数据集以及复杂数据模型；并进一步的，通过梯度下降法对两种价值函数中的参数值进行迭代计算，并得到最小化的损失函数，基于此确定出该参数，计算出对应的样本价值和特征价值，并最终得到更为合理的数据价值。According to the data pricing device based on reinforcement learning provided in this embodiment, the original data of multiple data providers are obtained, and a data set is generated based on the obtained original data; a sample value function and a feature value function are constructed, and the sample value function and the feature value function are initialized; according to the sample value function and the feature value function, the training data participating in the training of the prediction model is determined; the prediction model is trained according to the training data to obtain a predictor; the verification data is input into the predictor, the estimated value is calculated, and the estimated value is used as the reward value of the current sample value function and the feature value function; the gradient descent method is used for iterative calculation, and the value of the first parameter in the current sample value function and the second parameter in the current feature value function is determined based on the reward value, and the loss function corresponding to the first parameter and the second parameter is determined; when the preset conditions are met, the first parameter in the sample value function and the second parameter in the feature value function are output; based on the first parameter and the second parameter, the data value is calculated. The data pricing device based on reinforcement learning provided in this embodiment organizes the original data through the two dimensions of samples and features, and characterizes the value of the data point based on the probability of the data point being selected according to the sample value function and the feature value function. The computational complexity does not depend on the size of the training set, and is very concise and accurate, so that the data value assessment can be extended to large data sets and complex data models; and further, the parameter values in the two value functions are iteratively calculated through the gradient descent method, and the minimized loss function is obtained, based on which the parameters are determined, the corresponding sample value and feature value are calculated, and finally a more reasonable data value is obtained.

本发明还提供了一种非易失性计算机存储介质，计算机存储介质存储有至少一可执行指令，可执行指令可执行上述任意方法实施例中的基于强化学习的数据定价方法。The present invention also provides a non-volatile computer storage medium, which stores at least one executable instruction, and the executable instruction can execute the data pricing method based on reinforcement learning in any of the above method embodiments.

图7示出了根据本发明实施例的一种计算设备的结构示意图，本发明具体实施例并不对计算设备的具体实现做限定。FIG. 7 shows a schematic diagram of the structure of a computing device according to an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the computing device.

如图7所示，该计算设备可以包括：处理器(processor)702、通信接口(Communications Interface)704、存储器(memory)706、以及通信总线708。As shown in FIG. 7 , the computing device may include: a processor (processor) 702 , a communications interface (Communications Interface) 704 , a memory (memory) 706 , and a communication bus 708 .

其中：in:

处理器702、通信接口704、以及存储器706通过通信总线708完成相互间的通信。The processor 702 , the communication interface 704 , and the memory 706 communicate with each other via a communication bus 708 .

通信接口704，用于与其它设备比如客户端或其它服务器等的网元通信。The communication interface 704 is used to communicate with other devices such as clients or other servers.

处理器702，用于执行程序710，具体可以执行上述基于强化学习的数据定价方法实施例中的相关步骤。Processor 702 is used to execute program 710, and specifically can execute the relevant steps in the above-mentioned data pricing method embodiment based on reinforcement learning.

具体的，程序710可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 710 may include program codes, and the program codes include computer operation instructions.

处理器702可能是中央处理器CPU，或者是特定集成电路ASIC(Applica tionSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 702 may be a central processing unit (CPU), or an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the computing device may be processors of the same type, such as one or more CPUs; or may be processors of different types, such as one or more CPUs and one or more ASICs.

存储器706，用于存放程序710。存储器706可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 706 is used to store the program 710. The memory 706 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

程序710具体可以用于使得处理器702执行上述任意方法实施例中的基于强化学习的数据定价方法。程序710中各步骤的具体实现可以参见上述基于强化学习的数据定价方法实施例中的相应步骤和单元中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。Program 710 can be specifically used to enable processor 702 to execute the data pricing method based on reinforcement learning in any of the above-mentioned method embodiments. The specific implementation of each step in program 710 can refer to the corresponding descriptions in the corresponding steps and units in the above-mentioned data pricing method embodiment based on reinforcement learning, which will not be repeated here. Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the above-mentioned method embodiment, which will not be repeated here.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithm and display provided herein are not inherently related to any particular computer, virtual system or other device. Various general purpose systems can also be used together with the teachings based on this. According to the above description, it is obvious that the structure required for constructing such systems. In addition, the present invention is not directed to any specific programming language either. It should be understood that various programming languages can be utilized to realize the content of the present invention described herein, and the description of the above specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, a large number of specific details are described. However, it is understood that embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures and techniques are not shown in detail so as not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be understood that in order to streamline the present disclosure and aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of the present invention, various features of the present invention are sometimes grouped together into a single embodiment, figure, or description thereof. However, this disclosed method should not be interpreted as reflecting the intention that the claimed invention requires more features than those expressly recited in each claim. Rather, as reflected in the claims, inventive aspects lie in less than all of the features of the individual embodiments previously disclosed. Therefore, the claims that follow the detailed description are hereby expressly incorporated into the detailed description, with each claim itself serving as a separate embodiment of the present invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from the embodiments. The modules or units or components in the embodiments may be combined into one module or unit or component, and in addition they may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstracts and drawings) and all processes or units of any method or device disclosed in this manner may be combined in any combination. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstracts and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments is meant to be within the scope of the present invention and form different embodiments. For example, in the claims, any one of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be understood by those skilled in the art that a microprocessor or digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the embodiments of the present invention. The present invention may also be implemented as a device or apparatus program (e.g., computer program and computer program product) for executing part or all of the methods described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may have the form of one or more signals. Such a signal may be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.