CN115186767A

Movatterモバイル変換

Info

Publication number: CN115186767A
Application number: CN202210939512.9A
Authority: CN
Inventors: 蔡晓娟; 卞阳; 邢旭; 陈立峰
Original assignee: Fucun Technology Shanghai Co ltd
Current assignee: Fucun Technology Shanghai Co ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-10-14

Abstract

本申请提供一种模型训练方法、服务评估方法、装置、设备及存储介质，该方法包括：根据至少两方的样本数据，计算至少两方所共有的相似矩阵样本数据包括有标签数据和无标签数据；根据相似矩阵，计算每一个样本数据对应的置信度；根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数。本申请所提供的模型训练方法，考虑到了样本间的距离，提高了获得的服务模型的性能。

The present application provides a model training method, a service evaluation method, an apparatus, a device and a storage medium. The method includes: according to the sample data of at least two parties, calculating a similarity matrix common to at least two parties. The sample data includes labeled data and unlabeled data. According to the similarity matrix, calculate the confidence level corresponding to each sample data; determine the weak classifier weights and weak classification functions corresponding to multiple weak classifiers according to the confidence level and class label information of each sample data; The classifier weight and the weak classification function are updated to obtain the strong classification function corresponding to the strong classifier. The model training method provided by the present application takes into account the distance between samples and improves the performance of the obtained service model.

Description

Translated fromChinese

模型训练方法、服务评估方法、装置、设备及存储介质Model training method, service evaluation method, apparatus, equipment and storage medium

技术领域technical field

本申请涉及机器学习领域，具体而言，涉及一种模型训练方法、服务评估方法、装置、设备及存储介质。The present application relates to the field of machine learning, and in particular, to a model training method, a service evaluation method, an apparatus, a device, and a storage medium.

背景技术Background technique

半监督学习(Semi-Supervised Learning，SSL)是模式识别和机器学习领域研究的重点问题，是监督学习与无监督学习相结合的一种学习方法。半监督学习同时使用标记数据和大量的未标记数据来进行模式识别工作。Semi-Supervised Learning (SSL) is a key issue in the field of pattern recognition and machine learning, and it is a learning method that combines supervised learning with unsupervised learning. Semi-supervised learning uses both labeled data and large amounts of unlabeled data for pattern recognition work.

提升方法是一种常用的统计学习方法。在分类问题中，它可以通过改变训练样本的权重，学习多个分类器，并将这些分类器进行线性组合，提高分类的性能。The boosting method is a commonly used statistical learning method. In classification problems, it can learn multiple classifiers by changing the weight of training samples, and linearly combine these classifiers to improve the performance of classification.

目前在半监督联邦学习中，常采用神经网络模型，直接在迭代过程中对样本进行预测，但是神经网络模型的训练需要大量的训练样本，对于样本数量有限的情况下，其训练获得的联邦学习模型的性能较差。At present, in semi-supervised federated learning, neural network models are often used to directly predict samples in the iterative process, but the training of neural network models requires a large number of training samples. The performance of the model is poor.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的在于提供一种模型训练方法、服务评估方法、装置、设备及存储介质，用以提高训练获得的模型的性能。The purpose of the embodiments of the present application is to provide a model training method, a service evaluation method, an apparatus, a device, and a storage medium, so as to improve the performance of a model obtained by training.

第一方面，本申请实施例提供了一种模型训练方法，其应用于联邦学习场景，该方法包括：根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵；所述相似矩阵表征所述至少两方的样本数据之间的相似关系；所述样本数据包括有标签数据和无标签数据；且，所述样本数据包括多个用户分别对应的预设的用户信息；所述有标签数据对应的类标签信息用于表征是否为对应的用户提供服务；根据所述相似矩阵，计算每一个样本数据对应的置信度；所述置信度包括样本数据为有标签数据的第一概率以及样本数据为无标签数据的第二概率；根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数；其中，所述弱分类器和所述强分类器均用于预测是否为用户提供服务，且所述强分类器的预测准确性高于所述弱分类器的预测准确性。这样，可以使用无标签样本数据进行相关计算，解决了相关技术中存在的训练获得的模型性能差的问题；并且可以基于相似矩阵减少样本数据的需求量，基于提升方法提升可解释性，基于置信度将无标签数据排除在当前训练过程之外，避免了干扰因素。继而，该方法改善了相关技术中存在的联邦学习场景下的监督或者半监督学习方法中的不足，具有更强的实用性。In a first aspect, an embodiment of the present application provides a model training method, which is applied to a federated learning scenario. The method includes: calculating, according to sample data of at least two parties, a similarity matrix common to the at least two parties; The matrix represents the similarity relationship between the sample data of the at least two parties; the sample data includes labeled data and unlabeled data; and the sample data includes preset user information corresponding to a plurality of users; the The class label information corresponding to the labeled data is used to represent whether the corresponding user is provided with services; according to the similarity matrix, the confidence level corresponding to each sample data is calculated; the confidence level includes the first probability that the sample data is labeled data and the second probability that the sample data is unlabeled data; determine the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers according to the confidence of each sample data and the class label information; The weak classification function is updated to obtain the strong classification function corresponding to the strong classifier; wherein, the weak classifier and the strong classifier are both used to predict whether to provide services for users, and the prediction accuracy of the strong classifier is high on the prediction accuracy of the weak classifier. In this way, unlabeled sample data can be used for relevant calculations, which solves the problem of poor performance of models obtained by training in related technologies; and the demand for sample data can be reduced based on similarity matrices, interpretability can be improved based on improvement methods, and confidence-based The degree of exclusion of unlabeled data from the current training process avoids interference factors. Then, the method improves the deficiencies in the supervised or semi-supervised learning methods in the federated learning scenario existing in the related art, and has stronger practicability.

可选地，所述根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵，包括：根据至少两方的样本数据，确定所述至少两方的样本数据之间的欧式距离；根据所述欧式距离，计算所述至少两方所共有的相似矩阵。这样，能够使样本数据之间的相似关系更加直观，易于实现。Optionally, calculating the similarity matrix shared by the at least two parties according to the sample data of the at least two parties includes: determining the Euclidean distance between the sample data of the at least two parties according to the sample data of the at least two parties ; Calculate the similarity matrix common to the at least two parties according to the Euclidean distance. In this way, the similarity relationship between the sample data can be made more intuitive and easy to realize.

可选地，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个，以及所述根据至少两方的样本数据，确定所述至少两方的样本数据之间的欧式距离，包括：根据己方每一个样本数据所对应的己方特征值，确定任意两个己方特征值之间的差值；根据任意两个己方特征值之间的差值，确定己方特征差值所对应的己方平方和；接收对方平方和；所述对方平方和表征对方特征差值所对应的平方和；所述对方特征差值表征任意两个对方样本数据对应的对方特征值之间的差值；计算所述己方平方和以及所述对方平方和所对应的累加和的二次方根，得到所述欧式距离。这样，通过发送各参与方所对应的对方平方和，保证了对方的数据安全，继而可以应用于纵向学习场景。Optionally, the federated learning scenario includes a longitudinal learning scenario, the sample data of any party in the longitudinal learning scenario includes at least two, and the sample data of the at least two parties is determined according to the sample data of the at least two parties. The Euclidean distance between the data includes: determining the difference between any two own eigenvalues according to the own eigenvalue corresponding to each of the own sample data; determining the own eigenvalue according to the difference between any two own eigenvalues The sum of squares of oneself corresponding to the feature difference; receiving the sum of squares of the other party; the sum of squares of the other party represents the sum of squares corresponding to the difference of the other party’s feature value; Calculate the square root of the accumulated sum corresponding to the sum of squares of oneself and the sum of squares of the opponent to obtain the Euclidean distance. In this way, by sending the square sum of each party corresponding to the other party, the data security of the other party is guaranteed, and then it can be applied to vertical learning scenarios.

可选地，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个，以及所述根据所述相似矩阵，计算每一个样本数据对应的置信度，包括：利用脉冲函数确定每一个己方样本数据是否有标签；若任一己方样本数据有标签，则根据第一预设表达式计算该己方样本数据对应的置信度；若任一己方样本数据无标签，则根据第二预设表达式计算该己方样本数据对应的置信度；所述第二预设表达式包括决策树对该己方样本数据的预测结果项。这样，通过判断己方样本数据是否存在标签，可以通过不同的表达式确定出对应的置信度，并且可以预测出无标签样本数据的标签信息，减少了在纵向学习场景下的样本需求量。Optionally, the federated learning scenario includes a longitudinal learning scenario, and any one of the longitudinal learning scenarios includes at least two sample data, and the confidence level corresponding to each sample data is calculated according to the similarity matrix, Including: using the impulse function to determine whether each one's own sample data has a label; if any one's own sample data has a label, calculate the confidence level corresponding to the one's own sample data according to the first preset expression; if any one's own sample data has no label , the confidence level corresponding to the own sample data is calculated according to a second preset expression; the second preset expression includes a prediction result item of the decision tree for the own sample data. In this way, by judging whether one's own sample data has a label, the corresponding confidence level can be determined through different expressions, and the label information of unlabeled sample data can be predicted, which reduces the sample demand in the vertical learning scenario.

可选地，所述根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数，包括：确定同一己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；若所述概率差值在所述预设差异范围内，抽取该己方样本数据；根据抽取的己方样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。这样，可以基于己方样本数据的置信度确定出用于训练弱分类器的新样本数据，使训练得到的弱分类器更加适用于当前的纵向学习场景。Optionally, determining the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers according to the confidence level and class label information of each sample data includes: determining the first probability of the same own sample data. Whether the probability difference corresponding to the second probability is within the preset difference range; if the probability difference is within the preset difference range, extract the own sample data; according to the extracted own sample data and the The class label information is used to train multiple weak classifiers, and the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers are obtained respectively. In this way, new sample data for training the weak classifier can be determined based on the confidence of one's own sample data, so that the weak classifier obtained by training is more suitable for the current longitudinal learning scenario.

可选地，所述联邦学习场景包括横向学习场景，以及所述根据至少两方的样本数据，确定所述至少两方的样本数据之间的欧式距离，包括：根据接收到的第一对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第一对方加密特征值由对方通过全同态加密方式对对方样本数据所对应的特征值进行加密得到；或者根据接收到的第二对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第二对方加密特征值由对方通过半同态加密方式对对方样本数据所对应的特征值进行加密得到。这样，可以通过全同态加密方式或者半同态加密方式计算得到对应的欧式距离，在横向学习场景下保证了数据安全。Optionally, the federated learning scenario includes a horizontal learning scenario, and the determining, according to the sample data of the at least two parties, the Euclidean distance between the sample data of the at least two parties includes: encrypting the first party according to the received data. The eigenvalues and the eigenvalues corresponding to the own sample data are calculated by using the Euclidean distance formula; wherein, the first counterparty encrypted eigenvalue is encrypted by the counterparty through the fully homomorphic encryption method corresponding to the counterparty's sample data. obtain; or according to the received eigenvalues of the second counterparty encrypted eigenvalues and the eigenvalues corresponding to one's own sample data, use the Euclidean distance formula to calculate the Euclidean distance; wherein, the encrypted eigenvalues of the second counterparty are encrypted by the counterparty through a semi-homomorphic encryption method It is obtained by encrypting the eigenvalues corresponding to the sample data of the other party. In this way, the corresponding Euclidean distance can be calculated by the fully homomorphic encryption method or the semi-homomorphic encryption method, which ensures data security in the horizontal learning scenario.

可选地，所述根据所述相似矩阵，计算每一个样本数据对应的置信度，包括：接收对方加密样本数据；利用脉冲函数确定每一个对方加密样本数据以及己方样本数据是否有标签；若任一样本数据有标签，则根据第一预设表达式计算该样本数据对应的置信度；若任一样本数据无标签，则根据第二预设表达式计算该样本数据对应的置信度；所述第二预设表达式包括决策树对该样本数据的预测结果项。这样，通过判断样本数据是否存在标签，可以通过不同的表达式确定出对应的置信度，并且可以预测出无标签样本数据的标签信息，减少了在横向学习场景下的样本需求量。Optionally, calculating the confidence level corresponding to each sample data according to the similarity matrix includes: receiving encrypted sample data from the other party; If a sample data has a label, the confidence level corresponding to the sample data is calculated according to the first preset expression; if any sample data has no label, the confidence level corresponding to the sample data is calculated according to the second preset expression; the The second preset expression includes the prediction result item of the decision tree for the sample data. In this way, by judging whether the sample data has a label, the corresponding confidence level can be determined through different expressions, and the label information of the unlabeled sample data can be predicted, which reduces the sample demand in the horizontal learning scenario.

可选地，所述根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数，包括：确定同一对方加密样本数据或者己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；若所述概率差值在所述预设差异范围内，抽取该样本数据；根据抽取的样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。这样，可以基于任一样本数据的置信度确定出用于训练弱分类器的新样本数据，使训练得到的弱分类器更加适用于当前的横向学习场景。Optionally, determining the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers according to the confidence level and class label information of each sample data includes: determining the encrypted sample data of the same counterparty or the sample data of one's own party. Whether the probability difference corresponding to the first probability and the second probability is within the preset difference range; if the probability difference is within the preset difference range, extract the sample data; according to the extracted sample data And the class label information trains a plurality of weak classifiers to obtain weak classifier weights and weak classification functions corresponding to the plurality of weak classifiers respectively. In this way, new sample data for training the weak classifier can be determined based on the confidence of any sample data, so that the weak classifier obtained by training is more suitable for the current horizontal learning scenario.

第二方面，本申请实施例提供一种服务评估方法，包括：接收目标用户的用户信息；基于所述用户信息，利用第一方面所述的方法训练得到的强分类器预测是否为所述目标用户提供服务。In a second aspect, an embodiment of the present application provides a service evaluation method, including: receiving user information of a target user; based on the user information, predicting whether a strong classifier trained by the method described in the first aspect is the target User provides services.

第三方面，本申请实施例提供了一种基于提升方法的半监督学习装置，其可以应用于联邦学习场景，该装置包括：第一计算模块，用于根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵；所述相似矩阵表征所述至少两方的样本数据之间的相似关系；所述样本数据包括有标签数据和无标签数据；且，所述样本数据包括多个用户分别对应的预设的用户信息；所述有标签数据对应的类标签信息用于表征是否为对应的用户提供服务；第二计算模块，用于根据所述相似矩阵，计算每一个样本数据对应的置信度；所述置信度包括样本数据为有标签数据的第一概率以及样本数据为无标签数据的第二概率；确定模块，用于根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；更新模块，用于根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数；其中，所述弱分类器和所述强分类器均用于预测是否为用户提供服务，且所述强分类器的预测准确性高于所述弱分类器的预测准确性。In a third aspect, an embodiment of the present application provides a semi-supervised learning device based on a boosting method, which can be applied to a federated learning scenario. The device includes: a first calculation module, configured to calculate all the data according to the sample data of at least two parties. The similarity matrix shared by the at least two parties; the similarity matrix represents the similarity relationship between the sample data of the at least two parties; the sample data includes labeled data and unlabeled data; and the sample data includes multiple The preset user information corresponding to each user respectively; the class label information corresponding to the labeled data is used to represent whether to provide services for the corresponding users; the second calculation module is used to calculate each sample data according to the similarity matrix Corresponding confidence level; the confidence level includes the first probability that the sample data is labeled data and the second probability that the sample data is unlabeled data; the determination module is used to determine according to the confidence level of each sample data and the class label information The weak classifier weights and weak classification functions corresponding to the multiple weak classifiers respectively; the updating module is used to update the strong classification functions corresponding to the strong classifiers according to the weights of the weak classifiers and the weak classification functions; wherein, the Both the weak classifier and the strong classifier are used to predict whether to provide services to users, and the prediction accuracy of the strong classifier is higher than the prediction accuracy of the weak classifier.

第四方面，本申请实施例提供一种服务评估装置，包括：接收模块，用于接收目标用户的用户信息；预测模块，用于基于所述用户信息，利用第一方面所述的方法训练得到的强分类器预测是否为所述目标用户提供服务。In a fourth aspect, an embodiment of the present application provides a service evaluation device, including: a receiving module, configured to receive user information of a target user; The strong classifier of predicts whether to serve the target user.

第五方面，本申请实施例提供一种电子设备，包括处理器以及存储器，所述存储器存储有计算机可读取指令，当所述计算机可读取指令由所述处理器执行时，运行如上述第一方面提供的所述方法中的步骤。In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the operation is as described above. The steps in the method provided by the first aspect.

第六方面，本申请实施例提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时运行如上述第一方面提供的所述方法中的步骤。In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the method provided in the first aspect above are executed.

本申请的其他特征和优点将在随后的说明书阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请实施例了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be set forth in the description which follows, and, in part, will be apparent from the description, or may be learned by practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description, claims, and drawings.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对本申请实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments of the present application. It should be understood that the following drawings only show some embodiments of the present application, therefore It should not be regarded as a limitation of the scope. For those of ordinary skill in the art, other related drawings can also be obtained from these drawings without any creative effort.

图1为本申请实施例提供的一种模型训练方法的流程图；1 is a flowchart of a model training method provided by an embodiment of the present application;

图2为本申请实施例提供的一种模型训练装置的结构框图；2 is a structural block diagram of a model training apparatus provided by an embodiment of the present application;

图3为本申请实施例提供的一种用于执行模型训练方法或服务评估方法的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device for executing a model training method or a service evaluation method according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。同时，在本申请的描述中，术语“第一”、“第二”等仅用于区分描述，而不能理解为指示或暗示相对重要性。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

应当说明的是，在不冲突的情况下，本申请中的实施例或者实施例中的技术特征可以进行结合。It should be noted that the embodiments of the present application or the technical features of the embodiments may be combined without conflict.

本申请提供一种基于提升方法的半监督学习方法、装置和电子设备；进一步地，通过对现有的基于提升方法的半监督学习方法进行改造，以提高训练获得的模型的性能。具体的，可以通过计算联邦学习场景中的各方样本数据之间的相似关系，并通过该相似关系训练得到多个弱分类器，以将各个弱分类器进行线性组合，得到可以应用于联邦学习场景的强分类器。这样，由于该强分类器是根据各方样本数据之间的相似关系训练得到的，因此，考虑到了样本数据之间的距离，从而可以在样本数据有限的情况下，获得性能较好的模型。The present application provides a semi-supervised learning method, device and electronic device based on the boosting method; further, the performance of the model obtained by training is improved by transforming the existing semi-supervised learning method based on the boosting method. Specifically, the similarity relationship between the sample data of each party in the federated learning scenario can be calculated, and a plurality of weak classifiers can be obtained by training through the similarity relationship, so that the weak classifiers can be linearly combined to obtain a result that can be applied to federated learning. A strong classifier for scenes. In this way, since the strong classifier is trained according to the similar relationship between the sample data of various parties, the distance between the sample data is considered, so that a model with better performance can be obtained when the sample data is limited.

在一些应用场景中，上述基于提升方法的半监督学习方法可以应用于一方诸如手机、电脑等终端设备中，该终端设备可以接收其他各方终端设备发送的样本数据，并计算所有样本数据之间的相似关系，基于相似关系训练得到强分类器。这样，该强分类器可以应用于联邦学习场景，以对真实数据进行相关处理。在另一些应用场景中，上述基于提升方法的半监督学习方法可以应用于一方服务器中，该服务器可以对该方终端设备提供数据处理服务。进一步的，该服务器可以接收其他各方服务器发送的样本数据，并计算所有样本数据之间的相似关系，基于相似关系训练得到强分类器。示例性地，本申请以应用于一方服务器行文。In some application scenarios, the above-mentioned semi-supervised learning method based on the boosting method can be applied to a terminal device such as a mobile phone and a computer. The similarity relationship is obtained by training a strong classifier based on the similarity relationship. In this way, this strong classifier can be applied in federated learning scenarios to correlate real data. In other application scenarios, the above-mentioned semi-supervised learning method based on the boosting method can be applied to a server of a party, and the server can provide data processing services to the terminal device of the party. Further, the server can receive sample data sent by other servers, calculate the similarity relationship between all sample data, and train a strong classifier based on the similarity relationship. Exemplarily, the present application is applied to one-party server language.

以上相关技术中的方案所存在的缺陷，均是发明人在经过实践并仔细研究后得出的结果，因此，上述问题的发现过程以及下文中本发明实施例针对上述问题所提出的解决方案，都应该是发明人在本发明过程中对本发明做出的贡献。The defects existing in the solutions in the above related technologies are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions proposed by the embodiments of the present invention below for the above problems, All should be the contributions made by the inventor to the present invention in the course of the present invention.

请参考图1，其示出了本申请实施例提供的一种模型训练方法的流程图。如图1所示，该模型训练方法包括以下步骤101至步骤104。Please refer to FIG. 1 , which shows a flowchart of a model training method provided by an embodiment of the present application. As shown in FIG. 1 , the model training method includes the followingsteps 101 to 104 .

步骤101，根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵；所述相似矩阵表征所述至少两方的样本数据之间的相似关系；所述样本数据包括有标签数据和无标签数据；且，所述样本数据包括多个用户分别对应的预设的用户信息；所述有标签数据对应的类标签信息用于表征是否为对应的用户提供服务；Step 101: Calculate a similarity matrix shared by the at least two parties according to the sample data of the at least two parties; the similarity matrix represents the similarity relationship between the sample data of the at least two parties; the sample data includes labeled data and unlabeled data; and, the sample data includes preset user information corresponding to multiple users respectively; the class label information corresponding to the labeled data is used to indicate whether to provide services for the corresponding users;

在一些应用场景中，服务器可以接收至少一个参与方的样本数据，以与己方构成联邦学习场景。这样，服务器可以根据至少两方的样本数据，计算至少两方之间所共有的相似矩阵。这里，服务器所接收的样本数据可以根据实际联邦学习场景中所涉及的参与方的个数确定。例如，联邦学习场景中存在5个参与方，则一方服务器可以接收其他4个参与方的样本数据。In some application scenarios, the server may receive sample data of at least one participant to form a federated learning scenario with itself. In this way, the server can calculate the similarity matrix shared by the at least two parties according to the sample data of the at least two parties. Here, the sample data received by the server can be determined according to the number of participants involved in the actual federated learning scenario. For example, if there are 5 participants in a federated learning scenario, one server can receive sample data from the other 4 participants.

上述样本数据可以包括有标签数据和无标签数据，以使其能够应用于半监督学习方法。不同的参与方其对应的样本数据不同，并且，针对的服务，其选用的样本数据中的用户信息也不同，以贷款服务为例，参与方可以为某银行和移动运营商，银行对应的预设的用户信息包括用户的姓名、身份证号码、存款额度、是否贷款等信息，移动端对应的预设的用户信息包括用户姓名、用户身份证号码、与其他用户的通话频率及通话时长等信息。对于有标签数据，其对应的类标签信息为提供贷款或不提供贷款。以服务为是否向用户推送某商品信息为例，参与方可以为某购物平台和银行，某购物平台对应的样本数据中包含的预设的用户信息包括用户账号、手机号、收货地址、购买记录等信息；银行对应的样本数据中包含的预设的用户信息包括用户姓名、性别、年龄、手机号、银行卡账号、存款额度等信息。对于有标签数据，其对应的类标签信息为推送某商品或不推送某商品。The above-mentioned sample data can include labeled data and unlabeled data, so that it can be applied to semi-supervised learning methods. Different participants have different corresponding sample data, and the user information in the sample data selected for the targeted service is also different. Taking loan service as an example, the participants can be a certain bank and mobile The preset user information includes the user's name, ID number, deposit amount, whether to take a loan, etc. The preset user information corresponding to the mobile terminal includes the user's name, user ID number, call frequency and call duration with other users, etc. . For labeled data, the corresponding class label information is to provide loans or not to provide loans. Taking the service as an example of whether to push a certain product information to the user, the participants can be a shopping platform and a bank. The preset user information contained in the sample data corresponding to a shopping platform includes the user account, mobile phone number, delivery address, purchase Records and other information; the preset user information contained in the sample data corresponding to the bank includes user name, gender, age, mobile phone number, bank card account number, deposit amount and other information. For labeled data, the corresponding class label information is to push a certain product or not to push a certain product.

上述相似矩阵可以用于表征各个参与方的样本数据之间的相似关系。该相似关系例如可以包括欧式距离关系、余弦相似性关系等。也即，服务器可以通过计算各方的样本数据之间的诸如欧式距离、夹角余弦值等计算出各方所共有的相似矩阵。在一些应用场景中，在计算相似矩阵之前，可以先将各方的样本数据进行对齐处理，以得到各方所共有的相似矩阵。例如，在纵向学习场景中，可以确定参与方A与己方共有的样本数据的身份标识(例如，己方为电商，参与方A为银行，则可以用客户的手机设备号的哈希值作为样本数据的身份标识)，然后可以找出所有样本数据的身份标识的交集，基于该交集即可计算参与方A与己方所共有的相似矩阵。The above similarity matrix can be used to represent the similarity relationship between the sample data of each participant. The similarity relationship may include, for example, an Euclidean distance relationship, a cosine similarity relationship, and the like. That is, the server can calculate the similarity matrix shared by all parties by calculating the Euclidean distance, the cosine value of the included angle, etc. between the sample data of the parties. In some application scenarios, before calculating the similarity matrix, the sample data of each party may be aligned to obtain a similarity matrix shared by all parties. For example, in a longitudinal learning scenario, the identity of the sample data shared by participant A and itself can be determined (for example, if the participant is an e-commerce company and participant A is a bank, the hash value of the customer's mobile device number can be used as the sample Data identification), and then can find the intersection of the identifications of all sample data, and based on the intersection, the similarity matrix shared by participant A and itself can be calculated.

步骤102，根据所述相似矩阵，计算每一个样本数据对应的置信度；所述置信度包括样本数据为有标签数据的第一概率以及样本数据为无标签数据的第二概率；Step 102: Calculate the confidence level corresponding to each sample data according to the similarity matrix; the confidence level includes a first probability that the sample data is labeled data and a second probability that the sample data is unlabeled data;

服务器在计算得到各方所共有的相似矩阵之后，可以根据该相似矩阵，进一步计算每一个样本数据所对应的置信度。这里，上述置信度可以包括样本数据为有标签数据的第一概率和为无标签数据的第二概率。After calculating the similarity matrix shared by all parties, the server may further calculate the confidence level corresponding to each sample data according to the similarity matrix. Here, the above confidence may include a first probability that the sample data is labeled data and a second probability that the sample data is unlabeled data.

也就是说，服务器可以根据相似矩阵，计算每一个样本数据所分别对应的第一概率和第二概率。That is, the server may calculate the first probability and the second probability corresponding to each sample data according to the similarity matrix.

在一些应用场景中，若第一概率以p_i表示，则可以通过公式

计算得到，其中，S_i,j表征相似矩阵，i表示样本数据在相似矩阵中的行数；j表示样本数据在相似矩阵中的列数；c表示有标签样本和无标签样本的相对权重；H表示弱分类器输出值；n_l表示有标签的样本数；n_u表示无标签的样本数；δ(y_j,1)表示脉冲函数，其基于样本数据y的值确定(当y值为1时，该脉冲函数的值为1；当y值为0时，该脉冲函数的值为0)；y表示样本数据为有标签函数或者无标签数据；其中，y值为1时，可以视为其为有标签数据，y值为0时，可以视为其为无标签数据。In some application scenarios, if the first probability is represented by p_i , then the formula

Calculated, where S_i,j represent the similarity matrix, i represents the number of rows of the sample data in the similarity matrix; j represents the number of columns of the sample data in the similarity matrix; c represents the relative weight of labeled samples and unlabeled samples; H represents the output value of the weak classifier; n_l represents the number of labeled samples; n_u represents the number of unlabeled samples; δ(y_j , 1) represents the impulse function, which is determined based on the value of the sample data y (when the value of y is When the value of y is 1, the value of the impulse function is 1; when the value of y is 0, the value of the impulse function is 0); y indicates that the sample data is a labeled function or unlabeled data; among them, when the value of y is 1, it can be regarded as It is labeled data, and when the y value is 0, it can be regarded as unlabeled data.

在这些应用场景中，若第二概率以q_i表示，则可以通过公式

计算得到，其中，该公式中的各个参数与上述p_i的公式中的参数含义相同或相似，此处不赘述。In these application scenarios, if the second probability is represented by_qi , then the formula

It is obtained by calculation that each parameter in this formula has the same or similar meanings as the parameters in the above formula of p_i , and will not be repeated here.

步骤103，根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；Step 103: Determine the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers respectively according to the confidence level of each sample data and the class label information;

上述类标签信息例如可以通过数学符号函数(sign函数)确定。The above class label information can be determined by, for example, a mathematical sign function (sign function).

基于提升方法，服务器可以训练得到多个精度较差的弱分类器，然后可以组合多个弱分类器，得到一个精度较高的强分类器。在这过程中，可以通过每一个样本数据的置信度以及类标签信息训练弱分类器，以确定出每个弱分类器对应的弱分类权重以及弱分类函数，继而得到强分类器。在一些应用场景中，上述弱分类器例如可以包括逻辑回归模型、决策树模型等。Based on the boosting method, the server can train to obtain multiple weak classifiers with poor accuracy, and then combine multiple weak classifiers to obtain a strong classifier with higher accuracy. In this process, the weak classifier can be trained by the confidence of each sample data and the class label information, so as to determine the weak classification weight and weak classification function corresponding to each weak classifier, and then obtain the strong classifier. In some application scenarios, the above-mentioned weak classifier may include, for example, a logistic regression model, a decision tree model, and the like.

应当说明的是，本领域技术人员在得到联邦学习场景下每一个样本数据的置信度以及类标签信息之后，可以基于现有的提升方法中对多个弱分类器的训练过程实现上述步骤103，以得到对应的弱分类器权重以及弱分类函数，此处不赘述。It should be noted that, after obtaining the confidence level and class label information of each sample data in the federated learning scenario, those skilled in the art can implement theabove step 103 based on the training process of multiple weak classifiers in the existing improvement method, In order to obtain the corresponding weak classifier weight and weak classification function, it will not be repeated here.

步骤104，根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数。Step 104, according to the weight of each weak classifier and the weak classification function, update the strong classification function corresponding to the strong classifier.

在一些应用场景中，服务器根据各个弱分类器所分别对应的弱分类器权重以及弱分类函数，可以更新得到强分类器所对应的强分类函数。在这些应用场景中，若以a_t表示第t个弱分类器所对应的弱分类器权重，h_t(x)表示第t个弱分类器所对应的弱分函数，则可以利用公式H(x)←H(x)+a_th_t(x)更新得到强分类函数。也即，强分类函数可以视为多个弱分类器所分别对应的弱分类器权重以及弱分类函数进行组合得到。可以理解的是，强分类器即为训练获得的模型。In some application scenarios, the server may update the strong classification function corresponding to the strong classifier according to the weak classifier weight and the weak classification function corresponding to each weak classifier. In these application scenarios, if at_t represents the weak classifier weight corresponding to the t-th weak classifier, and h_t (x) represents the weak score function corresponding to the t-th weak classifier, the formula H( x)←H(x)+at h_t (x)_is updated to obtain a strong classification function. That is, the strong classification function can be regarded as a combination of the weak classifier weights corresponding to the multiple weak classifiers and the weak classification function respectively. It can be understood that a strong classifier is a model obtained by training.

在本实施例中，通过上述步骤101至步骤104可以基于联邦学习场景下的样本数据训练得到强分类函数，在真实的联邦学习场景下能够保证各方的数据安全，继而可以将其应用于联邦学习场景中。In this embodiment, through theabove steps 101 to 104, a strong classification function can be obtained by training based on the sample data in the federated learning scenario, which can ensure the data security of all parties in the real federated learning scenario, and then it can be applied to the federated learning scenario. in the learning scene.

在相关技术中，存在可以应用于联邦学习场景的监督学习方法，但是其需要使用全量标注数据(也即所有样本数据为有标签数据)学习用于预测的模型(例如本申请中的强分类器)，这样，若样本数据中存在无标签数据，则无法实现该方法。In the related art, there are supervised learning methods that can be applied to federated learning scenarios, but they need to use full labeled data (that is, all sample data are labeled data) to learn a model for prediction (such as the strong classifier in this application). ), so this method cannot be implemented if there is unlabeled data in the sample data.

进一步的，在相关技术中存在可以应用于联邦学习场景的半监督学习方法，但是其多是基于神经网络模型进行的，其可以直接在神经网络的迭代更新过程中对无标签的样本数据进行预测。这样，对于样本数据的需求量较大，并且可解释性较差。Further, there are semi-supervised learning methods that can be applied to federated learning scenarios in related technologies, but most of them are based on neural network models, which can directly predict unlabeled sample data in the iterative update process of the neural network. . In this way, the demand for sample data is large and the interpretability is poor.

在本实施例中，可以使用无标签样本数据进行相关计算，解决了相关技术中存在的不能应用于无标签数据的场景的问题；并且可以基于相似矩阵减少样本数据的需求量，基于提升方法提升可解释性，基于置信度将无标签数据排除在当前训练过程之外，避免了干扰因素。继而，本实施例改善了相关技术中存在的联邦学习场景下的监督或者半监督学习方法中的不足，具有更强的实用性。In this embodiment, unlabeled sample data can be used to perform related calculations, which solves the problem that the related art cannot be applied to unlabeled data scenarios; and the demand for sample data can be reduced based on the similarity matrix, and the improvement based on the improvement method can be improved. Interpretability, which excludes unlabeled data from the current training process based on confidence, avoiding confounding factors. Then, the present embodiment improves the deficiencies in the supervised or semi-supervised learning method in the federated learning scenario existing in the related art, and has stronger practicability.

在一些可选的实现方式中，上述步骤101中可以通过以下子步骤计算相似矩阵。In some optional implementation manners, the similarity matrix may be calculated by the following sub-steps in the foregoingstep 101 .

子步骤1011：根据至少两方的样本数据，确定所述至少两方的样本数据之间的欧式距离；Sub-step 1011: According to the sample data of the at least two parties, determine the Euclidean distance between the sample data of the at least two parties;

在一些应用场景中，可以通过计算各方的样本数据之间的欧式距离，计算出相似矩阵。具体的，可以应用欧式距离公式进行计算。这里，若联邦学习场景中有2个参与方，则对应的欧式距离公式可以为

其中，

分别表示2个参与方相互对齐的样本数据，d表示这两个样本数据之间的欧式距离。In some application scenarios, the similarity matrix can be calculated by calculating the Euclidean distance between the sample data of each party. Specifically, the Euclidean distance formula can be used for calculation. Here, if there are two participants in the federated learning scenario, the corresponding Euclidean distance formula can be

in,

Respectively represent the sample data of the two participants aligned with each other, and d represents the Euclidean distance between the two sample data.

子步骤1012，根据所述欧式距离，计算所述至少两方所共有的相似矩阵。Sub-step 1012: Calculate the similarity matrix shared by the at least two parties according to the Euclidean distance.

服务器在计算出相互对齐的样本数据之间的欧式距离之后，可以整理各个欧式距离，继而可以得到相似矩阵。After calculating the Euclidean distances between the aligned sample data, the server can sort out the Euclidean distances, and then obtain a similarity matrix.

在本实现方式中，通过计算多个参与方之间的欧式距离，可以得到多个参与方所共有的相似矩阵，这样，能够使样本数据之间的相似关系更加直观，易于实现。In this implementation manner, by calculating the Euclidean distance between multiple participants, a similarity matrix shared by multiple participants can be obtained, which can make the similarity relationship between sample data more intuitive and easy to implement.

在一些可选的实现方式中，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个。In some optional implementation manners, the federated learning scenario includes a longitudinal learning scenario, and the sample data of any one of the longitudinal learning scenarios includes at least two pieces of sample data.

在一些应用场景中，当联邦学习场景为纵向学习场景时，每个参与方的样本数据的特征值均存在于己方，对方存在其他样本数据的特征值，则己方服务器可以基于现存的样本数据的特征值计算己方样本数据之间的欧式距离。In some application scenarios, when the federated learning scenario is a longitudinal learning scenario, the eigenvalues of the sample data of each participant exist in its own party, and the other party has the eigenvalues of other sample data, then its own server can be based on the existing sample data. Eigenvalues calculate the Euclidean distance between own sample data.

这样，上述子步骤1011可以包括以下子步骤：In this way, the above sub-step 1011 may include the following sub-steps:

子步骤1，根据己方每一个样本数据所对应的己方特征值，确定任意两个己方特征值之间的差值。Sub-step 1: Determine the difference between any two own eigenvalues according to the own eigenvalues corresponding to each of the own sample data.

在一些应用场景中，服务器可以先确定出己方现存的每一个样本数据所对应的己方特征值。例如，样本数据为己方客户的订单信息时，该样本数据所对应的己方特征值可以为客户的购买数量。In some application scenarios, the server may first determine its own characteristic value corresponding to each existing sample data of its own. For example, when the sample data is the order information of one's own customer, the own characteristic value corresponding to the sample data may be the purchase quantity of the customer.

在这些应用场景中，服务器确定了己方每一个样本数据的己方特征值之后，可以确定任意两个己方特征值之间的差值。这里，确定该差值的原因为适应于欧式距离公式。In these application scenarios, after the server determines the own eigenvalue of each of its own sample data, it can determine the difference between any two own eigenvalues. Here, the reason for determining the difference is to adapt to the Euclidean distance formula.

子步骤2，根据任意两个己方特征值之间的差值，确定己方特征差值所对应的己方平方和；Sub-step 2, according to the difference between any two own eigenvalues, determine the own square sum corresponding to the own eigenvalue difference;

服务器确定了己方特征值之间的差值之后，可以进一步确定己方特征差值所对应的己方平方和。例如，在上述己方特征值为客户的购买数量时，可以先将任意两个客户的购买数量对应的数值作差，得到这两个客户所对应的己方特征差值，然后可以将得到的所有己方特征差值的平方进行累加，得到对应的己方平方和。这里，计算该己方平方和，同样是为了能够适用于欧式距离公式。After the server determines the difference between its own eigenvalues, it can further determine its own square sum corresponding to the own eigenvalue difference. For example, when the above-mentioned own characteristic value is the customer's purchase quantity, you can first make the difference between the values corresponding to the purchase quantity of any two customers to obtain the own characteristic difference value corresponding to the two customers, and then you can compare all the obtained self-side characteristic values. The squares of the characteristic differences are accumulated to obtain the corresponding sum of squares. Here, the calculation of the sum of squares of oneself is also made to be applicable to the Euclidean distance formula.

子步骤3，接收对方平方和；接收对方平方和；所述对方平方和表征对方特征差值所对应的平方和；所述对方特征差值表征任意两个对方样本数据对应的对方特征值之间的差值；Sub-step 3: Receive the sum of squares of the opponent; receive the sum of squares of the opponent; the sum of squares of the opponent represents the sum of squares corresponding to the difference of the opponent's feature; the difference of the opponent's feature represents the difference between the eigenvalues of the opponent corresponding to any two sample data of the opponent difference;

在纵向学习场景中，由于需要保证各个参与方的数据安全。因此，对方不能将自己原始的其他样本数据的特征值直接发送给己方。所以，对方可以先计算得到自己任意两个样本数据所对应的对方特征值之间的差值，然后将对方特征差值的平方进行累加，以得到对方平方和。In the vertical learning scenario, the data security of each participant needs to be guaranteed. Therefore, the other party cannot directly send the original eigenvalues of other sample data to the other party. Therefore, the opponent can first calculate the difference between the eigenvalues of the opponent corresponding to any two sample data of his own, and then accumulate the squares of the eigenvalues of the opponent to obtain the sum of the squares of the opponent.

对方计算得到对方平方和之后，可以发送至己方服务器。继而己方服务器可以接收到该对方平方和。After the other party calculates the square sum of the other party, it can be sent to its own server. Then the own server can receive the opposite party sum of squares.

子步骤4，计算所述己方平方和以及所述对方平方和所对应的累加和的二次方根，得到所述欧式距离。Sub-step 4: Calculate the quadratic root of the accumulated sum corresponding to the sum of squares of oneself and the sum of squares of the opponent to obtain the Euclidean distance.

服务器接收到对方平方和之后，可以计算己方平方和以及对方平方和所对应的累加和的二次方根，得到欧式距离。也即，在欧式距离公式的框架下，分别计算其所需的各个参数(己方平方和、对方平方和)，继而能够根据欧式距离公式，计算得到欧式距离。After the server receives the sum of squares of the opponent, it can calculate the square root of the sum of squares of its own and the accumulated sum corresponding to the sum of squares of the opponent to obtain the Euclidean distance. That is, under the framework of the Euclidean distance formula, the required parameters (the sum of squares of oneself and the sum of squares of the other party) are calculated respectively, and then the Euclidean distance can be calculated according to the Euclidean distance formula.

在本实现方式中，通过发送各参与方所对应的对方平方和，保证了对方的数据安全，继而可以应用于纵向学习场景。In this implementation manner, by sending the square sum of the counterparty corresponding to each participant, the data security of the counterparty is ensured, and then it can be applied to the vertical learning scenario.

应当说明的是，在计算己方平方和以及对方平方和所对应的累加和的二次方根时，由于己方数据是已知的，因此若对方仅有一个样本数据，则能够通过推导出的对方平方和，推导出对方的样本数据。因此，在纵向学习场景下，应当禁止只有一个样本数据参与计算的情况出现。It should be noted that when calculating the square root of the sum of squares of one's own side and the accumulated sum corresponding to the sum of squares of the other side, since the data of one's own side is known, if the other side has only one sample data, it can be obtained by deriving the other side's data. Sum of squares, deriving each other's sample data. Therefore, in the longitudinal learning scenario, the situation where only one sample data participates in the calculation should be prohibited.

在一些可选的实现方式中，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个，以及上述步骤102可以包括以下子步骤：In some optional implementations, the federated learning scenario includes a longitudinal learning scenario, and any one of the longitudinal learning scenarios includes at least two sample data, and theabove step 102 may include the following sub-steps:

子步骤1021A，利用脉冲函数确定每一个己方样本数据是否有标签；Sub-step 1021A, use the impulse function to determine whether each one's own sample data has a label;

在一些应用场景中，可以利用脉冲函数确定每一个己方样本数据是否有标签。这里，脉冲函数例如可以表示为上述的δ(y_j,1)。In some application scenarios, the impulse function can be used to determine whether each own sample data has a label. Here, the impulse function can be expressed as, for example, the above-mentioned δ(y_j ,1).

子步骤1022A，若任一己方样本数据有标签，则根据第一预设表达式计算该己方样本数据对应的置信度；Sub-step 1022A, if any one's own sample data has a label, calculate the confidence level corresponding to the one's own sample data according to the first preset expression;

在一些应用场景中，若确定了某一个己方样本数据存在标签，可以根据第一预设表达式计算该己方样本数据所对应的置信度。这里的第一预设表达式例如可以包括上述的

中的第一项(也即

)以及

中的第一项(也即

)。也就是说，若第j个己方样本数据有标签，则可以直接使用

计算出第一概率，使用

计算出第二概率。In some application scenarios, if it is determined that a certain own sample data has a label, the confidence level corresponding to the own sample data can be calculated according to the first preset expression. The first preset expression here may include, for example, the above-mentioned

the first term in (ie

)as well as

the first term in (ie

). That is to say, if the j-th self-sample data has a label, it can be used directly

To calculate the first probability, use

A second probability is calculated.

子步骤1023A，若任一己方样本数据无标签，则根据第二预设表达式计算该己方样本数据对应的置信度；所述第二预设表达式包括决策树对该己方样本数据的预测结果项。Sub-step 1023A, if any one's own sample data has no label, calculate the confidence level corresponding to the one's own sample data according to a second preset expression; the second preset expression includes the prediction result of the decision tree on the one's own sample data item.

在一些应用场景中，若确定了某一个己方样本数据没有标签，可以根据第二预设表达式计算该己方样本数据所对应的置信度。这里的第二表达式例如可以包括上述的

以及

也就是说，若第j个己方样本数据没有标签，则需要依赖于决策树对该己方样本数据的预测结果计算出第一概率和第二概率。In some application scenarios, if it is determined that a certain own sample data has no label, the confidence level corresponding to the own sample data can be calculated according to the second preset expression. The second expression here may include, for example, the above

as well as

That is to say, if the j-th self-sample data has no label, the first probability and the second probability need to be calculated by relying on the prediction result of the self-sample data by the decision tree.

在本实现方式中，通过判断己方样本数据是否存在标签，可以通过不同的表达式确定出对应的置信度，并且可以预测出无标签样本数据的标签信息，减少了在纵向学习场景下的样本需求量。In this implementation, by judging whether the own sample data has a label, the corresponding confidence level can be determined through different expressions, and the label information of the unlabeled sample data can be predicted, which reduces the sample demand in the vertical learning scenario. quantity.

在一些可选的实现方式中，上述步骤103可以包括以下子步骤：In some optional implementations, theabove step 103 may include the following sub-steps:

子步骤1031A，确定同一己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；Sub-step 1031A, determine whether the probability difference corresponding to the first probability and the second probability of the same own sample data is within a preset difference range;

在一些应用场景中，服务器可以通过计算得到的第一概率与第二概率的概率差值，训练得到多个弱分类器。具体的，服务器可以确定概率差值是否在预设差异范围内，并以此判断是否执行抽样操作。这里的预设差异范围例如可以包括大于10^-3或者大于10^-5等实质上不趋近于0的范围。In some application scenarios, the server may obtain a plurality of weak classifiers by training by calculating the probability difference between the first probability and the second probability. Specifically, the server may determine whether the probability difference is within a preset difference range, and based on this, determine whether to perform a sampling operation. The preset difference range here may include, for example, a range that is greater than 10⁻³ or greater than 10⁻⁵ and does not substantially approach 0.

子步骤1032A，若所述概率差值在所述预设差异范围内，抽取该己方样本数据；Sub-step 1032A, if the probability difference is within the preset difference range, extract the own sample data;

服务器若确定了概率差值在预设差异范围内，可以执行抽样操作。也即，若服务器确定了概率差值不趋近于0，则可以抽取该己方样本数据作为训练弱分类器的新样本数据。If the server determines that the probability difference is within the preset difference range, it may perform a sampling operation. That is, if the server determines that the probability difference does not approach 0, it can extract the own sample data as new sample data for training the weak classifier.

子步骤1033A，根据抽取的己方样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。Sub-step 1033A, train a plurality of weak classifiers according to the extracted own sample data and the class label information, and obtain weak classifier weights and weak classification functions corresponding to the plurality of weak classifiers respectively.

服务器抽取了多个己方样本数据作为训练弱分类器的新样本数据之后，可以根据各个新样本数据以及对应的类标签信息训练多个弱分类器，以得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。After the server extracts a plurality of its own sample data as new sample data for training the weak classifier, it can train a plurality of weak classifiers according to each new sample data and the corresponding class label information, so as to obtain the weak classifiers corresponding to the plurality of weak classifiers respectively. Classifier weights and weak classification functions.

在本实现方式中，可以基于己方样本数据的置信度确定出用于训练弱分类器的新样本数据，使训练得到的弱分类器更加适用于当前的纵向学习场景。In this implementation manner, new sample data for training the weak classifier can be determined based on the confidence of one's own sample data, so that the weak classifier obtained by training is more suitable for the current longitudinal learning scenario.

在一些可选的实现方式中，所述联邦学习场景包括横向学习场景。此时，待计算的样本数据存在于不同的参与方中，因此，不同的参与方需要将自己的样本数据的特征值发送给一方，由该方进行计算。In some optional implementations, the federated learning scenario includes a lateral learning scenario. At this time, the sample data to be calculated exists in different participants, therefore, different participants need to send the characteristic values of their own sample data to one party for calculation.

这样，上述子步骤1011可以包括以下：根据接收到的第一对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第一对方加密特征值由对方通过全同态加密方式对对方样本数据所对应的特征值进行加密得到；或者根据接收到的第二对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第二对方加密特征值由对方通过半同态加密方式对对方样本数据所对应的特征值进行加密得到。In this way, the above sub-step 1011 may include the following: calculating the Euclidean distance by using the Euclidean distance formula according to the received first counterparty encrypted eigenvalues and the eigenvalues corresponding to one's own sample data; wherein, the first counterparty encrypted eigenvalues are given by The other party encrypts the eigenvalues corresponding to the other party's sample data by fully homomorphic encryption; or calculates the Euclidean distance by using the Euclidean distance formula according to the received second party encrypted eigenvalues and the eigenvalues corresponding to the own sample data; Wherein, the second counterparty encrypted characteristic value is obtained by the counterparty encrypting the characteristic value corresponding to the counterparty's sample data through a semi-homomorphic encryption method.

在横向学习场景下时，由于一方参与者(对方)需要将自己的样本数据的特征值发送给另一方参与者(己方)。因此，为了数据安全，需要将自己的特征值进行加密。In the horizontal learning scenario, one participant (the other party) needs to send the feature values of its own sample data to the other party (the other party). Therefore, for data security, you need to encrypt your own eigenvalues.

在一些应用场景中，对方可以通过全同态加密方式对对方样本数据所对应的特征值进行加密，加密之后的特征值可以视为第一对方加密特征值。In some application scenarios, the counterparty may encrypt the eigenvalues corresponding to the counterparty's sample data by fully homomorphic encryption, and the encrypted eigenvalues may be regarded as the first counterparty encrypted eigenvalues.

服务器在接收到对方发送的第一加密特征值之后，可以联合己方样本数据对应的特征值，利用欧式距离公式计算得到样本数据之间的欧式距离。具体的，由于在全同态加密方式中，能够在不解密的情况下对密文数据进行计算。因此，可以基于欧式距离公式对第一加密特征值进行计算即可得到相对应的样本数据之间的欧式距离。After receiving the first encrypted eigenvalue sent by the other party, the server can combine the eigenvalues corresponding to its own sample data to calculate the Euclidean distance between the sample data by using the Euclidean distance formula. Specifically, in the fully homomorphic encryption method, the ciphertext data can be calculated without decryption. Therefore, the Euclidean distance between the corresponding sample data can be obtained by calculating the first encrypted feature value based on the Euclidean distance formula.

在另一些应用场景中，对方也可以通过半同态加密方式对对方样本数据所对应的特征值进行加密，加密之后的特征值可以视为第二对方加密特征值。In other application scenarios, the counterparty may also encrypt the eigenvalues corresponding to the counterparty's sample data by semi-homomorphic encryption, and the encrypted eigenvalues may be regarded as the second counterparty encrypted eigenvalues.

服务器在接收到对方发送的第二加密特征值之后，可以联合己方样本数据对应的特征值，利用欧式距离公式计算得到样本数据之间的欧式距离。After receiving the second encrypted eigenvalue sent by the other party, the server can combine the eigenvalues corresponding to its own sample data to calculate the Euclidean distance between the sample data by using the Euclidean distance formula.

具体的，若用

表示对方样本数据的特征值，用

表示己方样本数据的特征值，则利用半同态加密方式对方样本数据的特征值进行加密之后得到的第二加密特征值可以用

表示。此时，可以通过欧式距离公式

计算出欧式距离d。此时，为了适应于半同态加密方式，可以对该欧氏距离公式进行平方，得到

继而可以得到

这样，便可以在半同态加密状态得到与之对应的计算表达式

Specifically, if using

represents the eigenvalues of the opposite sample data, using

represents the eigenvalue of one's own sample data, then the second encrypted eigenvalue obtained after encrypting the eigenvalue of the other party's sample data by semi-homomorphic encryption can be used

express. At this point, the Euclidean distance formula can be used

Calculate the Euclidean distance d. At this time, in order to adapt to the semi-homomorphic encryption method, the Euclidean distance formula can be squared to obtain

can then get

In this way, the corresponding calculation expression can be obtained in the semi-homomorphic encryption state

服务器得到半同态加密状态下的计算表达式之后，可以计算得到半同态加密状态下的欧式距离。此时，服务器可以将各个欧式距离发送至与之对应的参与方，由对方解密之后得到对应的欧式距离。After the server obtains the calculation expression in the semi-homomorphic encryption state, it can calculate the Euclidean distance in the semi-homomorphic encryption state. At this time, the server can send each Euclidean distance to the corresponding participant, and the corresponding Euclidean distance is obtained after decryption by the other party.

在本实现方式中，可以通过全同态加密方式或者半同态加密方式计算得到对应的欧式距离，在横向学习场景下保证了数据安全。In this implementation manner, the corresponding Euclidean distance can be calculated through a fully homomorphic encryption method or a semi-homomorphic encryption method, which ensures data security in a horizontal learning scenario.

在一些可选的实现方式中，若在横向学习场景下，上述步骤102可以包括以下子步骤：In some optional implementation manners, in a horizontal learning scenario, theabove step 102 may include the following sub-steps:

子步骤1021B，接收对方加密样本数据；Sub-step 1021B, receiving encrypted sample data from the other party;

在横向学习场景下，由于需要发送加密后的样本数据的特征值，所以服务器可以接收对方发送的对方加密样本数据，以保证对方的样本数据安全。这里，对方例如也可以使用同态加密方式或者半同态加密方式对对方的样本数据进行加密之后得到对方加密样本数据。In the horizontal learning scenario, since the eigenvalues of the encrypted sample data need to be sent, the server can receive the encrypted sample data sent by the other party to ensure the security of the other party's sample data. Here, the counterparty may also encrypt the counterparty's sample data by using a homomorphic encryption method or a semi-homomorphic encryption method, for example, to obtain the counterparty encrypted sample data.

子步骤1022B，利用脉冲函数确定每一个对方加密样本数据以及己方样本数据是否有标签；Sub-step 1022B, using the impulse function to determine whether each encrypted sample data of the other party and the sample data of the own party have a label;

服务器接收到对方加密样本数据之后，可以利用脉冲函数确定每一个对方加密样本数据以及己方样本数据是否有标签。这里，也可以对脉冲函数进行加密处理，以保证数据安全。例如可以表示对上述的δ(y_j，1)进行处理，处理后的表达式例如可以为δ(y_j，1)＝E(y+1)*0.5、δ(y_j，-1)＝E(y-1)*0.5After the server receives the encrypted sample data of the other party, it can use the impulse function to determine whether each encrypted sample data of the other party and the sample data of its own have a label. Here, the impulse function can also be encrypted to ensure data security. For example, it can indicate that the above-mentioned δ(y_j , 1) is processed, and the processed expression can be, for example, δ(y_j , 1)=E(y+1)*0.5, δ(y_j ,-1)= E(y-1)*0.5

子步骤1023B，若任一样本数据有标签，则根据第一预设表达式计算该样本数据对应的置信度；Sub-step 1023B, if any sample data has a label, calculate the confidence level corresponding to the sample data according to the first preset expression;

服务器若检测到任意一个对方加密样本数据或者己方样本数据有标签时，可以根据第一预设表达式计算该样本数据所对应的置信度。If the server detects that any of the encrypted sample data of the other party or the sample data of its own has a tag, it can calculate the confidence level corresponding to the sample data according to the first preset expression.

这里，上述子步骤1023B的实现过程以及取得的技术效果可以与上述子步骤1022A相似，此处不赘述。Here, the implementation process of the above-mentioned sub-step 1023B and the obtained technical effect may be similar to those of the above-mentioned sub-step 1022A, and details are not described here.

子步骤1024B，若任一样本数据无标签，则根据第二预设表达式计算该样本数据对应的置信度；所述第二预设表达式包括决策树对该样本数据的预测结果项。Sub-step 1024B, if any sample data has no label, calculate the confidence level corresponding to the sample data according to a second preset expression; the second preset expression includes the prediction result item of the decision tree for the sample data.

服务器若检测到任意一个对方加密样本数据或者己方样本数据没有标签时，可以根据第二预设表达式计算该样本数据所对应的置信度。If the server detects that any of the encrypted sample data of the other party or the sample data of its own has no label, the server may calculate the confidence level corresponding to the sample data according to the second preset expression.

这里，上述子步骤1024B的实现过程以及取得的技术效果可以与上述子步骤1023A相似，此处不赘述。Here, the implementation process of the above sub-step 1024B and the obtained technical effect may be similar to the above-mentioned sub-step 1023A, and details are not described here.

应当说明的是，由于服务器当前计算的是对方加密样本数据，其可以通过同态加密方式或者半同态加密方式得到，因此，在实现上述子步骤1023B或者子步骤1024B时，将表达式中表征的对方数据对应替换为对方加密样本数据即可。It should be noted that, since the server currently calculates the encrypted sample data of the other party, which can be obtained by homomorphic encryption or semi-homomorphic encryption, when implementing the above sub-step 1023B or sub-step 1024B, the expression in the expression The corresponding data of the counterparty can be replaced with the encrypted sample data of the counterparty.

在本实现方式中，通过判断任一对方加密样本数据或者己方样本数据是否存在标签，可以通过不同的表达式确定出对应的置信度，并且可以预测出无标签样本数据的标签信息，减少了在横向学习场景下的样本需求量。In this implementation, by judging whether the encrypted sample data of either party or its own sample data has a label, the corresponding confidence level can be determined through different expressions, and the label information of the unlabeled sample data can be predicted, reducing the need for Sample demand in horizontal learning scenarios.

在一些可选的实现方式中，若在横向学习场景下，上述步骤103可以包括以下子步骤：In some optional implementations, in a horizontal learning scenario, theabove step 103 may include the following sub-steps:

子步骤1031B，确定同一对方加密样本数据或者己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；Sub-step 1031B, determine whether the probability difference corresponding to the first probability and the second probability of the encrypted sample data of the same counterparty or the sample data of one's own party is within a preset difference range;

子步骤1032B，若所述概率差值在所述预设差异范围内，抽取该样本数据；Sub-step 1032B, if the probability difference is within the preset difference range, extract the sample data;

子步骤1033B，根据抽取的样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。Sub-step 1033B: Train a plurality of weak classifiers according to the extracted sample data and the class label information, and obtain weak classifier weights and weak classification functions corresponding to the plurality of weak classifiers respectively.

上述子步骤1031B至子步骤1033B的实现过程以及取得的技术效果可以与上述的子步骤1031A至子步骤1033A相似，此处不赘述。The implementation process and the obtained technical effects of the above sub-steps 1031B to 1033B may be similar to the above-mentioned sub-steps 1031A to 1033A, and will not be repeated here.

应当说明的是，由于服务器当前计算的是对方加密样本数据，其可以通过同态加密方式或者半同态加密方式得到的公钥计算，因此，在实现上述子步骤1031B至子步骤1033B时，将对应表达式中表征的对方数据对应替换为对方加密样本数据即可。It should be noted that, since the server currently calculates the encrypted sample data of the other party, it can be calculated by the public key obtained by homomorphic encryption or semi-homomorphic encryption. Therefore, when implementing the above sub-steps 1031B to 1033B, the The counterpart data represented in the corresponding expression can be replaced with the counterpart encrypted sample data correspondingly.

在上述各实施例的基础上，本申请实施例提供一种服务评估方法。On the basis of the foregoing embodiments, the embodiments of the present application provide a service evaluation method.

在纵向场景预测中，该服务评估方法可以用于发起方和合作方，该方法具体为：In longitudinal scenario prediction, the service evaluation method can be used for initiators and partners, and the method is specifically:

每一个参与方接收目标用户的用户信息；Each participant receives the user information of the target user;

基于所述用户信息，每一个参与方利用上述各实施例提供的方法训练得到的强分类器共同预测是否为所述目标用户提供服务。Based on the user information, each participant uses the strong classifier trained by the methods provided in the above embodiments to jointly predict whether to provide services for the target user.

在横向场景预测中，该服务评估方法可以用于发起方，该方法具体为：In the horizontal scenario prediction, the service evaluation method can be used for the initiator, and the method is specifically:

接收目标用户的用户信息；Receive user information from target users;

基于所述用户信息，利用上述各实施例提供的方法训练得到的强分类器预测是否为所述目标用户提供服务。Based on the user information, a strong classifier trained by using the methods provided in the above embodiments predicts whether to provide a service for the target user.

可以理解的是，目标用户的用户信息所包含的字段与上述各实施例在对强分类器进行训练时所使用的用户信息的字段相同，此处不再赘述。It can be understood that the fields included in the user information of the target user are the same as the fields of the user information used in the training of the strong classifier in the foregoing embodiments, and details are not repeated here.

将目标用户的用户信息输入强分类器中，强分类器输出是否为该目标用户提供服务的预测结果。可以理解的是，该服务可以是是否为其提供贷款服务等。该强分类器可采用上述各实施例提供的训练方法训练获得，此处不再赘述。The user information of the target user is input into the strong classifier, and the strong classifier outputs the prediction result of whether to provide services for the target user. It can be understood that the service may be whether to provide loan services for them or not. The strong classifier can be obtained by training using the training methods provided in the above embodiments, and details are not described herein again.

请参考图2，其示出了本申请实施例提供的一种模型训练装置的结构框图，该基于提升方法的半监督学习装置可以是电子设备上的模块、程序段或代码。应理解，该装置与上述图1方法实施例对应，能够执行图1方法实施例涉及的各个步骤，该装置具体的功能可以参见上文中的描述，为避免重复，此处适当省略详细描述。Please refer to FIG. 2 , which shows a structural block diagram of a model training apparatus provided by an embodiment of the present application. The semi-supervised learning apparatus based on the boosting method may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus corresponds to the method embodiment of FIG. 1 and can perform various steps involved in the method embodiment of FIG. 1 . For specific functions of the apparatus, refer to the above description. To avoid repetition, the detailed description is appropriately omitted here.

可选地，上述基于提升方法的半监督学习装置包括第一计算模块201、第二计算模块202、确定模块203以及更新模块204。其中，第一计算模块201，用于根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵；所述相似矩阵表征所述至少两方的样本数据之间的相似关系；所述样本数据包括有标签数据和无标签数据；且，所述样本数据包括多个用户分别对应的预设的用户信息；所述有标签数据对应的类标签信息用于表征是否为对应的用户提供服务；第二计算模块202，用于根据所述相似矩阵，计算每一个样本数据对应的置信度；所述置信度包括样本数据为有标签数据的第一概率以及样本数据为无标签数据的第二概率；确定模块203，用于根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；更新模块204，用于根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数；其中，所述弱分类器和所述强分类器均用于预测是否为用户提供服务，且所述强分类器的预测准确性高于所述弱分类器的预测准确性。Optionally, the above-mentioned semi-supervised learning apparatus based on the lifting method includes afirst calculation module 201 , asecond calculation module 202 , adetermination module 203 and anupdate module 204 . Wherein, thefirst calculation module 201 is configured to calculate the similarity matrix shared by the at least two parties according to the sample data of the at least two parties; the similarity matrix represents the similarity relationship between the sample data of the at least two parties; The sample data includes labeled data and unlabeled data; and the sample data includes preset user information corresponding to a plurality of users respectively; the class label information corresponding to the labeled data is used to represent whether the corresponding user is provided Thesecond calculation module 202 is configured to calculate the confidence level corresponding to each sample data according to the similarity matrix; the confidence level includes the first probability that the sample data is labeled data and the first probability that the sample data is unlabeled data. Two probabilities; thedetermination module 203 is used to determine the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers according to the confidence level of each sample data and the class label information; theupdate module 204 is used to determine the weak classifier weights and weak classification functions according to each weak classification The weight of the classifier and the weak classification function are updated to obtain the strong classification function corresponding to the strong classifier; wherein, the weak classifier and the strong classifier are both used to predict whether to provide services for users, and the prediction of the strong classifier The accuracy is higher than the prediction accuracy of the weak classifier.

可选地，第一计算模块201进一步用于：根据至少两方的样本数据，确定所述至少两方的样本数据之间的欧式距离；根据所述欧式距离，计算所述至少两方所共有的相似矩阵。Optionally, thefirst calculation module 201 is further configured to: determine the Euclidean distance between the sample data of the at least two parties according to the sample data of the at least two parties; calculate the common distance between the at least two parties according to the Euclidean distance the similarity matrix.

可选地，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个，以及第一计算模块201进一步用于：根据己方每一个样本数据所对应的己方特征值，确定任意两个己方特征值之间的差值；根据任意两个己方特征值之间的差值，确定己方特征差值所对应的己方平方和；接收对方平方和；所述对方平方和表征对方特征差值所对应的平方和；所述对方特征差值表征任意两个对方样本数据对应的对方特征值之间的差值；计算所述己方平方和以及所述对方平方和所对应的累加和的二次方根，得到所述欧式距离。Optionally, the federated learning scenario includes a longitudinal learning scenario, and any one of the longitudinal learning scenarios includes at least two sample data, and thefirst computing module 201 is further configured to: own eigenvalue, determine the difference between any two own eigenvalues; according to the difference between any two own eigenvalues, determine the own square sum corresponding to the own eigenvalue difference; receive the other party's sum of squares; the other party The sum of squares represents the sum of squares corresponding to the difference of the opposite characteristics; the difference of the opposite characteristics represents the difference between the corresponding characteristic values of any two opposite sample data; The quadratic root of the corresponding accumulated sum, the Euclidean distance is obtained.

可选地，所述联邦学习场景包括纵向学习场景，所述纵向学习场景中的任意一方的样本数据至少包括2个，以及第二计算模块202进一步用于：利用脉冲函数确定每一个己方样本数据是否有标签；若任一己方样本数据有标签，则根据第一预设表达式计算该己方样本数据对应的置信度；若任一己方样本数据无标签，则根据第二预设表达式计算该己方样本数据对应的置信度；所述第二预设表达式包括决策树对该己方样本数据的预测结果项。Optionally, the federated learning scenario includes a longitudinal learning scenario, and any one of the longitudinal learning scenarios includes at least two pieces of sample data, and thesecond calculation module 202 is further configured to: determine each of its own sample data by using an impulse function. Whether there is a label; if any of our own sample data has a label, the confidence level corresponding to the own sample data is calculated according to the first preset expression; The confidence level corresponding to the own sample data; the second preset expression includes the prediction result item of the decision tree for the own sample data.

可选地，确定模块203进一步用于：确定同一己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；若所述概率差值在所述预设差异范围内，抽取该己方样本数据；根据抽取的己方样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。Optionally, the determiningmodule 203 is further configured to: determine whether the probability difference corresponding to the first probability and the second probability of the same own sample data is within a preset difference range; Within the preset difference range, extract the own sample data; train multiple weak classifiers according to the extracted own sample data and the class label information, and obtain the weak classifier weights and weak classification functions corresponding to the multiple weak classifiers respectively. .

可选地，所述联邦学习场景包括横向学习场景，以及第一计算模块201进一步用于：根据接收到的第一对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第一对方加密特征值由对方通过全同态加密方式对对方样本数据所对应的特征值进行加密得到；或者根据接收到的第二对方加密特征值以及己方样本数据对应的特征值，利用欧式距离公式计算所述欧式距离；其中，所述第二对方加密特征值由对方通过半同态加密方式对对方样本数据所对应的特征值进行加密得到。Optionally, the federated learning scenario includes a lateral learning scenario, and thefirst calculation module 201 is further configured to: calculate the described eigenvalue by using the Euclidean distance formula according to the received first encrypted eigenvalue of the opposite party and the eigenvalue corresponding to the own sample data. Euclidean distance; wherein, the first counterparty encrypted eigenvalue is obtained by the counterparty encrypting the eigenvalue corresponding to the counterparty’s sample data through fully homomorphic encryption; The eigenvalue of , and the Euclidean distance formula is used to calculate the Euclidean distance; wherein, the second encrypted eigenvalue of the opposite party is obtained by the opposite party encrypting the eigenvalue corresponding to the opposite party's sample data through a semi-homomorphic encryption method.

可选地，第二计算模块202进一步用于：接收对方加密样本数据；利用脉冲函数确定每一个对方加密样本数据以及己方样本数据是否有标签；若任一样本数据有标签，则根据第一预设表达式计算该样本数据对应的置信度；若任一样本数据无标签，则根据第二预设表达式计算该样本数据对应的置信度；所述第二预设表达式包括决策树对该样本数据的预测结果项。Optionally, thesecond calculation module 202 is further configured to: receive the encrypted sample data of the other party; use an impulse function to determine whether each encrypted sample data of the other party and the sample data of one's own party has a label; if any sample data has a label, then according to the first prediction Set an expression to calculate the confidence level corresponding to the sample data; if any sample data has no label, calculate the confidence level corresponding to the sample data according to a second preset expression; the second preset expression includes the decision tree for the The predicted outcome item for the sample data.

可选地，确定模块203进一步用于：确定同一对方加密样本数据或者己方样本数据的所述第一概率和所述第二概率所对应的概率差值是否在预设差异范围内；若所述概率差值在所述预设差异范围内，抽取该样本数据；根据抽取的样本数据以及所述类标签信息训练多个弱分类器，得到多个弱分类器所分别对应的弱分类器权重以及弱分类函数。Optionally, the determiningmodule 203 is further configured to: determine whether the probability difference corresponding to the first probability and the second probability of the encrypted sample data of the same party or the sample data of one's own party is within a preset difference range; if the If the probability difference is within the preset difference range, extract the sample data; train multiple weak classifiers according to the extracted sample data and the class label information, and obtain the weak classifier weights corresponding to the multiple weak classifiers and Weak classification function.

需要说明的是，本领域技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统或者装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再重复描述。It should be noted that those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system or device described above, reference may be made to the corresponding process in the foregoing method embodiments, and the description will not be repeated here. .

在另一实施例中，本申请实施例提供了一种服务评估装置包括：接收模块，用于接收目标用户的用户信息；预测模块，用于基于所述用户信息，利用上述的方法训练得到的强分类器预测是否为所述目标用户提供服务。In another embodiment, an embodiment of the present application provides a service evaluation apparatus including: a receiving module, configured to receive user information of a target user; A strong classifier predicts whether to serve the target user.

请参照图3，图3为本申请实施例提供的一种用于执行基于提升方法的半监督学习方法的电子设备的结构示意图，所述电子设备可以包括：至少一个处理器301，例如CPU，至少一个通信接口302，至少一个存储器303和至少一个通信总线304。其中，通信总线304用于实现这些组件直接的连接通信。其中，本申请实施例中设备的通信接口302用于与其他节点设备进行信令或数据的通信。存储器303可以是高速RAM存储器，也可以是非易失性的存储器(non-volatile memory)，例如至少一个磁盘存储器。存储器303可选的还可以是至少一个位于远离前述处理器的存储装置。存储器303中存储有计算机可读取指令，当所述计算机可读取指令由所述处理器301执行时，电子设备可以执行上述图1所示方法过程。Please refer to FIG. 3. FIG. 3 is a schematic structural diagram of an electronic device for executing a semi-supervised learning method based on a boosting method provided by an embodiment of the present application. The electronic device may include: at least oneprocessor 301, such as a CPU, At least onecommunication interface 302 , at least onememory 303 and at least onecommunication bus 304 . Among them, thecommunication bus 304 is used to realize the direct connection and communication of these components. Thecommunication interface 302 of the device in the embodiment of the present application is used for signaling or data communication with other node devices. Thememory 303 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory. Thememory 303 can optionally also be at least one storage device located remote from the aforementioned processor. Computer-readable instructions are stored in thememory 303. When the computer-readable instructions are executed by theprocessor 301, the electronic device can execute the method process shown in FIG. 1 above.

可以理解，图3所示的结构仅为示意，所述电子设备还可包括比图3中所示更多或者更少的组件，或者具有与图3所示不同的配置。图3中所示的各组件可以采用硬件、软件或其组合实现。It can be understood that the structure shown in FIG. 3 is only for illustration, and the electronic device may further include more or less components than those shown in FIG. 3 , or have different configurations than those shown in FIG. 3 . Each component shown in FIG. 3 can be implemented in hardware, software, or a combination thereof.

本申请实施例提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时，可以执行如图1所示方法实施例中电子设备所执行的方法过程。Embodiments of the present application provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method process performed by the electronic device in the method embodiment shown in FIG. 1 can be executed.

本申请实施例提供一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法实施例所提供的方法，例如，该方法可以应用于联邦学习场景中，其包括：根据至少两方的样本数据，计算所述至少两方所共有的相似矩阵；所述相似矩阵表征所述至少两方的样本数据之间的相似关系；所述样本数据包括有标签数据和无标签数据；根据所述相似矩阵，计算每一个样本数据对应的置信度；所述置信度包括样本数据为有标签数据的第一概率以及样本数据为无标签数据的第二概率；根据每一个样本数据的置信度以及类标签信息确定多个弱分类器所分别对应的弱分类器权重以及弱分类函数；根据各个弱分类器权重以及弱分类函数，更新得到强分类器所对应的强分类函数。An embodiment of the present application provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, The computer can execute the methods provided by the above method embodiments. For example, the method can be applied to a federated learning scenario, which includes: calculating a similarity matrix common to the at least two parties according to sample data of at least two parties; the The similarity matrix represents the similarity relationship between the sample data of the at least two parties; the sample data includes labeled data and unlabeled data; according to the similarity matrix, the corresponding confidence level of each sample data is calculated; the confidence level Including the first probability that the sample data is labeled data and the second probability that the sample data is unlabeled data; according to the confidence of each sample data and the class label information, determine the weak classifier weights corresponding to the multiple weak classifiers and Weak classification function: According to the weight of each weak classifier and the weak classification function, the strong classification function corresponding to the strong classifier is obtained by updating.

在本申请所提供的实施例中，应该理解到，所揭露装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

另外，作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。In addition, units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

再者，在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分，也可以是各个模块单独存在，也可以两个或两个以上模块集成形成一个独立的部分。Furthermore, each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。In this document, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such existence between these entities or operations. The actual relationship or sequence.

以上所述仅为本申请的实施例而已，并不用于限制本申请的保护范围，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are merely examples of the present application, and are not intended to limit the protection scope of the present application. For those skilled in the art, various modifications and changes may be made to the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the protection scope of this application.