CN111325550A

Movatterモバイル変換

Info

Publication number: CN111325550A
Application number: CN201811528230.XA
Authority: CN
Inventors: 陈雪; 彭文新
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-06-23

Abstract

The embodiment of the invention provides a method and a device for identifying fraudulent transaction behaviors, wherein the method comprises the following steps: aiming at any business classification, respectively inputting the characteristic behavior data of the user to be detected into a plurality of behavior recognition models corresponding to the business classification, and acquiring a candidate recognition result output by each behavior recognition model; acquiring a recognition result under the service classification based on the candidate recognition result output by each behavior recognition model; and acquiring a comprehensive identification result based on the identification result under each service type. The method and the device provided by the embodiment of the invention slow down the problems of overfitting and generalization of a single model, prolong the life cycle of the model and improve the accuracy of behavior recognition. In addition, the comprehensive identification result is obtained based on the identification results under a plurality of service types, so that the multi-industry fraud behaviors are effectively covered, the fields and behaviors which are easy to generate fraud are comprehensively excavated, and the optimization of service vulnerabilities can be guided.

Description

Translated fromChinese

一种欺诈交易行为识别方法和装置Method and device for identifying fraudulent transaction behavior

技术领域technical field

本发明实施例涉及互联网欺诈行为识别技术领域，尤其涉及一种欺诈交易行为识别方法和装置。Embodiments of the present invention relate to the technical field of Internet fraudulent behavior identification, and in particular, to a method and device for identifying fraudulent transaction behaviors.

背景技术Background technique

随着技术的发展和资本的介入，互联网和移动互联网迅猛发展。在业务飞速发展的同时，业务漏洞也引来羊毛党进行欺诈交易行为。为了打击羊毛党，维护市场秩序，各种反欺诈技术也蓬勃发展。With the development of technology and the involvement of capital, the Internet and mobile Internet have developed rapidly. While the business is developing rapidly, business loopholes also attract the wool party to conduct fraudulent transactions. In order to combat the wool party and maintain market order, various anti-fraud technologies are also flourishing.

就业务维度而言，反欺诈技术可以细分为信用卡申请反欺诈、信贷申请反欺诈、支付反欺诈、交易反欺诈、刷单反欺诈、商户反欺诈、内部人员反欺诈等细分产品。就机构维度而言，反欺诈技术分布于金融类机构，如大大小小的银行、第三方支付机构、各类消费金融公司、小额信贷机构、保险、证券等，以及非金融类机构如大型电商平台、快递公司、游戏平台、直播平台、小视频平台、打车平台、外卖平台等。In terms of business dimensions, anti-fraud technology can be subdivided into credit card application anti-fraud, credit application anti-fraud, payment anti-fraud, transaction anti-fraud, single-handling anti-fraud, merchant anti-fraud, insider anti-fraud and other products. In terms of institutional dimension, anti-fraud technology is distributed in financial institutions, such as large and small banks, third-party payment institutions, various consumer finance companies, microfinance institutions, insurance, securities, etc., as well as non-financial institutions such as large E-commerce platform, express company, game platform, live broadcast platform, small video platform, taxi platform, takeaway platform, etc.

目前的欺诈交易行为识别技术，一般采用已确定欺诈的名单为指导，可收集的特征行为数据为输入，利用数据挖掘算法(如决策树、逻辑回归、朴素贝叶斯算法等)构建预测欺诈模型，根据模型规则对全量用户进行打分得到欺诈名单。但是，由于业务层面单一，不能覆盖其他业务的欺诈行为，且挖掘算法均存在拟合程度和泛化能力问题，尚不足以应对羊毛党变化多端的欺诈行为，在某些情况下会对正常用户的“误伤”比较严重。The current fraudulent transaction behavior identification technology generally uses the list of identified frauds as the guide, the characteristic behavior data that can be collected as the input, and uses data mining algorithms (such as decision trees, logistic regression, naive Bayesian algorithms, etc.) to build predictive fraud models. , and score all users according to the model rules to get the fraud list. However, due to the single business level, it cannot cover the fraudulent behaviors of other businesses, and the mining algorithms all have problems with the degree of fit and generalization ability, which are not enough to deal with the ever-changing fraudulent behaviors of the Wool Party. In some cases, normal users will be affected. The "accidental injury" is more serious.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种欺诈交易行为识别方法和装置，用以解决现有的反欺诈技术业务层面单一、泛化能力差导致的全面性和准确性不足的问题。Embodiments of the present invention provide a method and device for identifying fraudulent transaction behaviors, which are used to solve the problems of insufficient comprehensiveness and accuracy caused by the single business level and poor generalization ability of the existing anti-fraud technology.

第一方面，本发明实施例提供一种欺诈交易行为识别方法，包括：In a first aspect, an embodiment of the present invention provides a method for identifying fraudulent transaction behavior, including:

针对任一业务分类，将待测用户特征行为数据分别输入至所述任一业务分类对应的多个行为识别模型，获取每一所述行为识别模型输出的候选识别结果；其中，所述行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的；For any business classification, input the characteristic behavior data of the user to be tested into a plurality of behavior recognition models corresponding to the any business classification, and obtain the candidate recognition results output by each of the behavior recognition models; wherein, the behavior recognition The model is trained based on sample user characteristic behavior data and sample recognition results;

基于所述每一行为识别模型输出的候选识别结果，获取所述任一业务分类下的识别结果；Based on the candidate recognition results output by each behavior recognition model, obtain the recognition results under any of the business classifications;

基于每一所述业务类型下的识别结果，获取综合识别结果。Based on the identification results under each of the service types, a comprehensive identification result is obtained.

第二方面，本发明实施例提供一种欺诈交易行为识别装置，包括：In a second aspect, an embodiment of the present invention provides a device for identifying fraudulent transaction behavior, including:

模型识别单元，用于针对任一业务分类，将待测用户特征行为数据分别输入至所述任一业务分类对应的多个行为识别模型，获取每一所述行为识别模型输出的候选识别结果；其中，所述行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的；A model identification unit, configured to input the characteristic behavior data of the user to be tested into a plurality of behavior identification models corresponding to any business classification, respectively, for any business classification, and obtain a candidate identification result output by each of the behavior identification models; Wherein, the behavior recognition model is obtained by training based on sample user characteristic behavior data and sample recognition results;

业务识别单元，用于基于所述每一行为识别模型输出的候选识别结果，获取所述任一业务分类下的识别结果；A business identification unit, configured to obtain identification results under any of the business classifications based on the candidate identification results output by each behavior identification model;

综合识别单元，用于基于每一所述业务类型下的识别结果，获取综合识别结果。The comprehensive identification unit is configured to obtain comprehensive identification results based on the identification results under each of the service types.

第三方面，本发明实施例提供一种电子设备，包括处理器、通信接口、存储器和总线，其中，处理器，通信接口，存储器通过总线完成相互间的通信，处理器可以调用存储器中的逻辑指令，以执行如第一方面所提供的方法的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, wherein the processor, the communication interface, and the memory communicate with each other through the bus, and the processor can call logic in the memory instructions to perform the steps of the method provided by the first aspect.

第四方面，本发明实施例提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如第一方面所提供的方法的步骤。In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the method provided in the first aspect.

本发明实施例提供的一种欺诈交易行为识别方法和装置，基于每一行为识别模型输出的候选识别结果，获取单一业务分类下的识别结果，从而减缓了单一模型过拟合和泛化的问题，延长了模型生命周期，提高了行为识别的准确性。此外，基于多个业务类型下的识别结果获取综合识别结果，有效覆盖多行业的欺诈行为，全面挖掘容易出现欺诈的领域和行为，能够指导业务漏洞的优化。A method and device for identifying fraudulent transaction behaviors provided by the embodiments of the present invention, based on the candidate identification results output by each behavior identification model, obtain identification results under a single business classification, thereby reducing the problem of overfitting and generalization of a single model , extending the model life cycle and improving the accuracy of behavior recognition. In addition, comprehensive identification results are obtained based on identification results under multiple business types, effectively covering fraudulent behaviors in multiple industries, and comprehensively mining areas and behaviors prone to fraud, which can guide the optimization of business vulnerabilities.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的欺诈交易行为识别方法的流程示意图；1 is a schematic flowchart of a method for identifying fraudulent transaction behavior provided by an embodiment of the present invention;

图2为本发明另一实施例提供的欺诈交易行为识别方法的流程示意图；2 is a schematic flowchart of a method for identifying fraudulent transaction behavior provided by another embodiment of the present invention;

图3为本发明实施例提供的欺诈交易行为识别装置的结构示意图；3 is a schematic structural diagram of an apparatus for identifying fraudulent transaction behavior provided by an embodiment of the present invention;

图4为本发明实施例提供的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

针对现有的反欺诈技术业务层面单一、泛化能力差导致的全面性和准确性不足的问题，本发明实施例提供了一种欺诈交易行为识别方法。图1为本发明实施例提供的欺诈交易行为识别方法的流程示意图，如图1所示，该方法包括：Aiming at the problems of insufficient comprehensiveness and accuracy caused by the single business level and poor generalization capability of the existing anti-fraud technology, the embodiment of the present invention provides a method for identifying fraudulent transaction behaviors. FIG. 1 is a schematic flowchart of a method for identifying fraudulent transaction behavior provided by an embodiment of the present invention. As shown in FIG. 1 , the method includes:

110，针对任一业务分类，将待测用户特征行为数据分别输入至该业务分类对应的多个行为识别模型，获取每一行为识别模型输出的候选识别结果；其中，行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的。110. For any business classification, input the characteristic behavior data of the user to be tested into a plurality of behavior recognition models corresponding to the business classification, and obtain a candidate recognition result output by each behavior recognition model; wherein, the behavior recognition model is based on sample users. Characteristic behavior data and sample recognition results are obtained by training.

具体地，业务分类是基于不同的业务维度对欺诈行为进行分类得到的类别，业务分类可以有多种。例如电商欺诈、互联网金融欺诈、运营商欺诈、信用贷款欺诈、快递欺诈和游戏欺诈等，本发明实施例不对业务分类的具体类型和数量作具体限定。待测用户特征行为数据是待测用户的特征行为数据，用于进行欺诈交易识别的识别依据，特征行为数据可以是通话情况，可以是流量使用行为或者是上网行为等。候选识别结果是行为识别模型基于待测用户特征行为数据得到的识别结果，候选识别结果用于标识待测用户是否存在欺诈交易行为，候选识别结果可以是正常行为或欺诈行为，还可以是待测用户存在欺诈交易行为的概率，本发明实施例对此不作具体限定。Specifically, the business classification is a category obtained by classifying fraudulent behaviors based on different business dimensions, and there can be multiple business classifications. For example, e-commerce fraud, Internet financial fraud, operator fraud, credit loan fraud, express delivery fraud, game fraud, etc., the embodiments of the present invention do not specifically limit the specific types and quantities of business classifications. The characteristic behavior data of the user to be tested is the characteristic behavior data of the user to be tested, which is used as an identification basis for identifying fraudulent transactions. The candidate recognition result is the recognition result obtained by the behavior recognition model based on the characteristic behavior data of the user to be tested. The candidate recognition result is used to identify whether the user to be tested has fraudulent transaction behavior. The candidate recognition result can be normal behavior or fraudulent behavior, or it can be tested. The probability that the user has a fraudulent transaction behavior is not specifically limited in this embodiment of the present invention.

此处，存在多个业务分类，每一业务分类分别对应多个行为识别模型，每一行为识别模型基于输入的待测用户特征行为数据输出一个候选识别结果。针对任一业务分类，该业务分类下的多个行为识别模型可以是基于不同的样本用户特征行为数据训练得到的，也可以是基于不同的神经网络模型或机器学习算法训练得到的，还可以是基于不同的样本用户特征行为数据以及不同的神经网络模型或机器学习算法训练得到的，本发明实施例对此不作具体限定。Here, there are multiple service categories, each service category corresponds to multiple behavior recognition models, and each behavior recognition model outputs a candidate recognition result based on the input characteristic behavior data of the user to be tested. For any business classification, the multiple behavior recognition models under the business classification can be trained based on different sample user characteristic behavior data, or can be trained based on different neural network models or machine learning algorithms, or can be It is obtained by training based on different sample user characteristic behavior data and different neural network models or machine learning algorithms, which is not specifically limited in this embodiment of the present invention.

另外，在执行步骤110之前，还可预先训练得到行为识别模型，具体可通过如下方式训练得到：首先，收集大量样本用户特征行为数据与样本识别结果；其中，样本用户特征行为数据与样本识别结果一一对应，样本识别结果是预先设定的，用于表征基于对应的样本用户特征行为数据分析得到的样本用户是否存在欺诈交易行为。基于样本用户特征行为数据与样本识别结果对初始模型进行训练，从而得到行为识别模型。其中，初始模型可以是单一神经网络模型，也可以是多个神经网络模型的组合，本发明实施例不对初始模型的类型和结构作具体限定。In addition, before performingstep 110, a behavior recognition model can also be pre-trained, which can be obtained by training in the following way: first, collect a large number of sample user characteristic behavior data and sample identification results; wherein, the sample user characteristic behavior data and sample identification results One-to-one correspondence, the sample identification result is preset and used to represent whether the sample user obtained by analyzing the corresponding sample user characteristic behavior data has fraudulent transaction behavior. The initial model is trained based on the sample user characteristic behavior data and the sample recognition results, so as to obtain the behavior recognition model. The initial model may be a single neural network model or a combination of multiple neural network models, and the embodiment of the present invention does not specifically limit the type and structure of the initial model.

120，针对任一业务分类，基于每一行为识别模型输出的候选识别结果，获取该业务分类下的识别结果。120. For any business classification, obtain the identification results under the business classification based on the candidate identification results output by each behavior identification model.

具体地，针对任一业务分类，基于该业务分类对应的多个行为识别模型输出的多个候选识别结果，获取该业务分类下的识别结果。此处，该业务分类下的识别结果是对多个候选识别结果进行汇总分析得到的，该业务分类下的识别结果可以通过对多个候选识别结果进行加权，或者采用投票制多数表决的方法，或者通过对多个候选识别结果进行平均后与预设阈值进行比较后得到，本发明实施例对此不作具体限定。Specifically, for any business classification, based on multiple candidate identification results output by multiple behavior identification models corresponding to the business classification, the identification results under the business classification are obtained. Here, the identification result under this business classification is obtained by summarizing and analyzing multiple candidate identification results, and the identification result under this business classification can be weighted by multiple candidate identification results, or the method of majority voting can be adopted. Or obtained by averaging multiple candidate identification results and comparing with a preset threshold, which is not specifically limited in this embodiment of the present invention.

130，基于每一业务类型下的识别结果，获取综合识别结果。130. Obtain a comprehensive identification result based on the identification result under each service type.

具体地，针对每一业务分类均执行步骤110和120，以获取每一业务类型下的识别结果。在得到每一业务类型下的识别结果后，对每一业务类型下的识别结果进行汇总分析，得到集合了各个业务类型的全面的识别结果，即综合识别结果。此处，综合识别结果可以通过对各个业务类型下的识别结果进行加权，或者采用投票制多数表决的方法，或者通过对各个业务类型下的识别结果进行平均后与预设阈值进行比较后得到，本发明实施例对此不作具体限定。Specifically, steps 110 and 120 are performed for each service classification to obtain the identification result under each service type. After the identification results under each service type are obtained, the identification results under each service type are summarized and analyzed to obtain a comprehensive identification result that integrates each service type, that is, a comprehensive identification result. Here, the comprehensive identification result can be obtained by weighting the identification results under each service type, or by adopting a majority voting method, or by averaging the identification results under each service type and comparing with a preset threshold, This embodiment of the present invention does not specifically limit this.

本发明实施例提供的方法，基于每一行为识别模型输出的候选识别结果，获取单一业务分类下的识别结果，从而减缓了单一模型过拟合和泛化的问题，延长了模型生命周期，提高了行为识别的准确性。此外，基于多个业务类型下的识别结果获取综合识别结果，有效覆盖多行业的欺诈行为，全面挖掘容易出现欺诈的领域和行为，能够指导业务漏洞的优化。The method provided by the embodiment of the present invention obtains the identification results under a single business classification based on the candidate identification results output by each behavior identification model, thereby reducing the problem of overfitting and generalization of a single model, prolonging the model life cycle, and improving the the accuracy of behavior recognition. In addition, comprehensive identification results are obtained based on identification results under multiple business types, effectively covering fraudulent behaviors in multiple industries, and comprehensively mining areas and behaviors prone to fraud, which can guide the optimization of business vulnerabilities.

基于上述任一实施例，业务分类包括电商欺诈、互联网金融欺诈、运营商欺诈和信用贷款欺诈中的至少一种；待测用户特征行为数据包括待测用户的基础信息、通话情况、流量使用行为、上网行为中的至少一种。此处，基础信息可以是待测用户的手机号码等账户信息，用于作为待测用户特征行为数据的标识。Based on any of the above embodiments, the business classification includes at least one of e-commerce fraud, Internet financial fraud, operator fraud, and credit loan fraud; the characteristic behavior data of the user to be tested includes basic information of the user to be tested, call status, traffic usage At least one of behavior and online behavior. Here, the basic information may be account information such as the mobile phone number of the user to be tested, which is used as an identifier of the characteristic behavior data of the user to be tested.

对应地，在进行行为识别模型训练时，样本用户特征行为数据同样包括样本用户的基础信息、通话情况、流量使用行为、上网行为中的至少一种。此处，基础信息可以是样本用户的手机号码等账户信息，用于作为样本用户特征行为数据的标识。Correspondingly, during the training of the behavior recognition model, the characteristic behavior data of the sample user also includes at least one of the basic information of the sample user, the call situation, the traffic usage behavior, and the surfing behavior. Here, the basic information may be account information such as the mobile phone number of the sample user, which is used as an identifier of the characteristic behavior data of the sample user.

基于上述任一实施例，步骤110之前还包括：针对任一业务分类，基于样本用户特征行为数据和样本识别结果，分别通过多种算法训练初始模型，得到多个行为识别模型；其中，算法包括逻辑回归算法、支持向量机算法和分类回归树算法中的至少一种。Based on any of the above embodiments, beforestep 110, the method further includes: for any business classification, based on the sample user characteristic behavior data and the sample identification result, respectively training the initial model through a variety of algorithms to obtain a plurality of behavior identification models; wherein the algorithm includes: At least one of a logistic regression algorithm, a support vector machine algorithm, and a classification regression tree algorithm.

具体地，针对任一业务分类，在将待测用户特征行为数据分别输入至该业务分类对应的多个行为识别模型之前，需要基于样本用户特征行为数据和样本识别结果训练得到的多个行为识别模型。模型训练过程中，通过不同的算法对初始模型进行训练，继而得到不同的行为识别模型，算法与行为识别模型一一对应，例如通过逻辑回归算法得到用于行为识别的逻辑回归模型，通过支持向量机算法得到用于行为识别的支持向量机模型，通过分类回归树算法得到用于行为识别的分类回归树模型。Specifically, for any service classification, before inputting the characteristic behavior data of the user to be tested into the multiple behavior recognition models corresponding to the service classification, it is necessary to train multiple behavior recognition models based on the sample user characteristic behavior data and the sample recognition results. Model. In the model training process, the initial model is trained by different algorithms, and then different behavior recognition models are obtained. The algorithms correspond to the behavior recognition models one-to-one. For example, the logistic regression algorithm is used to obtain the logistic regression model for behavior recognition, and the support vector The support vector machine model for behavior recognition is obtained by machine algorithm, and the classification and regression tree model for behavior recognition is obtained by classification and regression tree algorithm.

此处，逻辑回归(Logistic Regression，LR)算法用于处理因变量为分类变量的回归问题，常用于数据挖掘、疾病自动诊断、经济预测等领域。支持向量机(Support VectorMachine，SVM)是与相关的学习算法有关的监督学习模型，可以分析数据，识别模式，用于分类和回归分析。分类回归树(Classification and Regression Tree，CART)算法是在给定输入随机变量X条件下输出随机变量Y的条件概率分布的学习方法，可用于分类或者回归。Here, the logistic regression (Logistic Regression, LR) algorithm is used to deal with regression problems in which the dependent variable is a categorical variable, and is often used in fields such as data mining, automatic disease diagnosis, and economic forecasting. Support Vector Machine (SVM) is a supervised learning model related to related learning algorithms that can analyze data, identify patterns, and use for classification and regression analysis. Classification and Regression Tree (CART) algorithm is a learning method that outputs the conditional probability distribution of random variable Y given input random variable X, which can be used for classification or regression.

本发明实施例提供的方法，针对每一业务分类训练得到了多种识别分类模型，充分利用了数据维度，从而减缓了单一模型过拟合和泛化的问题，延长了模型生命周期，提高了行为识别的准确性。With the method provided by the embodiment of the present invention, a variety of identification and classification models are obtained through training for each business classification, and the data dimension is fully utilized, thereby reducing the problem of overfitting and generalization of a single model, prolonging the model life cycle, and improving the performance of the model. Accuracy of Behavior Recognition.

基于上述任一实施例，步骤120具体包括：基于每一行为识别模型输出的候选识别结果，统计候选识别结果为欺诈交易的模型数量，以及候选识别结果为正常交易的模型数量；若候选识别结果为欺诈交易的模型数量大于候选识别结果为正常交易的模型数量，则将该业务分类下的识别结果设置为欺诈交易；否则，将该业务分类下的识别结果设置为正常交易。Based on any of the above embodiments, step 120 specifically includes: based on the candidate recognition results output by each behavior recognition model, count the number of models whose candidate recognition results are fraudulent transactions, and the number of models whose candidate recognition results are normal transactions; If the number of models for fraudulent transactions is greater than the number of models whose candidate identification results are normal transactions, the identification results under this business category are set as fraudulent transactions; otherwise, the identification results under this business category are set as normal transactions.

具体地，候选识别结果为欺诈交易或正常交易，欺诈交易表示待测用户存在欺诈交易行为，正常交易表示待测用户不存在欺诈交易行为。针对任一业务分类，获取该业务分类下每一行为识别模型输出的候选识别结果后，基于多数表决函数得到综合该业务分类下的识别结果。进一步地，对候选识别结果进行统计，获取输出的候选识别结果为欺诈交易的行为识别模型的数量，以及输出的候选识别结果为正常交易的行为识别模型的数量，进行比较。如果候选识别结果为欺诈交易的模型数量大于候选识别结果为正常交易的模型数量，即欺诈交易的投票数大于正常交易的投票数，按照多数表决原则，确认该业务分类下的识别结果为欺诈交易。如果候选识别结果为欺诈交易的模型数量小于等于候选识别结果为正常交易的模型数量，即欺诈交易的投票数小于等于正常交易的投票数，按照多数表决原则，确认该业务分类下的识别结果为正常交易。Specifically, the candidate identification result is a fraudulent transaction or a normal transaction, the fraudulent transaction indicates that the user to be tested has fraudulent transaction behavior, and the normal transaction indicates that the user to be tested has no fraudulent transaction behavior. For any business classification, after obtaining the candidate identification results output by each behavior identification model under the business classification, the identification results under the business classification are synthesized based on the majority voting function. Further, the candidate identification results are counted to obtain and compare the number of behavior identification models whose output candidate identification results are fraudulent transactions and the number of behavior identification models whose output candidate identification results are normal transactions. If the number of models whose candidate identification results are fraudulent transactions is greater than the number of models whose candidate identification results are normal transactions, that is, the number of votes for fraudulent transactions is greater than the number of votes for normal transactions, according to the principle of majority voting, confirm that the identification results under this business category are fraudulent transactions . If the number of models whose candidate identification results are fraudulent transactions is less than or equal to the number of models whose candidate identification results are normal transactions, that is, the number of votes for fraudulent transactions is less than or equal to the number of votes for normal transactions, according to the principle of majority voting, confirm that the identification result under this business category is normal transaction.

例如，任一业务分类包含三个行为识别模型，分别为LR行为识别模型、SVM行为识别模型以及CART行为识别模型。假设LR行为识别模型和SVM行为识别模型认为待测用户有欺诈行为，即输出的候选识别结果为欺诈交易，而CART行为识别模型给出相反结果，即输出的候选识别结果为正常交易，基于多数表决的原理，确认该用户存在欺诈行为，将该业务分类下的识别结果设置为欺诈交易。For example, any business classification includes three behavior recognition models, namely the LR behavior recognition model, the SVM behavior recognition model, and the CART behavior recognition model. It is assumed that the LR behavior recognition model and the SVM behavior recognition model believe that the user under test has fraudulent behavior, that is, the output candidate recognition result is a fraudulent transaction, while the CART behavior recognition model gives the opposite result, that is, the output candidate recognition result is a normal transaction. The principle of voting is to confirm that the user has fraudulent behavior, and set the identification result under the business classification as a fraudulent transaction.

基于上述任一实施例，步骤130具体包括：基于每一业务类型对应的预设权值，对每一业务类型下的识别结果进行加权，得到综合识别结果。Based on any of the above embodiments, step 130 specifically includes: weighting the identification results under each service type based on the preset weight corresponding to each service type to obtain a comprehensive identification result.

此处，每一业务类型对应一个预设权值，预设权值是预先设置的该业务类型的界别结果的权重。Here, each service type corresponds to a preset weight, and the preset weight is a preset weight of the classification result of the service type.

基于上述任一实施例，步骤130之前还包括：获取每一业务类型对应的召回率；基于每一业务类型对应的召回率，获取每一业务类型对应的预设权值。Based on any of the above embodiments, beforestep 130, the method further includes: acquiring a recall rate corresponding to each service type; and acquiring a preset weight corresponding to each service type based on the recall rate corresponding to each service type.

具体地，任一业务类型对应的召回率是该业务类型的各个行为识别模型训练完成后，对预测得到的该业务类型的识别结果进行验证得到的，召回率越高则该业务类型的识别结果越重要，该业务类型对应的预设权值就越高。Specifically, the recall rate corresponding to any business type is obtained by verifying the predicted identification result of the business type after the training of each behavior recognition model of the business type is completed. The higher the recall rate, the identification result of the business type. The more important it is, the higher the preset weight corresponding to the service type is.

定义r1、r2、…、rn分别是业务类型1、业务类型2、…、业务类型n的召回率。对各个业务类型对应的权重α定义如下：Define r1, r2, ..., rn to be the recall rates of business type 1, business type 2, ..., business type n, respectively. The weight α corresponding to each business type is defined as follows:

α1、α2、…、αn分别对应业务类型1、业务类型2、…、业务类型n的权重。基于上述权重，综合识别结果F如下：α1, α2, ..., αn correspond to the weights of service type 1, service type 2, ..., and service type n, respectively. Based on the above weights, the comprehensive identification result F is as follows:

F＝α1·f1+α2·f2+…+αn·fn；F=α1·f1+α2·f2+…+αn·fn;

式中，F为待测用户存在欺诈交易行为的概率，f1、f2、…、fn分别对应业务类型1、业务类型2、…、业务类型n的识别结果。In the formula, F is the probability that the user to be tested has fraudulent transaction behavior, and f1, f2, ..., fn correspond to the identification results of business type 1, business type 2, ... and business type n, respectively.

基于上述任一实施例，图2为本发明另一实施例提供的欺诈交易行为识别方法的流程示意图，如图2所示，欺诈交易行为识别方法为矩阵式混合算法，横向为多业务维度，纵向为多算法维度，即欺诈交易行为识别方法包括多个业务分类，每一业务分类包括多个基于不同算法构建的行为识别模型。例如，业务分类1中包括基于逻辑回归算法构建的行为识别模型LR1，基于支持向量机构建的行为识别模型SVM1，以及基于分类回归树构建的行为识别模型CART1。Based on any of the above embodiments, FIG. 2 is a schematic flowchart of a method for identifying fraudulent transaction behavior provided by another embodiment of the present invention. As shown in FIG. 2 , the method for identifying fraudulent transaction behavior is a matrix-type hybrid algorithm, and the horizontal direction is multi-service dimensions. The vertical dimension is the multi-algorithm dimension, that is, the fraud transaction behavior identification method includes multiple business categories, and each business category includes multiple behavior identification models constructed based on different algorithms. For example, business classification 1 includes a behavior recognition model LR1 based on a logistic regression algorithm, a behavior recognition model SVM1 based on a support vector machine, and a behavior recognition model CART1 based on a classification regression tree.

将待测用户特征行为数据分别输入至每一业务分类对应的每一个行为识别模型，获取每一行为识别模型输出的候选识别结果。随后，针对任一业务分类，基于多数表决原则，统计该业务分类对应的每一个行为识别模型输出的候选识别结果，进而得到该任务分类对应的识别结果。例如，统计业务分类1中各个行为识别模型输出的候选识别结果，并得到业务分类1的识别结果，即识别结果1。Input the characteristic behavior data of the user to be tested into each behavior recognition model corresponding to each business classification, and obtain the candidate recognition results output by each behavior recognition model. Subsequently, for any business classification, based on the majority voting principle, the candidate identification results output by each behavior recognition model corresponding to the business classification are counted, and then the identification results corresponding to the task classification are obtained. For example, the candidate identification results output by each behavior identification model in business classification 1 are counted, and the identification result of business classification 1, that is, identification result 1 is obtained.

基于每一业务类型对应的预设权值，即图2中的α1、α2、…、αn，以及每一业务类型的识别结果，即识别结果1、识别结果2、…、识别结果n进行加权，得到综合识别结果。Weighting is performed based on the preset weights corresponding to each service type, namely α1, α2, . , to get the comprehensive identification result.

本发明实施例提供的方法，覆盖多行业的欺诈行为，并且可以根据需求增加行业分类，识别范围广，且可拓展性强。此外，基于多算法维度进行行为识别，识别准确率高，能够有效识别欺诈用户，降低了将普通用户错判为欺诈用户的概率，优化了用户体验，节省了由于错判导致的营销成本。The method provided by the embodiments of the present invention covers fraudulent behaviors in multiple industries, and can add industry classifications according to requirements, has a wide recognition range, and is highly scalable. In addition, behavior recognition based on multi-algorithmic dimensions has high recognition accuracy, can effectively identify fraudulent users, reduces the probability of misjudging ordinary users as fraudulent users, optimizes user experience, and saves marketing costs caused by misjudgment.

基于上述任一方法实施例，图3为本发明实施例提供的欺诈交易行为识别装置的结构示意图，该装置包括模型识别单元310、业务识别单元320和综合识别单元330；Based on any of the above method embodiments, FIG. 3 is a schematic structural diagram of a fraudulent transaction behavior identification device provided by an embodiment of the present invention, and the device includes amodel identification unit 310, abusiness identification unit 320, and acomprehensive identification unit 330;

其中，模型识别单元310用于针对任一业务分类，将待测用户特征行为数据分别输入至所述任一业务分类对应的多个行为识别模型，获取每一所述行为识别模型输出的候选识别结果；其中，所述行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的；Themodel identification unit 310 is configured to, for any business classification, input the characteristic behavior data of the user to be tested into a plurality of behavior identification models corresponding to the any business classification, and obtain the candidate identifications output by each of the behavior identification models. Results; wherein, the behavior recognition model is obtained by training based on sample user characteristic behavior data and sample recognition results;

业务识别单元320用于基于所述每一行为识别模型输出的候选识别结果，获取所述任一业务分类下的识别结果；Thebusiness identification unit 320 is configured to obtain identification results under any of the business classifications based on the candidate identification results output by each behavior identification model;

综合识别单元330用于基于每一所述业务类型下的识别结果，获取综合识别结果。Thecomprehensive identification unit 330 is configured to obtain comprehensive identification results based on the identification results under each of the service types.

本发明实施例提供的装置，基于每一行为识别模型输出的候选识别结果，获取单一业务分类下的识别结果，从而减缓了单一模型过拟合和泛化的问题，延长了模型生命周期，提高了行为识别的准确性。此外，基于多个业务类型下的识别结果获取综合识别结果，有效覆盖多行业的欺诈行为，全面挖掘容易出现欺诈的领域和行为，能够指导业务漏洞的优化。The device provided by the embodiment of the present invention obtains the identification results under a single business classification based on the candidate identification results output by each behavior identification model, thereby reducing the problem of over-fitting and generalization of a single model, prolonging the model life cycle, and improving the the accuracy of behavior recognition. In addition, comprehensive identification results are obtained based on identification results under multiple business types, effectively covering fraudulent behaviors in multiple industries, and comprehensively mining areas and behaviors prone to fraud, which can guide the optimization of business vulnerabilities.

基于上述任一实施例，所述业务分类包括电商欺诈、互联网金融欺诈、运营商欺诈和信用贷款欺诈中的至少一种；所述待测用户特征行为数据包括待测用户的基础信息、通话情况、流量使用行为、上网行为中的至少一种。Based on any of the above embodiments, the business classification includes at least one of e-commerce fraud, Internet financial fraud, operator fraud, and credit loan fraud; the characteristic behavior data of the user to be tested includes basic information of the user to be tested, call At least one of the situation, traffic usage behavior, and surfing behavior.

基于上述任一实施例，该装置还包括训练单元；Based on any of the above embodiments, the device further includes a training unit;

所述训练单元用于针对任一业务分类，基于所述样本用户特征行为数据和所述样本识别结果，分别通过多种算法训练初始模型，得到多个所述行为识别模型；其中，所述算法包括逻辑回归算法、支持向量机算法和分类回归树算法中的至少一种。The training unit is used for any business classification, based on the sample user characteristic behavior data and the sample identification result, respectively train the initial model through a variety of algorithms to obtain a plurality of the behavior identification models; wherein, the algorithm It includes at least one of logistic regression algorithm, support vector machine algorithm and classification regression tree algorithm.

基于上述任一实施例，业务识别单元320具体用于：Based on any of the above embodiments, theservice identification unit 320 is specifically configured to:

基于所述每一行为识别模型输出的候选识别结果，统计所述候选识别结果为欺诈交易的模型数量，以及所述候选识别结果为正常交易的模型数量；Based on the candidate recognition results output by each behavior recognition model, count the number of models whose candidate recognition results are fraudulent transactions, and the number of models whose candidate recognition results are normal transactions;

若所述候选识别结果为欺诈交易的模型数量大于所述候选识别结果为正常交易的模型数量，则将所述任一业务分类下的识别结果设置为欺诈交易；否则，将所述任一业务分类下的识别结果设置为正常交易。If the number of models for which the candidate identification result is a fraudulent transaction is greater than the number of models for which the candidate identification result is a normal transaction, the identification result under any business classification is set as a fraudulent transaction; otherwise, the any business The recognition result under classification is set to normal transaction.

基于上述任一实施例，综合识别单元330具体用于：基于每一所述业务类型对应的预设权值，对每一所述业务类型下的识别结果进行加权，得到综合识别结果。Based on any of the above-mentioned embodiments, thecomprehensive identification unit 330 is specifically configured to: based on the preset weight corresponding to each of the service types, weight the identification results under each of the service types to obtain a comprehensive identification result.

基于上述任一实施例，该装置还包括权值获取单元；Based on any of the foregoing embodiments, the apparatus further includes a weight acquisition unit;

权值获取单元用于获取每一所述业务类型对应的召回率；基于每一所述业务类型对应的召回率，获取每一所述业务类型对应的预设权值。The weight acquisition unit is configured to acquire a recall rate corresponding to each of the service types; and based on the recall rate corresponding to each of the service types, acquire a preset weight corresponding to each of the service types.

图4为本发明实施例提供的电子设备的实体结构示意图，如图4所示，该电子设备可以包括：处理器(processor)401、通信接口(Communications Interface)402、存储器(memory)403和通信总线404，其中，处理器401，通信接口402，存储器403通过通信总线404完成相互间的通信。处理器401可以调用存储在存储器403上并可在处理器401上运行的计算机程序，以执行上述各实施例提供的欺诈交易行为识别方法，例如包括：针对任一业务分类，将待测用户特征行为数据分别输入至所述任一业务分类对应的多个行为识别模型，获取每一所述行为识别模型输出的候选识别结果；其中，所述行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的；基于所述每一行为识别模型输出的候选识别结果，获取所述任一业务分类下的识别结果；基于每一所述业务类型下的识别结果，获取综合识别结果。FIG. 4 is a schematic diagram of an entity structure of an electronic device provided by an embodiment of the present invention. As shown in FIG. 4 , the electronic device may include: a processor (processor) 401, a communications interface (Communications Interface) 402, a memory (memory) 403, and a communication Thebus 404, wherein theprocessor 401, thecommunication interface 402, and thememory 403 complete the communication with each other through thecommunication bus 404. Theprocessor 401 can call a computer program stored in thememory 403 and can run on theprocessor 401 to execute the fraudulent transaction behavior identification methods provided in the above embodiments, for example, including: Behavior data is respectively input into a plurality of behavior recognition models corresponding to any business classification, and a candidate recognition result output by each of the behavior recognition models is obtained; wherein, the behavior recognition model is based on sample user characteristic behavior data and sample recognition Based on the candidate identification results output by each behavior identification model, the identification results under any of the business classifications are obtained; based on the identification results under each of the business types, the comprehensive identification results are obtained.

此外，上述的存储器403中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in thememory 403 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solutions of the embodiments of the present invention are essentially, or the parts that make contributions to the prior art or the parts of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

本发明实施例还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各实施例提供的欺诈交易行为识别方法，例如包括：针对任一业务分类，将待测用户特征行为数据分别输入至所述任一业务分类对应的多个行为识别模型，获取每一所述行为识别模型输出的候选识别结果；其中，所述行为识别模型是基于样本用户特征行为数据和样本识别结果训练得到的；基于所述每一行为识别模型输出的候选识别结果，获取所述任一业务分类下的识别结果；基于每一所述业务类型下的识别结果，获取综合识别结果。Embodiments of the present invention further provide a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, is implemented to execute the fraudulent transaction behavior identification methods provided by the foregoing embodiments, for example, including: For any business classification, input the characteristic behavior data of the user to be tested into a plurality of behavior recognition models corresponding to the any business classification, and obtain the candidate recognition results output by each of the behavior recognition models; wherein, the behavior recognition The model is trained based on sample user characteristic behavior data and sample identification results; based on the candidate identification results output by each behavior identification model, the identification results under any of the business classifications are obtained; based on the identification results under each business type The recognition results are obtained to obtain comprehensive recognition results.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of identifying fraudulent transaction activity, comprising:

aiming at any business classification, respectively inputting the characteristic behavior data of a user to be tested into a plurality of behavior recognition models corresponding to the business classification, and acquiring a candidate recognition result output by each behavior recognition model; the behavior recognition model is obtained by training based on sample user characteristic behavior data and a sample recognition result;

acquiring an identification result under any service classification based on the candidate identification result output by each behavior identification model;

and acquiring a comprehensive identification result based on the identification result under each service type.

2. The method of claim 1, wherein the service classification includes at least one of e-commerce fraud, internet financial fraud, operator fraud, and credit fraud;

the characteristic behavior data of the user to be tested comprises at least one of basic information, conversation conditions, traffic using behaviors and internet surfing behaviors of the user to be tested.

3. The method according to claim 1, wherein for any service class, the step of inputting the characteristic behavior data of the user to be tested into the plurality of behavior recognition models corresponding to the service class respectively to obtain the candidate recognition result output by each behavior recognition model further comprises:

aiming at any business classification, training initial models respectively through multiple algorithms based on the sample user characteristic behavior data and the sample recognition result to obtain multiple behavior recognition models;

wherein the algorithm comprises at least one of a logistic regression algorithm, a support vector machine algorithm, and a classification regression tree algorithm.

4. The method according to claim 1, wherein the obtaining the recognition result under any service classification based on the candidate recognition result output by each behavior recognition model specifically includes:

counting the number of models of which the candidate recognition result is fraudulent transaction and the number of models of which the candidate recognition result is normal transaction based on the candidate recognition result output by each behavior recognition model;

if the model number of the fraud transaction as the candidate identification result is larger than the model number of the normal transaction as the candidate identification result, setting the identification result under any service classification as the fraud transaction; otherwise, setting the identification result under any service classification as normal transaction.

5. The method according to claim 1, wherein the obtaining a comprehensive identification result based on the identification result under each service type specifically includes:

and weighting the identification result under each service type based on the preset weight corresponding to each service type to obtain a comprehensive identification result.

6. The method according to claim 5, wherein the weighting is performed on the recognition result under each service type based on a preset weight corresponding to each service type to obtain a comprehensive recognition result, and the method further comprises:

obtaining a recall rate corresponding to each service type;

and acquiring a preset weight corresponding to each service type based on the recall rate corresponding to each service type.

7. An apparatus for identifying fraudulent transaction activity, comprising:

the model identification unit is used for respectively inputting the characteristic behavior data of the user to be detected into a plurality of behavior identification models corresponding to any business classification aiming at any business classification and acquiring a candidate identification result output by each behavior identification model; the behavior recognition model is obtained by training based on sample user characteristic behavior data and a sample recognition result;

a service identification unit, configured to obtain an identification result in any service classification based on the candidate identification result output by each behavior identification model;

and the comprehensive identification unit is used for acquiring a comprehensive identification result based on the identification result under each service type.

8. The apparatus of claim 7, further comprising a training unit;

the training unit is used for training initial models through various algorithms respectively based on the sample user characteristic behavior data and the sample recognition results aiming at any business classification to obtain a plurality of behavior recognition models; wherein the algorithm comprises at least one of a logistic regression algorithm, a support vector machine algorithm, and a classification regression tree algorithm.

9. An electronic device, comprising a processor, a communication interface, a memory and a bus, wherein the processor, the communication interface and the memory communicate with each other via the bus, and the processor can call logic instructions in the memory to execute the method according to any one of claims 1 to 6.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.