Movatterモバイル変換


[0]ホーム

URL:


CN115659276A - Method and device for positioning abnormity, storage medium and electronic equipment - Google Patents

Method and device for positioning abnormity, storage medium and electronic equipment
Download PDF

Info

Publication number
CN115659276A
CN115659276ACN202211275355.2ACN202211275355ACN115659276ACN 115659276 ACN115659276 ACN 115659276ACN 202211275355 ACN202211275355 ACN 202211275355ACN 115659276 ACN115659276 ACN 115659276A
Authority
CN
China
Prior art keywords
data
dimension
combination
dimension value
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211275355.2A
Other languages
Chinese (zh)
Other versions
CN115659276B (en
Inventor
陈超宇
余航
李建国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co LtdfiledCriticalAlipay Hangzhou Information Technology Co Ltd
Priority to CN202211275355.2ApriorityCriticalpatent/CN115659276B/en
Publication of CN115659276ApublicationCriticalpatent/CN115659276A/en
Application grantedgrantedCritical
Publication of CN115659276BpublicationCriticalpatent/CN115659276B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

Translated fromChinese

本说明书公开了一种异常定位的方法、装置、存储介质及电子设备,可以获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到维度值组合,而后,可以针对每个维度值组合,确定该维度值组合对应的影响程度以及区分度,影响程度用于表征该维度值组合在待检测数据中所命中的数据,引起待检测数据出现异常的程度,区分度用于表征该维度值组合所命中的数据在待检测数据中的实际数据分布,与确定出的该维度值组合对应的标准数据分布之间的差异程度,根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据目标维度值组合,进行异常定位,从而提高异常原因定位的全面性以及准确性。

Figure 202211275355

This specification discloses a method, device, storage medium, and electronic equipment for abnormal location, which can obtain data to be detected, and combine the dimension values contained in different data dimensions to obtain a combination of dimension values, and then, for each A combination of dimension values, determine the degree of influence and discrimination corresponding to the combination of dimension values, the degree of influence is used to represent the data hit by the combination of dimension values in the data to be detected, and the degree to which the data to be detected is abnormal, and the degree of discrimination is used for Characterize the actual data distribution of the data hit by the dimension value combination in the data to be detected, and the degree of difference between the determined standard data distribution corresponding to the dimension value combination, according to the degree of influence and discrimination corresponding to each dimension value combination , determine the target dimension value combination, and perform abnormal location according to the target dimension value combination, thereby improving the comprehensiveness and accuracy of abnormal cause location.

Figure 202211275355

Description

Translated fromChinese
一种异常定位的方法、装置、存储介质及电子设备A method, device, storage medium and electronic equipment for abnormal location

技术领域technical field

本说明书涉及计算机技术领域,尤其涉及一种异常定位的方法、装置、存储介质及电子设备。This description relates to the field of computer technology, in particular to a method, device, storage medium and electronic equipment for abnormal location.

背景技术Background technique

在实际应用中,可以存在多种统计指标,如:交易数量、页面点击量等,而往往可以通过多个数据维度对统计指标进行统计,例如,对于交易数量这个统计指标来说,可以通过地区、商家、机房等数据维度进行统计,当一个统计指标出现异常时,往往需要排查出导致统计指标出现异常的原因,并基于原因,对出现的问题进行及时修复,以防止隐私数据泄露,如,确定出具体是哪个机房出现了故障,或是确定出哪个地区的数据导致统计指标的异常。In practical applications, there can be a variety of statistical indicators, such as the number of transactions, page clicks, etc., and the statistical indicators can often be counted through multiple data dimensions. For example, for the statistical indicator of the number of transactions, it can be calculated by region When a statistical index is abnormal, it is often necessary to find out the cause of the abnormal statistical index, and based on the reason, repair the problem in time to prevent the leakage of private data, such as, Determine which computer room is faulty, or determine which region's data leads to abnormal statistical indicators.

在现有技术中,确定可能导致统计指标总量出现异常的原因的过程中,最终往往会定位到一个维度下的某个维度值。In the prior art, in the process of determining the possible cause of the abnormality of the total amount of statistical indicators, a certain dimension value under one dimension is often located in the end.

例如,在统计交易数量时,可以统计出总交易数量,也可以针对每个维度值进行交易数量的统计,地区这个维度下包含的维度值可以为地区A、地区B等,机房这个维度下包含的维度值可以为机房1、机房2等,业务平台可以统计出地区A下产生的交易数量,还可以统计出由机房1所支持产生的交易数量,当交易数量总量出现异常(如突增或骤降),那么在现有技术下可以定位到机房维度下的某个机房,是有可能导致统计指标总量出现异常的原因。For example, when counting the number of transactions, the total number of transactions can be counted, or the number of transactions can be counted for each dimension value. The dimension values included in the dimension of region can be region A, region B, etc., and the dimension of computer room includes The dimension value can be computer room 1, computer room 2, etc. The business platform can count the number of transactions generated under region A, and can also count the number of transactions supported by computer room 1. When the total amount of transactions is abnormal (such as sudden increase or a sudden drop), then under the existing technology, it is possible to locate a certain computer room under the computer room dimension, which may cause the abnormality of the total statistical indicators.

但是,实际上导致统计指标总量出现异常的原因也有可能是多维度因素的组合共同导致的,例如,原因也有可能需要定位到某个地区与某个机房,因此,通过现有技术这种方式进行排查,可能会不准确,或者无法全面的排查出所有可能导致异常的因素。However, in fact, the reason for the abnormality of the total amount of statistical indicators may also be caused by a combination of multi-dimensional factors. For example, the cause may also need to be located in a certain region and a certain computer room. Therefore, through the existing technology in this way Checking may not be accurate, or it may not be possible to fully check out all factors that may cause abnormalities.

所以,如何提高异常原因排查的准确性和全面性,则是一个亟待解决的问题。Therefore, how to improve the accuracy and comprehensiveness of abnormal cause investigation is an urgent problem to be solved.

发明内容Contents of the invention

本说明书提供一种异常定位的方法、装置、存储介质及电子设备,以提高异常检测的准确性。This specification provides a method, device, storage medium and electronic equipment for anomaly location, so as to improve the accuracy of anomaly detection.

本说明书采用下述技术方案:This manual adopts the following technical solutions:

本说明书提供一种异常定位的方法,包括:This manual provides a method for abnormal location, including:

获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到各维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度;Obtain the data to be detected, and combine the dimension values contained in different data dimensions to obtain the combination of dimension values, wherein each dimension value combination contains at least two dimension values, and each dimension value comes from different data dimension;

针对每个维度值组合,根据所述待检测数据,确定该维度值组合对应的影响程度以及区分度,所述影响程度用于表征该维度值组合在所述待检测数据中所命中的数据出现的异常对全量的所述待检测数据出现的异常的影响程度,所述区分度用于表征该维度值组合在所述待检测数据中所命中的数据的实际数据分布与该维度值组合对应的标准数据分布之间的差异程度;For each dimension value combination, according to the data to be detected, determine the degree of influence and the degree of discrimination corresponding to the combination of dimension values, the degree of influence is used to represent the occurrence of data hit by the combination of dimension values in the data to be detected The degree of influence of the abnormality on the abnormality of the total amount of the data to be detected, the degree of discrimination is used to characterize the actual data distribution of the data hit by the dimension value combination in the data to be detected and the corresponding dimension value combination the degree of variance between standard data distributions;

根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据所述目标维度值组合,进行异常定位。Determine the target dimension value combination according to the degree of influence and discrimination corresponding to each dimension value combination, and perform abnormal location according to the target dimension value combination.

可选地,根据所述待检测数据,确定该维度值组合对应的影响程度,包括:Optionally, according to the data to be detected, determining the degree of influence corresponding to the dimension value combination includes:

预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在所述时间段内应产生的数据,作为第二预测数据;Predicting the data that should be generated within the time period when the data to be detected is generated, as the first predicted data, and predicting the data that should be generated within the time period for the dimension value combination, as the second predicted data;

根据所述第一预测数据,确定在所述待检测数据内的异常数据量,作为第一异常数据量,以及,根据所述第二预测数据,确定该维度值组合在所述待检测数据中所命中的数据内的异常数据量,作为第二异常数据量;According to the first prediction data, determine the amount of abnormal data in the data to be detected as the first amount of abnormal data, and, according to the second prediction data, determine that the dimension value is combined in the data to be detected The abnormal data amount in the hit data is used as the second abnormal data amount;

根据所述第二异常数据量在所述第一异常数据量的占比,确定该维度值组合对应的影响程度。According to the proportion of the second abnormal data volume in the first abnormal data volume, the degree of influence corresponding to the dimension value combination is determined.

可选地,根据所述待检测数据,确定该维度值组合对应的区分度,包括:Optionally, according to the data to be detected, determining the degree of differentiation corresponding to the dimension value combination includes:

预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在所述时间段内应产生的数据,作为第二预测数据;Predicting the data that should be generated within the time period when the data to be detected is generated, as the first predicted data, and predicting the data that should be generated within the time period for the dimension value combination, as the second predicted data;

根据该维度值组合在所述待检测数据中所命中的数据在所述待检测数据中的占比,确定所述实际数据分布,以及,根据所述第二预测数据在所述第一预测数据中的占比,确定所述标准数据分布;Determine the distribution of the actual data according to the proportion of the hit data in the data to be detected in the data to be detected according to the dimension value combination, and determine the distribution of the actual data according to the ratio of the second predicted data in the first predicted data Proportion in determining the standard data distribution;

根据所述实际数据分布和所述标准数据分布,确定所述区分度。The discrimination is determined according to the actual data distribution and the standard data distribution.

可选地,预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在所述时间段内应产生的数据,作为第二预测数据,包括:Optionally, predicting the data that should be generated within the time period when the data to be detected is generated as the first prediction data, and predicting the data that should be generated within the time period for the dimension value combination as the second prediction data include:

获取与所述时间段邻近的时间段中的数据,作为参考数据;Obtaining data in a time period adjacent to the time period as reference data;

根据所述参考数据,得到所述第一预测数据以及所述第二预测数据。According to the reference data, the first prediction data and the second prediction data are obtained.

可选地,根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,包括:Optionally, determine the target dimension value combination according to the degree of influence and the degree of differentiation corresponding to each dimension value combination, including:

针对确定出的每个维度组合,根据该维度组合包含的各维度值组合对应的影响程度,筛选出该维度组合下的候选维度值组合,其中,维度组合中包括至少两个数据维度;For each determined dimension combination, according to the degree of influence corresponding to each dimension value combination included in the dimension combination, select candidate dimension value combinations under the dimension combination, wherein the dimension combination includes at least two data dimensions;

根据各候选维度值组合对应的区分度,确定目标维度值组合。Determine the target dimension value combination according to the degree of differentiation corresponding to each candidate dimension value combination.

可选地,根据该维度组合包含的各维度值组合对应的影响程度,筛选出该维度组合下的候选维度值组合,包括:Optionally, according to the degree of influence corresponding to each dimension value combination included in the dimension combination, the candidate dimension value combinations under the dimension combination are screened out, including:

将该维度组合包含的各维度值组合按照区分度从大到小进行排序,并按照排序的顺序,依次将影响程度不小于确定出的预设影响程度的维度值组合加入到该维度组合对应的集合中,直至加入到所述集合中的维度值组合对应的影响程度总量不小于确定出的预设总量为止,得到该维度组合对应的目标集合;The dimension value combinations contained in the dimension combination are sorted from the largest to the smallest according to the degree of discrimination, and according to the order of sorting, the dimension value combinations whose influence degree is not less than the determined preset influence degree are sequentially added to the dimension value combination corresponding to the dimension combination In the set, until the total amount of influence corresponding to the dimension value combination added to the set is not less than the determined preset total amount, the target set corresponding to the dimension combination is obtained;

将所述目标集合内的维度值组合,作为该维度组合下的候选维度值组合。Combining dimension values in the target set as candidate dimension value combinations under the dimension combination.

可选地,确定所述预设影响程度以及所述预设总量,包括:Optionally, determining the preset influence degree and the preset total amount includes:

预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,并根据所述第一预测数据,确定在所述待检测数据内的异常数据量,作为第一异常数据量;Predicting the data that should be generated within the time period when the data to be detected is generated, as the first predicted data, and determining the amount of abnormal data in the data to be detected according to the first predicted data, as the first abnormal data amount;

根据所述第一异常数据量,确定所述预设影响程度以及所述预设总量。According to the first abnormal data amount, the preset influence degree and the preset total amount are determined.

本说明书提供一种异常定位的装置,包括:This manual provides a device for abnormal location, including:

获取模块,用于获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到各维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度;The obtaining module is used to obtain the data to be detected, and combine the dimension values contained in different data dimensions to obtain combinations of dimension values, wherein each combination of dimension values contains at least two dimension values, and each dimension Values come from different data dimensions;

确定模块,用于针对每个维度值组合,根据所述待检测数据,确定该维度值组合对应的影响程度以及区分度,所述影响程度用于表征该维度值组合在所述待检测数据中所命中的数据出现的异常对全量的所述待检测数据出现的异常的影响程度,所述区分度用于表征该维度值组合在所述待检测数据中所命中的数据的实际数据分布与该维度值组合对应的标准数据分布之间的差异程度;The determining module is configured to, for each dimension value combination, determine the degree of influence and the degree of discrimination corresponding to the combination of dimension values according to the data to be detected, and the degree of influence is used to characterize the combination of dimension values in the data to be detected The degree of influence of the abnormality of the hit data on the abnormality of the total amount of the data to be detected, and the degree of discrimination is used to represent the actual data distribution of the hit data in the data to be detected by the dimension value combination and the The degree of difference between the standard data distributions corresponding to the combination of dimension values;

定位模块,用于根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据所述目标维度值组合,进行异常定位。The positioning module is configured to determine the target dimension value combination according to the degree of influence and the degree of differentiation corresponding to each dimension value combination, and perform abnormal location according to the target dimension value combination.

本说明书提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述异常定位的方法。This specification provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the above method for abnormal location is realized.

本说明书提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述异常定位的方法。This specification provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the above-mentioned exception location method is realized when the processor executes the program.

本说明书采用的上述至少一个技术方案能够达到以下有益效果:The above-mentioned at least one technical solution adopted in this specification can achieve the following beneficial effects:

在本说明书中提供的异常定位的方法中,可以获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度,而后,可以针对每个维度值组合,根据待检测数据,确定该维度值组合对应的影响程度以及区分度,影响程度用于表征该维度值组合在待检测数据中所命中的数据出现的异常对全量的待检测数据出现异常的影响程度,区分度用于表征该维度值组合在待检测数据中所命中的数据在待检测数据中的实际数据分布,与确定出的该维度值组合对应的标准数据分布之间的差异程度,根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据目标维度值组合,进行异常定位。In the abnormal location method provided in this specification, the data to be detected can be obtained, and the dimension values contained in different data dimensions can be combined to obtain a combination of dimension values, wherein each combination of dimension values contains at least two Each dimension value comes from a different data dimension, and then, for each dimension value combination, according to the data to be detected, the degree of influence and the degree of differentiation corresponding to the combination of dimension values can be determined, and the degree of influence is used to represent the dimension value The degree of influence of the abnormality of the data hit by the combination in the data to be detected on the abnormality of the total amount of data to be detected, and the degree of discrimination is used to represent the actuality of the data hit by the dimension value combination in the data to be detected Data distribution, the degree of difference between the standard data distribution corresponding to the determined dimension value combination, determine the target dimension value combination according to the degree of influence and discrimination corresponding to each dimension value combination, and perform abnormal position.

从上述内容中可以看出,本说明书中提供的异常定位的方法,可以结合各维度值组合对应的影响程度以及区分度,确定哪个维度值组合可能是导致待检测数据出现异常的原因,相比于现有技术通常是仅定位到单一因素,本方案能够更加全面且准确的将异常原因定位到若干个维度值组合,即,定位到多个维度下的因素组合上,从而更方便相关的技术人员进行异常定位。It can be seen from the above that the method of abnormal location provided in this specification can combine the degree of influence and the degree of discrimination corresponding to the combination of each dimension value to determine which combination of dimension values may be the cause of the abnormality of the data to be detected. Since the existing technology usually only locates a single factor, this solution can more comprehensively and accurately locate the cause of the abnormality to the combination of several dimension values, that is, to locate the combination of factors under multiple dimensions, so that it is more convenient for related technologies Personnel for abnormal location.

附图说明Description of drawings

此处所说明的附图用来提供对本说明书的进一步理解,构成本说明书的一部分,本说明书的示意性实施例及其说明用于解释本说明书,并不构成对本说明书的不当限定。在附图中:The drawings described here are used to provide a further understanding of this specification and constitute a part of this specification. The schematic embodiments and descriptions of this specification are used to explain this specification and do not constitute an improper limitation of this specification. In the attached picture:

图1为本说明书中一种异常定位的方法的流程示意图;FIG. 1 is a schematic flow diagram of a method for abnormal location in this specification;

图2为本说明书提供的一种以交易数量为例对各维度值组合进行筛选的方式示意图;Figure 2 is a schematic diagram of a method for screening combinations of dimension values provided by this specification, taking transaction quantity as an example;

图3为本说明书提供的一种异常定位的装置示意图;Figure 3 is a schematic diagram of an abnormal location device provided in this specification;

图4为本说明书提供的一种对应于图1的电子设备示意图。FIG. 4 is a schematic diagram of an electronic device corresponding to FIG. 1 provided in this specification.

具体实施方式Detailed ways

为使本说明书的目的、技术方案和优点更加清楚,下面将结合本说明书具体实施例及相应的附图对本说明书技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本说明书保护的范围。In order to make the purpose, technical solution and advantages of this specification clearer, the technical solution of this specification will be clearly and completely described below in conjunction with specific embodiments of this specification and corresponding drawings. Apparently, the described embodiments are only some of the embodiments in this specification, not all of them. Based on the embodiments in this specification, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this specification.

以下结合附图,详细说明本说明书各实施例提供的技术方案。The technical solutions provided by each embodiment of this specification will be described in detail below in conjunction with the accompanying drawings.

图1为本说明书中一种异常定位的方法的流程示意图,具体包括以下步骤:Figure 1 is a schematic flow chart of a method for abnormal location in this specification, which specifically includes the following steps:

S100:获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到各维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度。S100: Obtain the data to be detected, and combine the dimension values contained in different data dimensions to obtain combinations of dimension values, wherein each dimension value combination contains at least two dimension values, and each dimension value comes from a different data dimension.

在很多业务场景中,业务平台中会产生大量的数据,可以通过统计指标来监测业务是否有正常运行,这里提到的统计指标可以是指对任意一种业务场景进行业务量统计的指标,例如,统计指标可以为业务平台内的交易数量,再例如,统计指标可以为用户针对业务平台进行页面点击的页面点击量。In many business scenarios, a large amount of data will be generated in the business platform, and statistical indicators can be used to monitor whether the business is running normally. The statistical indicators mentioned here can refer to the indicators that perform business volume statistics for any business scenario, such as , the statistical index may be the number of transactions in the business platform, and for another example, the statistical index may be the number of page clicks performed by users on the business platform.

在一些情况下,业务平台需要确定统计指标在总量上的波动是由什么导致的,例如,若是统计指标的总量出现异常,如出现了突然剧烈的增长或是下跌,业务平台需要尽快定位到可能导致统计指标总量出现异常的原因,再例如,当统计指标在总量上出现一定变化时(不一定是异常),业务平台的相关人员需要定位统计指标出现变化的原因,则也可以应用本方法。In some cases, the business platform needs to determine what causes the fluctuations in the total amount of statistical indicators. For example, if the total amount of statistical indicators is abnormal, such as sudden and violent growth or decline, the business platform needs to locate it as soon as possible. To find out the reasons that may lead to abnormalities in the total amount of statistical indicators, for example, when the statistical indicators have a certain change in the total amount (not necessarily abnormal), the relevant personnel of the business platform need to locate the reasons for the changes in the statistical indicators, then you can also Apply this method.

基于此,业务平台可以获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度,数据维度是统计该待检测数据时所使用的维度,也就是说,可以将多个不同的数据维度的维度值组合在一起,得到维度值组合。Based on this, the business platform can obtain the data to be detected, and combine the dimension values contained in different data dimensions to obtain a dimension value combination, wherein each dimension value combination contains at least two dimension values, and each dimension The values come from different data dimensions, and the data dimension is the dimension used when counting the data to be detected. That is to say, the dimension values of multiple different data dimensions can be combined to obtain a combination of dimension values.

此处还可以引入维度组合的概念,维度组合中可以包含多个不同数据维度,而针对一个维度组合下每个维度值组合,该维度值组合是由该维度组合中包含的每个数据维度中的一个维度值组成的,其中,不同维度值组合中所包含的维度值至少存在部分是不同的。The concept of dimension combination can also be introduced here. A dimension combination can contain multiple different data dimensions. For each dimension value combination under a dimension combination, the dimension value combination is determined by each data dimension contained in the dimension combination It consists of a dimension value of , where at least some of the dimension values contained in different dimension value combinations are different.

这里提到的待检测数据可以是指一段时间内和一个统计指标相关的业务数据,例如,当统计指标为交易数量时,待检测数据可以是指在交易总量存在异常时间段内的交易数据。The data to be detected mentioned here can refer to business data related to a statistical indicator for a period of time. For example, when the statistical indicator is the number of transactions, the data to be detected can refer to the transaction data during the time period when the total amount of transactions is abnormal. .

上述提到的数据维度可以是人为划分出的针对该待检测数据的一些数据维度,也可以是在业务场景中与该统计指标相关的至少部分维度,在此依然以交易数量为统计指标来举例说明各个数据维度,如下表1所示。The data dimensions mentioned above can be some data dimensions artificially divided for the data to be detected, or at least part of the dimensions related to the statistical indicators in the business scenario. Here, the number of transactions is still used as a statistical indicator as an example. Describe each data dimension, as shown in Table 1 below.

时间time地区area运营商operator数据中心data center支付方式payment method交易数量Number of transactions10:0010:00地区AArea A运营商1operator 1数据中心1data center 1支付方式1payment method 12210:0010:00地区AArea A运营商2Carrier 2数据中心2data center 2支付方式2payment method 23310:0110:01地区Barea B运营商3Carrier 3数据中心3data center 3支付方式2payment method 255

表1Table 1

从表1中可以看出,针对交易数量这一统计指标的数据维度可以包括:地区、运营商、数据中心以及支付方式这4个维度,地区维度中包括地区A、地区B等维度值,运营商维度中包含运营商1、运营商2、运营商3等维度值,数据中心维度下包括数据中心1、数据中心2以及数据中心3等维度值,支付方式维度下包括支付方式1以及支付方式2等维度值,上述表1中的数据可以是指在监测交易数量这个统计指标时的待检测数据,可以看出,待检测数据为一定时间段内与某个统计指标相关的数据,以表1的这种示例来说,是直接按照省份、运营商、数据中心、支付方式来统计交易数量,在10:00时,由某些用户在地区A通过支付方式1进行支付,使用的是运营商1,由业务平台的数据中心1支持的交易发生了2笔,由某些用户在地区A通过转账支付,并在北京联通,由业务平台的数据中心2支持的交易发生了2笔。It can be seen from Table 1 that the data dimensions of the statistical indicator of the number of transactions can include four dimensions: region, operator, data center, and payment method. The region dimension includes dimension values such as region A and region B. The business dimension includes Carrier 1, Carrier 2, and Carrier 3. The data center dimension includes data center 1, data center 2, and data center 3. The payment method dimension includes payment method 1 and payment method. 2 and other dimension values, the data in the above Table 1 can refer to the data to be detected when monitoring the statistical indicator of transaction quantity. It can be seen that the data to be detected is the data related to a certain statistical indicator within a certain period of time. For this example of 1, the number of transactions is directly counted according to the province, operator, data center, and payment method. At 10:00, some users pay in area A through payment method 1, using the operator For merchant 1, there were 2 transactions supported by data center 1 of the business platform, which were paid by some users in region A through transfer, and in Beijing Unicom, 2 transactions supported by data center 2 of the business platform occurred.

依然以上述表1为例,假设省份、运营商、数据中心以及支付方式是用来统计交易数量的全部数据维度,那么通过上述表1中待检测数据的形式,将任意几种数据维度进行组合,可以得到维度组合,进而在维度组合下的每种维度值组合下均能够统计出一种交易数量,例如,将省份、运营商这两个数据维度组合,得到的维度组合为(地区,运营商),这个维度组合下可以存在几种维度值组合:(地区A,运营商1)、(地区A、运营商2)、(地区B、运营商1)、(地区B、运营商2)等,当然,也可以确定出每个维度下的维度值,也将单个维度下的维度值确定出,也可以将单个维度下的维度值也视为维度值组合,并通过下面提到的计算影响程度及区分度的方式确定出导致指标的维度值,即,(地区)自己可以视为一种维度组合,(运营商)自己也可以视为一种维度组合,地区这种维度组合包括的维度值组合可以包含地区维度下的任意一个维度值,如(地区1)、(地区2)、(地区3)等。Still taking the above Table 1 as an example, assuming that provinces, operators, data centers, and payment methods are all the data dimensions used to count the number of transactions, then through the form of the data to be detected in the above Table 1, any number of data dimensions can be combined , can get the combination of dimensions, and then can count a transaction quantity under each combination of dimension values under the combination of dimensions, for example, combining the two data dimensions of province and operator, the resulting combination of dimensions is (region, operation Provider), there can be several dimension value combinations under this dimension combination: (region A, operator 1), (region A, operator 2), (region B, operator 1), (region B, operator 2) Etc. Of course, the dimension value under each dimension can also be determined, and the dimension value under a single dimension can also be determined, or the dimension value under a single dimension can also be regarded as a combination of dimension values, and through the calculation mentioned below The degree of influence and the degree of differentiation determine the dimension value of the indicator, that is, (region) itself can be regarded as a combination of dimensions, and (operator) itself can also be regarded as a combination of dimensions. This combination of dimensions in the region includes The dimension value combination can contain any dimension value under the region dimension, such as (region 1), (region 2), (region 3), and so on.

S102:针对每个维度值组合,根据所述待检测数据,确定该维度值组合对应的影响程度以及区分度,所述影响程度用于表征该维度值组合在所述待检测数据中所命中的数据出现的异常对全量的所述待检测数据出现的异常的影响程度,所述区分度用于表征该维度值组合在所述待检测数据中所命中的数据的实际数据分布与该维度值组合对应的标准数据分布之间的差异程度。S102: For each dimension value combination, according to the data to be detected, determine the degree of influence and the degree of discrimination corresponding to the combination of dimension values, the degree of influence is used to represent the hit of the combination of dimension values in the data to be detected The degree of influence of the abnormality of the data on the abnormality of the total amount of the data to be detected, and the degree of discrimination is used to characterize the actual data distribution of the data hit by the combination of dimension values in the data to be detected and the combination of dimension values The degree of difference between the corresponding standard data distributions.

S103:根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据所述目标维度值组合,进行异常定位。S103: Determine the target dimension value combination according to the degree of influence and the degree of differentiation corresponding to each dimension value combination, and perform abnormal location according to the target dimension value combination.

确定出有多种维度值组合后,则需要定位到哪个维度值组合是可能导致待检测数据异常(或导致待检测数据波动)的原因的维度值组合,作为目标维度值组合,确定出目标维度值组合后,可以基于得到的目标维度值组合进行异常定位(即,进行异常原因的定位,也就是说,给出的目标维度值组合即可能是待检测数据的异常原因),例如,若是确定出(地区1,运营商1)、(数据中心1,支付方式1)、(运营商2,数据中心1)为目标维度值组合,则这些目标维度值组合可能是导致待检测数据异常的原因,可以进一步地基于这些维度值组合进行异常原因的确定,或异常修复等操作。After determining that there are multiple dimension value combinations, it is necessary to locate which dimension value combination is the dimension value combination that may cause the abnormality of the data to be detected (or cause fluctuations in the data to be detected), as the target dimension value combination, and determine the target dimension After the values are combined, abnormal location can be performed based on the obtained target dimension value combination (that is, to locate the cause of the abnormality, that is to say, the given target dimension value combination may be the abnormal cause of the data to be detected), for example, if it is determined If (region 1, operator 1), (data center 1, payment method 1), (operator 2, data center 1) are target dimension value combinations, then these target dimension value combinations may be the cause of abnormal data to be detected , you can further determine the cause of the abnormality or repair the abnormality based on the combination of these dimension values.

其中,可以针对每个维度值组合,根据该待检测数据,确定该维度值组合对应的影响程度以及区分度,影响程度用于表征该维度值组合在待检测数据中所命中的数据出现的异常对全量的待检测数据的影响程度,区分度用于表征该维度值组合在待检测数据中所命中的数据的实际数据分布,与该维度值组合对应的标准数据分布之间的差异程度,而后,可以根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据目标维度值组合,进行异常定位。Wherein, for each combination of dimension values, according to the data to be detected, the degree of influence and the degree of discrimination corresponding to the combination of dimension values can be determined, and the degree of influence is used to represent the abnormality of the data hit by the combination of dimension values in the data to be detected The degree of influence on the full amount of data to be detected, the degree of discrimination is used to characterize the actual data distribution of the data hit by the dimension value combination in the data to be detected, and the degree of difference between the standard data distribution corresponding to the dimension value combination, and then , the target dimension value combination can be determined according to the degree of influence and discrimination corresponding to each dimension value combination, and abnormal location can be performed according to the target dimension value combination.

上述提到的影响程度可以被计算为该维度值组合下可能产生的异常量在待检测数据的总异常量下的占比,例如,(地区A,运营商1)这个维度值组合下产生的交易数量实际上是50单,而原本在理想情况(如未产生异常的情况)下会产生的交易数量应是100单,100-50=50,那么对于(地区A,运营商1)这个维度值组合来说,自身的异常量为50,通过统计待检测数据确定出了,总的交易数量为500单,而在理想情况下会产生700单交易,那么总异常量为200,则(北京,移动)对应的影响程度为25%(即,50/200)。The degree of influence mentioned above can be calculated as the proportion of the abnormal amount that may be generated under the dimension value combination in the total abnormal amount of the data to be detected, for example, (area A, operator 1) generated under the dimension value combination The number of transactions is actually 50 orders, and the number of transactions that would have been generated in an ideal situation (if no exception occurs) should be 100 orders, 100-50=50, then for the dimension of (region A, operator 1) In terms of value combination, the abnormal amount of itself is 50, which is determined by counting the data to be detected. The total number of transactions is 500, and under ideal circumstances, 700 transactions will be generated, so the total abnormal amount is 200, then (Beijing , move) corresponds to a degree of influence of 25% (ie, 50/200).

区分度可以间接表示出该维度值组合中的数据出现异常的可能性,从上述定义可以看出,区分度为该维度值组合中在待检测数据中所命中的数据在待检测数据中对应的实际数据分布,和该维度值组合的标准数据分布的差异程度,实际数据分布可以间接表示出该维度值组合中的数据在该待检测数据中实际出现的概率,而标准数据分布可以是指该维度值组合在理想情况下产生的数据在总体数据中出现的概率,例如,若待检测数据对应的统计指标为交易数量,则该实际数据分布为该维度值组合下产生的交易量出现在总交易量中的概率分布。而标准数据分布为该维度值组合下产生的交易量在标准情况下出现在总交易量中的概率分布,实际数据分布和标准数据分布的差异程度越大,则该维度值组合下所产生交易数量的实际情况和理想情况差异越大,而实际数据分布和预测数据分布的差异程度越小,则该维度值组合下所产生交易数量的实际情况和理想情况区别较小,因此,该区分度也可以用来确定维度值维度组合是否是目标维度值组合。The degree of discrimination can indirectly indicate the possibility of abnormalities in the data in the combination of dimension values. From the above definition, it can be seen that the degree of discrimination is the corresponding ratio of the hit data in the data to be detected in the combination of dimension values to the data to be detected. The degree of difference between the actual data distribution and the standard data distribution of the dimension value combination, the actual data distribution can indirectly indicate the probability that the data in the dimension value combination actually appears in the data to be tested, and the standard data distribution can refer to the The probability that the data generated by the combination of dimension values appears in the overall data under ideal conditions. For example, if the statistical indicator corresponding to the data to be detected is the number of transactions, the actual data distribution is that the transaction volume generated by the combination of dimension values appears in the total Probability distribution in trading volume. The standard data distribution is the probability distribution that the transaction volume generated under the dimension value combination appears in the total transaction volume under standard conditions. The greater the difference between the actual data distribution and the standard data distribution, the more transactions generated under the dimension value combination The greater the difference between the actual situation and the ideal situation of the quantity, and the smaller the difference between the actual data distribution and the predicted data distribution, the smaller the difference between the actual situation and the ideal situation of the transaction quantity generated under this dimension value combination. Therefore, the degree of discrimination Can also be used to determine whether a dimension-value-dimension combination is a target dimension-value combination.

对于一个维度值组合来说,在计算影响程度和区分度时,均可以先预测产生该待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在该时间段内应产生的数据,作为第二预测数据。For a combination of dimension values, when calculating the degree of influence and the degree of discrimination, it is possible to first predict the data that should be generated within the time period when the data to be detected is generated, as the first predicted data, and predict the combination of dimension values at this time The data that should be generated in the segment is used as the second forecast data.

通过第一预测数据可以得知标准总业务量(F),通过第二预测数据可以得知该维度值组合对应的标准业务量(Fij),其中,ij表示第i个维度组合下的第j个维度值组合,即,表示第ij个维度值组合,通过待检测数据可以得知实际总业务量(A),以及通过待检测数据中该维度值组合对应的数据可以得知,该维度值组合对应的实际业务量(Aij)。The standard total business volume (F) can be known through the first forecast data, and the standard business volume (Fij ) corresponding to the dimension value combination can be known through the second forecast data, where ij represents the i-th dimension combination. The combination of j dimension values, that is, the ijth dimension value combination, the actual total business volume (A) can be known through the data to be detected, and the data corresponding to the dimension value combination in the data to be detected can be known. The actual traffic volume (Aij ) corresponding to the combination of values.

其中,这里提到的标准总业务量(F),可以是指通过预测的方式得出的理想情况下的待检测数据对应的业务总量(如上述提到的交易数量),而标准业务量(Fij)则可以是指通过预测的方式得出的理想情况下维度值组合下能够统计出的待检测数据的业务量。Among them, the standard total business volume (F) mentioned here can refer to the total business volume (such as the number of transactions mentioned above) corresponding to the data to be detected under ideal conditions obtained by forecasting, and the standard business volume (Fij ) may refer to the business volume of the data to be detected that can be counted under the ideal combination of dimension values obtained by prediction.

在预测上述第一预测数据以及第二预测数据时,可以获取与待检测数据对应时间段邻近的时间段中的业务数据,作为对照数据,或,获取预设时间段内的业务数据,作为对照数据,并根据对照数据,以对预测总量以及预测业务量进行预测。When predicting the above-mentioned first prediction data and second prediction data, the business data in the time period adjacent to the time period corresponding to the data to be detected can be obtained as comparison data, or the business data in the preset time period can be obtained as comparison data, and based on the control data, to forecast the forecasted total volume and forecasted business volume.

而后,可以根据第一预测数据,确定在待检测数据内的异常数据量,作为第一异常数据量,以及,根据第二预测数据,确定该维度值组合在待检测数据中对应的数据内的异常数据量,作为第二异常数据量,并根据第二异常数据量在第一异常数据量的占比,确定该维度值组合对应的影响程度。Then, according to the first predicted data, the amount of abnormal data in the data to be detected can be determined as the first abnormal data amount, and, according to the second predicted data, the dimension value combination in the corresponding data in the data to be detected can be determined The amount of abnormal data is used as the second amount of abnormal data, and according to the ratio of the second amount of abnormal data to the first amount of abnormal data, the degree of influence corresponding to the dimension value combination is determined.

即,可以根据通过第一预测数据得知的标准业务量(Fij)与通过待检测数据中目标维度值对应的数据得知的真实业务量(Aij),确定该维度值组合对应的第二异常数据量,并根据预测总业务量(F)与待检测数据对应的标准总业务量(A),确定待检测数据对应的第一异常数据量,进而可以根据该维度值组合对应的第二异常数据量在待检测数据对应的第一异常数据量中的占比,确定该维度值组合对应的影响程度。That is, according to the standard business volume (Fij ) obtained through the first forecast data and the real traffic volume (Aij ) obtained through the data corresponding to the target dimension value in the data to be detected, the first dimension value combination corresponding to the dimension value can be determined. Two abnormal data volumes, and according to the predicted total business volume (F) and the standard total business volume (A) corresponding to the data to be detected, determine the first abnormal data volume corresponding to the data to be detected, and then the corresponding first abnormal data volume can be combined according to the dimension value The proportion of the second abnormal data amount in the first abnormal data amount corresponding to the data to be detected determines the degree of influence corresponding to the dimension value combination.

可以根据该维度值组合在待检测数据中对应的数据在待检测数据中的占比,确定实际数据分布,以及,根据第二预测数据在第一预测数据中的占比,确定标准数据分布,并根据实际数据分布和标准数据分布,确定所述区分度。The actual data distribution can be determined according to the ratio of the dimension value combination in the data to be detected corresponding to the data to be detected, and the standard data distribution can be determined according to the ratio of the second predicted data in the first predicted data, And according to the actual data distribution and the standard data distribution, the degree of discrimination is determined.

即,可根据该维度值组合对应的实际业务量(Aij)在待检测数据对应的实际总业务量(A)中的占比,确定实际数据分布,以及根据标准业务量(Fij)在标准总业务量(F)中的占比,确定标准数据分布,进而根据实际数据分布和标准数据分布,确定该维度值组合对应的区分度,区分度确定的具体方式可采用计算js散度的方式。That is, the actual data distribution can be determined according to the proportion of the actual business volume (Aij ) corresponding to the dimension value combination in the actual total business volume (A) corresponding to the data to be detected, and the standard business volume (Fij ) in Determine the standard data distribution according to the proportion of the standard total business volume (F), and then determine the degree of discrimination corresponding to the combination of the dimension values according to the actual data distribution and the standard data distribution. Way.

具体计算影响程度的公式可以如下所示:The specific formula for calculating the degree of influence can be shown as follows:

EPij=((Aij-Fij)/(A-F))EPij =((Aij -Fij )/(AF))

具体计算区分度的公式可以如下所示:The specific formula for calculating the discrimination can be shown as follows:

pij=Fij/Fpij =Fij /F

qij=Aij/Aqij =Aij /A

Figure BDA0003896301810000071
Figure BDA0003896301810000071

其中,EPij为影响程度,Sij为区分度,pij为标准数据分布,qij为实际数据分布。Among them, EPij is the degree of influence, Sij is the degree of discrimination, pij is the standard data distribution, and qij is the actual data distribution.

确定出任意一个维度值组合对应的影响程度和区分度后,可以根据各维度值组合对应的影响程度和区分度,来确定出目标维度值组合。After determining the degree of influence and degree of discrimination corresponding to any combination of dimension values, the target combination of dimension values can be determined according to the degree of influence and degree of discrimination corresponding to each combination of dimension values.

需要说明的是,之所以在确定目标维度值组合时,既需要考虑影响程度,也需要考虑区分度,是因为,若是维度值组合对应的影响程度较高,但是这个维度值组合可能并不一定是是异常原因,例如,(北京)这个维度值组合下的交易数量与(北京,联通)(北京,移动)(北京,二维码转账)(北京,联通,二维码转账)这些维度值组合的交易数量之间是相互影响的,而包含有相同维度值的维度值组合的影响程度实际上是会互相影响的,在其中一些影响程度都较高时,可能会区分不出来到底哪些维度值组合才是可能的异常原因,而若是一个维度值组合区分度较高,但是影响程度不高的话,也可能不是异常原因,例如,若一个维度值组合对应的区分度为80%,但是影响程度特别低,仅有0.2%,则即使该维度值组合确实存在异常,但该维度值组合也不会是导致异常的原因。It should be noted that when determining the target dimension value combination, both the degree of influence and the degree of discrimination need to be considered. This is because, if the dimension value combination corresponds to a higher degree of influence, this dimension value combination may not necessarily Is the reason for the abnormality, for example, the number of transactions under the dimension value combination of (Beijing) and the dimension values of (Beijing, China Unicom) (Beijing, China Mobile) (Beijing, QR code transfer) (Beijing, China Unicom, QR code transfer) The transaction quantities of a combination affect each other, and the degree of influence of the combination of dimension values containing the same dimension value will actually affect each other. When some of them have a high degree of influence, it may not be possible to distinguish which dimensions The combination of values is the possible reason for the abnormality, and if a combination of dimension values has a high degree of discrimination, but the degree of influence is not high, it may not be the cause of the abnormality. For example, if a combination of dimension values corresponds to a degree of discrimination of 80%, but affects If the degree is extremely low, only 0.2%, even if there is indeed an abnormality in the combination of dimension values, this combination of dimension values will not be the cause of the abnormality.

其中,确定目标维度值组合的方法可以存在多种,例如,可以将影响程度和区分度均高于一定数值的维度值组合,作为目标维度值组合,但是这种方式需要一次性将所有的维度值组合的影响程度和区分度均确定出,在效率上可能会较低。Among them, there are many ways to determine the combination of target dimension values. For example, the combination of dimension values whose influence degree and degree of discrimination are both higher than a certain value can be combined as the target dimension value combination, but this method needs to combine all dimensions The degree of influence and discrimination of value combinations are determined, and the efficiency may be lower.

因此,在此给出一种,能够在一定程度上提高效率的方式,即,可以针对确定出的每个维度组合,根据该维度组合包含的各维度值组合对应的影响程度,筛选出该维度组合下的候选维度值组合,再根据筛选出的各候选维度值组合对应的区分度,确定目标维度值组合,其中,维度组合中包括至少一个数据维度。Therefore, here is a method that can improve efficiency to a certain extent, that is, for each dimension combination determined, the dimension can be screened out according to the degree of influence corresponding to each dimension value combination contained in the dimension combination For the combination of candidate dimension values under the combination, the target dimension value combination is determined according to the degree of discrimination corresponding to each selected candidate dimension value combination, wherein the dimension combination includes at least one data dimension.

具体的,在筛选时,将该维度组合包含的各维度值组合按照区分度从大到小进行排序,并按照排序的顺序,依次将影响程度不小于确定出的预设影响程度的维度值组合加入到该维度组合对应的集合中,直至加入到集合中的维度值组合对应的影响程度总量不小于确定出的预设总量为止,得到该维度组合对应的目标集合,而后,可以将目标集合内的维度值组合,作为该维度组合下的候选维度值组合。Specifically, when screening, the dimension value combinations contained in the dimension combination are sorted from large to small according to the degree of discrimination, and according to the order of sorting, the dimension value combinations whose influence degree is not less than the determined preset influence degree are sequentially sorted Add it to the set corresponding to the combination of dimensions until the total amount of influence corresponding to the combination of dimension values added to the set is not less than the determined preset total amount, and the target set corresponding to the combination of dimensions is obtained, and then the target can be The combination of dimension values in the set is used as the combination of candidate dimension values under this dimension combination.

也就是说,这一方式是,在筛选时,是从依次每个维度组合中,筛选出该维度组合中的候选维度值组合,由于一个维度组合中包含有所有维度值组合是互斥的,那么,将这些维度值组合下的业务量叠加起来就是待检测数据对应的统计指标总量。因此,一个维度组合下的各维度值组合对应的影响程度也是可以叠加的,将各维度值组合按照区分度从大到小排列后,会从区分度最大的开始判断,该维度值组合对应的影响程度是否大于第一阈值,这可以将影响程度较小的维度值组合排除出去,并且,当集合中的维度值组合的影响程度总量到达一定值时,还未放进集合中的维度值组合也就被排除出去了,这样也相当于筛掉了一些影响程度较小并且区分度也较小的维度值组合。That is to say, in this method, when screening, the candidate dimension value combinations in each dimension combination are screened out in turn. Since all dimension value combinations contained in a dimension combination are mutually exclusive, Then, the total amount of statistical indicators corresponding to the data to be detected is obtained by superimposing the business volume under the combination of these dimension values. Therefore, the degree of influence corresponding to each combination of dimension values under a combination of dimensions can also be superimposed. After arranging the combinations of dimension values from large to small according to the degree of discrimination, it will be judged from the one with the highest degree of discrimination. Whether the influence degree is greater than the first threshold, which can exclude dimension value combinations with less influence, and when the total influence degree of the dimension value combinations in the set reaches a certain value, the dimension values that have not been put in the set Combinations are also excluded, which is equivalent to screening out some dimension value combinations with less influence and less discrimination.

上述提到的一个维度组合中包含有所有维度值组合是互斥的意思是,对于一个维度组合来说,该维度组合下包含的每个维度值组合下的数据是不存在交集的,例如,对于(运营商,地区)这个维度组合来说,维度值组合可以包括(运营商1,地区1),(运营商2,地区1),(运营商1,地区2),(运营商2,地区1)。这些维度值组合之间的交易数据是不存在交集的,并且若这些维度值组合是该维度组合下的所有维度值组合,那么这些维度值组合对应的交易量的总和,是待检测数据对应的交易总量。The above-mentioned combination of dimension values contained in a combination of dimensions is mutually exclusive, which means that for a combination of dimensions, the data under each combination of dimension values contained in the combination of dimensions does not have intersection, for example, For the dimension combination of (operator, region), the dimension value combination can include (operator 1, region 1), (operator 2, region 1), (operator 1, region 2), (operator 2, Region 1). There is no intersection of transaction data between these dimension value combinations, and if these dimension value combinations are all dimension value combinations under the dimension value combination, then the sum of the transaction volumes corresponding to these dimension value combinations is the corresponding value of the data to be detected Total transaction volume.

其中,上述这种筛选维度值组合的方式,类似于本说明书提供的图2中的方式。Wherein, the above-mentioned manner of screening dimension value combinations is similar to the manner in FIG. 2 provided in this specification.

图2为本说明书提供的一种以交易数量为例对各维度值组合进行筛选的方式示意图。FIG. 2 is a schematic diagram of a method for screening combinations of dimension values provided by this specification, taking transaction quantity as an example.

从图2中可以看出,当总交易量存在异常时,可以依次确定出每个维度组合下的维度值组合有没有可能是目标维度值组合,即,可以像图2中先确定地区维度下的每个维度值组合:地区1、地区2,再确定机房维度下的每个维度值组合:机房1、机房2,最后确定支付方式维度下的每个维度值组合:支付方式1、支付方式2,从图2中可以看出,机房1以及支付方式1下的交易量均有可能是出现了异常,那么,可以进而来确定机房+支付方式这个维度组合下的维度值组合是否有可能存在目标维度值组合。It can be seen from Figure 2 that when the total transaction volume is abnormal, it can be determined in turn whether the dimension value combination under each dimension combination may be the target dimension value combination, that is, it can be determined first under the region dimension as shown in Figure 2 Combination of each dimension value in: region 1, region 2, then determine the combination of each dimension value under the computer room dimension: computer room 1, computer room 2, and finally determine the combination of each dimension value under the payment method dimension: payment method 1, payment method 2. From Figure 2, it can be seen that the transaction volume under computer room 1 and payment method 1 may be abnormal. Then, it can be further determined whether the combination of dimension values under the dimension combination of computer room + payment method may exist Target dimension value combination.

因此,在实际应用中,筛选候选维度值组合时,可以先对包含有一个维度的维度组合下的维度值组合进行筛选,而后,依次累加维度数量,即,下一次可以筛选包含有两个维度的维度组合下的维度值组合,接下来可以是三个,而若是在筛选包含有一个维度的维度组合下的维度值组合时,在这个维度组合中无法确定出能够作为候选维度值组合的维度值组合,则可以在接下来的筛选中,将这个维度组合对应的数据维度排除在外,即,后续的维度组合中不应包含该数据维度。Therefore, in practical applications, when screening candidate dimension value combinations, you can first filter the dimension value combinations under the dimension combination that contains one dimension, and then accumulate the number of dimensions in turn, that is, the next time you can filter the combinations that contain two dimensions Dimension value combinations under the dimension combination, there can be three next, and if you filter the dimension value combination under the dimension combination that contains one dimension, it is impossible to determine the dimension that can be used as a candidate dimension value combination in this dimension combination value combination, the data dimension corresponding to this dimension combination can be excluded in the next filtering, that is, the data dimension should not be included in subsequent dimension combinations.

当各维度组合对应的集合均确定出后,各维度组合对应的集合中包含的维度值组合即可以为候选维度值组合,得到候选维度值组合后,可以将区分度排在top-n的维度值组合,作为目标维度值组合,n可以为设定数值,例如,可以选取前3个区分度较大的维度值组合,作为目标维度值组合。When the sets corresponding to each dimension combination are determined, the combination of dimension values contained in the set corresponding to each dimension combination can be regarded as the combination of candidate dimension values. After obtaining the combination of candidate dimension values, the degree of discrimination can be ranked in the top-n dimension Value combination, as the target dimension value combination, n can be a set value, for example, you can select the first three dimension value combinations with high discrimination as the target dimension value combination.

需要说明的是,上述第一阈值和第二阈值可以根据待检测数据的异常程度进行自适应地调整,也就是说,可以根据待检测数据对应的异常量,确定第一阈值以及第二阈值,其中,该总异常值越高,第一阈值与第二阈值越高,总异常值越低,则第一阈值与第二阈值越低。It should be noted that the above-mentioned first threshold and second threshold can be adaptively adjusted according to the degree of abnormality of the data to be detected, that is, the first threshold and the second threshold can be determined according to the amount of abnormality corresponding to the data to be detected, Wherein, the higher the total abnormal value, the higher the first threshold and the second threshold, and the lower the total abnormal value, the lower the first threshold and the second threshold.

从上述方法中可以看出,本方法可以结合各维度值组合对应的影响程度以及区分度,确定哪个维度值组合可能是导致待检测数据出现异常的原因,相比于现有技术通常是仅定位到单一因素,本方案能够更加全面且准确的将原因定位到若干个维度值组合,即,定位到多个维度下的因素组合上,从而更方便相关的技术人员进行异常定位。It can be seen from the above method that this method can combine the degree of influence and discrimination corresponding to each dimension value combination to determine which dimension value combination may be the cause of the abnormality of the data to be detected. Compared with the existing technology, which usually only locates For a single factor, this solution can more comprehensively and accurately locate the cause to a combination of several dimension values, that is, to a combination of factors in multiple dimensions, so that it is more convenient for relevant technical personnel to locate abnormalities.

需要说明的是,计算维度值组合对应的区分度的方式优选为计算js散度的方式,当然,也可以通过其他的计算方式进行计算,其中,之所以优选通过js散度来计算,是由以下证明过程证明得出了,通过js散度计算出的维度值组合对应的区分度满足,当一个维度值组合有可能为目标维度值组合时,该维度值组合的区分度会大于其他的包含有该维度值组合中每个维度值的维度值组合,或是大于包含有维度值组合中部分维度值的维度值组合。It should be noted that the method of calculating the degree of discrimination corresponding to the combination of dimension values is preferably the method of calculating js divergence. Of course, other calculation methods can also be used for calculation. Among them, the reason why it is preferred to calculate by js divergence is that The following proof process proves that the degree of discrimination corresponding to the combination of dimension values calculated by js divergence is satisfied. When a combination of dimension values is likely to be the combination of target dimension values, the degree of discrimination of this combination of dimension values will be greater than that of other combinations. A combination of dimension values that has every dimension value in the combination of dimension values, or a combination of dimension values that is larger than a combination that includes some of the values in the combination of dimension values.

在证明前,首先引入子节点和父节点的概念,对于一个维度值组合来说,该维度值组合的父节点为包含有维度值组合中部分维度值的维度值组合,子节点为包含有该维度值组合中每个维度值的维度值组合,例如,对于(地区A,运营商1)这个维度值组合来说,该维度值组合的父节点有(地区A)、(运营商1),子节点有(地区A,运营商1,支付方式1)。Before the proof, first introduce the concepts of child nodes and parent nodes. For a dimension value combination, the parent node of the dimension value combination is the dimension value combination that contains some dimension values in the dimension value combination, and the child node is the dimension value combination that contains the dimension value combination. The dimension value combination of each dimension value in the dimension value combination. For example, for the dimension value combination (region A, operator 1), the parent nodes of the dimension value combination are (region A), (operator 1), The child node has (area A, operator 1, payment method 1).

证明1:原因节点(导致待检测数据出现异常的原因的维度值组合,即,目标维度值组合)对应的区分度S大于其对应所有父节点。Proof 1: The discriminative degree S corresponding to the cause node (the dimension value combination that causes the abnormality of the data to be detected, that is, the target dimension value combination) is greater than all its corresponding parent nodes.

当某一维度值组合是目标维度值组合时,异常变化从子节点传播,其δ=Aij-Fij,假设δ<0,反之同理。When a certain dimension value combination is the target dimension value combination, the abnormal change is propagated from the child nodes, and its δ=Aij -Fij , assuming δ<0, and vice versa.

父节点的变化量等于子节点即δparent=δchild,且显然,Pparent>Pcause,qparent>qcauseThe variation of the parent node is equal to the child node, that is, δparent = δchild , and obviously, Pparent >Pcause , qparent >qcause

进而容易推出父节点波动小于子节点波动即Then it is easy to deduce that the fluctuation of the parent node is smaller than the fluctuation of the child node, that is,

Pcause-qcause>Pparent-qparentPcause -qcause >Pparent -qparent

Pcause/qcause>Pparent/qparent(cond)Pcause /qcause >Pparent /qparent (cond)

Figure BDA0003896301810000101
Figure BDA0003896301810000101

且根据公式(1)和不等式(cond)可知,区分度S由前项

Figure BDA0003896301810000102
与后项
Figure BDA0003896301810000103
加和组成,前项大于0,同时正比于(p-q)和p/q,当原因发生在某一组合维度的维度值组合cause时,cause节点前项大于0且大于父节点前项。后项小于0,且根因节点后项大于父节点后项。因此,维度值组合cause对应的区分度大于其父节点对应的区分度。证毕。And according to the formula (1) and the inequality (cond), it can be seen that the degree of discrimination S is determined by the previous term
Figure BDA0003896301810000102
with the latter
Figure BDA0003896301810000103
Addition composition, the previous item is greater than 0, and is proportional to (pq) and p/q at the same time. When the cause occurs in the dimension value combination cause of a certain combination dimension, the previous item of the cause node is greater than 0 and greater than the previous item of the parent node. The successor is less than 0, and the successor of the root cause node is greater than the successor of the parent node. Therefore, the discrimination degree corresponding to the dimension value combination cause is greater than that corresponding to its parent node. Certificate completed.

证明2:根因节点对应的区分度S大于其对应所有子节点。Proof 2: The discrimination degree S corresponding to the root cause node is greater than its corresponding all child nodes.

很显然,

Figure BDA0003896301810000104
根因节点发生时,其子节点的p和q均等比例变化,即obviously,
Figure BDA0003896301810000104
When the root cause node occurs, the p and q of its child nodes change in equal proportion, namely

Figure BDA0003896301810000105
Figure BDA0003896301810000105

因此,

Figure BDA0003896301810000106
根因节点对应的区分度同样大于其子节点。therefore,
Figure BDA0003896301810000106
The degree of discrimination corresponding to the root cause node is also greater than that of its child nodes.

证毕。Certificate completed.

还需说明的是,在上述内容中可以看出本方法主要用于检测各维度值组合中导致数据出现异常原因的目标维度值组合,但是,本方法还可以应用在检测某个维度值是否是导致数据出现异常的原因,即,将本方法中的维度值组合替换为单个维度值即可,而确定导致数据出现异常的原因的单个维度值的方式与上述内容基本相同的,在此就不详细赘述了。It should also be noted that, from the above content, it can be seen that this method is mainly used to detect the target dimension value combination that causes abnormal data in each dimension value combination, but this method can also be applied to detect whether a certain dimension value is The cause of the abnormal data, that is, just replace the combination of dimension values in this method with a single dimension value, and the method of determining the single dimension value that causes the data abnormality is basically the same as the above content, so we will not discuss it here described in detail.

以上为本说明书的一个或多个实施例提供的异常定位的方法,基于同样的思路,本说明书还提供了异常定位的装置,如图3所示。The above is the method for locating abnormality provided by one or more embodiments of this specification. Based on the same idea, this specification also provides a device for locating abnormality, as shown in FIG. 3 .

图3为本说明书提供的一种异常定位的装置示意图,具体包括:Figure 3 is a schematic diagram of an abnormal location device provided in this specification, specifically including:

获取模块301,用于获取待检测数据,以及将不同数据维度中包含的各维度值进行相互组合,得到各维度值组合,其中,每个维度值组合中包含有至少两个维度值,每个维度值来自不同的数据维度;The obtainingmodule 301 is used to obtain the data to be detected, and combine the dimension values contained in different data dimensions to obtain combinations of dimension values, wherein each combination of dimension values contains at least two dimension values, each Dimension values come from different data dimensions;

确定模块302,用于针对每个维度值组合,根据所述待检测数据,确定该维度值组合对应的影响程度以及区分度,所述影响程度用于表征该维度值组合在所述待检测数据中所命中的数据出现的异常对全量的所述待检测数据出现的异常的影响程度,所述区分度用于表征该维度值组合在所述待检测数据中所命中的数据的实际数据分布与该维度值组合对应的标准数据分布之间的差异程度;The determiningmodule 302 is configured to, for each dimension value combination, determine the degree of influence and the degree of discrimination corresponding to the combination of dimension values according to the data to be detected, and the degree of influence is used to characterize the combination of dimension values in the data to be detected The degree of influence of the abnormality of the hit data in the data to be detected has an abnormality in the total amount of the data to be detected, and the degree of discrimination is used to characterize the actual data distribution and The degree of difference between the standard data distributions corresponding to the dimension value combination;

定位模块303,用于根据各维度值组合对应的影响程度以及区分度,确定目标维度值组合,并根据所述目标维度值组合,进行异常定位。The locatingmodule 303 is configured to determine a target dimensional value combination according to the degree of influence and differentiation corresponding to each dimensional value combination, and perform abnormal location according to the target dimensional value combination.

可选地,所述确定模块302具体用于,预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在所述时间段内应产生的数据,作为第二预测数据;根据所述第一预测数据,确定在所述待检测数据内的异常数据量,作为第一异常数据量,以及,根据所述第二预测数据,确定该维度值组合在所述待检测数据中所命中的数据内的异常数据量,作为第二异常数据量;根据所述第二异常数据量在所述第一异常数据量的占比,确定该维度值组合对应的影响程度。Optionally, thedetermination module 302 is specifically configured to predict the data that should be generated within the time period when the data to be detected is generated as the first predicted data, and predict the data that should be generated within the time period for the dimension value combination , as the second prediction data; according to the first prediction data, determine the amount of abnormal data in the data to be detected as the first abnormal data amount, and, according to the second prediction data, determine the dimension value combination The amount of abnormal data in the hit data in the data to be detected is used as the second abnormal data amount; according to the proportion of the second abnormal data amount in the first abnormal data amount, determine the corresponding dimension value combination degree of influence.

可选地,所述确定模块302具体用于,预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,以及,预测该维度值组合在所述时间段内应产生的数据,作为第二预测数据;根据该维度值组合在所述待检测数据中所命中的数据在所述待检测数据中的占比,确定所述实际数据分布,以及,根据所述第二预测数据在所述第一预测数据中的占比,确定所述标准数据分布;根据所述实际数据分布和所述标准数据分布,确定所述区分度。Optionally, thedetermination module 302 is specifically configured to predict the data that should be generated within the time period when the data to be detected is generated as the first predicted data, and predict the data that should be generated within the time period for the dimension value combination , as the second prediction data; according to the proportion of the data hit by the dimension value combination in the data to be detected in the data to be detected, determine the distribution of the actual data, and, according to the second prediction data The standard data distribution is determined according to the proportion in the first forecast data; and the degree of differentiation is determined according to the actual data distribution and the standard data distribution.

可选地,所述确定模块302具体用于,获取与所述时间段邻近的时间段中的数据,作为参考数据;根据所述参考数据,得到所述第一预测数据以及所述第二预测数据。Optionally, the determiningmodule 302 is specifically configured to acquire data in a time period adjacent to the time period as reference data; obtain the first forecast data and the second forecast data according to the reference data data.

可选地,所述检测模块303具体用于,针对确定出的每个维度组合,根据该维度组合包含的各维度值组合对应的影响程度,筛选出该维度组合下的候选维度值组合,其中,维度组合中包括至少两个数据维度;根据各候选维度值组合对应的区分度,确定目标维度值组合。Optionally, thedetection module 303 is specifically configured to, for each determined dimension combination, filter out candidate dimension value combinations under the dimension combination according to the degree of influence corresponding to each dimension value combination included in the dimension combination, wherein , the dimension combination includes at least two data dimensions; according to the degree of discrimination corresponding to each candidate dimension value combination, determine the target dimension value combination.

可选地,所述检测模块303具体用于,将该维度组合包含的各维度值组合按照区分度从大到小进行排序,并按照排序的顺序,依次将影响程度不小于确定出的预设影响程度的维度值组合加入到该维度组合对应的集合中,直至加入到所述集合中的维度值组合对应的影响程度总量不小于确定出的预设总量为止,得到该维度组合对应的目标集合;将所述目标集合内的维度值组合,作为该维度组合下的候选维度值组合。Optionally, thedetection module 303 is specifically configured to sort the dimension value combinations contained in the dimension combination from large to small according to the degree of discrimination, and sequentially rank the influence degree not less than the determined preset The dimension value combination of the degree of influence is added to the set corresponding to the dimension combination, until the total amount of influence degree corresponding to the dimension value combination added to the set is not less than the determined preset total amount, and the corresponding dimension value of the dimension combination is obtained. A target set: combine the dimension value combinations in the target set as candidate dimension value combinations under the dimension combination.

可选地,所述检测模块303具体用于,预测产生所述待检测数据的时间段内应生成的数据,作为第一预测数据,并根据所述第一预测数据,确定在所述待检测数据内的异常数据量,作为第一异常数据量;根据所述第一异常数据量,确定所述预设影响程度以及所述预设总量。Optionally, thedetection module 303 is specifically configured to predict the data that should be generated within the time period during which the data to be detected is generated as the first predicted data, and determine the data to be detected in the data to be detected according to the first predicted data The amount of abnormal data within is used as the first amount of abnormal data; according to the first amount of abnormal data, the preset degree of influence and the preset total amount are determined.

本说明书还提供了一种计算机可读存储介质,该存储介质存储有计算机程序,计算机程序可用于执行上述异常定位的方法。This specification also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to execute the above method for abnormal location.

本说明书还提供了图4所示的电子设备的示意结构图。如图4所述,在硬件层面,该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器,当然还可能包括其他业务所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,以实现上述异常定位的方法。当然,除了软件实现方式之外,本说明书并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。This specification also provides a schematic structural diagram of the electronic device shown in FIG. 4 . As shown in FIG. 4 , at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and of course may also include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it, so as to realize the above method for locating the abnormality. Of course, in addition to the software implementation, this specification does not exclude other implementations, such as logic devices or the combination of software and hardware, etc., that is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic device.

在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable GateArray,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware DescriptionLanguage)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(RubyHardware Description Language)等,目前最普遍使用的是VHDL(Very-High-SpeedIntegrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished as an improvement in hardware (for example, improvements in circuit structures such as diodes, transistors, and switches) or improvements in software (improvement in method flow). However, with the development of technology, the improvement of many current method flows can be regarded as the direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (Programmable Logic Device, PLD) (such as a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit, and its logic function is determined by programming the device by a user. It is programmed by the designer to "integrate" a digital system on a PLD, instead of asking a chip manufacturer to design and make a dedicated integrated circuit chip. Moreover, nowadays, instead of making integrated circuit chips by hand, this kind of programming is mostly realized by "logic compiler (logic compiler)" software, which is similar to the software compiler used when writing programs. The original code of the computer must also be written in a specific programming language, which is called a hardware description language (Hardware Description Language, HDL), and there is not only one kind of HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., currently the most commonly used is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that only a little logical programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain a hardware circuit for realizing the logic method flow.

控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable way, for example the controller may take the form of a microprocessor or processor and a computer readable medium storing computer readable program code (such as software or firmware) executable by the (micro)processor , logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that, in addition to realizing the controller in a purely computer-readable program code mode, it is entirely possible to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as structures within the hardware component. Or even, means for realizing various functions can be regarded as a structure within both a software module realizing a method and a hardware component.

上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Combinations of any of these devices.

为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本说明书时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing this specification, the functions of each unit can be implemented in one or more pieces of software and/or hardware.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read only memory (ROM) or flash RAM. Memory is an example of computer readable media.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本领域技术人员应明白,本说明书的实施例可提供为方法、系统或计算机程序产品。因此,本说明书可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of this specification may be provided as methods, systems or computer program products. Accordingly, this description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书,在这些分布式计算环境中,由通过通信网络而被连接的远程处理节点来执行任务。在分布式计算环境中,程序模块可以位于包括存储节点在内的本地和远程计算机存储介质中。The specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The present description may also be practiced in distributed computing environments where tasks are performed by remote processing nodes that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage nodes.

本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.

以上所述仅为本说明书的实施例而已,并不用于限制本说明书。对于本领域技术人员来说,本说明书可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本说明书的权利要求范围之内。The above descriptions are only examples of this specification, and are not intended to limit this specification. For those skilled in the art, various modifications and changes may occur in this description. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification shall be included within the scope of the claims of this specification.

Claims (10)

1. A method of anomaly location, comprising:
acquiring data to be detected, and mutually combining all dimension values contained in different data dimensions to obtain all dimension value combinations, wherein each dimension value combination comprises at least two dimension values, and each dimension value is from different data dimensions;
for each dimension value combination, determining an influence degree and a discrimination degree corresponding to the dimension value combination according to the data to be detected, wherein the influence degree is used for representing the influence degree of the abnormality of the data hit in the data to be detected by the dimension value combination on the abnormality of the full amount of the data to be detected, and the discrimination degree is used for representing the difference degree between the actual data distribution of the data hit in the data to be detected by the dimension value combination and the standard data distribution corresponding to the dimension value combination;
and determining a target dimension value combination according to the influence degree and the discrimination degree corresponding to each dimension value combination, and performing abnormal positioning according to the target dimension value combination.
2. The method according to claim 1, wherein determining the influence degree corresponding to the combination of the dimension values according to the data to be detected comprises:
predicting data which should be generated in a time period for generating the data to be detected as first prediction data, and predicting data which should be generated in the time period by combining the dimension values as second prediction data;
determining an abnormal data volume in the data to be detected as a first abnormal data volume according to the first prediction data, and determining an abnormal data volume in the data hit by the dimension value combination in the data to be detected as a second abnormal data volume according to the second prediction data;
and determining the influence degree corresponding to the dimension value combination according to the proportion of the second abnormal data volume to the first abnormal data volume.
3. The method according to claim 1, wherein determining the discrimination corresponding to the combination of the dimension values according to the data to be detected comprises:
predicting data which should be generated in a time period for generating the data to be detected as first prediction data, and predicting data which should be generated in the time period by the dimension value combination as second prediction data;
determining the actual data distribution according to the proportion of the hit data in the data to be detected in the combination of the dimension values in the data to be detected, and determining the standard data distribution according to the proportion of the second prediction data in the first prediction data;
and determining the discrimination according to the actual data distribution and the standard data distribution.
4. A method according to claim 2 or 3, predicting data that should be generated within a time period in which the data to be detected is generated as first prediction data, and predicting data that should be generated within the time period in which the dimension value combination is combined as second prediction data, comprising:
acquiring data in a time period adjacent to the time period as reference data;
and obtaining the first prediction data and the second prediction data according to the reference data.
5. The method of claim 1, wherein determining the target combination of dimensional values according to the degree of influence and the degree of discrimination corresponding to each combination of dimensional values comprises:
aiming at each determined dimension combination, screening out a candidate dimension value combination under the dimension combination according to the influence degree corresponding to each dimension value combination contained in the dimension combination, wherein the dimension combination comprises at least two data dimensions;
and determining a target dimension value combination according to the discrimination corresponding to each candidate dimension value combination.
6. The method of claim 5, wherein screening out candidate combinations of dimension values under the dimension combination according to the influence degrees corresponding to the dimension value combinations included in the dimension combination comprises:
sorting all the dimension value combinations contained in the dimension combination from large to small according to the discrimination, and sequentially adding the dimension value combinations with the influence degrees not less than the determined preset influence degrees into a set corresponding to the dimension combination according to the sorting sequence until the total influence degrees corresponding to the dimension value combinations added into the set are not less than the determined preset total, so as to obtain a target set corresponding to the dimension combination;
and combining the dimension values in the target set to serve as a candidate dimension value combination under the dimension combination.
7. The method of claim 6, determining the preset influence level and the preset total amount, comprising:
predicting data which should be generated in a time period for generating the data to be detected to serve as first prediction data, and determining abnormal data volume in the data to be detected to serve as first abnormal data volume according to the first prediction data;
and determining the preset influence degree and the preset total amount according to the first abnormal data volume.
8. An apparatus for anomaly location, comprising:
the acquisition module is used for acquiring data to be detected and mutually combining all dimension values contained in different data dimensions to obtain all dimension value combinations, wherein each dimension value combination comprises at least two dimension values, and each dimension value is from different data dimensions;
the determining module is used for determining the influence degree and the discrimination degree corresponding to each dimension value combination according to the data to be detected, wherein the influence degree is used for representing the influence degree of the abnormality of the data hit in the data to be detected by the dimension value combination on the abnormality of the full amount of the data to be detected, and the discrimination degree is used for representing the difference degree between the actual data distribution of the data hit in the data to be detected by the dimension value combination and the standard data distribution corresponding to the dimension value combination;
and the positioning module is used for determining a target dimension value combination according to the influence degree and the discrimination corresponding to each dimension value combination and carrying out abnormal positioning according to the target dimension value combination.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1 to 7 when executing the program.
CN202211275355.2A2022-10-182022-10-18 A method, device, storage medium and electronic device for locating abnormalityActiveCN115659276B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211275355.2ACN115659276B (en)2022-10-182022-10-18 A method, device, storage medium and electronic device for locating abnormality

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211275355.2ACN115659276B (en)2022-10-182022-10-18 A method, device, storage medium and electronic device for locating abnormality

Publications (2)

Publication NumberPublication Date
CN115659276Atrue CN115659276A (en)2023-01-31
CN115659276B CN115659276B (en)2025-03-28

Family

ID=84989809

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211275355.2AActiveCN115659276B (en)2022-10-182022-10-18 A method, device, storage medium and electronic device for locating abnormality

Country Status (1)

CountryLink
CN (1)CN115659276B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118536009A (en)*2024-07-242024-08-23湖北华中电力科技开发有限责任公司 Power data model construction method and system based on generative artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106815255A (en)*2015-11-272017-06-09阿里巴巴集团控股有限公司The method and device of detection data access exception
US20170187737A1 (en)*2015-12-282017-06-29Le Holdings (Beijing) Co., Ltd.Method and electronic device for processing user behavior data
CN108346011A (en)*2018-05-152018-07-31阿里巴巴集团控股有限公司Index fluction analysis method and device
WO2020039610A1 (en)*2018-08-202020-02-27日本電信電話株式会社Abnormality factor deduction device, abnormality factor deduction method, and program
CN112015995A (en)*2020-09-292020-12-01北京百度网讯科技有限公司Data analysis method, device, equipment and storage medium
CN112949983A (en)*2021-01-292021-06-11北京达佳互联信息技术有限公司Root cause determination method and device
CN113553208A (en)*2021-07-192021-10-26神策网络科技(北京)有限公司 Anomaly dimension determination method for data outliers
CN115018106A (en)*2021-03-042022-09-06腾讯科技(深圳)有限公司Anomaly analysis method, device, equipment and computer-readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106815255A (en)*2015-11-272017-06-09阿里巴巴集团控股有限公司The method and device of detection data access exception
US20170187737A1 (en)*2015-12-282017-06-29Le Holdings (Beijing) Co., Ltd.Method and electronic device for processing user behavior data
CN108346011A (en)*2018-05-152018-07-31阿里巴巴集团控股有限公司Index fluction analysis method and device
WO2020039610A1 (en)*2018-08-202020-02-27日本電信電話株式会社Abnormality factor deduction device, abnormality factor deduction method, and program
CN112015995A (en)*2020-09-292020-12-01北京百度网讯科技有限公司Data analysis method, device, equipment and storage medium
CN112949983A (en)*2021-01-292021-06-11北京达佳互联信息技术有限公司Root cause determination method and device
CN115018106A (en)*2021-03-042022-09-06腾讯科技(深圳)有限公司Anomaly analysis method, device, equipment and computer-readable storage medium
CN113553208A (en)*2021-07-192021-10-26神策网络科技(北京)有限公司 Anomaly dimension determination method for data outliers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨雅辉;杜克明;: "全网异常流量簇的检测与确定机制", 计算机研究与发展, no. 11, 15 November 2009 (2009-11-15)*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118536009A (en)*2024-07-242024-08-23湖北华中电力科技开发有限责任公司 Power data model construction method and system based on generative artificial intelligence

Also Published As

Publication numberPublication date
CN115659276B (en)2025-03-28

Similar Documents

PublicationPublication DateTitle
TWI679592B (en) Method and device for generating risk control rules
CN111324867B (en)Suspected risk transaction determination method, device and equipment
TWI709931B (en) Method, device and electronic equipment for detecting indicator abnormality
TWI745589B (en) Risk feature screening, description message generation method, device and electronic equipment
CN108305158A (en)A kind of method, apparatus and equipment of trained air control model and air control
KR20190075083A (en) Method and apparatus for automatic processing of risk control events
CN112395179B (en)Model training method, disk prediction method, device and electronic equipment
CN113505942A (en)Project engineering cost estimation method and device, electronic equipment and storage medium
WO2021120845A1 (en)Homogeneous risk unit feature set generation method, apparatus and device, and medium
CN107633015A (en)A kind of data processing method, device and equipment
CN113516453A (en)Construction project investment fund control early warning method, device, equipment and medium
CN105893224B (en)A kind of resource measurement method and device
WO2024139255A1 (en)Root cause positioning method and apparatus, and device and readable medium
CN109039695B (en)Service fault processing method, device and equipment
CN115659276A (en)Method and device for positioning abnormity, storage medium and electronic equipment
CN106033574A (en)Identification method and identification device for cheating behavior
CN114611850A (en)Service analysis method and device and electronic equipment
CN115686909A (en) Memory failure prediction method and device, storage medium and electronic device
CN111164633B (en)Method and device for adjusting scoring card model, server and storage medium
CN110008386B (en)Data generation, processing and evaluation method, device, equipment and medium
CN111967767A (en)Business risk identification method, device, equipment and medium
TW201935341A (en)Processing method, device and equipment for wind control instructions
CN118312377A (en)Alarm threshold determining method, equipment and storage medium
CN115756782A (en) A large-scale alarm arming method, device and equipment
TWI718690B (en) Model merging method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp