技术领域technical field
本发明涉及信息技术领域,尤其涉及一种多维度数据的数据分析方法、装置及计算机可读存储介质。The present invention relates to the field of information technology, in particular to a data analysis method, device and computer-readable storage medium for multi-dimensional data.
背景技术Background technique
为了更好地实时了解和分析服务的运行状况,互联网公司通常会在采集监控数据时,附上尽可能多的属性标签,如UA(User Agent,用户代理)、网络制式、地理位置等。标签是从不同的角度或维度对数据的描述,不同维度的描述信息使该采集数据具有强大的表达能力,构成了该采集数据的多维度数据。In order to better understand and analyze the running status of services in real time, Internet companies usually attach as many attribute tags as possible when collecting monitoring data, such as UA (User Agent, user agent), network system, and geographic location. Tags are descriptions of data from different angles or dimensions. The description information of different dimensions makes the collected data have powerful expressive ability and constitutes the multi-dimensional data of the collected data.
当前利用多维度数据进行定位主要是通过人工对不同维度的数据查看、比较,从所有维度中找到异常程度明显的维度。故障发生时通过人工根据多维度数据进行判断,需要工作人员有一定的经验,且由于判断过程需要查看较多数据的趋势图之后综合判断,过程会耗费较长的时间。当数据维度较多时,定位时间会急剧上涨,导致因无法快速定位止损而引发较大的损失。At present, the use of multi-dimensional data for positioning is mainly to manually view and compare the data of different dimensions, and find the dimension with obvious abnormality from all the dimensions. When a fault occurs, it needs to be judged manually based on multi-dimensional data, which requires the staff to have certain experience, and because the judgment process needs to check the trend graph of more data and make a comprehensive judgment, the process will take a long time. When there are many data dimensions, the positioning time will rise sharply, resulting in large losses due to the inability to quickly position the stop loss.
发明内容Contents of the invention
本发明实施例提供一种多维度数据的数据分析方法、装置及计算机可读存储介质,以至少解决现有技术中的一个或多个技术问题。Embodiments of the present invention provide a data analysis method, device, and computer-readable storage medium for multi-dimensional data, so as to at least solve one or more technical problems in the prior art.
第一方面,本发明实施例提供了一种多维度数据的数据分析方法,包括:获取多维度数据的维度组合中各维度的正常流量值和异常流量值;将多维度数据的维度组合以及所述维度组合的正常流量值和异常流量值输入决策树,使用所述决策树从所述多维度数据的维度组合中筛选出疑似根因维度;计算所述疑似根因维度的贡献度和子维度损失程度一致度;以及根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,其中,所述根因维度是造成流量损失的根因所对应的数据维度。In the first aspect, the embodiment of the present invention provides a data analysis method for multi-dimensional data, including: obtaining the normal flow value and abnormal flow value of each dimension in the dimension combination of multi-dimensional data; combining the dimension combination of multi-dimensional data and the obtained Input the normal flow value and abnormal flow value of the dimension combination into the decision tree, use the decision tree to screen out the suspected root cause dimension from the dimension combination of the multi-dimensional data; calculate the contribution and sub-dimension loss of the suspected root cause dimension degree of consistency; and according to the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss, identify whether the suspected root cause dimension is the root cause dimension, wherein the root cause dimension is the one that causes traffic loss The data dimension corresponding to the root cause.
结合第一方面,本发明实施例在第一方面的第一种实现方式中,获取多维度数据的各维度的正常流量值和异常流量值,包括:监控所述多维度数据的总流量;以及若监控到预设时间段内的所述多维度数据的总流量有流量损失,则获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值。With reference to the first aspect, in the first implementation manner of the first aspect of the embodiment of the present invention, obtaining the normal flow value and the abnormal flow value of each dimension of the multidimensional data includes: monitoring the total flow of the multidimensional data; and If it is monitored that the total flow of the multi-dimensional data within the preset time period has a flow loss, the normal flow value and the abnormal flow value of each dimension of the multi-dimensional data within the preset time period are obtained.
结合第一方面的第一种实现方式,本发明实施例在第一方面的第二种实现方式中,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:将获取的所述预设时间段内的各维度的流量数据值与指定时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。In combination with the first implementation of the first aspect, in the second implementation of the first aspect in the embodiment of the present invention, the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period are acquired The method includes: determining the difference between the obtained flow data values of each dimension within the preset time period and the flow data values of each dimension within a specified time period as the abnormal flow value of each dimension.
结合第一方面的第一种实现方式,本发明实施例在第一方面的第三种实现方式中,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:统计所述预设时间段内的各维度的失败访问的次数,其中,将所述预设时间段内的没有收到回复信息的访问作为失败访问;以及将各维度的访问失败的次数确定为所述各维度的异常流量值。In combination with the first implementation of the first aspect, in the third implementation of the first aspect in the embodiment of the present invention, the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period are acquired Including: counting the number of failed visits of each dimension within the preset time period, wherein the visits that do not receive reply information within the preset time period are regarded as failed visits; and counting the number of failed visits of each dimension It is determined as the abnormal flow value of each dimension.
结合第一方面的第一种实现方式,本发明实施例在第一方面的第四种实现方式中,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:预测所述预设时间段内的各维度的流量数据值;将获取的所述预设时间段内的各维度的流量数据值与预测的所述预设时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。In combination with the first implementation of the first aspect, in the fourth implementation of the first aspect in the embodiment of the present invention, the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period are acquired Including: predicting the traffic data value of each dimension within the preset time period; combining the acquired traffic data value of each dimension within the preset time period with the predicted traffic of each dimension within the preset time period The difference of the data values is determined as the abnormal flow value of each dimension.
结合第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式、第一方面的第三种实现方式、第一方面的第四种实现方式,本发明实施例在第一方面的第五种实现方式中,使用所述决策树筛选出疑似根因维度,包括:将多维度数据的维度组合的异常流量值作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重;平衡正负例样本权重,以使初始状态下正负例样本权重相当;根据平衡后的正负例样本权重计算每个维度的信息增益率,选择信息增益率最大的维度进行划分,构造所述决策树;以及将构造的所述决策树的路径确定为疑似根因维度。In combination with the first aspect, the first implementation of the first aspect, the second implementation of the first aspect, the third implementation of the first aspect, and the fourth implementation of the first aspect, the embodiment of the present invention is In the fifth implementation of the first aspect, using the decision tree to screen out the suspected root cause dimension includes: using the abnormal flow value of the dimension combination of multi-dimensional data as the weight of the dimension combination in the positive example set, and combining the multi-dimensional The normal flow value of the dimension combination of the dimension data is used as the weight of the dimension combination in the negative example set; the positive and negative example sample weights are balanced so that the positive and negative example sample weights are equal in the initial state; calculated according to the balanced positive and negative example sample weights the information gain rate of each dimension, select the dimension with the largest information gain rate to divide, and construct the decision tree; and determine the path of the constructed decision tree as the suspected root cause dimension.
结合第一方面的第五种实现方式,本发明实施例在第一方面的第六种实现方式中,所述平衡正负例样本权重包括:将多维度数据的维度组合的异常流量值与平衡系数的乘积作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重,其中,所述平衡系数是多维度数据的各维度的正常流量值的总和与各维度的异常流量值的总和之比。With reference to the fifth implementation of the first aspect, in the sixth implementation of the first aspect of the embodiment of the present invention, the balance of positive and negative sample weights includes: combining the abnormal flow value and the balance of the dimensions of multi-dimensional data The product of the coefficients is used as the weight of the dimension combination in the positive example set, and the normal flow value of the dimension combination of the multi-dimensional data is used as the weight of the dimension combination in the negative example set, wherein the balance coefficient is each of the multi-dimensional data The ratio of the sum of normal traffic values for a dimension to the sum of abnormal traffic values for each dimension.
结合第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式、第一方面的第三种实现方式、第一方面的第四种实现方式,本发明实施例在第一方面的第七种实现方式中,根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,包括:将计算出的所述疑似根因维度的贡献度和子维度损失程度一致度输入到分类器,对所述疑似根因维度是否为根因维度进行分类。In combination with the first aspect, the first implementation of the first aspect, the second implementation of the first aspect, the third implementation of the first aspect, and the fourth implementation of the first aspect, the embodiment of the present invention is In the seventh implementation of the first aspect, according to the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss, identifying whether the suspected root cause dimension is the root cause dimension includes: calculating the calculated The contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss are input to a classifier to classify whether the suspected root cause dimension is a root cause dimension.
第二方面,本发明实施例提供了一种多维度数据的数据分析装置,包括:流量获取单元,用于获取多维度数据的维度组合中各维度的正常流量值和异常流量值;维度筛选单元,用于将多维度数据的维度组合以及所述维度组合的正常流量值和异常流量值输入决策树,使用所述决策树从所述多维度数据的维度组合中筛选出疑似根因维度;特征计算单元,用于计算所述疑似根因维度的贡献度和子维度损失程度一致度;以及识别单元,用于根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,其中,所述根因维度是造成流量损失的根因所对应的数据维度。In the second aspect, an embodiment of the present invention provides a data analysis device for multi-dimensional data, including: a traffic acquisition unit, configured to acquire normal traffic values and abnormal traffic values of each dimension in a dimension combination of multi-dimensional data; a dimension screening unit , for inputting the combination of dimensions of multidimensional data and the normal flow value and abnormal flow value of the combination of dimensions into a decision tree, and using the decision tree to screen out suspected root cause dimensions from the combination of dimensions of multidimensional data; features A calculation unit, used to calculate the consistency degree of the contribution degree of the suspected root cause dimension and the degree of sub-dimension loss; Whether the suspected root cause dimension is a root cause dimension, wherein the root cause dimension is a data dimension corresponding to the root cause of traffic loss.
结合第二方面,本发明实施例在第二方面的第一种实现方式中,所述流量获取单元包括:监控子单元,用于监控所述多维度数据的总流量;以及获取子单元,用于:若监控到预设时间段内的所述多维度数据的总流量有流量损失,则获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值。With reference to the second aspect, in the first implementation manner of the second aspect of the embodiment of the present invention, the traffic acquisition unit includes: a monitoring subunit for monitoring the total traffic of the multi-dimensional data; and an acquisition subunit for In: if the total flow loss of the multi-dimensional data within the preset time period is monitored, then obtain the normal flow value and the abnormal flow value of each dimension of the multi-dimensional data within the preset time period.
结合第二方面的第一种实现方式,本发明实施例在第二方面的第二种实现方式中,所述获取子单元还用于:将获取的所述预设时间段内的各维度的流量数据值与指定时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。With reference to the first implementation manner of the second aspect, in the second implementation manner of the second aspect of the embodiment of the present invention, the acquiring subunit is further configured to: acquire the The difference between the traffic data value and the traffic data value of each dimension within the specified time period is determined as the abnormal traffic value of each dimension.
结合第二方面的第一种实现方式,本发明实施例在第二方面的第三种实现方式中,所述获取子单元还用于:统计所述预设时间段内的各维度的失败访问的次数,其中,将所述预设时间段内的没有收到回复信息的访问作为失败访问;以及将各维度的访问失败的次数确定为所述各维度的异常流量值。With reference to the first implementation of the second aspect, in the third implementation of the second aspect of the embodiment of the present invention, the acquisition subunit is further configured to: count failed accesses of each dimension within the preset time period The number of times, wherein, the visits that do not receive reply information within the preset time period are regarded as failed visits; and the number of visit failures in each dimension is determined as the abnormal traffic value of each dimension.
结合第二方面的第一种实现方式,本发明实施例在第二方面的第四种实现方式中,所述获取子单元还用于:预测所述预设时间段内的各维度的流量数据值;将获取的所述预设时间段内的各维度的流量数据值与预测的所述预设时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。With reference to the first implementation of the second aspect, in the fourth implementation of the second aspect of the embodiment of the present invention, the acquisition subunit is further configured to: predict traffic data of each dimension within the preset time period value; determine the difference between the acquired traffic data value of each dimension within the preset time period and the predicted traffic data value of each dimension within the preset time period as the abnormal traffic value of each dimension.
结合第二方面、第二方面的第一种实现方式、第二方面的第二种实现方式、第二方面的第三种实现方式、第二方面的第四种实现方式,本发明实施例在第二方面的第五种实现方式中,所述维度筛选单元还用于:将多维度数据的维度组合的异常流量值作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重;平衡正负例样本权重,以使初始状态下正负例样本权重相当;根据平衡后的正负例样本权重计算每个维度的信息增益率,选择信息增益率最大的维度进行划分,构造所述决策树;以及将构造的所述决策树的路径确定为疑似根因维度。In combination with the second aspect, the first implementation of the second aspect, the second implementation of the second aspect, the third implementation of the second aspect, and the fourth implementation of the second aspect, the embodiment of the present invention is In the fifth implementation of the second aspect, the dimension screening unit is further configured to: use the abnormal traffic value of the dimension combination of the multidimensional data as the weight of the dimension combination in the positive example set, and combine the dimension combination of the multidimensional data The normal flow value of the dimension is used as the weight of the combination of the dimensions in the negative example set; balance the positive and negative sample weights so that the positive and negative sample weights are equal in the initial state; calculate the information of each dimension according to the balanced positive and negative sample weights Gain rate, selecting the dimension with the largest information gain rate for division, constructing the decision tree; and determining the path of the constructed decision tree as the suspected root cause dimension.
结合第二方面的第五种实现方式,本发明实施例在第二方面的第六种实现方式中,所述平衡正负例样本权重包括:将多维度数据的维度组合的异常流量值与平衡系数的乘积作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重,其中,所述平衡系数是多维度数据的各维度的正常流量值的总和与各维度的异常流量值的总和之比。With reference to the fifth implementation of the second aspect, in the sixth implementation of the second aspect of the embodiment of the present invention, the balance of positive and negative sample weights includes: combining the abnormal flow value and the balance of the dimensions of multi-dimensional data The product of the coefficients is used as the weight of the dimension combination in the positive example set, and the normal flow value of the dimension combination of the multi-dimensional data is used as the weight of the dimension combination in the negative example set, wherein the balance coefficient is each of the multi-dimensional data The ratio of the sum of normal traffic values for a dimension to the sum of abnormal traffic values for each dimension.
结合第二方面、第二方面的第一种实现方式、第二方面的第二种实现方式、第二方面的第三种实现方式、第二方面的第四种实现方式,本发明实施例在第二方面的第七种实现方式中,所述识别单元还用于:将计算出的所述疑似根因维度的贡献度和子维度损失程度一致度输入到分类器,对所述疑似根因维度是否为根因维度进行分类。In combination with the second aspect, the first implementation of the second aspect, the second implementation of the second aspect, the third implementation of the second aspect, and the fourth implementation of the second aspect, the embodiment of the present invention is In the seventh implementation manner of the second aspect, the identification unit is further configured to: input the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss degree to the classifier, and for the suspected root cause dimension Whether to classify for the root cause dimension.
第三方面,本发明实施例提供了一种多维度数据的数据分析装置,包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述第一方面中任一所述的方法。In a third aspect, an embodiment of the present invention provides a data analysis device for multi-dimensional data, including: one or more processors; a storage device for storing one or more programs; when the one or more programs are When the one or more processors execute, the one or more processors implement the method described in any one of the above first aspects.
在一个可能的设计中,多维度数据的数据分析装置的结构中包括处理器和存储器,所述存储器用于存储支持多维度数据的数据分析装置执行上述第一方面中多维度数据的数据分析方法的程序,所述处理器被配置为用于执行所述存储器中存储的程序。所述多维度数据的数据分析装置还可以包括通信接口,用于多维度数据的数据分析装置与其他设备或通信网络通信。In a possible design, the structure of the data analysis device for multi-dimensional data includes a processor and a memory, and the memory is used to store the data analysis device supporting multi-dimensional data to perform the data analysis method for multi-dimensional data in the first aspect above program, the processor configured to execute the program stored in the memory. The data analysis device for multi-dimensional data may also include a communication interface, for the data analysis device for multi-dimensional data to communicate with other devices or a communication network.
第四方面,本发明实施例提供了一种计算机可读存储介质,其存储有计算机程序,该程序被处理器执行时实现上述第一方面中任一所述的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, implements the method described in any one of the above-mentioned first aspects.
上述技术方案具有如下优点或有益效果:可以在发生故障时,根据故障指标的多维度数据,快速分析出根因维度,节省运维人员定位故障的时间,减少故障带来的损失。The above technical solution has the following advantages or beneficial effects: when a fault occurs, the root cause dimension can be quickly analyzed according to the multi-dimensional data of the fault index, saving the time of operation and maintenance personnel to locate the fault, and reducing the loss caused by the fault.
上述概述仅仅是为了说明书的目的,并不意图以任何方式进行限制。除上述描述的示意性的方面、实施方式和特征之外,通过参考附图和以下的详细描述,本发明进一步的方面、实施方式和特征将会是容易明白的。The above summary is for illustrative purposes only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments and features described above, further aspects, embodiments and features of the present invention will be readily apparent by reference to the drawings and the following detailed description.
附图说明Description of drawings
在附图中,除非另外规定,否则贯穿多个附图相同的附图标记表示相同或相似的部件或元素。这些附图不一定是按照比例绘制的。应该理解,这些附图仅描绘了根据本发明公开的一些实施方式,而不应将其视为是对本发明范围的限制。In the drawings, unless otherwise specified, the same reference numerals designate the same or similar parts or elements throughout the several drawings. The drawings are not necessarily drawn to scale. It should be understood that these drawings only depict some embodiments disclosed in accordance with the present invention and should not be taken as limiting the scope of the present invention.
图1为本发明实施例的多维度数据的数据分析方法的整体框架图;FIG. 1 is an overall framework diagram of a data analysis method for multidimensional data according to an embodiment of the present invention;
图2为本发明提供的多维度数据的数据分析方法的一种优选实施例的步骤流程图;Fig. 2 is a flow chart of the steps of a preferred embodiment of the data analysis method for multi-dimensional data provided by the present invention;
图3示出根据本发明一种实施例的多维度数据的数据分析方法的决策树的示意图;FIG. 3 shows a schematic diagram of a decision tree of a data analysis method for multidimensional data according to an embodiment of the present invention;
图4a和图4b示出根据本发明一种实施例的多维度数据的数据分析方法的决策树构造划分过程示意图;FIG. 4a and FIG. 4b show a schematic diagram of a decision tree construction division process of a data analysis method for multi-dimensional data according to an embodiment of the present invention;
图5示出根据本发明一种实施例的多维度数据的数据分析方法的疑似根因维度组合全集示意图;FIG. 5 shows a schematic diagram of a complete set of suspected root cause dimension combinations of a data analysis method for multidimensional data according to an embodiment of the present invention;
图6为本发明实施例的多维度数据的数据分析装置的整体框架图;6 is an overall frame diagram of a data analysis device for multi-dimensional data according to an embodiment of the present invention;
图7示出根据本发明另一实施例的多维度数据的数据分析装置的结构框图;Fig. 7 shows a structural block diagram of a data analysis device for multi-dimensional data according to another embodiment of the present invention;
图8示出根据本发明另一实施例的多维度数据的数据分析装置的结构框图。Fig. 8 shows a structural block diagram of a data analysis device for multi-dimensional data according to another embodiment of the present invention.
具体实施方式Detailed ways
在下文中,仅简单地描述了某些示例性实施例。正如本领域技术人员可认识到的那样,在不脱离本发明的精神或范围的情况下,可通过各种不同方式修改所描述的实施例。因此,附图和描述的内容被认为本质上是示例性的而非限制性的。In the following, only some exemplary embodiments are briefly described. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and the matter of the description are to be regarded as illustrative in nature and not restrictive.
本发明实施例提供了一种多维度数据的数据分析方法。图1为本发明实施例的多维度数据的数据分析方法的整体框架图。如图1所示,本发明实施例的多维度数据的数据分析方法包括:步骤S110,获取多维度数据的维度组合中各维度的正常流量值和异常流量值;步骤S120,将多维度数据的维度组合以及所述维度组合的正常流量值和异常流量值输入决策树,使用所述决策树从所述多维度数据的维度组合中筛选出疑似根因维度;步骤S130,计算所述疑似根因维度的贡献度和子维度损失程度一致度;以及步骤S140,根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,其中,所述根因维度是造成流量损失的根因所对应的数据维度。An embodiment of the present invention provides a data analysis method for multi-dimensional data. FIG. 1 is an overall framework diagram of a data analysis method for multi-dimensional data according to an embodiment of the present invention. As shown in Figure 1, the data analysis method for multi-dimensional data according to the embodiment of the present invention includes: step S110, acquiring the normal flow value and abnormal flow value of each dimension in the dimension combination of multi-dimensional data; step S120, combining the multi-dimensional data The dimension combination and the normal flow value and abnormal flow value of the dimension combination are input into the decision tree, and the suspected root cause dimension is screened out from the dimension combination of the multi-dimensional data by using the decision tree; step S130, calculating the suspected root cause The contribution degree of the dimension and the consistency degree of the sub-dimension loss degree; and step S140, according to the calculated contribution degree of the suspected root cause dimension and the consistency degree of the sub-dimension loss degree, identify whether the suspected root cause dimension is the root cause dimension, wherein, The root cause dimension is a data dimension corresponding to the root cause of traffic loss.
本发明实施例的多维度数据的数据分析方法可用于在故障发生时从所有维度中找到根因维度,其中,根因维度是异常程度明显的维度。以下是两个在多维度数据中定位根因维度的示例。The data analysis method for multi-dimensional data in the embodiment of the present invention can be used to find the root cause dimension from all dimensions when a fault occurs, wherein the root cause dimension is a dimension with obvious abnormality. Below are two examples of locating the root cause dimension in multidimensional data.
例一:维度组合包括省份和运营商,其中,运营商如联通、移动、电信等。在服务流量有损时,读入故障时各维度的流量数据,根据故障时各维度的流量数据对根因维度进行快速定位,比如电信的数据流量损失较多,则定位结果为:异常程度明显的根因维度为运营商维度。Example 1: Dimension combinations include provinces and operators, among which operators are China Unicom, China Mobile, China Telecom, etc. When the service traffic is damaged, read the traffic data of each dimension at the time of the fault, and quickly locate the root cause dimension according to the traffic data of each dimension at the time of the fault. For example, if the data traffic loss of telecom is large, the positioning result is: the degree of abnormality is obvious The root cause dimension of is the operator dimension.
例二:维度组合包括操作系统、浏览器和移动通信技术,其中,操作系统如苹果、安卓等;浏览器如谷歌浏览器、360浏览器、UC浏览器等;移动通信技术如3G、4G等。在发布应用之后监控数据总流量,在总流量有损时判断出现了故障,读入故障时各维度的流量数据,根据故障时各维度的流量数据对根因维度进行快速定位,比如定位结果为:若该应用在使用谷歌浏览器时流量损失异常程度明显,则根因维度是浏览器。Example 2: The combination of dimensions includes operating systems, browsers and mobile communication technologies, among which, operating systems such as Apple, Android, etc.; browsers such as Google Chrome, 360 Browser, UC Browser, etc.; mobile communication technologies such as 3G, 4G, etc. . Monitor the total data flow after the application is released, and judge that a fault has occurred when the total flow is damaged, read the traffic data of each dimension at the time of the fault, and quickly locate the root cause dimension according to the traffic data of each dimension at the time of the fault, for example, the positioning result is : If the traffic loss of the application is abnormally obvious when using Google Chrome, the root cause dimension is the browser.
在具体应用中,可使用流量监控软件,监控网络数据流量。当服务流量有损时,可利用本发明实施例的多维度数据的数据分析方法对根因维度进行快速定位,从而缩短止损时间,减少故障损失。In specific applications, traffic monitoring software can be used to monitor network data traffic. When the service flow is damaged, the multi-dimensional data data analysis method of the embodiment of the present invention can be used to quickly locate the root cause dimension, thereby shortening the stop loss time and reducing the failure loss.
根据本发明多维度数据的数据分析方法的一种实施方式,获取多维度数据的各维度的正常流量值和异常流量值,包括:监控所述多维度数据的总流量;以及若监控到预设时间段内的所述多维度数据的总流量有流量损失,则获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值。According to an embodiment of the multi-dimensional data data analysis method of the present invention, obtaining the normal flow value and abnormal flow value of each dimension of the multi-dimensional data includes: monitoring the total flow of the multi-dimensional data; If there is a flow loss in the total flow of the multidimensional data within the time period, the normal flow value and the abnormal flow value of each dimension of the multidimensional data within the preset time period are obtained.
在这种实施方式中,监控数据总流量,在总流量有损时判断出现了故障,读入故障时各维度的流量数据值。其中,流量数据值中包括正常流量值,也包括异常流量值,流量数据值是正常流量值和异常流量值的总和。需要通过某种方式,比如通过采集或预测的方式,获得的各维度的异常流量值,异常流量值也就是损失流量数据值。In this embodiment, the total data flow is monitored, and it is judged that a fault occurs when the total flow is damaged, and the flow data values of each dimension at the time of the fault are read. Wherein, the flow data value includes a normal flow value and an abnormal flow value, and the flow data value is the sum of the normal flow value and the abnormal flow value. It is necessary to obtain the abnormal traffic values of each dimension through a certain method, such as collection or prediction. The abnormal traffic value is also the loss traffic data value.
根据本发明多维度数据的数据分析方法的一种实施方式,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:将获取的所述预设时间段内的各维度的流量数据值与指定时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis method of the present invention, obtaining the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period includes: acquiring the preset time period The difference between the traffic data value of each dimension within the specified time period and the traffic data value of each dimension within the specified time period is determined as the abnormal traffic value of each dimension.
在这种实施方式中,通过采集的方式获得的各维度的异常流量值,采集包括采集实际发生的流量。可根据实际发生的流量计算流量下跌了多少,计算流量下跌多少可与指定时间段内的各维度的流量数据值做差值得出。例如,可计算当前时间段内的各维度的流量数据值与前一时间段内的各维度的流量数据值的差值。可选地,可计算当前时间段内的各维度的流量数据值与前一天的同一时间段内的各维度的流量数据值的差值。在另一可选实施例中,还可计算当前时间段内的各维度的流量数据值与几天前的同一时间段内的各维度的流量数据值的差值,可指定“几天前”中的天数,如一个星期或者一个月等。In this implementation manner, the abnormal traffic values of each dimension are obtained through collection, and the collection includes collecting actually occurred traffic. The amount of traffic drop can be calculated based on the actual traffic, and the calculated traffic drop can be obtained by making a difference with the traffic data values of each dimension within the specified time period. For example, the difference between the traffic data value of each dimension in the current time period and the traffic data value of each dimension in the previous time period may be calculated. Optionally, the difference between the traffic data values of each dimension in the current time period and the traffic data values of each dimension in the same time period of the previous day may be calculated. In another optional embodiment, the difference between the traffic data value of each dimension in the current time period and the traffic data value of each dimension in the same time period a few days ago can also be calculated, and "a few days ago" can be specified The number of days in , such as a week or a month.
根据本发明多维度数据的数据分析方法的一种实施方式,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:统计所述预设时间段内的各维度的失败访问的次数,其中,将所述预设时间段内的没有收到回复信息的访问作为失败访问;以及将各维度的访问失败的次数确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis method of the present invention, obtaining the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period includes: counting the The number of failed visits in each dimension, wherein the visits that do not receive reply information within the preset time period are regarded as failed visits; and the number of failed visits in each dimension is determined as the abnormal traffic value of each dimension.
具体地,通过采集的方式获得的各维度的异常流量值的具体方法,还可以统计有多少请求没有被处理,没有被处理的请求次数就是失败访问的次数。如果访问没有收到回复信息,也就是该访问请求没有被处理,则认为是访问失败的情况。可将各维度的失败访问的次数确定为所述各维度的异常流量值。同理,收到回复信息的访问则被认为是访问成功的情况,将各维度的成功访问的次数确定为所述各维度的正常流量值。Specifically, the specific method of obtaining the abnormal traffic values of each dimension by means of collection can also count how many requests have not been processed, and the number of unprocessed requests is the number of failed accesses. If the access does not receive a reply message, that is, the access request is not processed, it is considered a failure of the access. The number of failed accesses of each dimension may be determined as the abnormal traffic value of each dimension. Similarly, a visit that receives a reply message is considered a successful visit, and the number of successful visits in each dimension is determined as the normal traffic value of each dimension.
根据本发明多维度数据的数据分析方法的一种实施方式,获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值包括:预测所述预设时间段内的各维度的流量数据值;将获取的所述预设时间段内的各维度的流量数据值与预测的所述预设时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis method of the present invention, obtaining the normal flow value and abnormal flow value of each dimension of the multi-dimensional data within the preset time period includes: predicting the flow rate within the preset time period The traffic data value of each dimension; the difference between the acquired traffic data value of each dimension within the preset time period and the predicted traffic data value of each dimension within the preset time period is determined as the dimension abnormal flow value.
通过预测的方式获得的各维度的异常流量值,包括:预测假如没有发生故障的流量,与采集到的实际发生的流量的差值为异常流量值,也就是损失的流量。具体地,可统计网络流量的周期性变化规律,根据时段和/或用户浏览行为模式等信息预测当前时间段内的各维度的流量数据值。将预测的流量数据值与实际采集到的流量数据值的差值作为异常流量值。The abnormal traffic values of each dimension obtained through prediction include: the difference between the predicted traffic and the collected actual traffic is the abnormal traffic value, that is, the lost traffic. Specifically, the periodical change rule of network traffic can be counted, and the traffic data value of each dimension in the current time period can be predicted according to information such as time period and/or user browsing behavior pattern. The difference between the predicted flow data value and the actually collected flow data value is taken as the abnormal flow value.
图2为本发明提供的多维度数据的数据分析方法的一种优选实施例的步骤流程图。如图2所示,根据本发明多维度数据的数据分析方法的一种实施方式,图1中的步骤S120,使用所述决策树筛选出疑似根因维度,包括:步骤S210,将多维度数据的维度组合的异常流量值作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重;步骤S220,平衡正负例样本权重,以使初始状态下正负例样本权重相当;步骤 S230,根据平衡后的正负例样本权重计算每个维度的信息增益率,选择信息增益率最大的维度进行划分,构造所述决策树;以及步骤S240,将构造的所述决策树的路径确定为疑似根因维度。Fig. 2 is a flow chart of the steps of a preferred embodiment of the data analysis method for multi-dimensional data provided by the present invention. As shown in Figure 2, according to an embodiment of the data analysis method for multi-dimensional data of the present invention, step S120 in Figure 1 uses the decision tree to screen out suspected root cause dimensions, including: step S210, the multi-dimensional data The abnormal flow value of the dimension combination is used as the weight of the dimension combination in the positive example set, and the normal flow value of the dimension combination of multi-dimensional data is used as the weight of the dimension combination in the negative example set; step S220, balancing the positive and negative example samples Weight, so that the weights of the positive and negative samples in the initial state are equal; step S230, calculate the information gain rate of each dimension according to the balanced positive and negative sample weights, select the dimension with the largest information gain rate for division, and construct the decision tree and step S240, determining the path of the constructed decision tree as the suspected root cause dimension.
决策树是一种类似流程图的树结构,其中每个内部节点(非树叶节点)表示在一个属性上的测试,每个分枝代表一个测试输出,而每个树叶节点存放一个类标号。一旦建立好了决策树,对于一个未给定类标号的元组,跟踪一条有根节点到叶节点的路径,该叶节点就存放着该元组的预测。A decision tree is a tree structure similar to a flowchart, in which each internal node (non-leaf node) represents a test on an attribute, each branch represents a test output, and each leaf node stores a class label. Once the decision tree is built, for a tuple that is not given a class label, trace a path from the root node to the leaf node that holds the prediction for that tuple.
本发明实施例用构造决策树的过程来筛选出疑似根因维度,决策树的输入特征为访问的维度组合,例如省份和运营商,及其正常流量值、异常流量值,输出为此维度组合是否为正例,也就是疑似根因维度;通过模型训练获得有较好区分度的决策树,从而得到疑似根因维度组合全集,即决策树路径。其中,可用基于C4.5算法构造决策树的过程来筛选疑似根因维度,筛选出疑似根因维度可减少后续的维度特征计算和根因识别的计算量。The embodiment of the present invention uses the process of constructing a decision tree to screen out suspected root cause dimensions. The input feature of the decision tree is the combination of dimensions accessed, such as provinces and operators, and their normal traffic values and abnormal traffic values. The output is this combination of dimensions. Whether it is a positive example, that is, the dimension of suspected root cause; through model training, a decision tree with better discrimination is obtained, so as to obtain the complete set of combinations of dimensions of suspected root cause, that is, the path of the decision tree. Among them, the process of constructing a decision tree based on the C4.5 algorithm can be used to screen the suspected root cause dimension, and the screening of the suspected root cause dimension can reduce the calculation amount of subsequent dimension feature calculation and root cause identification.
在步骤S210中,将多维度数据中某个维度组合d视作一个样本点,则维度组合d的访问失败次数pvlostd,也就是异常流量值,作为d在正例集合的权重 weightpositive_d,维度组合d的访问成功次数pvd,也就是正常流量值,作为d在负例集合的权重weightnegative_d。In step S210, a certain dimension combination d in the multi-dimensional data is regarded as a sample point, then the number of access failures pvlostd of dimension combination d, that is, the abnormal traffic value, is used as the weightpositive_d of d in the positive example set, dimension The number of successful visits pvd of combination d, that is, the normal traffic value, is used as the weightnegative_d of d in the negative example set.
根据本发明多维度数据的数据分析方法的一种实施方式,步骤S220平衡正负例样本权重包括:将多维度数据的维度组合的异常流量值与平衡系数的乘积作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重,其中,所述平衡系数是多维度数据的各维度的正常流量值的总和与各维度的异常流量值的总和之比。According to an embodiment of the multi-dimensional data data analysis method of the present invention, step S220 balancing the positive and negative sample weights includes: taking the product of the abnormal flow value and the balance coefficient of the dimension combination of the multi-dimensional data as the dimension combination in the positive case The weight of the set, the normal flow value of the dimension combination of the multi-dimensional data is used as the weight of the combination of the dimensions in the negative example set, wherein the balance coefficient is the sum of the normal flow values of each dimension of the multi-dimensional data and the sum of each dimension The ratio of the sum of abnormal flow values.
为了满足利用信息增益率筛选疑似根因维度的假设,使初始状态信息熵最大,需要使用平衡正负例样本权重使得初始状态下正负例样本权重相当。在这种实施方式中,最终的正例权重weightpositive_d'=pvlostd*(pvtotal/pvlosttotal);最终的负例权重weightnegative_d'=pvd。In order to meet the assumption of using the information gain rate to screen suspected root cause dimensions and maximize the information entropy of the initial state, it is necessary to balance the weights of positive and negative samples so that the weights of positive and negative samples in the initial state are equal. In this embodiment, the final positive weight weightpositive_d '=pvlostd *(pvtotal /pvlosttotal ); the final negative weight weightnegative_d '=pvd .
例如,当只有两个维度组合时,按照pvlosttotal为1,pvtotal为100,pvlostd1为1,pvd1为10,pvlostd2为0,pvd2为90的情况计算:For example, when only two dimensions are combined, calculate according to the situation that pvlosttotal is 1, pvtotal is 100, pvlostd1 is 1, pvd1 is 10, pvlostd2 is 0, and pvd2 is 90:
样本点d1的正例权重weightpositive_d1为pvlostd1*(pvtotal/pvlosttotal)=100,负例权重weightnegative_d1为pvd1=10;The weightpositive_d1 of the positive example of the sample point d1 is pvlostd1 *(pvtotal /pvlosttotal )=100, and the weightnegative_d1 of the negative example is pvd1 =10;
同理d2的正例权重为0,负例权重为90,总体上初始状态的正例权重为 100,负例权重为100。初始状态信息熵最大。Similarly, the weight of positive examples of d2 is 0, and the weight of negative examples is 90. In general, the weight of positive examples in the initial state is 100, and the weight of negative examples is 100. The information entropy of the initial state is the largest.
在步骤S230中,决策树的训练阶段从给定的训练数据集,构造出一棵决策树。可以基于C4.5算法训练来建立决策树。每次划分只使用一个维度进行筛选,在每次划分时,计算每个维度带来的信息增益率,贪心选择信息增益率最大、且大于0的特征(即维度)进行划分。在熵增益为负时停止子树生成,这样节省了子树部分的计算,最终生成的决策树中结果为非负例的节点路径为疑似根因维度,其中,非负例的节点路径包含非叶子结点。In step S230, the decision tree training phase constructs a decision tree from a given training data set. A decision tree can be built based on C4.5 algorithm training. Each division uses only one dimension for screening. In each division, the information gain rate brought by each dimension is calculated, and the feature (ie dimension) with the largest information gain rate and greater than 0 is greedily selected for division. Stop the subtree generation when the entropy gain is negative, which saves the calculation of the subtree part. In the final generated decision tree, the node path of the non-negative example is the suspected root cause dimension. Among them, the node path of the non-negative example contains the non-negative example. leaf nodes.
例如,按照只有两个维度的情况,省份有取值北京、上海,运营商有取值电信、联通。取电信异常的情况分析,电信异常会导致电信正例权重(与pvlost 正相关)很高,偏离平衡位置,信息熵低于其他相对平衡的维度;联通的负例权重很高,同样偏离平衡位置,信息熵较低,会使得运营商维度的信息增益率高于省份维度的信息增益率,此时选择运营商进行划分,不再考虑<省份>、<省份,运营商>这两类维度组合,其中,信息增益率是信息熵均值的降低程度。依此类推,可以基于贪心方法得到一组能够较好区分正常和异常的维度组合,且剪枝效果明显。For example, in the case of only two dimensions, provinces have the values Beijing and Shanghai, and operators have the values Telecom and China Unicom. Taking the analysis of telecom anomalies, telecom anomalies will lead to a high weight of telecom positive cases (positively correlated with pvlost), which deviates from the equilibrium position, and information entropy is lower than other relatively balanced dimensions; China Unicom’s negative cases have a high weight, which also deviates from the equilibrium position , the information entropy is low, which will make the information gain rate of the operator dimension higher than the information gain rate of the province dimension. At this time, the operator is selected for division, and the combination of the two dimensions of <province> and <province, operator> is no longer considered , where the information gain rate is the reduction degree of the mean value of information entropy. By analogy, a set of dimension combinations that can better distinguish between normal and abnormal can be obtained based on the greedy method, and the pruning effect is obvious.
再如,仍按照只有两个维度的情况,省份有取值北京、河北,运营商有取值联通、电信。表1是本例中的多维度数据的流量数据值及权重值。表1共示出了4个样本点,分别是:样本点d11,北京联通;样本点d12,北京电信;样本点d21,河北联通;样本点d22,河北电信。按照表1中数据,异常流量值的合计pvlosttotal为100,正常流量值的合计pvtotal为1000,pvlostd11为90,pvd1为100,计算得出:样本点d11的正例权重weightpositive_d11为pvlostd11*(pvtotal/pvlosttotal)=900,负例权重weightnegative_d11为pvd1=100;同理d12的正例权重为100,负例权重为80;d21的正例权重为0,负例权重为200;d22的正例权重为0,负例权重为620。For another example, still according to the situation of only two dimensions, provinces have the values of Beijing and Hebei, and operators have the values of Unicom and Telecom. Table 1 shows the traffic data values and weight values of the multi-dimensional data in this example. Table 1 shows four sample points in total, namely: sample point d11, Beijing Unicom; sample point d12, Beijing Telecom; sample point d21, Hebei Unicom; sample point d22, Hebei Telecom. According to the data in Table 1, the total pvlosttotal of the abnormal flow value is 100, the total pvtotal of the normal flow value is 1000, the pvlostd11 is 90, and the pvd1 is 100. It is calculated that the positive example weightpositive_d11 of the sample point d11 is pvlostd11 *(pvtotal /pvlosttotal )=900, the weight ofnegative_d11 is pvd1 =100; similarly, the weight of positive cases of d12 is 100, and the weight of negative cases is 80; the weight of positive cases of d21 is 0, and the weight of negative cases is 0. The weight is 200; the weight of the positive example of d22 is 0, and the weight of the negative example is 620.
表1多维度数据的流量数据值及权重值Table 1 Flow data value and weight value of multi-dimensional data
图3示出根据本发明一种实施例的多维度数据的数据分析方法的决策树的示意图;图4a和图4b示出根据本发明一种实施例的多维度数据的数据分析方法的决策树构造划分过程示意图。图3是根据表1所示的样本集数据构造出的决策树示意图。图3所示的决策树的具体的划分过程由图4a和图4b示出。Fig. 3 shows a schematic diagram of a decision tree of a data analysis method for multidimensional data according to an embodiment of the present invention; Fig. 4a and Fig. 4b show a decision tree of a data analysis method for multidimensional data according to an embodiment of the present invention Schematic diagram of the construction partition process. FIG. 3 is a schematic diagram of a decision tree constructed based on the sample set data shown in Table 1. The specific division process of the decision tree shown in Fig. 3 is shown in Fig. 4a and Fig. 4b.
其中,图4a是决策树第一次划分示意图。如图4a所示,第一次划分由节点(1),也就是根节点,使用省份划分为节点(2)北京和节点(3)河北。具体地,根据样本集数据,即表1所示的样本点d11、d12、d21、d22的正常流量值和异常流量值的数据计算,若划分维度使用省份划分,则北京的正例/负例比例为1000/180,河北的正例/负例比例为0/820;若划分维度使用运营商划分,则电信的正例/负例比例为100/700,联通的正例/负例比例为900/300。Among them, Fig. 4a is a schematic diagram of the first division of the decision tree. As shown in Fig. 4a, the first division is divided into node (2) Beijing and node (3) Hebei by node (1), which is the root node, using provinces. Specifically, according to the sample set data, that is, the data calculation of the normal flow values and abnormal flow values of the sample points d11, d12, d21, and d22 shown in Table 1, if the division dimension uses province division, then the positive/negative cases in Beijing The ratio is 1000/180, and the ratio of positive cases/negative cases in Hebei is 0/820; if the division dimension is divided by operators, the ratio of positive cases/negative cases in China Telecom is 100/700, and the ratio of positive cases/negative cases in China Unicom is 900/300.
在本实施例中,基于C4.5算法训练来建立决策树。C4.5算法用信息增益率来选择属性。属性选择度量又称分裂规则,因为它们决定给定节点上的元组如何分裂。属性选择度量提供了每个属性描述给定训练元组的秩评定,具有最好度量得分的属性被选作给定元组的分裂属性。例如C4.5算法用信息增益率来选择属性。在决策树创建时,许多分枝反映的是训练数据中的异常,剪枝方法是用来处理这种过分拟合数据的问题。在决策树构造过程中进行剪枝,因为某些具有很少元素的结点可能会使构造的决策树过适应,如果不考虑这些结点可能会更好。In this embodiment, a decision tree is established based on C4.5 algorithm training. The C4.5 algorithm uses the information gain rate to select attributes. Attribute selection metrics are also known as splitting rules because they determine how tuples at a given node are split. The attribute selection metric provides a rank rating for each attribute describing a given training tuple, and the attribute with the best metric score is selected as the splitting attribute for a given tuple. For example, the C4.5 algorithm uses the information gain rate to select attributes. When a decision tree is created, many branches reflect anomalies in the training data, and pruning methods are used to deal with this problem of overfitting data. Pruning is done during the decision tree construction, because some nodes with few elements may overfit the constructed decision tree, it may be better not to consider these nodes.
在机器学习与特征工程中,信息的不确定性可以用熵来表示。对于一个取有限个值的随机变量X,如果其概率分布为:In machine learning and feature engineering, the uncertainty of information can be represented by entropy. For a random variable X that takes a finite number of values, if its probability distribution is:
P(X=xi)=pi,i=1,2,…,nP(X=xi )=pi ,i=1,2,...,n
那么随机变量X的熵可以用以下公式描述:Then the entropy of a random variable X can be described by the following formula:
举个例子,如果一个分类系统中,类别的标识是c,取值情况是c1,c2,…,cn, n为类别的总数,那么此分类系统的熵为:For example, if in a classification system, the category identifier is c, the values are c1 ,c2 ,…,cn , and n is the total number of categories, then the entropy of this classification system is:
信息增益指的就是熵的减少量,是划分前样本集合的熵与使用某个特征划分后的数据子集的熵的差值,也就是某个特征X被固定以后,给系统带来的信息增益。当特征X的整体分布情况被固定时,条件熵为H(c|X)。那么因为特征 X被固定以后,给系统带来的信息增益为:IG(X)=H(c)-H(c|X)。Information gain refers to the reduction of entropy, which is the difference between the entropy of the sample set before division and the entropy of the data subset divided by a certain feature, that is, the information brought to the system after a certain feature X is fixed. gain. When the overall distribution of features X is fixed, the conditional entropy is H(c|X). Then because the feature X is fixed, the information gain brought to the system is: IG(X)=H(c)-H(c|X).
信息增益率是用前面提到的信息增益和分裂信息度量来共同定义的,分裂信息度量也就是特征X的熵H(X),那么信息增益率为:The information gain rate is jointly defined by the aforementioned information gain and split information measure. The split information measure is the entropy H(X) of feature X, then the information gain rate is:
在图4a所示的第一次划分中,分别计算按照省份划分和按照运营商划分后的信息增益率,由于按照省份划分后的信息增益率大于按照运营商划分后的信息增益率,因此选择按照省份划分,使节点(1)分裂出子节点(2)北京和子节点(3)河北。In the first division shown in Figure 4a, the information gain rate divided by province and operator is calculated respectively. Since the information gain rate divided by province is greater than the information gain rate divided by operator, choose According to the division of provinces, node (1) is split into child node (2) Beijing and child node (3) Hebei.
在图4b所示的第二次划分中,与第一次划分的计算方式相同,通过信息增益率的计算确定节点(2)和节点(3)的划分方式。对于节点(2),选择按照运营商划分,使节点(2)分裂出子节点(4)北京电信和子节点(5)北京联通;对于节点(3),由于运营商划分的信息增益率为0,所以不再划分。最后得到的疑似根因维度组合全集,也就是决策树路径如图5所示。In the second division shown in FIG. 4b, the calculation method of the first division is the same, and the division method of the node (2) and the node (3) is determined through the calculation of the information gain rate. For node (2), choose to divide according to the operator, so that node (2) splits into child node (4) Beijing Telecom and child node (5) Beijing Unicom; for node (3), the information gain rate due to operator division is 0 , so it is no longer divided. The final combination of suspected root cause dimensions, that is, the path of the decision tree, is shown in Figure 5.
在图1中的步骤S120,使用决策树筛选出疑似根因维度之后,执行步骤 S130,维度特征值计算。计算所有疑似根因维度的两个特征:贡献度、子维度损失程度一致度。贡献度可根据公式1计算,子维度损失程度一致度可用变异系数衡量,如公式2所示:In step S120 in Fig. 1, after using the decision tree to filter out the suspected root cause dimension, step S130 is performed to calculate the characteristic value of the dimension. Calculate the two characteristics of all suspected root cause dimensions: contribution degree and subdimension loss degree consistency. The degree of contribution can be calculated according to formula 1, and the degree of consistency of subdimension loss can be measured by the coefficient of variation, as shown in formula 2:
上式中,pvlostd为维度d的损失值,pvlosttotal为总维度的损失值。其中,损失值也就是异常流量值。In the above formula, pvlostd is the loss value of dimension d, and pvlosttotal is the loss value of the total dimension. Wherein, the loss value is the abnormal flow value.
式中,pvd、pvlostd分别为维度d的成功数(正常流量值)、失败数(异常流量值);rd为维度d的异常程度;维度{t1,t2,t3…tn}为维度d的的子维度,例如:北京维度的子维度是北京联通、北京移动和北京电信。In the formula, pvd and pvlostd are the number of successes (normal flow value) and the number of failures (abnormal flow value) of dimension d respectively; rd is the degree of abnormality of dimension d; dimensions {t1 ,t2 ,t3 …tn } is a subdimension of dimension d, for example, the subdimensions of Beijing dimension are Beijing Unicom, Beijing Mobile and Beijing Telecom.
根据本发明多维度数据的数据分析方法的一种实施方式,步骤S140,根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,包括:将计算出的所述疑似根因维度的贡献度和子维度损失程度一致度输入到分类器,对所述疑似根因维度是否为根因维度进行分类。According to an embodiment of the multi-dimensional data data analysis method of the present invention, in step S140, according to the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss, identify whether the suspected root cause dimension is the root cause Dimensions, including: inputting the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss to a classifier to classify whether the suspected root cause dimension is a root cause dimension.
在步骤S140中,将各疑似根因维度的贡献度和子维度损失程度一致度输入到基于历史数据训练到的线性二分类器进行根因维度的识别,对维度是否为根因维度进行分类。基于历史数据对分类器的训练过程为:获取历史故障时数据,并将各维度按照是否为根因维度标注为两类,如0为非根因,1为根因。按照上述步骤计算各维度的两个特征,利用机器学习分类算法,如决策树、逻辑回归等,训练得到二分类器。In step S140, the contribution degree of each suspected root cause dimension and the consistency degree of sub-dimension loss are input to the linear binary classifier trained based on historical data to identify the root cause dimension, and classify whether the dimension is the root cause dimension. The training process of the classifier based on historical data is as follows: Obtain historical fault data, and mark each dimension into two categories according to whether it is a root cause, such as 0 for non-root cause and 1 for root cause. Calculate the two features of each dimension according to the above steps, and use machine learning classification algorithms, such as decision trees, logistic regression, etc., to train a binary classifier.
本发明实施例的多维度数据分析方法不仅可以使用到故障定位场景,同时适用于任何可以加和的多维度数据分析上。其中,可以加和的多维度数据是指总的维度数据等于各分维度数据的和,比如运营商维度的数据等于联通、移动、电信等数据的和。The multi-dimensional data analysis method in the embodiment of the present invention can not only be used in fault location scenarios, but also be applicable to any summable multi-dimensional data analysis. Among them, the multi-dimensional data that can be summed means that the total dimensional data is equal to the sum of the sub-dimensional data, for example, the data of the operator dimension is equal to the sum of the data of China Unicom, China Mobile, and China Telecom.
另一方面,本发明实施例提供了一种多维度数据的数据分析装置。图6为本发明实施例的多维度数据的数据分析装置的整体框架图。如图6所示,本发明实施例的多维度数据的数据分析装置包括:流量获取单元100,用于获取多维度数据的维度组合中各维度的正常流量值和异常流量值;维度筛选单元200,用于将多维度数据的维度组合以及所述维度组合的正常流量值和异常流量值输入决策树,使用所述决策树从所述多维度数据的维度组合中筛选出疑似根因维度;特征计算单元300,用于计算所述疑似根因维度的贡献度和子维度损失程度一致度;以及识别单元400,用于根据计算出的所述疑似根因维度的贡献度和子维度损失程度一致度,识别所述疑似根因维度是否为根因维度,其中,所述根因维度是造成流量损失的根因所对应的数据维度。On the other hand, an embodiment of the present invention provides a data analysis device for multi-dimensional data. FIG. 6 is an overall frame diagram of a data analysis device for multi-dimensional data according to an embodiment of the present invention. As shown in FIG. 6 , the data analysis device for multi-dimensional data according to the embodiment of the present invention includes: a traffic acquisition unit 100 for acquiring normal traffic values and abnormal traffic values of each dimension in the dimension combination of multi-dimensional data; a dimension screening unit 200 , for inputting the combination of dimensions of multidimensional data and the normal flow value and abnormal flow value of the combination of dimensions into a decision tree, and using the decision tree to screen out suspected root cause dimensions from the combination of dimensions of multidimensional data; features The calculation unit 300 is used to calculate the contribution degree of the suspected root cause dimension and the consistency degree of the sub-dimension loss degree; and the identification unit 400 is used to calculate the contribution degree of the suspected root cause dimension and the consistency degree of the sub-dimension loss degree, Identifying whether the suspected root cause dimension is a root cause dimension, wherein the root cause dimension is a data dimension corresponding to the root cause of traffic loss.
图7示出根据本发明另一实施例的多维度数据的数据分析装置的结构框图。如图7所示,根据本发明多维度数据的数据分析装置的一种实施方式,所述流量获取单元100包括:监控子单元110,用于监控所述多维度数据的总流量;以及获取子单元120,用于:若监控到预设时间段内的所述多维度数据的总流量有流量损失,则获取所述预设时间段内的多维度数据的各维度的正常流量值和异常流量值。Fig. 7 shows a structural block diagram of a data analysis device for multi-dimensional data according to another embodiment of the present invention. As shown in FIG. 7 , according to an embodiment of the multidimensional data data analysis device of the present invention, the flow acquisition unit 100 includes: a monitoring subunit 110 for monitoring the total flow of the multidimensional data; The unit 120 is configured to: if it is monitored that the total flow of the multi-dimensional data within the preset time period has a flow loss, then obtain the normal flow value and abnormal flow of each dimension of the multi-dimensional data within the preset time period value.
根据本发明多维度数据的数据分析装置的一种实施方式,所述获取子单元 120还用于:将获取的所述预设时间段内的各维度的流量数据值与指定时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis device of the present invention, the acquisition subunit 120 is further configured to: combine the acquired flow data values of each dimension within the preset time period with each dimension within the specified time period The difference between the traffic data values of the dimensions is determined as the abnormal traffic value of each dimension.
根据本发明多维度数据的数据分析装置的一种实施方式,所述获取子单元 120还用于:统计所述预设时间段内的各维度的失败访问的次数,其中,将所述预设时间段内的没有收到回复信息的访问作为失败访问;以及将各维度的访问失败的次数确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis device of the present invention, the acquisition subunit 120 is further configured to: count the number of failed accesses of each dimension within the preset time period, wherein the preset Accesses that do not receive reply information within a time period are regarded as failed accesses; and the number of access failures in each dimension is determined as the abnormal traffic value of each dimension.
根据本发明多维度数据的数据分析装置的一种实施方式,所述获取子单元 120还用于:预测所述预设时间段内的各维度的流量数据值;将获取的所述预设时间段内的各维度的流量数据值与预测的所述预设时间段内的各维度的流量数据值的差值确定为所述各维度的异常流量值。According to an embodiment of the multi-dimensional data data analysis device of the present invention, the acquisition subunit 120 is also used to: predict the flow data value of each dimension within the preset time period; the preset time to be acquired The difference between the flow data value of each dimension in the segment and the predicted flow data value of each dimension in the preset time period is determined as the abnormal flow value of each dimension.
根据本发明多维度数据的数据分析装置的一种实施方式,所述维度筛选单元200还用于:将多维度数据的维度组合的异常流量值作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重;平衡正负例样本权重,以使初始状态下正负例样本权重相当;根据平衡后的正负例样本权重计算每个维度的信息增益率,选择信息增益率最大的维度进行划分,构造所述决策树;以及将构造的所述决策树的路径确定为疑似根因维度。According to an embodiment of the multi-dimensional data data analysis device of the present invention, the dimension screening unit 200 is further configured to: use the abnormal flow value of the dimension combination of the multi-dimensional data as the weight of the dimension combination in the positive example set, and The normal flow value of the dimension combination of multi-dimensional data is used as the weight of the dimension combination in the negative example set; balance the positive and negative sample weights so that the positive and negative sample weights are equal in the initial state; according to the balanced positive and negative sample weights calculating the information gain rate of each dimension, selecting the dimension with the largest information gain rate for division, and constructing the decision tree; and determining the path of the constructed decision tree as the suspected root cause dimension.
根据本发明多维度数据的数据分析装置的一种实施方式,所述平衡正负例样本权重包括:将多维度数据的维度组合的异常流量值与平衡系数的乘积作为所述维度组合在正例集合的权重,将多维度数据的维度组合的正常流量值作为所述维度组合在负例集合的权重,其中,所述平衡系数是多维度数据的各维度的正常流量值的总和与各维度的异常流量值的总和之比。According to an embodiment of the multi-dimensional data data analysis device of the present invention, the balancing of positive and negative sample weights includes: taking the product of the abnormal flow value and the balance coefficient of the dimension combination of the multi-dimensional data as the dimension combination in the positive sample The weight of the set, the normal flow value of the dimension combination of the multi-dimensional data is used as the weight of the combination of the dimensions in the negative example set, wherein the balance coefficient is the sum of the normal flow values of each dimension of the multi-dimensional data and the sum of each dimension The ratio of the sum of abnormal flow values.
参见图6,根据本发明多维度数据的数据分析装置的一种实施方式,所述识别单元400还用于:将计算出的所述疑似根因维度的贡献度和子维度损失程度一致度输入到分类器,对所述疑似根因维度是否为根因维度进行分类。Referring to FIG. 6 , according to an embodiment of the multi-dimensional data data analysis device of the present invention, the identification unit 400 is further configured to: input the calculated contribution degree of the suspected root cause dimension and the consistency degree of sub-dimension loss degree into A classifier, for classifying whether the suspected root cause dimension is a root cause dimension.
本发明实施例的装置中各模块的功能可以参见上述方法的相关描述,在此不再赘述。For the functions of each module in the device of the embodiment of the present invention, reference may be made to the relevant description of the above method, and details are not repeated here.
另一方面,本发明实施例提供了一种多维度数据的数据分析装置,包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述多维度数据的数据分析方法中任一所述的方法。On the other hand, an embodiment of the present invention provides a data analysis device for multi-dimensional data, including: one or more processors; a storage device for storing one or more programs; when the one or more programs are When the one or more processors are executed, the one or more processors are made to implement the method described in any one of the above data analysis methods for multi-dimensional data.
在一个可能的设计中,多维度数据的数据分析装置的结构中包括处理器和存储器,所述存储器用于存储支持多维度数据的数据分析装置执行上述多维度数据的数据分析方法的程序,所述处理器被配置为用于执行所述存储器中存储的程序。所述多维度数据的数据分析装置还可以包括通信接口,用于多维度数据的数据分析装置与其他设备或通信网络通信。In a possible design, the structure of the data analysis device for multi-dimensional data includes a processor and a memory, and the memory is used to store a program that supports the data analysis device for multi-dimensional data to execute the above-mentioned data analysis method for multi-dimensional data, so The processor is configured to execute programs stored in the memory. The data analysis device for multi-dimensional data may also include a communication interface, for the data analysis device for multi-dimensional data to communicate with other devices or a communication network.
图8示出根据本发明另一实施例的多维度数据的数据分析装置的结构框图。如图8所示,该图像处理的装置包括:存储器910和处理器920,存储器 910内存储有可在处理器920上运行的计算机程序。所述处理器920执行所述计算机程序时实现上述实施例中的多维度数据的数据分析方法。所述存储器 910和处理器920的数量可以为一个或多个。Fig. 8 shows a structural block diagram of a data analysis device for multi-dimensional data according to another embodiment of the present invention. As shown in FIG. 8 , the image processing device includes: a memory 910 and a processor 920, and the memory 910 stores a computer program that can run on the processor 920. When the processor 920 executes the computer program, the data analysis method for multi-dimensional data in the foregoing embodiments is realized. The number of the memory 910 and the processor 920 may be one or more.
该多维度数据的数据分析装置还包括:The data analysis device for the multi-dimensional data also includes:
通信接口930,用于与外界设备进行通信,进行数据交互传输。The communication interface 930 is used for communicating with external devices for interactive data transmission.
存储器910可能包含高速RAM存储器,也可能还包括非易失性存储器 (non-volatile memory),例如至少一个磁盘存储器。The memory 910 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
如果存储器910、处理器920和通信接口930独立实现,则存储器910、处理器920和通信接口930可以通过总线相互连接并完成相互间的通信。所述总线可以是工业标准体系结构(ISA,Industry Standard Architecture)总线、外部设备互连(PCI,PeripheralComponent)总线或扩展工业标准体系结构(EISA, Extended Industry StandardComponent)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If the memory 910, the processor 920, and the communication interface 930 are independently implemented, the memory 910, the processor 920, and the communication interface 930 may be connected to each other through a bus to complete mutual communication. The bus may be an Industry Standard Architecture (ISA, Industry Standard Architecture) bus, a Peripheral Component Interconnect (PCI, Peripheral Component) bus, or an Extended Industry Standard Architecture (EISA, Extended Industry Standard Component) bus, and the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 8 , but it does not mean that there is only one bus or one type of bus.
可选的,在具体实现上,如果存储器910、处理器920及通信接口930集成在一块芯片上,则存储器910、处理器920及通信接口930可以通过内部接口完成相互间的通信。Optionally, in specific implementation, if the memory 910, the processor 920, and the communication interface 930 are integrated on one chip, the memory 910, the processor 920, and the communication interface 930 may communicate with each other through an internal interface.
又一方面,本发明实施例提供了一种计算机可读存储介质,其存储有计算机程序,该程序被处理器执行时实现上述实施例中任一所述的方法。In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, implements the method described in any one of the above-mentioned embodiments.
上述技术方案具有如下优点或有益效果:可以在发生故障时,根据故障指标的多维度数据,快速分析出根因维度,节省运维人员定位故障的时间,减少故障带来的损失。The above technical solution has the following advantages or beneficial effects: when a fault occurs, the root cause dimension can be quickly analyzed according to the multi-dimensional data of the fault index, saving the time of operation and maintenance personnel to locate the fault, and reducing the loss caused by the fault.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means two or more, unless otherwise specifically defined.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统) 使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment used. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. processing to obtain the program electronically and store it in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA) 等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读存储介质中。所述存储介质可以是只读存储器,磁盘或光盘等。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到其各种变化或替换,这些都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of its various changes or modifications within the technical scope disclosed in the present invention. Replacement, these should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810400910.7ACN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method, device and storage medium for multi-dimensional data |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810400910.7ACN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method, device and storage medium for multi-dimensional data |
| Publication Number | Publication Date |
|---|---|
| CN108683530Atrue CN108683530A (en) | 2018-10-19 |
| CN108683530B CN108683530B (en) | 2021-06-01 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810400910.7AActiveCN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method, device and storage medium for multi-dimensional data |
| Country | Link |
|---|---|
| CN (1) | CN108683530B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109858821A (en)* | 2019-02-14 | 2019-06-07 | 金瓜子科技发展(北京)有限公司 | A kind of influence feature determines method, apparatus, equipment and medium |
| CN110009012A (en)* | 2019-03-20 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of risk specimen discerning method, apparatus and electronic equipment |
| CN110995524A (en)* | 2019-10-28 | 2020-04-10 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
| CN111064614A (en)* | 2019-12-17 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Fault root cause positioning method, device, equipment and storage medium |
| CN111209179A (en)* | 2020-04-23 | 2020-05-29 | 成都四方伟业软件股份有限公司 | Method, device and system for collecting and analyzing system operation and maintenance data |
| CN111241128A (en)* | 2020-01-21 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
| CN111314173A (en)* | 2020-01-20 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
| CN112015995A (en)* | 2020-09-29 | 2020-12-01 | 北京百度网讯科技有限公司 | Data analysis method, device, equipment and storage medium |
| CN113220796A (en)* | 2020-01-21 | 2021-08-06 | 北京达佳互联信息技术有限公司 | Abnormal business index analysis method and device |
| CN113535444A (en)* | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
| CN113746798A (en)* | 2021-07-14 | 2021-12-03 | 清华大学 | A method for locating abnormal root causes of cloud network shared resources based on multi-dimensional analysis |
| CN114371950A (en)* | 2020-10-15 | 2022-04-19 | 中国移动通信集团浙江有限公司 | Root cause positioning method and device for application service abnormity |
| CN114443336A (en)* | 2022-01-27 | 2022-05-06 | 北京达佳互联信息技术有限公司 | Abnormal root cause positioning method and device, electronic equipment and storage medium |
| CN114781822A (en)* | 2022-04-01 | 2022-07-22 | 深圳市创梦天地科技有限公司 | A data analysis method, system and related device |
| CN114900835A (en)* | 2022-04-20 | 2022-08-12 | 广州爱浦路网络技术有限公司 | Malicious traffic intelligent detection method and device and storage medium |
| CN114971110A (en)* | 2021-02-26 | 2022-08-30 | 腾讯科技(深圳)有限公司 | Method for determining root combination, related device, equipment and storage medium |
| CN115578078A (en)* | 2022-11-15 | 2023-01-06 | 云智慧(北京)科技有限公司 | Data processing method, device and equipment of operation and maintenance system |
| CN116227995A (en)* | 2023-02-06 | 2023-06-06 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3110198A2 (en)* | 2015-06-22 | 2016-12-28 | Accenture Global Services Limited | Wi-fi access points performance management |
| CN106874574A (en)* | 2017-01-22 | 2017-06-20 | 清华大学 | Mobile solution performance bottleneck analysis method and device based on decision tree |
| CN107025154A (en)* | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
| CN107154880A (en)* | 2016-03-03 | 2017-09-12 | 阿里巴巴集团控股有限公司 | system monitoring method and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3110198A2 (en)* | 2015-06-22 | 2016-12-28 | Accenture Global Services Limited | Wi-fi access points performance management |
| CN107025154A (en)* | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
| CN107154880A (en)* | 2016-03-03 | 2017-09-12 | 阿里巴巴集团控股有限公司 | system monitoring method and device |
| CN106874574A (en)* | 2017-01-22 | 2017-06-20 | 清华大学 | Mobile solution performance bottleneck analysis method and device based on decision tree |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109858821A (en)* | 2019-02-14 | 2019-06-07 | 金瓜子科技发展(北京)有限公司 | A kind of influence feature determines method, apparatus, equipment and medium |
| CN110009012A (en)* | 2019-03-20 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of risk specimen discerning method, apparatus and electronic equipment |
| CN110995524A (en)* | 2019-10-28 | 2020-04-10 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
| CN110995524B (en)* | 2019-10-28 | 2022-06-14 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
| CN111064614A (en)* | 2019-12-17 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Fault root cause positioning method, device, equipment and storage medium |
| CN111314173B (en)* | 2020-01-20 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
| CN111314173A (en)* | 2020-01-20 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
| CN111241128A (en)* | 2020-01-21 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
| CN113220796A (en)* | 2020-01-21 | 2021-08-06 | 北京达佳互联信息技术有限公司 | Abnormal business index analysis method and device |
| CN113535444B (en)* | 2020-04-14 | 2023-11-03 | 中国移动通信集团浙江有限公司 | Abnormal motion detection method, device, computing equipment and computer storage medium |
| CN113535444A (en)* | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
| CN111209179A (en)* | 2020-04-23 | 2020-05-29 | 成都四方伟业软件股份有限公司 | Method, device and system for collecting and analyzing system operation and maintenance data |
| CN112015995B (en)* | 2020-09-29 | 2024-08-16 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for data analysis |
| CN112015995A (en)* | 2020-09-29 | 2020-12-01 | 北京百度网讯科技有限公司 | Data analysis method, device, equipment and storage medium |
| CN114371950A (en)* | 2020-10-15 | 2022-04-19 | 中国移动通信集团浙江有限公司 | Root cause positioning method and device for application service abnormity |
| CN114971110A (en)* | 2021-02-26 | 2022-08-30 | 腾讯科技(深圳)有限公司 | Method for determining root combination, related device, equipment and storage medium |
| CN113746798A (en)* | 2021-07-14 | 2021-12-03 | 清华大学 | A method for locating abnormal root causes of cloud network shared resources based on multi-dimensional analysis |
| CN114443336A (en)* | 2022-01-27 | 2022-05-06 | 北京达佳互联信息技术有限公司 | Abnormal root cause positioning method and device, electronic equipment and storage medium |
| CN114781822A (en)* | 2022-04-01 | 2022-07-22 | 深圳市创梦天地科技有限公司 | A data analysis method, system and related device |
| CN114900835A (en)* | 2022-04-20 | 2022-08-12 | 广州爱浦路网络技术有限公司 | Malicious traffic intelligent detection method and device and storage medium |
| CN115578078A (en)* | 2022-11-15 | 2023-01-06 | 云智慧(北京)科技有限公司 | Data processing method, device and equipment of operation and maintenance system |
| CN116227995A (en)* | 2023-02-06 | 2023-06-06 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
| CN116227995B (en)* | 2023-02-06 | 2023-09-12 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
| Publication number | Publication date |
|---|---|
| CN108683530B (en) | 2021-06-01 |
| Publication | Publication Date | Title |
|---|---|---|
| CN108683530A (en) | Data analysing method, device and the storage medium of multi-dimensional data | |
| JP6822509B2 (en) | Data processing methods and electronic devices | |
| CN104021264B (en) | A kind of failure prediction method and device | |
| JP2021518024A (en) | How to generate data for machine learning algorithms, systems | |
| US20180113928A1 (en) | Multiple record linkage algorithm selector | |
| CN118761745B (en) | OA collaborative workflow optimization method applied to enterprise | |
| CN110147367B (en) | A method, system and electronic device for filling missing temperature data | |
| CN107168995B (en) | Data processing method and server | |
| CN108650684A (en) | A kind of correlation rule determines method and device | |
| CN113516174B (en) | Call chain anomaly detection method, computer device, and readable storage medium | |
| CN117891811B (en) | Customer data acquisition and analysis method and device and cloud server | |
| US20210182293A1 (en) | Candidate projection enumeration based query response generation | |
| US8909768B1 (en) | Monitoring of metrics to identify abnormalities in a large scale distributed computing environment | |
| CN115442242A (en) | Workflow arrangement system and method based on importance ordering | |
| CN113987186B (en) | Method and device for generating marketing scheme based on knowledge graph | |
| CN114510405B (en) | Index data evaluation method, apparatus, device, storage medium, and program product | |
| CN115796704A (en) | Goods and materials sampling inspection method and device based on LightGBM index model | |
| CN109977030A (en) | A kind of test method and equipment of depth random forest program | |
| CN118819941A (en) | Fault diagnosis method, device, equipment, storage medium and program product | |
| CN118917390A (en) | Knowledge base management system and method based on knowledge big model | |
| CN112243247B (en) | Base station optimization priority determination method, device and computing equipment | |
| CN109245948B (en) | Security-aware virtual network mapping method and device | |
| CN117422545A (en) | Credit risk identification method, apparatus, device and storage medium | |
| CN118331831A (en) | Application system performance evaluation method, device, electronic device and storage medium | |
| JP5640796B2 (en) | Name identification support processing apparatus, method and program |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |