Movatterモバイル変換


[0]ホーム

URL:


CN106055452A - Method and apparatus for creating switch log template - Google Patents

Method and apparatus for creating switch log template
Download PDF

Info

Publication number
CN106055452A
CN106055452ACN201610355129.3ACN201610355129ACN106055452ACN 106055452 ACN106055452 ACN 106055452ACN 201610355129 ACN201610355129 ACN 201610355129ACN 106055452 ACN106055452 ACN 106055452A
Authority
CN
China
Prior art keywords
log
message type
keywords
tree
switch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610355129.3A
Other languages
Chinese (zh)
Other versions
CN106055452B (en
Inventor
董辉
宋磊
侯翔宇
孟伟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610355129.3ApriorityCriticalpatent/CN106055452B/en
Publication of CN106055452ApublicationCriticalpatent/CN106055452A/en
Application grantedgrantedCritical
Publication of CN106055452BpublicationCriticalpatent/CN106055452B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请公开了创建交换机日志模板的方法和装置。所述方法的一具体实施方式包括:获取一个型号的交换机的原始日志;获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。该实施方式创建交换机日志模板,使用该模板对日志进行压缩。

The present application discloses a method and a device for creating a switch log template. A specific implementation of the method includes: obtaining the original log of a switch of a model; obtaining the message type and detailed message in the original log, if the message type cannot be obtained, then creating a clustering label as the message type; Segment the detailed message to obtain keywords; reorder the keywords according to the word frequency of the keywords, and arrange the keywords with high word frequency in front; create a multi-fork tree according to the reordered keywords, each The message type is used as the root node of the tree, and the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree; the depth-first traverses the multi-fork tree, and creates all log template corresponding to each message type of the switch of the above model. In this embodiment, a switch log template is created, and the log is compressed using the template.

Description

Translated fromChinese
创建交换机日志模板的方法和装置Method and device for creating switch log template

技术领域technical field

本申请涉及计算机技术领域,具体涉及互联网技术领域,尤其涉及创建交换机日志模板的方法和装置。The present application relates to the field of computer technology, specifically to the field of Internet technology, and in particular to a method and device for creating a switch log template.

背景技术Background technique

交换机日志是交换机设备在存活周期内产生的重要数据,通过交换机日志可以获取交换机的状态,包括端口的抖动、协议的抖动、板卡故障、电源故障等,而传统的设备监控系统都是基于交换机日志,通过特定的规则匹配日志,从而发现故障点并进行报警。The switch log is important data generated by the switch device during its life cycle. The status of the switch can be obtained through the switch log, including port jitter, protocol jitter, board failure, power failure, etc., while traditional equipment monitoring systems are based on switch Logs, match logs through specific rules, so as to find fault points and give alarms.

随着互联网数据的爆炸性增长以及业务的增多,很多互联网企业对自建网络的投入也越来越大,IDC(Internet Data Center,互联网数据中心)中涉及到的网络设备也越来越多,尤其是交换机设备。With the explosive growth of Internet data and the increase of business, many Internet companies are investing more and more in self-built networks, and more and more network devices are involved in IDC (Internet Data Center, Internet Data Center). is a switch device.

一个典型的案例是在企业自建IDC中,存在大规模的交换机设备,这些设备分别来自不同的厂商、存在多种型号,同时分布在多个IDC中。过多的设备会产生大规模的交换机日志,对于报警规则的挖掘以及排障都带来了困难,需要一种自动化手段对日志进行压缩。A typical case is that in an enterprise's self-built IDC, there are large-scale switch devices. These devices come from different manufacturers and have multiple models, and are distributed in multiple IDCs at the same time. Too many devices will generate large-scale switch logs, which brings difficulties in mining alarm rules and troubleshooting. An automatic method is needed to compress the logs.

发明内容Contents of the invention

本申请的目的在于提出一种创建交换机日志模板的方法和装置,来解决以上背景技术部分提到的技术问题。The purpose of this application is to propose a method and device for creating switch log templates to solve the technical problems mentioned in the above background technology section.

第一方面,本申请提供了创建交换机日志模板的方法,所述方法包括:获取一个型号的交换机的原始日志;获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。In a first aspect, the present application provides a method for creating a switch log template, the method comprising: obtaining the original log of a switch of a model; obtaining the message type and detailed message in the original log, if the message type cannot be obtained, then Create a clustering tag as the message type; segment the detailed message into words to obtain keywords; reorder the keywords according to the word frequency of the keywords, and rank keywords with high word frequencies in front; according to the reordering keywords to create a multi-fork tree, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree; depth-first traversal of the multi-fork tree, according to A log template corresponding to each message type of the model switch is created for the path in the multi-fork tree.

在一些实施例中,所述方法还包括:获取一个型号的交换机的新增日志;获取所述新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;使用所述日志模板过滤所述新增日志;将所述日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的新增日志模板。In some embodiments, the method further includes: obtaining a new log of a switch of a model; obtaining the message type and detailed message in the new log, and if the message type cannot be obtained, creating a clustering label as a message type; use the log template to filter the new log; segment the detailed messages in the new log that cannot be matched by the log template to obtain keywords; reorder the keywords according to the word frequency of the keywords , the keywords with high word frequency are arranged in front; a multi-fork tree is created according to the reordered keywords, each message type is used as the root node of the tree, and the reordered keywords are used as the nodes of the tree, and each message type Corresponding to a multi-fork tree; depth-first traversal of the multi-fork tree, and creating a new log template corresponding to each message type of the model switch according to the path in the multi-fork tree.

在一些实施例中,如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。In some embodiments, if the number of child nodes of a node in the multi-fork tree exceeds a threshold number of nodes, all child nodes of the node are deleted, and the node is used as the last child node.

在一些实施例中,所述创建聚类标签作为消息类型,包括:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。In some embodiments, the creating a clustering label as a message type includes: dividing each log of an unknown message type into five categories according to semantics and assigning weight values respectively, and the five categories include: only numbers or numbers And symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols; extract the frequency of the five types of semantics in the log, and convert the log into a word frequency vector with a fixed length of five; calculate the word frequency The similarity between the vector and the known message type set is obtained to obtain a set of similarity results. If the maximum similarity is greater than or equal to the preset similarity threshold, the log will be classified into the corresponding message type; if the maximum similarity is less than the preset If the similarity threshold is set, the word frequency vector is used as a new message type.

在一些实施例中,当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束。In some embodiments, when a log template is a subset of another log template, whether a node is the end of a path is identified by marking the node.

在一些实施例中,将每个消息类型中的日志模板按照树的深度排序,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的日志模板进行匹配。In some embodiments, the log templates in each message type are sorted according to the depth of the tree, and when one log template is a subset of another log template, the log template with a larger tree depth is preferentially used for matching.

第二方面,本申请提供了一种创建交换机日志模板的装置,其特征在于,所述装置包括:获取单元,配置用于获取一个型号的交换机的原始日志;解析单元,配置用于获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;处理单元,配置用于对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;创建单元,配置用于根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。In a second aspect, the present application provides a device for creating a switch log template, characterized in that the device includes: an acquisition unit configured to obtain an original log of a switch model; an analysis unit configured to obtain the The message type and detailed message in the original log, if the message type cannot be obtained, then create a clustering label as the message type; the processing unit is configured to perform word segmentation on the detailed message to obtain keywords; according to the keyword The word frequency reorders the keywords, and the keywords with high word frequency are arranged in front; the creation unit is configured to create a multi-fork tree according to the reordered keywords, and each message type is used as a root node of the tree, and the The reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree; the depth-first traverses the multi-fork tree, and creates the model corresponding to each message type of the switch according to the path in the multi-fork tree diary template.

在一些实施例中,所述装置还配置用于:获取一个型号的交换机的新增日志;获取所述新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;使用所述日志模板过滤所述新增日志;将所述日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的新增日志模板。In some embodiments, the device is further configured to: obtain a newly added log of a switch of a model; obtain the message type and detailed message in the newly added log, and create a clustering label if the message type cannot be obtained As a message type; use the log template to filter the newly added log; perform word segmentation on the detailed message in the newly added log that cannot be matched by the log template to obtain keywords; Reordering, the keywords with high word frequency are arranged in front; create a multi-fork tree according to the reordering keywords, each message type is used as the root node of the tree, and the reordering keywords are used as the nodes of the tree, each The message type corresponds to a multi-fork tree; depth-first traverses the multi-fork tree, and creates a new log template corresponding to each message type of the model switch according to the path in the multi-fork tree.

在一些实施例中,所述创建单元还配置用于:如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。In some embodiments, the creating unit is further configured to: delete all child nodes of the node if the number of child nodes of a node in the multi-fork tree exceeds a threshold number of nodes, and the node serves as the last child node .

在一些实施例中,所述创建聚类标签作为消息类型,包括:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。In some embodiments, the creating a clustering label as a message type includes: dividing each log of an unknown message type into five categories according to semantics and assigning weight values respectively, and the five categories include: only numbers or numbers And symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols; extract the frequency of the five types of semantics in the log, and convert the log into a word frequency vector with a fixed length of five; calculate the word frequency The similarity between the vector and the known message type set is obtained to obtain a set of similarity results. If the maximum similarity is greater than or equal to the preset similarity threshold, the log will be classified into the corresponding message type; if the maximum similarity is less than the preset If the similarity threshold is set, the word frequency vector is used as a new message type.

在一些实施例中,所述创建单元还配置用于:当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束。In some embodiments, the creation unit is further configured to: when a log template is a subset of another log template, identify whether a node is the end of a path by marking the node.

在一些实施例中,所述创建单元还配置用于:将每个消息类型中的模板按照树的深度排序,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的模板进行匹配。In some embodiments, the creating unit is further configured to: sort the templates in each message type according to the depth of the tree, and when a log template is a subset of another log template, the tree with a larger depth is preferred template to match.

本申请提供的创建交换机日志模板的方法和装置,通过获取所述原始日志中的消息类型和详细消息,根据详细消息中的关键词创建多叉树,根据多叉树中的路径创建交换机的每个消息类型所对应的日志模板,以采用增量迭代训练方式对交换机日志进行压缩。The method and device for creating a switch log template provided by the present application, by obtaining the message type and detailed message in the original log, creating a multi-fork tree according to the keywords in the detailed message, and creating each Log templates corresponding to each message type are used to compress the switch logs by means of incremental iterative training.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other characteristics, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

图2是根据本申请的创建交换机日志模板的方法的一个实施例的流程图;Fig. 2 is the flow chart of an embodiment of the method for creating switch log template according to the present application;

图3是根据本申请的创建交换机日志模板的方法的词频向量的示意图;Fig. 3 is the schematic diagram of the word frequency vector according to the method for creating switch log template of the present application;

图4a、4b和4c是根据本申请的创建交换机日志模板的方法的一个应用场景的示意图;4a, 4b and 4c are schematic diagrams of an application scenario according to the method for creating a switch log template according to the present application;

图5是根据本申请的创建交换机日志模板的方法的又一个实施例的流程图;Fig. 5 is the flow chart of another embodiment of the method for creating switch log template according to the present application;

图6是根据本申请的创建交换机日志模板的装置的一个实施例的结构示意图;FIG. 6 is a schematic structural diagram of an embodiment of a device for creating a switch log template according to the present application;

图7是适于用来实现本申请实施例的服务器的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system suitable for implementing the server of the embodiment of the present application.

具体实施方式detailed description

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

图1示出了可以应用本申请的创建交换机日志模板的方法或创建交换机日志模板的装置的实施例的示例性系统架构100。Fig. 1 shows an exemplary system architecture 100 to which an embodiment of the method for creating a switch log template or the device for creating a switch log template of the present application can be applied.

如图1所示,系统架构100可以包括交换机101、102、103,网络104和服务器105。网络104用以在交换机101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include switches 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the switches 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

交换机101、102、103的日志通过网络104传输到服务器105。交换机101、102、103上可以安装有各种日志采集工具的客户端,例如安装有可以将其内部的日志信息传输到远程的日志服务器的rsyslog等。The logs of the switches 101 , 102 , 103 are transmitted to the server 105 through the network 104 . Clients of various log collection tools can be installed on the switches 101, 102, and 103, for example, rsyslog, which can transmit its internal log information to a remote log server, can be installed.

交换机101、102、103可以是为接入交换机的任意两个网络节点提供独享的电信号通路的网络设备,包括但不限于以太网交换机、快速以太网交换机、千兆以太网交换机、FDDI交换机、ATM交换机和令牌环交换机等。The switches 101, 102, and 103 can be network devices that provide exclusive electrical signal paths for any two network nodes connected to the switches, including but not limited to Ethernet switches, Fast Ethernet switches, Gigabit Ethernet switches, and FDDI switches , ATM switches and Token Ring switches, etc.

服务器105可以是提供各种服务的服务器,例如采集交换机101、102、103的日志,对采集到的日志进行结构化处理并创建日志模板的服务器。The server 105 may be a server that provides various services, such as collecting logs of the switches 101, 102, and 103, performing structured processing on the collected logs, and creating a log template.

需要说明的是,本申请实施例所提供的创建交换机日志模板的方法一般由服务器105执行,相应地,创建交换机日志模板的装置一般设置于服务器105中。It should be noted that the method for creating a switch log template provided in the embodiment of the present application is generally executed by the server 105 , and correspondingly, the device for creating a switch log template is generally set in the server 105 .

应该理解,图1中的交换机、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的交换机、网络和服务器。It should be understood that the numbers of switches, networks and servers in Figure 1 are merely illustrative. There can be as many switches, networks, and servers as the implementation requires.

继续参考图2,示出了根据本申请的创建交换机日志模板的方法的一个实施例的流程200。所述的创建交换机日志模板的方法,包括以下步骤:Continuing to refer to FIG. 2 , a flow 200 of an embodiment of the method for creating a switch log template according to the present application is shown. The described method for creating switch log template includes the following steps:

步骤201,获取一个型号的交换机的原始日志。In step 201, the original log of a switch of a model is obtained.

在本实施例中,创建交换机日志模板的方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从交换机获取原始日志。In this embodiment, the electronic device (such as the server shown in FIG. 1 ) on which the method for creating a switch log template runs can obtain the original log from the switch through a wired connection or a wireless connection.

步骤202,获取原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型。Step 202, obtain the message type and detailed message in the original log, if the message type cannot be obtained, create a cluster label as the message type.

在本实施例中,获取原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型。其中,所述详细消息需要去除变量。In this embodiment, the message type and detailed message in the original log are obtained, and if the message type cannot be obtained, a cluster label is created as the message type. Wherein, the detailed message needs to remove variables.

在本实施例的一些可选的实现方式中,所述创建聚类标签作为消息类型,包括:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号,如表1中所示;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量,如图3所示;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。In some optional implementations of this embodiment, the creating clustering labels as message types includes: dividing each log with an unknown message type into five categories according to semantics and assigning weight values respectively, the five categories Categories include: only numbers or numbers and symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols, as shown in Table 1; extract the frequency of the five categories of semantics in the log, and convert the log It is a fixed five-length word frequency vector, as shown in Figure 3; calculate the similarity between the word frequency vector and the known message type set, and obtain a set of similarity results, if the maximum similarity is greater than or equal to the preset similarity threshold, then classify the log into the corresponding message type; if the maximum similarity is less than the preset similarity threshold, then use the word frequency vector as a new message type.

表1Table 1

步骤203,对详细消息进行分词,得到关键词。Step 203, segment the detailed message into words to obtain keywords.

在本实施例中,采用Luene进行分词,得到关键词。In this embodiment, Luene is used for word segmentation to obtain keywords.

步骤204,根据关键词的词频对关键词重新排序,将词频高的关键词排在前面。In step 204, the keywords are reordered according to their word frequency, and the keywords with high word frequency are ranked first.

在本实施例中,根据关键词的词频对关键词重新排序,将词频高的关键词排在前面。In this embodiment, the keywords are reordered according to their word frequency, and the keywords with high word frequency are ranked first.

步骤205,根据重新排序的关键词创建多叉树。Step 205, create a multi-fork tree according to the reordered keywords.

在本实施例中,根据步骤204中重新排序的关键词创建多叉树。In this embodiment, a multi-tree is created according to the keywords reordered in step 204 .

在本实施例的一些可选的实现方式中,如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。例如,如果一个节点的子节点超过10个(经验值),则砍掉该节点的所有子节点,该节点作为最后一个子节点。这样做的目的是防止模板膨胀,因为针对每一个消息类型,一般不会超过10个有效的状态。例如,以登录日志为例,处理用户名其它部分关键词一致,如果不进行子节点数量的限制,就会导致过多的模板表示同一含义。In some optional implementation manners of this embodiment, if the number of child nodes of a node in the multi-fork tree exceeds a threshold number of nodes, all child nodes of the node are deleted, and the node is used as the last child node. For example, if a node has more than 10 child nodes (experience value), all child nodes of the node are cut off, and this node is used as the last child node. The purpose of this is to prevent template bloat, because there are generally no more than 10 valid states for each message type. For example, taking the login log as an example, the keywords in other parts of the user name are consistent. If the number of child nodes is not limited, too many templates will express the same meaning.

步骤206,深度优先遍历多叉树,根据多叉树中的路径创建该型号交换机的每个消息类型所对应的日志模板。Step 206, depth-first traverses the multi-fork tree, and creates a log template corresponding to each message type of the switch of the model according to the paths in the multi-fork tree.

在本实施例中,深度优先遍历步骤205创建的多叉树,根据多叉树中的路径创建该型号交换机的每个消息类型所对应的日志模板。In this embodiment, the multi-fork tree created in step 205 is traversed depth-first, and a log template corresponding to each message type of the switch of this model is created according to the paths in the multi-fork tree.

在本实施例的一些可选的实现方式中,当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束(该节点可能不是最长路径的结束)。In some optional implementations of this embodiment, when a log template is a subset of another log template, it is marked whether a node is the end of a path (this node may not be the end of a path) by labeling the node. end of the long path).

在本实施例的一些可选的实现方式中,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的日志模板进行匹配。在节点不是最长路径的结束的情况下,用路径最长的匹配。In some optional implementation manners of this embodiment, when one log template is a subset of another log template, the log template with a larger tree depth is preferentially used for matching. In the case where the node is not the end of the longest path, match with the longest path.

继续参见图4a-4c,图4a-4c是根据本实施例的创建交换机日志模板的方法的应用场景的一个示意图。在图4a-4c的应用场景中,在图4a中,图左侧为经过按照词频排序后的日志,图右侧为构造出的多叉树。在图4b中,节点down的子节点数目多于10个,因此删除down的子节点,并将down作为最后一个子节点。在图4c中,节点up是一条路径的结束,对其打上标签,但它不是最长路径,第一模板的路径比第二模板的路径长。采用模板进行日志匹配时优先采用第一模板进行匹配。Continuing to refer to FIG. 4a-4c, FIG. 4a-4c is a schematic diagram of an application scenario of the method for creating a switch log template according to this embodiment. In the application scenarios of Figures 4a-4c, in Figure 4a, the left side of the figure is the log sorted according to word frequency, and the right side of the figure is the constructed multi-fork tree. In Figure 4b, the number of child nodes of the node down is more than 10, so the child nodes of down are deleted, and down is set as the last child node. In Figure 4c, the node up is the end of a path, which is labeled, but it is not the longest path, the path of the first template is longer than the path of the second template. When using templates for log matching, the first template is preferred for matching.

本申请的上述实施例提供的方法通过对已经消息类型的日志进行分词处理得到关键词后创建多叉树,创建了交换机日志模板,可用于增量迭代训练方式压缩日志,日志压缩比可达到2000:1。The method provided by the above-mentioned embodiments of the present application creates a multi-fork tree by performing word segmentation on the log of the message type to obtain keywords, and creates a switch log template, which can be used to compress the log in an incremental iterative training mode, and the log compression ratio can reach 2000 :1.

进一步参考图5,其示出了创建交换机日志模板的方法的又一个实施例的流程500。该创建交换机日志模板的方法的流程500,包括以下步骤:Further referring to FIG. 5 , it shows a flow 500 of still another embodiment of the method for creating a switch log template. The process 500 of the method for creating a switch log template includes the following steps:

步骤501,获取一个型号的交换机的新增日志。Step 501 , acquiring a newly added log of a switch of a model.

在本实施例中,该步骤与步骤201基本相同,区别在于该步骤获取的是新增的日志。In this embodiment, this step is basically the same as step 201, except that this step obtains newly added logs.

步骤502,获取新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型。Step 502, obtain the message type and detailed message in the newly added log, if the message type cannot be obtained, create a cluster label as the message type.

步骤502与步骤202基本相同,在此不再赘述。Step 502 is basically the same as step 202, and will not be repeated here.

步骤503,使用日志模板过滤新增日志。Step 503, use the log template to filter the newly added log.

在本实施例中,使用在步骤206中创建的日志模板过滤新增日志,得到原日志模板无法匹配的日志进行增量训练。In this embodiment, the log template created in step 206 is used to filter the newly added logs to obtain logs that cannot be matched by the original log template for incremental training.

步骤504,将日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词。Step 504, segment the detailed message in the newly added log that cannot be matched by the log template to obtain keywords.

步骤504与步骤203基本相同,在此不再赘述。Step 504 is basically the same as step 203 and will not be repeated here.

步骤505,根据关键词的词频对关键词重新排序,将词频高的关键词排在前面。Step 505: Reorder the keywords according to their word frequency, and rank the keywords with high word frequency first.

步骤506,根据重新排序的关键词创建多叉树。Step 506, create a multi-fork tree according to the reordered keywords.

步骤507,深度优先遍历多叉树,根据多叉树中的路径创建该型号交换机的每个消息类型所对应的新增日志模板。Step 507, depth-first traverse the multi-fork tree, and create a new log template corresponding to each message type of the switch of the model according to the path in the multi-fork tree.

步骤505-507与步骤204-206基本相同,在此不再赘述。Steps 505-507 are basically the same as steps 204-206, and will not be repeated here.

从图5中可以看出,与图2对应的实施例相比,本实施例中的创建交换机日志模板的方法的流程500突出了对新增日志进行过滤的步骤。由此,本实施例描述的方案可以采用增量迭代训练的方式对日志进行压缩。It can be seen from FIG. 5 that, compared with the embodiment corresponding to FIG. 2 , the process 500 of the method for creating a switch log template in this embodiment highlights the step of filtering new logs. Therefore, the solution described in this embodiment can compress the logs by means of incremental iterative training.

进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种创建交换机日志模板的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present application provides an embodiment of a device for creating a switch log template, which corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.

如图6所示,本实施例所述的创建交换机日志模板的装置600包括:获取单元601、解析单元602、处理单元603和创建单元604。其中,获取单元601配置用于获取一个型号的交换机的原始日志;解析单元602配置用于获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;处理单元603配置用于对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;创建单元604配置用于根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。As shown in FIG. 6 , the apparatus 600 for creating a switch log template described in this embodiment includes: an acquisition unit 601 , an analysis unit 602 , a processing unit 603 and a creation unit 604 . Wherein, the obtaining unit 601 is configured to obtain the original log of a switch of a model; the parsing unit 602 is configured to obtain the message type and detailed information in the original log, and if the message type cannot be obtained, create a clustering label as the message type; the processing unit 603 is configured to perform word segmentation on the detailed message to obtain keywords; reorder the keywords according to the word frequency of the keywords, and rank the keywords with high word frequency first; the creation unit 604 is configured to use Create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree; depth-first traversal of all The multi-fork tree, creating a log template corresponding to each message type of the model switch according to the path in the multi-fork tree.

在本实施例的一些可选的实现方式中,该创建交换机日志模板的装置600还配置用于:获取一个型号的交换机的新增日志;获取所述新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;使用所述日志模板过滤所述新增日志;将所述日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的新增日志模板。In some optional implementations of this embodiment, the device 600 for creating a switch log template is further configured to: obtain a new log of a switch model; obtain the message type and detailed message in the new log, If the message type cannot be obtained, create a clustering label as the message type; use the log template to filter the new log; segment the detailed message in the new log that cannot be matched by the log template to obtain keywords; According to the word frequency of the keyword, the keywords are reordered, and the keywords with high word frequency are arranged in front; a multi-fork tree is created according to the reordered keywords, and each message type is used as a root node of the tree, and the The reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree; the depth-first traverses the multi-fork tree, and creates the model corresponding to each message type of the switch according to the path in the multi-fork tree New log template for .

在本实施例的一些可选的实现方式中,创建单元604还配置用于:如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。In some optional implementations of this embodiment, the creating unit 604 is further configured to: delete all child nodes of a node if the number of child nodes of a node in the multi-fork tree exceeds a threshold number of nodes, and the node as the last child node.

在本实施例的一些可选的实现方式中,创建聚类标签作为消息类型,包括:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。In some optional implementations of this embodiment, creating a clustering label as a message type includes: dividing each log of an unknown message type into five categories according to semantics and assigning weight values to them, the five categories include : Only numbers or numbers and symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols; extract the frequency of five types of semantics in the log, and convert the log into a fixed five-length word frequency vector ; Calculate the similarity between the word frequency vector and the known message type set to obtain a set of similarity results, if the maximum similarity is greater than or equal to the preset similarity threshold, then the log is classified into the corresponding message type; if If the maximum similarity is less than the preset similarity threshold, the word frequency vector is used as a new message type.

在本实施例的一些可选的实现方式中,创建单元604还配置用于:当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束。In some optional implementations of this embodiment, the creation unit 604 is further configured to: when a log template is a subset of another log template, identify whether a node is a path by labeling the node the end.

在本实施例的一些可选的实现方式中,创建单元604还配置用于:将每个消息类型中的模板按照树的深度排序,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的模板进行匹配。In some optional implementations of this embodiment, the creation unit 604 is further configured to: sort the templates in each message type according to the depth of the tree, and when a log template is a subset of another log template, priority The template with the larger depth of the tree is used for matching.

下面参考图7,其示出了适于用来实现本申请实施例的服务器的计算机系统700的结构示意图。Referring now to FIG. 7 , it shows a schematic structural diagram of a computer system 700 suitable for implementing the server of the embodiment of the present application.

如图7所示,计算机系统700包括中央处理单元603(CPU)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7, a computer system 700 includes a central processing unit 603 (CPU) 701, which can be programmed according to a program stored in a read-only memory (ROM) 702 or loaded from a storage section 708 into a random access memory (RAM) 703. Various appropriate actions and processing are performed by the program. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701 , ROM 702 , and RAM 703 are connected to each other via a bus 704 . An input/output (I/O) interface 705 is also connected to the bus 704 .

以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 708 including a hard disk, etc. and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.

特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元603(CPU)701执行时,执行本申请的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product including a computer program tangibly embodied on a machine-readable medium, the computer program including program code for performing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 709 and/or installed from removable media 711 . When the computer program is executed by the central processing unit 603 (CPU) 701, the above-mentioned functions defined in the method of the present application are performed.

附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logic devices for implementing the specified Executable instructions for a function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、解析单元、处理单元和创建单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取一个型号的交换机的原始日志的单元”。The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. The described units may also be set in a processor, for example, it may be described as: a processor includes an acquisition unit, an analysis unit, a processing unit, and a creation unit. Wherein, the names of these units do not limit the unit itself under certain circumstances, for example, the obtaining unit may also be described as "a unit for obtaining the original log of a switch of a model".

作为另一方面,本申请还提供了一种非易失性计算机存储介质,该非易失性计算机存储介质可以是上述实施例中所述装置中所包含的非易失性计算机存储介质;也可以是单独存在,未装配入终端中的非易失性计算机存储介质。上述非易失性计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:获取一个型号的交换机的原始日志;获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium contained in the device described in the above embodiments; It may be a non-volatile computer storage medium that exists independently and is not assembled into the terminal. The above-mentioned non-volatile computer storage medium stores one or more programs, and when the one or more programs are executed by a device, the device: obtains the original log of a switch of a model; obtains the original log in the original log message type and detailed message, if the message type cannot be obtained, create a clustering label as the message type; perform word segmentation on the detailed message to obtain keywords; reorder the keywords according to the word frequency of the keywords, Arrange the keywords with high word frequency in front; create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, and the reordered keywords are used as the nodes of the tree, and each message type corresponds to A multi-fork tree; depth-first traversal of the multi-fork tree, and creating a log template corresponding to each message type of the model switch according to paths in the multi-fork tree.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principle. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but should also cover the technical solution formed by the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of or equivalent features thereof. For example, a technical solution formed by replacing the above-mentioned features with technical features with similar functions disclosed in (but not limited to) this application.

Claims (12)

Translated fromChinese
1.一种创建交换机日志模板的方法,其特征在于,所述方法包括:1. A method for creating a switch log template, characterized in that the method comprises:获取一个型号的交换机的原始日志;Get the original log of a switch model;获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;Obtain the message type and detailed message in the original log, if the message type cannot be obtained, create a cluster label as the message type;对所述详细消息进行分词,得到关键词;Perform word segmentation on the detailed message to obtain keywords;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;reordering the keywords according to the word frequency of the keywords, and ranking the keywords with high word frequency in front;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;Create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。Depth-first traverses the multi-fork tree, and creates a log template corresponding to each message type of the model switch according to paths in the multi-fork tree.2.根据权利要求1所述的创建交换机日志模板的方法,其特征在于,所述方法还包括:2. the method for creating switch log template according to claim 1, is characterized in that, described method also comprises:获取一个型号的交换机的新增日志;Obtain the new log of a switch model;获取所述新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;Obtain the message type and detailed message in the newly added log, if the message type cannot be obtained, create a cluster label as the message type;使用所述日志模板过滤所述新增日志;Filtering the newly added log by using the log template;将所述日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词;Segmenting detailed messages in the newly added logs that cannot be matched by the log template to obtain keywords;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;reordering the keywords according to the word frequency of the keywords, and ranking the keywords with high word frequency in front;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;Create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的新增日志模板。Depth-first traverses the multi-fork tree, and creates a new log template corresponding to each message type of the model switch according to the path in the multi-fork tree.3.根据权利要求1或2所述的创建交换机日志模板的方法,其特征在于,如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。3. the method for creating switch log template according to claim 1 or 2, is characterized in that, if the child node of a node in the described multi-fork tree exceeds node threshold number, then deletes all child nodes of described node, so above node as the last child node.4.根据权利要求1或2所述的创建交换机日志模板的方法,其特征在于,所述创建聚类标签作为消息类型,包括:4. The method for creating a switch log template according to claim 1 or 2, wherein said creating a clustering label as a message type includes:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号;Each log with an unknown message type is divided into five categories according to semantics and assigned weight values respectively. The five categories include: only numbers or numbers and symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols ;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量;Extracting the frequency of five types of semantics in the log, converting the log into a fixed five-length word frequency vector;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。Calculate the similarity between the word frequency vector and the known message type set to obtain a set of similarity results, if the maximum similarity is greater than or equal to the preset similarity threshold, then classify the log into the corresponding message type; if the maximum If the similarity is less than the preset similarity threshold, the word frequency vector is used as a new message type.5.根据权利要求1或2所述的创建交换机日志模板的方法,其特征在于,当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束。5. the method for creating switch log template according to claim 1 or 2, is characterized in that, when a log template is a subset of another log template, whether a node is identified by labeling the node is a end of path.6.根据权利要求5所述的创建交换机日志模板的方法,其特征在于,将每个消息类型中的日志模板按照树的深度排序,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的日志模板进行匹配。6. The method for creating switch log templates according to claim 5, wherein the log templates in each message type are sorted according to the depth of the tree, and when a log template is a subset of another log template, priority A log template with a larger tree depth is used for matching.7.一种创建交换机日志模板的装置,其特征在于,所述装置包括:7. A device for creating a switch log template, characterized in that the device comprises:获取单元,配置用于获取一个型号的交换机的原始日志;An acquisition unit configured to acquire an original log of a switch of a model;解析单元,配置用于获取所述原始日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;The parsing unit is configured to obtain the message type and detailed message in the original log, and if the message type cannot be obtained, create a clustering label as the message type;处理单元,配置用于对所述详细消息进行分词,得到关键词;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;The processing unit is configured to perform word segmentation on the detailed message to obtain keywords; reorder the keywords according to the word frequency of the keywords, and rank the keywords with high word frequency at the front;创建单元,配置用于根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的日志模板。The creation unit is configured to create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree ; Depth-first traversing the multi-fork tree, and creating a log template corresponding to each message type of the model switch according to the path in the multi-fork tree.8.根据权利要求7所述的创建交换机日志模板的装置,其特征在于,所述装置还配置用于:8. The device for creating a switch log template according to claim 7, wherein the device is further configured to:获取一个型号的交换机的新增日志;Obtain the new log of a switch model;获取所述新增日志中的消息类型和详细消息,如果获取不到消息类型,则创建聚类标签作为消息类型;Obtain the message type and detailed message in the newly added log, if the message type cannot be obtained, create a cluster label as the message type;使用所述日志模板过滤所述新增日志;Filtering the newly added log by using the log template;将所述日志模板无法匹配的新增日志中的详细消息进行分词,得到关键词;Segmenting detailed messages in the newly added logs that cannot be matched by the log template to obtain keywords;根据所述关键词的词频对所述关键词重新排序,将词频高的关键词排在前面;reordering the keywords according to the word frequency of the keywords, and ranking the keywords with high word frequency in front;根据所述重新排序的关键词创建多叉树,每个消息类型作为树的根节点,所述重新排序的关键词作为树的节点,每个消息类型对应一个多叉树;Create a multi-fork tree according to the reordered keywords, each message type is used as the root node of the tree, the reordered keywords are used as the nodes of the tree, and each message type corresponds to a multi-fork tree;深度优先遍历所述多叉树,根据所述多叉树中的路径创建所述型号交换机的每个消息类型所对应的新增日志模板。Depth-first traverses the multi-fork tree, and creates a new log template corresponding to each message type of the model switch according to the path in the multi-fork tree.9.根据权利要求7或8所述的创建交换机日志模板的装置,其特征在于,所述创建单元还配置用于:9. The device for creating a switch log template according to claim 7 or 8, wherein the creation unit is further configured to:如果所述多叉树中一个节点的子节点超过节点阈值数目,则删除所述节点的所有子节点,所述节点作为最后一个子节点。If the child nodes of a node in the multi-fork tree exceed the node threshold number, delete all child nodes of the node, and the node is the last child node.10.根据权利要求7或8所述的创建交换机日志模板的装置,其特征在于,所述创建聚类标签作为消息类型,包括:10. The device for creating a switch log template according to claim 7 or 8, wherein said creating a clustering label as a message type includes:将每一条消息类型未知的日志按照语义分成了五种类别并分别赋予权重值,所述五种类别包括:只有数字或数字与符号,数字、字母和符号,符号和字母,只有字母,只有符号;Each log with an unknown message type is divided into five categories according to semantics and assigned weight values respectively. The five categories include: only numbers or numbers and symbols, numbers, letters and symbols, symbols and letters, only letters, only symbols ;提取所述日志中五类语义的频度,将所述日志转化为一个固定五个长度的词频向量;Extracting the frequency of five types of semantics in the log, converting the log into a fixed five-length word frequency vector;计算所述词频向量与已知的消息类型集合的相似度,得到一组相似度结果,如果最大相似度大于等于预设的相似度阈值,则将所述日志归到对应的消息类型;如果最大相似度小于预设的相似度阈值,则将所述词频向量作为一个新的消息类型。Calculate the similarity between the word frequency vector and the known message type set to obtain a set of similarity results, if the maximum similarity is greater than or equal to the preset similarity threshold, then classify the log into the corresponding message type; if the maximum If the similarity is less than the preset similarity threshold, the word frequency vector is used as a new message type.11.根据权利要求7或8所述的创建交换机日志模板的装置,其特征在于,所述创建单元还配置用于:11. The device for creating a switch log template according to claim 7 or 8, wherein the creating unit is further configured to:当一个日志模板是另外一个日志模板的子集时,通过对节点打标签的形式来标识一个节点是否是一条路径的结束。When a log template is a subset of another log template, mark the node to identify whether a node is the end of a path.12.根据权利要求11所述的创建交换机日志模板的装置,其特征在于,所述创建单元还配置用于:12. The device for creating a switch log template according to claim 11, wherein the creation unit is further configured to:将每个消息类型中的模板按照树的深度排序,当一个日志模板是另外一个日志模板的子集时,优先采用树的深度较大的模板进行匹配。The templates in each message type are sorted according to the depth of the tree. When a log template is a subset of another log template, the template with a larger tree depth is preferred for matching.
CN201610355129.3A2016-05-252016-05-25 Method and apparatus for creating switch log templateActiveCN106055452B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201610355129.3ACN106055452B (en)2016-05-252016-05-25 Method and apparatus for creating switch log template

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201610355129.3ACN106055452B (en)2016-05-252016-05-25 Method and apparatus for creating switch log template

Publications (2)

Publication NumberPublication Date
CN106055452Atrue CN106055452A (en)2016-10-26
CN106055452B CN106055452B (en)2019-06-14

Family

ID=57175843

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201610355129.3AActiveCN106055452B (en)2016-05-252016-05-25 Method and apparatus for creating switch log template

Country Status (1)

CountryLink
CN (1)CN106055452B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108696899A (en)*2017-04-072018-10-23北京京东尚科信息技术有限公司Sip message transmits and method of reseptance and transmission and reception device
CN110096411A (en)*2019-03-222019-08-06西安电子科技大学Log template rapid extracting method and system based on association analysis and time window
CN110134615A (en)*2019-04-102019-08-16百度在线网络技术(北京)有限公司The method and device of application program acquisition daily record data
CN111435343A (en)*2019-01-152020-07-21北京大学 Method and system for automatic generation and online update of computer system log template
CN112559474A (en)*2019-09-262021-03-26中国电信股份有限公司Log processing method and device
CN113821491A (en)*2021-02-222021-12-21京东科技控股股份有限公司 Method, apparatus, server and medium for generating network device log template
CN114116410A (en)*2022-01-282022-03-01北京安帝科技有限公司Log analysis method and system
CN114968933A (en)*2022-05-172022-08-30阿里巴巴(中国)有限公司Method and device for classifying logs of data center
CN115329748A (en)*2022-10-142022-11-11北京优特捷信息技术有限公司Log analysis method, device, equipment and storage medium
CN115774987A (en)*2021-09-072023-03-10中国联合网络通信集团有限公司 Log detection method, device, computer storage medium and equipment
CN116029289A (en)*2022-12-282023-04-28中国电信股份有限公司Log template acquisition method and device, computer equipment and storage medium
CN116029289B (en)*2022-12-282025-10-10中国电信股份有限公司 Log template acquisition method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050203934A1 (en)*2004-03-092005-09-15Microsoft CorporationCompression of logs of language data
US20070130099A1 (en)*2005-12-022007-06-07International Business Machines CorporationCompression of servo control logging entries
CN101320348A (en)*2008-06-252008-12-10中兴通讯股份有限公司Log function implementing method of embedded system
CN103379136A (en)*2012-04-172013-10-30中国移动通信集团公司Compression method and decompression method of log acquisition data, compression apparatus and decompression apparatus of log acquisition data
CN104408100A (en)*2014-11-192015-03-11北京融海恒信咨询有限公司Compression method for structured web log

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050203934A1 (en)*2004-03-092005-09-15Microsoft CorporationCompression of logs of language data
US20070130099A1 (en)*2005-12-022007-06-07International Business Machines CorporationCompression of servo control logging entries
CN101320348A (en)*2008-06-252008-12-10中兴通讯股份有限公司Log function implementing method of embedded system
CN103379136A (en)*2012-04-172013-10-30中国移动通信集团公司Compression method and decompression method of log acquisition data, compression apparatus and decompression apparatus of log acquisition data
CN104408100A (en)*2014-11-192015-03-11北京融海恒信咨询有限公司Compression method for structured web log

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
唐球等: "基于差分压缩的大规模日志压缩系统", 《通信学报》*
张珠玉等: "基于网格的TCP网络日志二次聚类算法", 《暨南大学学报(自然科学版)》*

Cited By (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108696899B (en)*2017-04-072021-09-03北京京东尚科信息技术有限公司SIP message transmitting and receiving method and transmitting and receiving device
CN108696899A (en)*2017-04-072018-10-23北京京东尚科信息技术有限公司Sip message transmits and method of reseptance and transmission and reception device
CN111435343B (en)*2019-01-152023-02-24北京大学Automatic generation and online updating method and system for computer system log template
CN111435343A (en)*2019-01-152020-07-21北京大学 Method and system for automatic generation and online update of computer system log template
CN110096411A (en)*2019-03-222019-08-06西安电子科技大学Log template rapid extracting method and system based on association analysis and time window
CN110134615A (en)*2019-04-102019-08-16百度在线网络技术(北京)有限公司The method and device of application program acquisition daily record data
CN112559474A (en)*2019-09-262021-03-26中国电信股份有限公司Log processing method and device
CN113821491A (en)*2021-02-222021-12-21京东科技控股股份有限公司 Method, apparatus, server and medium for generating network device log template
CN115774987A (en)*2021-09-072023-03-10中国联合网络通信集团有限公司 Log detection method, device, computer storage medium and equipment
CN114116410A (en)*2022-01-282022-03-01北京安帝科技有限公司Log analysis method and system
CN114968933A (en)*2022-05-172022-08-30阿里巴巴(中国)有限公司Method and device for classifying logs of data center
CN114968933B (en)*2022-05-172025-03-14阿里巴巴(中国)有限公司 Data center log classification method and device
CN115329748A (en)*2022-10-142022-11-11北京优特捷信息技术有限公司Log analysis method, device, equipment and storage medium
CN115329748B (en)*2022-10-142023-01-10北京优特捷信息技术有限公司Log analysis method, device, equipment and storage medium
CN116029289A (en)*2022-12-282023-04-28中国电信股份有限公司Log template acquisition method and device, computer equipment and storage medium
CN116029289B (en)*2022-12-282025-10-10中国电信股份有限公司 Log template acquisition method, device, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN106055452B (en)2019-06-14

Similar Documents

PublicationPublication DateTitle
CN106055452A (en)Method and apparatus for creating switch log template
US12130842B2 (en)Segmenting machine data into events
US11146286B2 (en)Compression of JavaScript object notation data using structure information
CN106055608B (en)The method and apparatus of automatic collection and analysis interchanger log
CN104346480B (en)information mining method and device
CN110046297B (en)Operation and maintenance violation identification method and device and storage medium
CN112148881B (en) Methods and devices for outputting information
WO2022048668A1 (en)Knowledge graph construction method and apparatus, check method and storage medium
CN108737290A (en)Non-encrypted method for recognizing flux based on load mapping and random forest
CN111177360A (en)Self-adaptive filtering method and device based on user logs on cloud
CN110096411A (en)Log template rapid extracting method and system based on association analysis and time window
CN108241658A (en)A kind of logging mode finds method and system
CN112883704B (en)Big data similar text duplicate removal preprocessing method and device and terminal equipment
JP5798095B2 (en) Log generation rule creation device and method
CN116340536A (en)Operation and maintenance knowledge graph construction method, device, equipment, medium and program product
CN114860667B (en)File classification method, device, electronic equipment and computer readable storage medium
JP6078485B2 (en) Operation history analysis apparatus, method, and program
CN113806647B (en)Method for identifying development framework and related equipment
CN118450020A (en) Method, device, equipment and medium for constructing protocol knowledge base based on machine learning
CN110084710B (en) Method and device for determining message subject
CN113904961B (en)User behavior identification method, system, equipment and storage medium
CN116192619A (en) Abnormal location method, device, electronic equipment, medium and program product
CN115168582A (en) Robot-based intelligent processing method and system for customer service problems
CN112347801A (en) A kind of electronic chip information data analysis method
CN116303875A (en) A log processing method, device and electronic equipment

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp