Rule data, not conditions, are obtained through calculation of an expert database model calculation formula, and calculation data are cleaned from original data through all rules in a patent database model;

s212, generating the rule into an action table, and triggering the action when the calculated rule data meets the condition set by a certain action; the rule data will be used as a rule, i.e. a cluster rule is obtained.

Table 2 shows examples of several trigger conditions in the application layer,

TABLE 2

In the triggering condition of the cluster Rule, the "$ { N }" symbol represents a Rule value of a certain granularity, such as STEP1 in fig. 3, where N ═ 0 indicates the Rule value of the current granularity, N ═ 1 indicates the Rule value of the previous granularity, and so on;

the action may be defined as:

firstly, triggering a problem: when a certain condition is met, triggering the associated problem, wherein the function is mainly used for triggering an analysis system to automatically analyze the problem;

generating a rule: when the condition is satisfied, the Rule data is used as a Rule, and the Rule data is the same as a Rule generated by a common log, except that the Rule data can be used as the Rule only after the Rule is explicitly generated.

The continuous/failure rate/decline and the like in the log description are keyword information filled in the cluster cleaning process;

(2) the method for cleaning the virtual layer log and the physical layer log comprises the following specific steps:

extracting cluster rules from key information of running log information of the virtual layer log and the physical layer log through a preprocessing model; the preprocessing model is a regular expression matching mode;

table 3 lists one example of a canonical expression matching approach:

TABLE 3

Object	Trigger condition	Movement of	Description of the invention	Cluster rule description
					Cloud platform	Log keywords: port, exception	Generating cluster rules	Port anomaly	C1000

S3, clustering analysis generates a first-level association rule:

a core STEP of the analysis system, as shown in STEP2 in fig. 3, performing cluster analysis on the application layer log, the virtual layer log, and the physical layer log according to the keyword information defined in the cluster rule, to form a first-level association rule (e.g., failure rate/address/message, etc.) of the cluster label, i.e., a cluster rule;

table 4 lists several examples of cluster rules;

TABLE 4

Cluster rules	Cluster rule description	Associated cluster	Number of occurrences
				cRule1	Failure rate address assignment failure port	C1000,C0010,C0020	M times
cRule2	Failure rate message failure port	C1000,C0010,C0030	N times

Description of the Cluster rules:

the method comprises the following steps of: the rule is uniquely identified, and the rule is globally unique;

cluster rule description: the function of this cluster is described, belonging to the auxiliary information;

③ associated clusters: the subordinate rules associated with the cluster may be associated with a plurality of rules, the rules are separated by commas, and the table describes the superior-inferior relationship between the rules (without explicit superior-inferior relationship between the rules, it is necessary to ensure that there cannot be a mutually-referenced scenario, for example, a rule a is associated with a rule B, and a rule B is associated with a rule a, which is endless); in order to avoid mutual reference in the association process of the cluster rules, the display layer number of the cluster rules in the analysis system needs to be limited, the display layer number of the cluster rules in the analysis system can be limited to be less than or equal to N, N is a positive integer, namely N is more than or equal to 1; such as: n-5, the analysis system limits the cluster rule to display up to 5 layers, so that even recursion does not have catastrophic consequences.

Generally, the rules that we want to cluster are tree-shaped, as shown in fig. 4, one rule must be attributed to one or more levels, and the hierarchical clustering algorithm is performed according to the description key attributes of the rule by specifying the cluster type and number of the clusters to be clustered.

Besides, clusters are generated by nesting of clusters, and some analysis constraints can be set for effective limitation in order to avoid mutual reference.

The raw data can be converted into regular data that can be recognized by the analysis system by this clustering step.

S4: frequent item set analysis generation secondary association rule

Performing deep rule mining on an object data set in a cluster rule formed by clustering in S3 by adopting an FP-tree frequent item set algorithm to generate a secondary association rule (such as calculation/CPU/memory/network/storage), namely a frequent item set rule;

this step mines the transaction data set of the cluster rule formed in the analysis step 3 (see fig. 4), i.e., the set of associated transactions in the cluster rule, by the FP-tree frequent item set algorithm in the analysis system. The specific realization is as follows: by scanning the transaction data set twice, the frequent items contained in each transaction are compressed and stored in the FP-Tree according to the descending order of the support degree of the frequent items. In the process of finding the frequent pattern later, the transaction data set does not need to be scanned, and only the FP-Tree is searched. Frequent patterns can be directly generated by recursively calling the FP-Growth algorithm, so that candidate patterns do not need to be generated in the whole discovery process. Because the data set is scanned only twice, the FP-Growth algorithm overcomes the problems in the Apriori algorithm, and the execution efficiency is obviously better than that of the Apriori algorithm. Meanwhile, the step is established on the basis of clustering analysis, so that the mining accuracy and effectiveness of the association rule are obviously improved, the mining of invalid association data is avoided, and the mining efficiency is further improved.

The specific method for generating the secondary association rule by analyzing the frequent item set comprises the following steps:

s12, establishing FP-tree: scanning an item header table and the sequenced log data set, and inserting all the scanned item header tables and the log data set into a cluster rule node to build an FP-Tree; in practical operation, the FP-Tree is established on the FP-Tree established last time each time, so when a new node appears when scanning the item head table and the sorted log data set, the node corresponding to the item head table can be linked with the new node until all data are inserted, and the establishment of the FP-Tree is completed;

s13, digging FP-tree: based on the FP-Tree, the item head table and the node chain table, sequentially excavating upwards from the bottom item of the item head table, finding out the node of the item head table corresponding to the FP-Tree, namely finding out a condition mode base, and based on the condition mode base, recursively excavating to obtain a frequent item set rule;

s5, rule storage: and comparing the association rule result of frequent item set analysis, namely the frequent item set rule with the association model rule in the expert database, finally identifying the valuable fault association rule, and persistently storing.

Before the frequent item set rules generated by secondary association are collected into an expert database, the accuracy of the frequent item set rules generated by secondary association needs to be verified, and the specific method comprises the following steps: comparing the frequent item set generated by secondary association with association rules provided by an expert database according to similarity, listing differences according to comparison results, and manually confirming the rules; and storing the successfully matched or confirmed rules in a database, and providing the next step of use as a rule base associated with the log data.

S6 quick matching of log data

According to the fault association Rule formed in S5, fast associating application layer log data with data of a virtual layer log by using a drools Rule engine and a rete algorithm and an actual resource topological relation, and obtaining a Root Rule, for example, in fig. 5, a Root Rule1 is associated with rules Rule16 and cRule1, so that a Root cause of an abnormal fault is detected from the log.

As shown in fig. 5, regarding the analysis result, the reason that the failure occurs is most concerned, the rapid matching is performed through a RETE algorithm, which is a forward rule rapid matching algorithm, the RETE algorithm performs pattern matching by forming a RETE network, and the time redundancy and the structural similarity of the rule-based system are utilized, so that the system pattern matching efficiency is improved, and the finally effective problem is the root cause.

In a production system, the processed log data is called log work memory, and the association rule for determining is divided into two parts, LHS (left-hand-side) and rhs (right-hand-side), which respectively represent the premise and the conclusion.

The main flow of the RETE algorithm is divided into the following steps:

matching: finding out a log learning memory set which accords with an LHS part;

eliminating conflict: selecting a rule for which a condition is satisfied;

executing: content to perform RHS;

and fourthly, returning to the first step and repeatedly executing in a circulating way.

S7, fault warning: and before the fault occurs, analyzing and early warning the equipment fault in real time according to the requirement of real-time detection.

S8, fault tracing: after the fault occurs, historical log data is reversely mined for the fault (from equipment monitoring) which has occurred, and the source tracing of the abnormal fault problem is formed.

In order to implement the method, as shown in fig. 6, the present invention further shows an embodiment of an apparatus for analyzing NFV virtualization device operation data, which includes:

and thelog acquisition module 1 is used for acquiring logs generated by the virtual network equipment.

The data analysis module 2: the log data processing system is used for cleaning the acquired log data, standardizing the log data, carrying out cluster analysis to generate a primary association rule, carrying out frequent item set analysis to generate a secondary association rule and identifying a valuable fault association rule;

the fast matching module 3: the method is used for quickly associating the application layer log data with the virtual layer log data by utilizing a drools rule engine, a rete algorithm and an actual resource topological relation according to a fault association rule to obtain a root rule.

Further, thedata analysis module 2 includes a data washing sub-module 21: is used for cleaning the acquired log data, standardizing the log data,

the model training submodule 22 is used for carrying out cluster analysis on the log data to generate a first-level association rule,

performing frequent item set analysis to generate a secondary association rule;

and the verification submodule 23 is used for verifying the accuracy of the frequent item set rule generated by secondary association and identifying a valuable fault association rule.

Furthermore, the analysis device further comprises an early warning module 4 for performing real-time analysis and early warning on the equipment fault according to the requirement of real-time detection.

Further, the analysis device further comprises a fault tracking device 5: the method is used for reversely mining historical log data for faults which occur, and forming a source tracing for abnormal fault problems.

The embodiments described above are only preferred embodiments of the invention and are not exhaustive of the possible implementations of the invention. Any obvious modifications to the above would be obvious to those of ordinary skill in the art, but would not bring the invention so modified beyond the spirit and scope of the present invention.

Claims

1. An NFV virtualization device operation data analysis method is characterized by comprising the following steps:

s1, log collection: a log generated by the virtual network device is obtained,

S6, quick matching of log data: according to the fault association rule, rapidly associating the application layer log data with the virtual layer log data by using a drools rule engine and a rete algorithm and an actual resource topological relation to obtain a root cause rule, so that the root cause of the abnormal fault is detected through the log.

2. The method for analyzing the running data of the NFV virtualization device according to claim 1,

before the fault happens, the method further comprises fault warning: and according to the requirement of real-time detection, carrying out real-time analysis and early warning on equipment faults.

3. The NFV virtualization device operation data analysis method according to claim 1 or 2,

after the fault occurs, the method further comprises the following steps: and for the faults which have already occurred, historical log data is reversely mined, and the source tracing of the abnormal fault problem is formed.

4. The method for analyzing the operational data of the NFV virtualization device as claimed in claim 1, wherein the logs in S1 include an application layer log, a virtual layer log and a physical layer log.

5. The method for analyzing the operational data of the NFV virtualization device according to claim 1, wherein in S3, a number of display layers of the cluster rule is further limited in an analysis system.

6. The method for analyzing the running data of the NFV virtualization device according to claim 1, wherein in S4, the specific method for generating the secondary association rule through frequent item set analysis includes:

7. An apparatus for analyzing operation data of an NFV virtualization device, comprising:

the log acquisition module is used for acquiring logs generated by the virtual network equipment, and the data analysis module is used for: the log data processing system is used for cleaning the acquired log data, standardizing the log data, carrying out cluster analysis to generate a primary association rule, carrying out frequent item set analysis to generate a secondary association rule and identifying a valuable fault association rule;

8. The apparatus according to claim 7, wherein the data analysis module includes a data cleansing sub-module: for cleaning the acquired log data, standardizing the log data,

and the verification submodule is used for verifying the accuracy of the frequent item set rule generated by secondary association and identifying a valuable fault association rule.

9. The device for analyzing the operational data of the NFV virtualization apparatus according to claim 7, further comprising an early warning module, configured to perform real-time analysis and early warning on an apparatus fault according to a requirement of real-time detection.

10. The apparatus for analyzing operational data of an NFV virtualization device according to claim 7, further comprising a fault tracking: the method is used for reversely mining historical log data for faults which occur, and forming a source tracing for abnormal fault problems.