Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a video code rate identification method of an encrypted video, a video playing index estimation method and a video playing index estimation device.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the first technical scheme is as follows:
an NFV virtualization device operation data analysis method comprises the following steps:
s1, log collection: a log generated by the virtual network device is obtained,
s2, log cleaning: the application layer log is calculated by adopting an expert library model and a clustering rule, and the virtual layer log and the physical layer log are defined by adopting a preprocessing model to standardize log data;
s3, clustering analysis generates a first-level association rule: performing clustering analysis on the application layer log, the virtual layer log and the physical layer log according to keyword information defined in the cluster rule to form a first-level association rule of a clustering label;
s4, analyzing the frequent item set to generate a secondary association rule: performing deep rule mining on the object data set in the cluster rule formed by clustering in S3 by adopting an FP-tree frequent item set algorithm, thereby generating a secondary association rule, namely a frequent item set rule;
s5, rule storage: comparing the frequent item set rule with the association model rule in the expert database, identifying the valuable fault association rule, and storing the valuable fault association rule in the expert database in a persistent manner
S6, quick matching of log data: according to the fault association rule, fast association is carried out on the application layer log data and the virtual layer log data by using a drools rule engine and a rete algorithm and an actual resource topological relation, a root cause rule is obtained, and therefore the root cause of the abnormal fault is detected through the log.
Further, before the fault occurs, the method further comprises fault warning: and according to the requirement of real-time detection, carrying out real-time analysis and early warning on equipment faults.
Further, after the fault occurs, the method further comprises fault tracking: and for the faults which have already occurred, historical log data is reversely mined, and the source tracing of the abnormal fault problem is formed.
Further, the logs described in S1 include an application layer log, a virtual layer log, and a physical layer log.
Further, in S3, the number of display layers of the cluster rule is also limited in the analysis system.
Further, in S4, the specific method for generating the secondary association rule through frequent item set analysis includes:
s11, establishing an item head table: by scanning the transaction data set of the cluster rule formed in S3, finding an item head table with the support degree > set threshold and sequencing to obtain a sequenced log data set;
s12, establishing FP-tree: scanning an item header table and the sequenced log data set, and inserting all the scanned item header tables and the log data set into a cluster rule node to build an FP-Tree;
s13, digging FP-tree: and based on the FP-Tree, the item head table and the node linked list, sequentially excavating upwards from the bottom item of the item head table, finding out the node of the item head table corresponding to the FP-Tree, namely finding out a condition mode base, and based on the condition mode base, recursively excavating to obtain a frequent item set rule.
The second technical scheme is as follows:
an NFV virtualization device operation data analysis device comprises:
the log acquisition module is used for acquiring logs generated by the virtual network equipment;
a data analysis module: the log data processing system is used for cleaning the acquired log data, standardizing the log data, performing cluster analysis to generate a primary association rule, performing frequent item set analysis to generate a secondary association rule, and identifying a valuable fault association rule;
a fast matching module: the method is used for quickly associating the application layer log data with the virtual layer log data by utilizing a drools rule engine and a rete algorithm and an actual resource topological relation according to a fault association rule to obtain a root cause rule.
Further, the data analysis module comprises a data cleaning sub-module: for cleaning the acquired log data, standardizing the log data,
the model training submodule is used for carrying out clustering analysis on the log data to generate a first-level association rule, carrying out frequent item set analysis and generating a second-level association rule;
and the verification submodule is used for verifying the accuracy of the frequent item set rule generated by secondary association and identifying the fault association rule with a value.
And the system further comprises an early warning module for carrying out real-time analysis and early warning on the equipment fault according to the requirement of real-time detection.
Further, the method also comprises the following steps: the method is used for reversely mining historical log data for faults which occur, and forming a source tracing for abnormal fault problems.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention collects the operation data in real time for the equipment of the NFV network element virtualization and cloud platform, firstly calculates and converts the collected operation data into the log information to be analyzed, thereby reducing the impact caused by recording a large amount of logs and reporting the large amount of logs by the equipment.
2. In the invention, in the process of preprocessing the log data through cluster analysis, original data are gathered by a cluster analysis method for multidimensional data with a complex structure, so that the data with the complex structure is standardized.
3. In the invention, the dependency relationship among the data items is found, and the clustering analysis method is continuously adopted, so that the data items with close dependency relationship are removed or combined. Reliable data are provided for the next step of mining the log association rule; a frequent item set method is adopted in the secondary deep mining of the log association rule, so that the association rule among log data can be found, and the formation of a decision tree is further assisted; and comparing and analyzing the formed decision tree with an expert database to finally form an efficient and accurate association rule relation.
4. According to the invention, through the matching rule engine, the log data is quickly and accurately subjected to association matching according to the association rule relation, so that the purpose of tracing the root cause of the fault is achieved
5. By mining analysis and association processing of log data rules, the invention solves the common difficulty of fault location in a virtual scene, and brings the following benefits:
(1) the actual running condition of the current network equipment can be known in real time by analyzing the log data generated in real time in the equipment, so that the hidden trouble can be found in time, the suggestion of closed-loop measures can be taken in time, and the purposes of early warning and avoiding major problems are achieved.
(2) When the system fails, fault delimitation positioning can be rapidly carried out according to the incidence relation by analyzing log data generated when a problem occurs; meanwhile, by backtracking the root of the network equipment problem and determining the range and possible reasons of the fault, the analysis of fault personnel is facilitated and the fault condition of the network equipment is further deeply excavated and analyzed.
(3) By means of an intelligent analysis processing algorithm and a machine learning method, occupation and consumption of system processing resources of invalid log data and scenes of missing processing of important log data are effectively avoided.
(4) And through accurate problem positioning, the operation and maintenance are prevented from causing troubles, and the actual processing efficiency of faults is prevented from being influenced. The root cause problem is quickly identified, the invalid processing of frequent dispatching of multiple systems is avoided, and the maintenance and operation cost of operators is effectively reduced.
(5) And equipment between the inner cross-layer of the NFV and the NFV generates faults, and if the faults occur in a manual maintenance mode, multiple departments need to be coordinated, and the cooperation analysis and the delimiting positioning of multiple places are needed. The method and the device of the invention greatly reduce the communication cost and effectively improve the efficiency of processing problems among all the departments.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases by those skilled in the art.
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1 to 5, an embodiment of an operation data analysis method for an NFV virtualization device in the present invention includes:
s1, log collection: acquiring logs generated by virtual network equipment, wherein the logs comprise application layer (application server) logs, virtual layer (virtual machine/host) logs, physical layer server/switch/router/firewall) logs (and the like, and support multiple log formats;
the network element equipment generates log data in real time and reports the log data, the acquisition device generates an internal statistical file after receiving the data and stores the internal statistical file in the directory, and the log file is pulled by the cleaning component to clean the data subsequently. The specific storage method is as follows: and storing the data generated by the service in real time according to the service instances, wherein each instance generates a file, and the actual operation data of the service instances are described in the files. In practical application, in order to save storage space, the system compresses data of a plurality of service instances into the same compressed packet according to the acquisition granularity, and analyzes the file through format matching defined by the analysis model file so as to analyze and acquire correct data.
Through the analysis of the step, the information such as the service instance, the module instance and the like of each file in the compressed packet can be obtained.
S2, cleaning logs: the application layer log is calculated by adopting an expert library model and a clustering rule, and the virtual layer log and the physical layer log are defined by adopting a preprocessing model to standardize log data; (ii) a
The logs are different, the cleaning method is also different,
(1) the method for cleaning the application layer logs comprises the following specific steps:
s211, firstly, processing logs by adopting an expert library model calculation formula to obtain a service model, wherein the service model comprises keywords and rule data;
the expert database model is a standard model which is established in advance and comprises a rule name, rule description, rule data, a summary example and parameter characteristics of a file label;
the results of the preprocessing calculations, which are a type of data, are calculated according to the expert database model (see STEP0 in fig. 3), and several examples of the expert database model calculation formula are given in table 1:
TABLE 1
Rule data, not conditions, are obtained through calculation of an expert database model calculation formula, and calculation data are cleaned from original data through all rules in a patent database model;
s212, generating the rule into an action table, and triggering the action when the calculated rule data meets the condition set by a certain action; the rule data will be used as a rule, i.e. a cluster rule is obtained.
Table 2 shows examples of several trigger conditions in the application layer,
TABLE 2
In the triggering condition of the cluster Rule, the "$ { N }" symbol represents a Rule value of a certain granularity, such as STEP1 in fig. 3, where N ═ 0 indicates the Rule value of the current granularity, N ═ 1 indicates the Rule value of the previous granularity, and so on;
the action may be defined as:
firstly, triggering a problem: when a certain condition is met, triggering the associated problem, wherein the function is mainly used for triggering an analysis system to automatically analyze the problem;
generating a rule: when the condition is satisfied, the Rule data is used as a Rule, and the Rule data is the same as a Rule generated by a common log, except that the Rule data can be used as the Rule only after the Rule is explicitly generated.
The continuous/failure rate/decline and the like in the log description are keyword information filled in the cluster cleaning process;
(2) the method for cleaning the virtual layer log and the physical layer log comprises the following specific steps:
extracting cluster rules from key information of running log information of the virtual layer log and the physical layer log through a preprocessing model; the preprocessing model is a regular expression matching mode;
table 3 lists one example of a canonical expression matching approach:
TABLE 3
| Object | Trigger condition | Movement of | Description of the invention | Cluster rule description |
| Cloud platform | Log keywords: port, exception | Generating cluster rules | Port anomaly | C1000 |
S3, clustering analysis generates a first-level association rule:
a core STEP of the analysis system, as shown in STEP2 in fig. 3, performing cluster analysis on the application layer log, the virtual layer log, and the physical layer log according to the keyword information defined in the cluster rule, to form a first-level association rule (e.g., failure rate/address/message, etc.) of the cluster label, i.e., a cluster rule;
table 4 lists several examples of cluster rules;
TABLE 4
| Cluster rules | Cluster rule description | Associated cluster | Number of occurrences |
| cRule1 | Failure rate address assignment failure port | C1000,C0010,C0020 | M times |
| cRule2 | Failure rate message failure port | C1000,C0010,C0030 | N times |
Description of the Cluster rules:
the method comprises the following steps of: the rule is uniquely identified, and the rule is globally unique;
cluster rule description: the function of this cluster is described, belonging to the auxiliary information;
③ associated clusters: the subordinate rules associated with the cluster may be associated with a plurality of rules, the rules are separated by commas, and the table describes the superior-inferior relationship between the rules (without explicit superior-inferior relationship between the rules, it is necessary to ensure that there cannot be a mutually-referenced scenario, for example, a rule a is associated with a rule B, and a rule B is associated with a rule a, which is endless); in order to avoid mutual reference in the association process of the cluster rules, the display layer number of the cluster rules in the analysis system needs to be limited, the display layer number of the cluster rules in the analysis system can be limited to be less than or equal to N, N is a positive integer, namely N is more than or equal to 1; such as: n-5, the analysis system limits the cluster rule to display up to 5 layers, so that even recursion does not have catastrophic consequences.
Generally, the rules that we want to cluster are tree-shaped, as shown in fig. 4, one rule must be attributed to one or more levels, and the hierarchical clustering algorithm is performed according to the description key attributes of the rule by specifying the cluster type and number of the clusters to be clustered.
Besides, clusters are generated by nesting of clusters, and some analysis constraints can be set for effective limitation in order to avoid mutual reference.
The raw data can be converted into regular data that can be recognized by the analysis system by this clustering step.
S4: frequent item set analysis generation secondary association rule
Performing deep rule mining on an object data set in a cluster rule formed by clustering in S3 by adopting an FP-tree frequent item set algorithm to generate a secondary association rule (such as calculation/CPU/memory/network/storage), namely a frequent item set rule;
this step mines the transaction data set of the cluster rule formed in the analysis step 3 (see fig. 4), i.e., the set of associated transactions in the cluster rule, by the FP-tree frequent item set algorithm in the analysis system. The specific realization is as follows: by scanning the transaction data set twice, the frequent items contained in each transaction are compressed and stored in the FP-Tree according to the descending order of the support degree of the frequent items. In the process of finding the frequent pattern later, the transaction data set does not need to be scanned, and only the FP-Tree is searched. Frequent patterns can be directly generated by recursively calling the FP-Growth algorithm, so that candidate patterns do not need to be generated in the whole discovery process. Because the data set is scanned only twice, the FP-Growth algorithm overcomes the problems in the Apriori algorithm, and the execution efficiency is obviously better than that of the Apriori algorithm. Meanwhile, the step is established on the basis of clustering analysis, so that the mining accuracy and effectiveness of the association rule are obviously improved, the mining of invalid association data is avoided, and the mining efficiency is further improved.
The specific method for generating the secondary association rule by analyzing the frequent item set comprises the following steps:
s11, establishing an item head table: by scanning the transaction data set of the cluster rule formed in S3, finding an item head table with the support degree > set threshold and sequencing to obtain a sequenced log data set;
s12, establishing FP-tree: scanning an item header table and the sequenced log data set, and inserting all the scanned item header tables and the log data set into a cluster rule node to build an FP-Tree; in practical operation, the FP-Tree is established on the FP-Tree established last time each time, so when a new node appears when scanning the item head table and the sorted log data set, the node corresponding to the item head table can be linked with the new node until all data are inserted, and the establishment of the FP-Tree is completed;
s13, digging FP-tree: based on the FP-Tree, the item head table and the node chain table, sequentially excavating upwards from the bottom item of the item head table, finding out the node of the item head table corresponding to the FP-Tree, namely finding out a condition mode base, and based on the condition mode base, recursively excavating to obtain a frequent item set rule;
s5, rule storage: and comparing the association rule result of frequent item set analysis, namely the frequent item set rule with the association model rule in the expert database, finally identifying the valuable fault association rule, and persistently storing.
Before the frequent item set rules generated by secondary association are collected into an expert database, the accuracy of the frequent item set rules generated by secondary association needs to be verified, and the specific method comprises the following steps: comparing the frequent item set generated by secondary association with association rules provided by an expert database according to similarity, listing differences according to comparison results, and manually confirming the rules; and storing the successfully matched or confirmed rules in a database, and providing the next step of use as a rule base associated with the log data.
S6 quick matching of log data
According to the fault association Rule formed in S5, fast associating application layer log data with data of a virtual layer log by using a drools Rule engine and a rete algorithm and an actual resource topological relation, and obtaining a Root Rule, for example, in fig. 5, a Root Rule1 is associated with rules Rule16 and cRule1, so that a Root cause of an abnormal fault is detected from the log.
As shown in fig. 5, regarding the analysis result, the reason that the failure occurs is most concerned, the rapid matching is performed through a RETE algorithm, which is a forward rule rapid matching algorithm, the RETE algorithm performs pattern matching by forming a RETE network, and the time redundancy and the structural similarity of the rule-based system are utilized, so that the system pattern matching efficiency is improved, and the finally effective problem is the root cause.
In a production system, the processed log data is called log work memory, and the association rule for determining is divided into two parts, LHS (left-hand-side) and rhs (right-hand-side), which respectively represent the premise and the conclusion.
The main flow of the RETE algorithm is divided into the following steps:
matching: finding out a log learning memory set which accords with an LHS part;
eliminating conflict: selecting a rule for which a condition is satisfied;
executing: content to perform RHS;
and fourthly, returning to the first step and repeatedly executing in a circulating way.
S7, fault warning: and before the fault occurs, analyzing and early warning the equipment fault in real time according to the requirement of real-time detection.
S8, fault tracing: after the fault occurs, historical log data is reversely mined for the fault (from equipment monitoring) which has occurred, and the source tracing of the abnormal fault problem is formed.
In order to implement the method, as shown in fig. 6, the present invention further shows an embodiment of an apparatus for analyzing NFV virtualization device operation data, which includes:
and thelog acquisition module 1 is used for acquiring logs generated by the virtual network equipment.
The data analysis module 2: the log data processing system is used for cleaning the acquired log data, standardizing the log data, carrying out cluster analysis to generate a primary association rule, carrying out frequent item set analysis to generate a secondary association rule and identifying a valuable fault association rule;
the fast matching module 3: the method is used for quickly associating the application layer log data with the virtual layer log data by utilizing a drools rule engine, a rete algorithm and an actual resource topological relation according to a fault association rule to obtain a root rule.
Further, thedata analysis module 2 includes a data washing sub-module 21: is used for cleaning the acquired log data, standardizing the log data,
the model training submodule 22 is used for carrying out cluster analysis on the log data to generate a first-level association rule,
performing frequent item set analysis to generate a secondary association rule;
and the verification submodule 23 is used for verifying the accuracy of the frequent item set rule generated by secondary association and identifying a valuable fault association rule.
Furthermore, the analysis device further comprises an early warning module 4 for performing real-time analysis and early warning on the equipment fault according to the requirement of real-time detection.
Further, the analysis device further comprises a fault tracking device 5: the method is used for reversely mining historical log data for faults which occur, and forming a source tracing for abnormal fault problems.
The embodiments described above are only preferred embodiments of the invention and are not exhaustive of the possible implementations of the invention. Any obvious modifications to the above would be obvious to those of ordinary skill in the art, but would not bring the invention so modified beyond the spirit and scope of the present invention.