Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In the embodiment of the present application, the execution body of the flow includes a terminal device. The terminal equipment comprises, but is not limited to, equipment such as a server, a computer, a smart phone and a tablet personal computer and the like capable of executing the method disclosed by the application. Fig. 1 shows a flow chart of a method for processing network packet loss in a cloud available area according to a first embodiment of the present application, which is described in detail below:
s101, respectively deploying a plurality of detection nodes on the available area networks of each cloud, and acquiring multi-dimensional network packet loss index data of the available area networks of each cloud according to the detection nodes.
In this embodiment, in order to monitor multidimensional network packet loss index data of the available area network on each cloud, a network monitoring system based on detection nodes is designed and deployed, and the system collects and analyzes key performance indexes such as network packet loss, time delay, jitter and the like in real time by deploying a plurality of detection nodes on each available area on the cloud, so as to provide data support for network optimization and fault detection.
Specifically, according to the network architecture and the scale of the available area on the cloud, an appropriate server or virtual machine is selected as a detection node. These nodes should have stable network connections and sufficient computing resources. A probe node is deployed near a critical network node of an available area on each cloud, ensuring network conditions that can fully cover the available area. Network monitoring software, such as Ping, traceroute, iperf and other tools, and customized network monitoring scripts are installed on the detection nodes, and are used for sending and receiving network data packets and recording related index data.
The configuration detection node periodically transmits ICMP protocol packets, TCP/UDP data packets and the like to the target server and the intermediate routing nodes thereof, and multi-dimensional network performance index data including packet loss rate, data delay, data jitter, available bandwidth estimation, routing hop count and the like are collected. And (3) analyzing the collected network data in real time by utilizing a cloud network analysis technology, and identifying abnormal conditions such as network congestion, delay, packet loss and the like. Through visualization tools, analysis results are displayed in the forms of charts, trend charts and the like, and operation and maintenance staff are helped to intuitively know network conditions. And according to the analysis result, rapidly positioning network fault points, such as network equipment faults, line faults and the like. According to the network performance bottleneck, optimization suggestions are provided, such as adjusting the network topology, increasing the bandwidth, optimizing the application program and the like.
In the embodiment, key indexes such as network packet loss are monitored and analyzed in real time, so that network problems are found and solved in time, and the stability and reliability of the network are improved. According to the monitoring data, the network is subjected to fine management and optimization, such as network bandwidth adjustment, routing strategy optimization and the like, so that the overall performance of the network is improved.
In some embodiments, in step S101, the acquiring, according to the probe node, network performance data of the available area network on each cloud further includes:
acquiring topology structure information and path dynamic change information of an available area network on each cloud, and determining an initial detection path and detection frequency parameters according to the topology structure information and the path dynamic change information;
Carrying out modeling analysis on historical detection data of the available area network on each cloud by adopting an ARIMA time sequence analysis algorithm to obtain packet loss probability distribution under different paths and different frequencies;
And constructing an integer programming detection strategy optimization model by taking the minimized total packet loss monitoring blind area as an optimization target according to the initial detection path, the detection frequency parameter and the packet loss probability distribution under different paths and different frequencies, wherein the integer programming detection strategy optimization model is used for determining the optimal detection strategy of the available area network on each cloud.
In this embodiment, network topology information of an available area on the cloud is acquired, including network nodes, links, connection relationships between them, and the like, and a network topology graph model is constructed. And monitoring dynamic change information of each path in the available area network on the cloud, including the change conditions of performance indexes such as delay, packet loss rate, bandwidth and the like of the path, and recording path dynamic change data. According to the network topology structure information and the path dynamic change information, an initial network detection path is automatically planned and generated by adopting a graph theory algorithm and a machine learning algorithm, and a key path and nodes are covered. And grouping the paths by using a clustering algorithm through analyzing the statistical characteristics of the dynamic change data of the paths to obtain a stable path group and an unstable path group. And aiming at different path groups, the detection frequency parameters are adaptively set, the detection frequency of the stable path group is low, the detection frequency of the unstable path group is high, and the detection efficiency is dynamically optimized.
Network history detection data of available areas on the cloud are obtained, and data under different network paths and detection frequencies are classified and sorted. And modeling and analyzing the classified historical detection data by adopting an ARIMA time sequence analysis algorithm. And obtaining the packet loss probability distribution under different network paths and detection frequencies according to the result of ARIMA algorithm modeling analysis. If the packet loss probability of a certain network path or detection frequency exceeds a preset threshold, judging that the path or frequency has network abnormality risk. And predicting the packet loss probability of a path or frequency with network abnormality risk in a future period by adopting an exponential smoothing method.
And acquiring initial detection paths, detection frequency parameters and packet loss probability distribution data under different paths and frequencies, and taking the initial detection paths, the detection frequency parameters and the packet loss probability distribution data under different paths and frequencies as input for constructing an integer programming detection strategy optimization model. And constructing a detection strategy optimization model by adopting an integer programming algorithm according to the acquired data, and setting an objective function as a minimum total packet loss monitoring blind area. And setting decision variables as detection paths and detection frequencies of the available area network on each cloud in the constructed integer programming detection strategy optimization model. And obtaining the optimal detection path and detection frequency combination of the available area network on each cloud by solving the integer programming detection strategy optimization model, namely the optimal detection strategy. If the solved optimal detection strategy can meet the goal of minimizing the overall packet loss monitoring blind area, the strategy is applied to the packet loss monitoring of the available area network on each cloud. And continuously acquiring packet loss probability distribution data of the available area network on each cloud after the optimal detection strategy is applied, and dynamically updating an integer programming detection strategy optimization model. And periodically re-solving the optimal detection strategy according to the updated integer programming detection strategy optimization model, and applying the optimal detection strategy to the network of the available area on the cloud so as to adapt to the change of the network condition and continuously minimize the overall packet loss monitoring blind area.
For example, in order to obtain topology structure information and path dynamic change information of the available area network on each cloud, a network topology discovery tool such as Nmap may be used to scan the target network, obtain device information, link information, route information and the like in the network, and simultaneously learn and predict a path change rule by continuously monitoring network traffic change, and use a machine learning algorithm such as a support vector machine SVM, so as to master network topology dynamics in real time. Based on the information, an initial detection path can be constructed through a minimum spanning tree algorithm, so that the whole network is covered and the path is shortest, and the initial detection frequency is set to be once every 5 minutes according to factors such as network bandwidth, equipment load and the like. After enough historical detection data is obtained, modeling the packet loss rate of each path by using an ARIMA time sequence analysis algorithm, and obtaining the packet loss probability distribution under different paths and frequencies through parameter estimation. And finally, constructing an integer programming model by taking comprehensive path coverage and lowest total packet loss rate as targets, and solving how to select an optimal detection path set and detection frequencies of all paths, so that the blind area of the whole network packet loss monitoring is minimized under the condition of limited monitoring resources. By the technical scheme, the detection strategy can be optimized in a self-adaptive mode according to the dynamic characteristics of the network on the cloud, the monitoring quality is ensured, meanwhile, the cost is reduced, and the reliability of the cloud network is improved.
S102, adopting a weighted fusion algorithm to respectively process multidimensional network packet loss index data of all detection nodes of the available area network on each cloud to obtain network packet loss comprehensive scores of the available area network on each cloud.
In this embodiment, in order to more accurately evaluate the network quality, especially the network packet loss condition, of the available area network on each cloud, a network packet loss comprehensive scoring system based on a weighted fusion algorithm is designed and implemented, and the system performs weighted fusion processing on multidimensional network packet loss index data of all detection nodes in the available area network on each cloud to finally obtain a comprehensive score so as to quantify the network packet loss condition.
Specifically, a weight distribution principle of a weighted fusion algorithm is determined first, and weights can be distributed according to factors such as the position, importance, historical performance and the like of the detection nodes. For example, a probing node located on a critical path of the network may have a higher weight. And for each available area network on the cloud, taking multidimensional network packet loss index data of all detection nodes in the cloud as input. And carrying out weighting processing on the data of each detection node according to the weight distribution principle. And fusing the weighted data, and calculating the network packet loss comprehensive score of the available area network on the cloud. The score may be expressed in a percentile or other form to facilitate understanding and comparison. And displaying the calculated network packet loss comprehensive scores to related personnel, such as operation and maintenance personnel, management personnel and the like in a visual form. Detailed scoring reports and data analysis are provided to help related personnel to understand network conditions and to formulate corresponding optimization measures.
In the embodiment, the multidimensional data of the plurality of detection nodes are comprehensively processed through the weighted fusion algorithm, so that the network packet loss condition can be more comprehensively reflected, and the evaluation accuracy is improved. The data of different detection nodes can be subjected to difference or error, the weighted fusion algorithm can flexibly process the data according to the weight distribution principle, the influence of individual abnormal data on the overall evaluation result is reduced, and the robustness of the system is enhanced.
In some embodiments, in the step S102, the processing, by using a weighted fusion algorithm, the multidimensional network packet loss index data of all the detection nodes of the available area network on each cloud to obtain a network packet loss comprehensive score of the available area network on each cloud specifically includes:
Determining a weight coefficient of network packet loss index data of each detection node by adopting an analytic hierarchy process;
Respectively carrying out weighted calculation on the network packet loss index data of each detection node according to a preset weighted fusion algorithm model and the weight coefficient to obtain the network packet loss comprehensive score of each detection node;
Performing cluster analysis on the network packet loss comprehensive scores of all the detection nodes by adopting a K-means clustering algorithm, and determining network packet loss grade data of each detection node;
Acquiring network packet loss grade data of all detection nodes of an available area network on each cloud, and acquiring the distribution condition of the network packet loss grade of the detection nodes of the available area network on each cloud by carrying out frequency statistics on the network packet loss grade data;
And carrying out weighted calculation on the distribution condition of the network packet loss grade of the detection node according to a preset network packet loss grade weight coefficient to obtain the network packet loss comprehensive score of the available area network on each cloud.
In this embodiment, network packet loss index data collected by a plurality of detection nodes, including indexes such as packet loss rate, delay, jitter, and the like, are obtained as input of a hierarchical analysis method. And constructing a hierarchical structure model of the network packet loss index by using a hierarchical analysis method, determining importance weights among the indexes, and calculating to obtain the weight coefficient of the network packet loss index data of each detection node. And according to a preset weighted fusion algorithm model, carrying out weighted calculation on the weight coefficient and the network packet loss index data of the corresponding detection node to obtain the network packet loss comprehensive score of each detection node.
And (3) adopting a K-means clustering algorithm, taking the network packet loss scores of all detection nodes as input, dividing the nodes into different clusters in an iterative optimization mode, wherein each cluster represents a network packet loss level. In the clustering process, nodes are divided into clusters closest to the clustering center by calculating the distance between the node scores and the clustering center, and the clustering center is continuously updated until the clustering result is converged. And determining the network packet loss level of each detection node according to the clustering result to obtain the network quality classification label of the node.
And acquiring all detection node information in the available area on each cloud, including metadata such as node IDs, available areas where the nodes are located and the like, and storing the metadata into a detection node information table. And periodically carrying out network packet loss test on each detection node, obtaining network packet loss rate data in a period of time, converting the packet loss rate data into corresponding packet loss levels according to a preset packet loss rate level threshold, and storing the packet loss rate data into a packet loss level data table of the detection node. And carrying out data association according to the dimension of the available area from the detection node information table and the detection node packet loss level data table, and carrying out frequency statistics on the packet loss level data of all detection nodes in each available area to obtain the node quantity distribution condition of each packet loss level.
And acquiring a preset network packet loss level weight coefficient, and determining the network packet loss level of each detection node. And obtaining the network packet loss weighted score of each detection node by adopting a weighted calculation method according to the network packet loss level distribution condition of the detection nodes. And summarizing the network packet loss weighted scores of the detection nodes in the available area on the same cloud to obtain the network packet loss comprehensive score of the available area.
Illustratively, in order to determine the weight coefficient of the packet loss index data of each detection node network, a hierarchical analysis method may be adopted. Firstly, the indexes such as time delay, jitter, retransmission rate and the like are compared with each other in pairs by means of expert scoring, and a judgment matrix is constructed. And then calculating the weight vector of each index by using a matrix eigenvalue method, and carrying out consistency test until the indexes pass. For example, the final calculated weight coefficients are delay 5, jitter 3, retransmission rate 2. On the basis, the weighted average model is utilized to carry out weighted fusion calculation on the network packet loss index data of each detection node, and the comprehensive score is obtained. Then, a K-means clustering algorithm is used for carrying out clustering analysis on the comprehensive scores of all the detection nodes. And (3) calculating Euclidean distance from each scoring sample to a clustering center, and iteratively updating the clustering center until the clustering result is converged, so that the detection nodes can be finally divided into low, medium and high packet loss grades. And counting the frequency distribution of the detection nodes with different packet loss grades in each cloud available region, and carrying out weighted average on the frequency distribution according to preset grade weight coefficients (such as low 2, medium 3 and high 5) to obtain the overall packet loss comprehensive score of the cloud available region network for subsequent available region selection decision. By the method, the multidimensional index is converted into the one-dimensional score, and the importance degree difference of different indexes and grades is considered, so that the packet loss level of the network in the available area on the cloud can be estimated more comprehensively and reasonably.
And S103, determining a network packet loss scoring threshold value of the available area network on each cloud according to the network area division, the service type and the time period configuration parameters, and determining whether a network packet loss fault event occurs in the available area network on each cloud by comparing the respective network packet loss comprehensive score and the network packet loss scoring threshold value of the available area network on each cloud.
In this embodiment, in order to more accurately monitor the network packet loss condition of the available area network on the cloud and discover the network packet loss fault event in time, a network packet loss fault monitoring system based on a dynamic threshold is designed and implemented, and the system configures a specific network packet loss scoring threshold for each available area network on the cloud according to parameters such as network area division, service type, time period and the like. By comparing the network packet loss comprehensive scores with the thresholds in real time, the system can automatically judge and report network packet loss fault events.
Specifically, the network is divided into different areas according to factors such as physical location, topological structure and the like of the available area network on the cloud, and each area may have different network characteristics and service requirements. Different service types running on the cloud, such as Web services, database services, video streaming media and the like, are identified and classified, and the sensitivity of the different service types to network packet loss is also different. Considering the periodic variation of network traffic, the time is divided into different time periods (such as peak time period, low peak time period, night time period, etc.), and the network load and packet loss conditions of different time periods are also different. Based on historical data and service requirements, reasonable network packet loss scoring thresholds are set for the available area network on each cloud, each service type and each time period. And deploying detection nodes in the available area network on each cloud, and collecting multi-dimensional network performance index data such as network packet loss, time delay, jitter and the like in real time. And processing the collected data by using a weighted fusion algorithm to obtain the network packet loss comprehensive score of the available area network on each cloud. The system compares the network packet loss comprehensive scores of the available area networks on each cloud with the corresponding threshold values in real time. When the score exceeds a threshold value, the system automatically judges the network packet loss fault event.
In this embodiment, by configuring a specific network packet loss scoring threshold for the available area network on each cloud, and comparing the scores with the thresholds in real time, the system can more accurately discover network packet loss fault events, and improve the fault discovery rate. The dynamic threshold mechanism can automatically adjust the threshold according to the changes of parameters such as network area division, service types, time periods and the like, so that false alarm and false alarm rate caused by improper setting of the fixed threshold are reduced.
In some embodiments, in step S103, the determining the network packet loss scoring threshold of the available area network on each cloud according to the network area division, the service type and the time period configuration parameter specifically includes:
in the available area network on each cloud, dividing a corresponding network area for each user, and determining a first scoring coefficient of each network area according to user characteristic information of the user corresponding to each network area;
Determining a second scoring coefficient of each service type according to the sensitivity of each service type to network packet loss;
Determining a third scoring coefficient of each service access time period according to the user flow condition of the service access time period;
And determining a network packet loss scoring threshold value of the available area network on each cloud by adopting a weighted average algorithm through the first scoring coefficient, the second scoring coefficient and the third scoring coefficient.
In this embodiment, user information in the available area network on each cloud is acquired, including data such as user characteristics, network usage requirements, and the like. And grouping the users by adopting a clustering algorithm according to the acquired user characteristic information, and dividing the users with similar characteristics into the same network area. And analyzing the user characteristic information in each network area, and extracting key characteristic parameters such as bandwidth requirements, time delay sensitivity and the like. And calculating a first scoring coefficient of each network area according to key characteristic parameters of users in the network area and combining network resource availability.
Acquiring service type information and corresponding network packet loss sensitivity degree data, judging the sensitivity level of the service type to network packet loss according to a preset sensitivity degree threshold, and determining corresponding second scoring coefficients aiming at different sensitivity levels. Specifically, a machine learning algorithm is adopted to train historical data of the service type and the network packet loss sensitivity degree, a correlation model of the service type and the sensitivity degree is obtained, the packet loss sensitivity degree of a new service type is dynamically predicted through the model, and a second scoring coefficient of the new service type is determined.
And acquiring service access log data, and extracting information such as user access time, access flow and the like. The data is divided into different time periods, such as hourly, daily, etc., according to access time. And counting the total access flow in each time period, and calculating the ratio of the flow in each time period to the total flow in the whole day. And taking the flow ratio of each time period as a third scoring coefficient of the time period, and establishing a mapping relation between the time period and the scoring coefficient. And when the subsequent service is scored, acquiring a corresponding third scoring coefficient from the mapping relation according to the current service access time period.
And taking the first scoring coefficient, the second scoring coefficient and the third scoring coefficient as input parameters, and calculating by adopting a weighted average algorithm to obtain a network packet loss scoring threshold value of the available area network on each cloud. If the network packet loss scoring threshold exceeds a preset warning value, judging that the network quality of the available area network on the cloud is at risk, and carrying out network optimization and adjustment. And according to the quality of the network packet loss scoring threshold, performing quality evaluation and sequencing on the available area networks on each cloud, and taking the available area network with the higher scoring threshold as a preferable network resource for bearing the service request of the key service and the high-priority user.
Illustratively, in order to determine the first scoring coefficients for each network region, a fuzzy comprehensive evaluation method may be employed. Firstly, a fuzzy judgment matrix is constructed according to the characteristics of bandwidth requirements, time delay sensitivity, service importance and the like of users. The qualitative indication is then quantized to a membership value between 0 and 1 using a membership function. Then, the comprehensive membership degree is calculated by a weighted average method and is used as a first scoring coefficient of the network area. For example, if the bandwidth requirement membership of a user in a certain area is 8, the delay sensitivity membership is 6, and the service importance membership is 9, the first scoring coefficient is 8×3+6×3+9×4=78.
When the second scoring coefficient is determined, packet loss sensitivity of different service types can be scored according to expert experience, and the score is between 0 and 10. And then, normalizing the scores of the service types to obtain a weight coefficient which is a second score coefficient. For example, if the packet loss sensitivity scores of the video service, the game service and the download service are 10, 8 and 6, respectively, the normalized second scoring coefficients are 42, 33 and 25.
For the third scoring coefficient, the user flow distribution conditions of different time periods can be statistically analyzed, and a weight coefficient between 0 and 1 is given to each time period by combining expert experience. For example, a day is divided into four time periods of early morning, afternoon and evening, and the corresponding weight coefficients are 1, 3 and 3 respectively. And finally, calculating the packet loss scoring threshold value of the available area network on each cloud by using a weighted average model and combining the three scoring coefficients.
Assuming that the weights of the first, second and third scoring coefficients are 4, 35 and 25, respectively, the packet loss scoring threshold=first scoring coefficient×4+second scoring coefficient×35+third scoring coefficient×25. When the actual packet loss comprehensive score of the available area network exceeds the threshold value, the network quality is poor, and optimization adjustment is needed.
S104, determining packet loss fault event data of an available area network on the cloud, wherein the packet loss fault event data of the network is generated, and determining the packet loss event grade according to the packet loss fault event data.
In this embodiment, after a network packet loss fault event occurs in a network in an available area on a cloud, in order to more effectively evaluate the influence range and severity of the fault, a packet loss fault event level determining system is designed and implemented, and the system automatically determines the level of the packet loss event through a series of evaluation indexes and algorithms based on the occurred packet loss fault event data, so as to take corresponding countermeasures.
Specifically, when the system detects that a network packet loss fault event occurs in the network of the available area on the cloud, relevant fault data are immediately collected. Such data may include the time, location (specific availability zone), type of traffic affected, packet loss rate, duration, number of concurrent failures (e.g., increased delay, increased jitter), etc. The method comprises the steps of determining a packet loss rate, wherein the packet loss rate is used for measuring the loss proportion of data packets in the network transmission process, the duration represents the time length from occurrence to end of a fault event, the affected service types are used for evaluating different sensitivities of different service types to network packet loss, and the concurrent fault number is used for evaluating whether other network faults occur in the same time period and the relevance among the faults.
Based on the evaluation index, an evaluation algorithm is designed and implemented to determine the level of the packet loss fault event. The algorithm may employ methods such as weighted summation, fuzzy evaluation, machine learning, etc., to convert the values of the evaluation indicators into a composite score, and then classify the fault event into different classes (e.g., severe, general, mild, etc.) based on the score. And dividing the packet loss fault event into different grades according to the output result of the evaluation algorithm. Each level corresponds to a different countermeasure and priority. For example, a severe level of failure event may require immediate initiation of an emergency plan, notification of critical users, and deployment of resources for emergency repair, while a slight level of failure event may require only recording and monitoring for subsequent optimization.
In the embodiment, the system can evaluate the grade of the network packet loss fault event of the available area on the cloud more accurately by defining clear evaluation indexes and adopting a scientific evaluation algorithm, so that the error and subjectivity of human judgment are reduced. According to different fault grades, the system can reasonably allocate resources to carry out fault processing. For serious-level fault events, more resources and manpower can be allocated for emergency repair, and for slight-level fault events, more flexible processing modes can be adopted.
In some embodiments, in step S104, the determining the packet loss severity of the network of the available area on the cloud where the network packet loss fault event has occurred, and determining the packet loss event level according to the packet loss severity specifically includes:
Acquiring historical packet loss data of an available area network on a cloud, acquiring packet loss occurrence event indexes according to the historical packet loss data, and generating an original packet loss event log through the packet loss occurrence event indexes;
According to a preset time window, carrying out statistical analysis on the original packet loss event log to obtain an average packet loss rate and a maximum packet loss rate in each time window;
Determining the severity of network packet loss faults according to the average packet loss rate and the maximum packet loss rate in all time windows, and forming a packet loss event training data set;
inputting the packet loss event training data set into a decision tree model for training to obtain a packet loss event grading model;
Acquiring packet loss fault event data of an available area network on the cloud, wherein the packet loss fault event data of the available area network on the cloud has occurred, inputting the packet loss fault event data into the packet loss event hierarchical model for identification, and determining the packet loss event grade of the available area network on the cloud, wherein the packet loss fault event of the network has occurred.
In this embodiment, historical packet loss data of a network in an available area on the cloud is obtained, and statistical analysis is performed on the packet loss data in different time periods to obtain indexes such as average packet loss rate and packet loss occurrence frequency of each time period. And determining a judging threshold value of the packet loss occurrence event according to the statistical analysis result of the historical packet loss data, and judging that the packet loss event occurs in a certain time period if the packet loss rate or the packet loss occurrence frequency in the certain time period exceeds a preset threshold value. And further analyzing the characteristics of the packet loss data in the time period for judging the packet loss event, extracting key indexes such as the start-stop time, the packet loss rate peak value and the like of the occurrence of packet loss, and generating the packet loss event index data. And classifying the packet loss event index data through a clustering algorithm, and identifying different types of packet loss events, such as packet loss caused by network congestion, packet loss caused by link failure and the like, according to a clustering result. Aiming at each type of packet loss event, based on the characteristics and occurrence rules of the packet loss data, a corresponding packet loss event detection model is constructed and used for monitoring the packet loss condition of the network in real time and finding out the packet loss event in time. After detecting the packet loss event, automatically generating an original packet loss event log in a standard format according to indexes such as the occurrence time and the packet loss rate of the event, and recording the detailed information of the event occurrence.
The method comprises the steps of obtaining preset time window parameters, determining the starting time and the ending time of the time windows, screening packet loss event records falling in each time window from an original packet loss event log according to the time window parameters, counting the total packet loss event number and the total data packet number according to the packet loss event records in each time window, calculating the average packet loss rate in each time window, wherein the formula is that the average packet loss rate = the number of packet loss events in the window/the number of total data packets in the window, calculating the maximum packet loss rate in each time window, and obtaining the maximum value of the packet loss rate in the window. Judging whether the maximum packet loss rate exceeds a threshold value according to a preset packet loss fault severity threshold value, and if so, considering that serious packet loss fault occurs. And training a packet loss fault severity classification model by adopting a support vector machine algorithm and taking the average packet loss rate and the maximum packet loss rate as characteristics. And training a packet loss event classification model by taking the average packet loss rate and the maximum packet loss rate as characteristics through a decision tree algorithm, and judging whether a packet loss event occurs. And taking the average packet loss rate, the maximum packet loss rate, the severity of the packet loss fault and the packet loss event as sample attributes to construct a packet loss event training data set.
And constructing a packet loss event grading model by adopting a decision tree algorithm according to the preprocessed training data set. Through a cross verification method, super parameters of the decision tree model are optimized, and the classification accuracy of the model is improved. And predicting and classifying the newly generated packet loss event data by using the trained decision tree model. And if the confidence coefficient of the predicted result is lower than a preset threshold value, triggering the model to retrain and updating the packet loss event classification model.
And continuously acquiring real-time packet loss data and related event information of an available area network on the cloud through a network monitoring system, and triggering a subsequent packet loss event grading flow when detecting that a packet loss fault occurs. And extracting key characteristic parameters such as time, duration, packet loss rate and the like of packet loss according to the acquired packet loss fault event data, and formatting the key characteristic parameters into a standardized data form for inputting into a packet loss event grading model. And inputting the standardized packet loss event data into a packet loss event grading model, automatically identifying and analyzing the severity of the packet loss event, and giving out a corresponding grading evaluation result.
Further, the determining the packet loss event level of the available area network on the cloud where the network packet loss failure event has occurred further includes:
Acquiring a network topology structure of an available area network on a cloud with the packet loss event grade being a serious grade, and determining connection relations and network equipment information among all host nodes of the available area network on the cloud according to the network topology structure to form a network topology diagram;
Based on the network topology diagram, a graph neural network model is adopted to determine a fault propagation characteristic mode, and optimal network packet loss fault corresponding measures are determined according to the fault propagation characteristic mode.
In the embodiment, network equipment information with serious packet loss event grade in a cloud available area network is acquired, and a network topology structure model is constructed by analyzing the connection relation between network equipment. And according to the network topology structure model, adopting a graph theory algorithm to analyze connectivity among the host nodes, and obtaining a connection relation matrix among the host nodes. And analyzing the configuration information and the alarm log of the network equipment through a deep learning algorithm, judging whether the network equipment is abnormal, and marking abnormal equipment in the topological graph if the network equipment is abnormal.
The method comprises the steps of constructing a graph neural network model according to a network topological graph, mapping network nodes and links into nodes and edges of the graph, wherein attributes of the nodes and the edges comprise network attribute parameters such as network equipment types, link bandwidths and the like, extracting characteristic parameters such as fault propagation paths, propagation time, influence ranges and the like according to historical network fault data, training through the graph neural network model to obtain a fault propagation characteristic mode, acquiring a current network topological structure and fault alarm information when a network has a packet loss fault, inputting the current network topological structure and the fault alarm information into the trained graph neural network model, predicting the propagation paths and the influence ranges of the fault, searching and matching in a fault knowledge base according to the predicted fault propagation characteristic mode, acquiring candidate fault corresponding measures, wherein the candidate measures comprise link load balancing, dynamic routing and the like, evaluating the candidate fault corresponding measures by adopting a reinforcement learning algorithm, and obtaining optimal measures with highest evaluation benefits through repeated iteration, issuing the optimal fault corresponding measures to a network controller, dynamically adjusting network forwarding rules and link strategies through a software-defined network technology, achieving rapid positioning and recovery of the network packet loss fault, acquiring a state, continuously monitoring the fault state, and carrying out fault recovery performance and performance of the network with the failure state being tested by the aid of the network is improved, and the failure state is better, and the performance is better than the failure state of the network is tested by the network.
S105, based on the packet loss event level and the attribute information of each operation and maintenance personnel, determining different network fault briefing generation rules and different network fault briefing pushing rules for each operation and maintenance personnel, and generating and pushing corresponding network fault briefs for each operation and maintenance personnel according to the different network fault briefing generation rules and the different network fault briefing pushing rules.
In this embodiment, in order to improve the response efficiency and processing capability of the operation and maintenance team to the network fault event, a personalized network fault briefing generation and pushing system is designed and implemented, and the system customizes different network fault briefing generation rules and pushing rules for each operation and maintenance person based on the determined packet loss event level and attribute information (such as skill expertise, responsibility scope, preference setting, etc.). By generating and pushing personalized network fault briefs, the system can ensure that operation and maintenance personnel acquire fault information related to responsibilities and interests of the operation and maintenance personnel in time, so that response and decision making can be performed more quickly.
Specifically, attribute information of the operation and maintenance personnel is collected, wherein the attribute information comprises, but is not limited to, skill expertise, knowledge and experience of the operation and maintenance personnel in specific technical or business fields, a responsibility range, specific network areas, business types or system components which the operation and maintenance personnel are responsible for, preference setting, and personal preference of the operation and maintenance personnel on presentation content, format, pushing mode and the like.
Based on the packet loss event level and attribute information of the operation and maintenance personnel, the system establishes personalized brief report generation rules for each operation and maintenance personnel, wherein the rules comprise content screening, screening out fault information related to the operation and maintenance personnel according to the responsibility range and skill expertise of the operation and maintenance personnel, priority sorting, sorting a plurality of fault events according to the event level and the responsibility emergency degree of the operation and maintenance personnel, formatting and customizing, namely customizing the format, the chart, the color and the like of the brief report according to the preference setting of the operation and maintenance personnel.
The system also establishes personalized brief report pushing rules for each operation and maintenance personnel, wherein the rules comprise pushing time, proper pushing time is selected according to the working time and preference of the operation and maintenance personnel, brief reports are pushed in a pushing mode through various modes such as mail, short messages, instant messages, mobile applications and the like, and proper pushing frequency is set according to the emergency degree of fault events and the requirements of the operation and maintenance personnel.
According to the formulated personalized presentation generation rules and pushing rules, the system generates corresponding network fault presentation for each operation and maintenance personnel and pushes the network fault presentation in a selected mode.
In the embodiment, the system can remarkably improve the response efficiency of the operation and maintenance team to the network fault event by providing the operation and maintenance personnel with fault information closely related to the responsibility and the interest of the operation and maintenance personnel. The operation and maintenance personnel can recognize and process the fault event related to the operation and maintenance personnel more quickly, and unnecessary interference and delay are reduced. The personalized bulletin not only contains the basic information of the fault event, but also can comprise analysis and suggestion aiming at the skill expertise of the operation and maintenance personnel, thereby being beneficial to the operation and maintenance personnel to more comprehensively know the fault condition and make more accurate decisions.
In some embodiments, in the step S105, the determining, for each operation and maintenance person, a different network failure briefing generation rule and a different network failure briefing pushing rule based on the packet loss event level and attribute information of each operation and maintenance person specifically includes:
Acquiring attribute information of each operation and maintenance person in a pre-established operation and maintenance person information database, wherein the attribute information comprises a level, a professional scope and a responsibility attribute;
Performing cluster analysis on the attribute information by adopting a K-means algorithm, dividing operation and maintenance personnel with similar attribute information into the same category, and setting a corresponding network fault bulletin template for each category;
Taking the level, professional scope and responsibility attribute of operation and maintenance personnel and the historical packet loss event data as input features of a decision tree, taking personalized network fault briefing generation rules and network fault briefing pushing rules as output of the decision tree, and forming a network fault briefing rule setting model by training a decision tree model;
And according to the network fault briefing rule setting model and the network fault briefing template, determining different network fault briefing generation rules and different network fault briefing pushing rules for each operation and maintenance person through the packet loss event level and the attribute information of each operation and maintenance person.
In the embodiment, a pre-established operation and maintenance personnel information database is acquired, the level, professional range and responsibility attribute information of each operation and maintenance personnel are extracted from the database, the extracted attribute information is converted into numerical value vectors to construct an operation and maintenance personnel attribute feature matrix, the operation and maintenance personnel attribute feature matrix is subjected to clustering analysis by adopting a K-means algorithm, operation and maintenance personnel with similar attributes are divided into the same category by calculating the distance between each data point and a clustering center, the common attribute characteristics of the operation and maintenance personnel in each category are analyzed according to the operation and maintenance personnel category obtained by each clustering, a corresponding network fault brief template is set according to the attribute characteristics, when a network fault occurs, the corresponding operation and maintenance personnel category is matched according to the fault type and the professional to which fault equipment belongs, the network fault brief template corresponding to the category is acquired, and the fault information is automatically filled to generate a specific fault brief.
The method comprises the steps of obtaining corresponding weight coefficients according to the level, professional range and responsibility attribute of operation and maintenance personnel, counting the efficiency and accuracy of processing faults by different operation and maintenance personnel according to historical packet loss event data, taking the attribute and the historical data of the operation and maintenance personnel as input features of a decision tree, personalizing a brief report rule as output, constructing a decision tree model by an ID3 algorithm, selecting optimal partition attribute through information gain, partitioning a data set into a plurality of subsets according to the attribute value on each decision node, recursively constructing sub-decision trees until all samples belong to the same category or cannot be continuously partitioned, and generating personalized network fault brief report generation rules and pushing rules according to paths of the decision tree.
And setting a model and a network fault bulletin template according to the network fault bulletin rule, and dynamically generating personalized network fault bulletin content by combining the current packet loss event level and the related operation and maintenance personnel attribute information of the event. And according to the pushing rules and the attributes of the operation and maintenance personnel in the network fault briefing rule setting model, judging which operation and maintenance personnel the briefing content should be pushed to, and forming a personalized network fault briefing pushing list. And sending the personalized network fault briefing content to corresponding operation and maintenance personnel according to the push list, and dynamically optimizing a network fault briefing rule setting model according to feedback, so that the accuracy and timeliness of the network fault briefing are continuously improved.
For example, in order to obtain attribute information of the operation and maintenance personnel, a database containing basic information of all the operation and maintenance personnel may be built in advance. The records of each operation and maintenance person in the database comprise levels (such as primary, intermediate, high-grade), professional areas (such as network, system, security, etc.), and responsibility attributes (such as duty, inspection, emergency, etc.). Then, the attribute information is analyzed by using a K-means clustering algorithm. The algorithm first randomly selects k cluster centers, then iteratively assigns each data point to the nearest cluster center, and updates the location of the cluster center until no more changes in the cluster center occur. Assuming that the operation and maintenance personnel are divided into 5 classes, the attribute characteristics of each class of personnel are similar, such as the class is medium class, and the professional scope is network. Different network fault bulletin templates are set for each category, and the content and the format in the templates are customized according to the characteristics of personnel in the category. And then, taking the attribute of the operation and maintenance personnel and the historical packet loss event data as characteristics, taking personalized briefing generation rules (for example, aiming at primary network operation and maintenance personnel, briefing content is detailed, and aiming at advanced security operation and maintenance personnel, briefing emphasis points of fault influence) and briefing pushing rules (for example, pushing within 5 minutes of serious faults and pushing within 1 hour of general faults) as the output of a decision tree, and training a decision tree model through a C5 algorithm. When a new packet loss fault occurs, judging a briefing generation rule and a pushing rule corresponding to each personnel by a decision tree model according to the severity level of the fault and the attribute of operation and maintenance personnel, automatically generating a personalized network fault briefing by combining a preset briefing template, and timely transmitting the personalized network fault briefing to the relevant operation and maintenance personnel according to the pushing rule so as to rapidly develop fault processing.
As an application example of the embodiment, as shown in FIG. 2, a network monitoring system (such as an open source zabbix system) is deployed in an available area to collect network packet loss data between the available areas in real time through a simple ping full interconnection method, when network packet loss of the available areas on the cloud reaches a set threshold, the network monitoring system triggers an alarm, the network monitoring system calls back a push platform interface, the push platform pulls out monitoring data of the network monitoring system through the interface, the push platform presets the threshold and judges conditions through pulling out the monitoring data, and the network fault conclusion is output through calculation and comparison and written into a database. And outputting a network fault conclusion, triggering a pushing module, generating a network fault brief and pushing. Before pushing the network fault briefing, the database is found to draw the fault conclusion up and down, the pushing time is compared, and the same conclusion is not repeatedly pushed within 5 minutes. And calling a third party webhook interface to complete pushing the network fault bulletin and informing an operator by telephone.
Referring to fig. 3, an embodiment of the present invention provides a processing system 3 for packet loss of a network in a cloud availability zone, where the system 3 specifically includes:
the first processing module 301 is configured to deploy a plurality of detection nodes on the available area networks on each cloud respectively, and obtain multidimensional network packet loss index data of the available area network on each cloud according to the detection nodes;
The second processing module 302 is configured to respectively process multidimensional network packet loss index data of all detection nodes of the available area network on each cloud by adopting a weighted fusion algorithm, so as to obtain a network packet loss comprehensive score of the available area network on each cloud;
the third processing module 303 is configured to determine a network packet loss scoring threshold of the available area network on each cloud according to the network area division, the service type and the time period configuration parameter, and determine whether a network packet loss fault event occurs in the available area network on each cloud by comparing the respective network packet loss comprehensive score and the network packet loss scoring threshold of the available area network on each cloud;
A fourth processing module 304, configured to determine packet loss failure event data of an available area network on a cloud where a network packet loss failure event has occurred, and determine a packet loss event class according to the packet loss failure event data;
And a fifth processing module 305, configured to determine, for each operation and maintenance person, different network failure briefing generation rules and different network failure briefing pushing rules based on the packet loss event level and attribute information of each operation and maintenance person, generate, for each operation and maintenance person, a corresponding network failure briefing according to the different network failure briefing generation rules and the different network failure briefing pushing rules, and push.
It can be understood that, the content in the embodiment of the method for processing the packet loss of the available area network on the cloud shown in fig. 1 is suitable for the embodiment of the system for processing the packet loss of the available area network on the cloud, and the functions specifically realized by the embodiment of the system for processing the packet loss of the available area network on the cloud are the same as those of the embodiment of the method for processing the packet loss of the available area network on the cloud shown in fig. 1, and the achieved beneficial effects are the same as those of the embodiment of the method for processing the packet loss of the available area network on the cloud shown in fig. 1.
It should be noted that, because the content of information interaction and execution process between the above systems is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the system is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Referring to fig. 4, the embodiment of the invention further provides a computer device 4, which comprises a memory 402, a processor 401 and a computer program 403 stored on the memory 402, wherein the computer program 403 implements the method for processing network packet loss of an available area on a cloud according to any one of the above methods when executed on the processor 401.
The computer device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device 4 may include, but is not limited to, a processor 401, a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of computer device 4 and is not intended to limit computer device 4, and may include more or fewer components than shown, or may combine certain components, or may include different components, such as input-output devices, network access devices, etc.
The Processor 401 may be a central processing unit (Central Processing Unit, CPU), but the Processor 401 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 402 may in some embodiments be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. The memory 402 may also be an external storage device of the computer device 4 in other embodiments, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Further, the memory 402 may also include both internal storage units and external storage devices of the computer device 4. The memory 402 is used to store an operating system, application programs, boot Loader (Boot Loader), data, and other programs, such as program code for the computer program. The memory 402 may also be used to temporarily store data that has been output or is to be output.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the method for processing the network packet loss of the available area on the cloud according to any one of the methods is realized.
In this embodiment, the integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least any entity or device capable of carrying computer program code to a camera device/terminal equipment, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the disclosed embodiments of the application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.