Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an intranet flow detection method and system based on artificial intelligence;
the aim of the invention can be achieved by the following technical scheme:
S1: intercepting real-time flow data of a network through a flow probe, and arranging according to time to obtain a time sequence; the time sequence is subjected to averaging treatment to obtain a stable time sequence, and a network flow model is established through the stable time sequence;
S2: judging the intranet flow based on the network flow model, presetting an iteration period and a flow prediction model, training and iterating the flow prediction model according to the iteration period, and outputting new optimal model and model-adaptive parameters again to predict newly input real-time network flow data to obtain predicted flow data;
S3: defining a standardized abnormal coefficient sequence according to the deviation of the actual flow of each period and the online dynamic threshold value; setting an abnormal boundary trigger threshold value through the standardized abnormal coefficient sequence, and constructing an abnormal detection model;
S4: and taking the predicted flow data output by the flow prediction model as an input sequence of the abnormality detection model to obtain the flow abnormality degree and the abnormality duration.
Specifically, the method for averaging treatment comprises the following steps: data were converted to a distribution with a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation.
Specifically, the network flow model performs differential operation on the smooth time sequence after the equalization processing, performs unit root inspection on an output result, and takes a time sequence with an inspection value larger than a preset critical value as an output sequence of the network flow model.
Specifically, the flow prediction model obtains a network flow data sample through the network flow model, and the flow characteristic matrix is filtered to obtain a filtered time sequence and a time sequence to be analyzed; calculating fitting model parameters according to the time sequence to be analyzed, wherein the calculation formula is as follows:
Wherein A (m) is a fitting sequence, j is element count in the sequence, T is an iteration period, fm is a regression term, Xj is a time sequence to be analyzed, bm is random disturbance, and pm is an optimal model order.
Specifically, the online dynamic threshold is a real-time threshold boundary of a model determined by performing target value screening on a sequence value predicted by each model, and the target value screening rule is as follows:
Wherein Xt is the sequence to be analyzed,The j predicted sequence value of the ith simulation, n is the predicted step length, I is a random positive integer in the predicted sequence,And (3) taking the average value j of the sequences to be detected as the time count, selecting the maximum value in the maximum value of the predicted sequences at each time as an upper limit of the online dynamic threshold value, and selecting the minimum value in the minimum value of the predicted sequences at each time as a lower limit of the online dynamic threshold value.
Specifically, the anomaly detection model detects outliers based on Grubbs algorithm, and the anomaly duration calculation method comprises the following steps:
Ok-h~k=[(ik-hTf+Tt)Xt+l,(ikTf+Tt)Xt+l],
Wherein k is a time count, Ok-h~k is a real time interval from the (k-h) th anomaly to the kth anomaly, h is the number of iteration cycles for which the anomaly is continuous, ik-h is a model prediction fluctuation deviation of the period corresponding to the kth anomaly event, ik is a model prediction fluctuation deviation of the period corresponding to the kth anomaly event, Tf is a training period number, Tt is a prediction period number, Xt is anomaly traffic data, and l is an anomaly distribution value.
An intranet flow detection system based on artificial intelligence comprises a flow processing module, a flow prediction module, a flow threshold control module and an abnormality processing module;
The flow processing module is used for intercepting real-time flow data of the network through the flow probe and obtaining a time sequence by arranging according to time; the time sequence is subjected to averaging treatment to obtain a stable time sequence, and a network flow model is established through the stable time sequence;
the flow prediction module judges the intranet flow based on the network flow model, presets an iteration period and a flow prediction model, trains and iterates the flow prediction model according to the iteration period, and re-outputs a new optimal model and parameters of model adaptation to judge newly input real-time network flow data again;
The flow threshold control module is used for defining a standardized abnormal coefficient sequence according to the deviation of the actual flow of each period and the online dynamic threshold; setting an abnormal boundary trigger threshold value through the standardized abnormal coefficient sequence, and constructing an abnormal detection model;
The abnormality processing module is used for taking the flow data output by the flow prediction model as an input sequence of the abnormality detection model to obtain the flow abnormality degree and the abnormality duration.
The beneficial effects of the invention are as follows:
(1) The probe is used for obtaining flow monitoring data, analyzing the performance and trend of network transmission flow in real time, providing complete flow information and adapting to ultra-large network flow; the attack type is determined through flow analysis, the dynamic influence generated by network attack is evaluated, and a foundation is provided for security protection measures.
(2) The flow data is predicted and abnormal detected, so that the capacities of multi-source data integration analysis, safety threat perception, risk information early warning and the like of the monitoring system can be effectively improved, and meanwhile, a network safety monitoring system for real-time monitoring, risk prejudging and rapid early warning is effectively supported. And carrying out real-time anomaly detection and analysis on the network traffic, reducing the difficulty of system security operation and maintenance and the probability of occurrence of internal security events, and enhancing the capability of the system for resisting external invasion.
Detailed Description
In order to further describe the technical means and effects adopted by the invention for achieving the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects according to the invention with reference to the attached drawings and the preferred embodiment.
Referring to fig. 1, an intranet flow detection method based on artificial intelligence includes:
S1: intercepting real-time flow data of a network through a flow probe, and arranging according to time to obtain a time sequence; the time sequence is subjected to averaging treatment to obtain a stable time sequence, and a network flow model is established through the stable time sequence;
S2: judging the intranet flow based on the network flow model, presetting an iteration period and a flow prediction model, training and iterating the flow prediction model according to the iteration period, and outputting new optimal model and model-adaptive parameters again to predict newly input real-time network flow data to obtain predicted flow data;
S3: defining a standardized abnormal coefficient sequence according to the deviation of the actual flow of each period and the online dynamic threshold value; setting an abnormal boundary trigger threshold value through the standardized abnormal coefficient sequence, and constructing an abnormal detection model;
S4: and taking the predicted flow data output by the flow prediction model as an input sequence of the abnormality detection model to obtain the flow abnormality degree and the abnormality duration.
In this embodiment, the monitoring system with the probe is connected to the mirror image port of the switch to collect network traffic in real time, and uses relevant scripts written in the Python 3.6 environment to identify network attack, malicious operation behavior, and network anomaly online. Wherein, the network flow is defined by a data packet sequence with the same value, which comprises a source IP, a destination IP, a source port, a destination port and a protocol; the traffic data characteristics include the time of forward transmission of data packets between two data streams, the time of reverse transmission of data packets between two data streams, and the duration of a stream per second.
Specifically, the method for averaging treatment comprises the following steps: data were converted to a distribution with a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation.
Specifically, the network flow model performs differential operation on the smooth time sequence after the equalization processing, performs unit root inspection on an output result, and takes a time sequence with an inspection value larger than a preset critical value as an output sequence of the network flow model.
Specifically, the flow prediction model obtains a network flow data sample through the network flow model, and the flow characteristic matrix is filtered to obtain a filtered time sequence and a time sequence to be analyzed; calculating fitting model parameters according to the time sequence to be analyzed, wherein the calculation formula is as follows:
Wherein A (m) is a fitting sequence, j is element count in the sequence, T is an iteration period, fm is a regression term, Xj is a time sequence to be analyzed, bm is random disturbance, and pm is an optimal model order.
In this embodiment, the ARIMA model is combined with the differential operation, and the non-stationary sequence after any averaging process can realize the stationary sequence after the differential process only by the differential of a certain order. After the sequences pass ADF test, ARMA model parameter fits can be performed on the sequences. D-order fractal difference is carried out on the time sequence through a FARIMA model, and the time sequence is converted into an ARMA model for processing after the stabilization treatment. The method can describe long correlation characteristics of sample flow and well represent short correlation characteristics of a network; on the premise of a certain training sample, an ideal fitting prediction model is obtained through the supervision training of the regression model; the method comprises the steps of adopting a mean square error, a root mean square error (mean square error and average absolute percentage error for verification, adopting filters with different orders, and carrying out filtering analysis on a flow time sequence after averaging, so as to obtain the influence of the order of the filter on the performance of the filter.
Specifically, the online dynamic threshold is a real-time threshold boundary of a model determined by performing target value screening on a sequence value predicted by each model, and the target value screening rule is as follows:
Wherein Xt is the sequence to be analyzed,The j predicted sequence value of the ith simulation, n is the predicted step length, I is a random positive integer in the predicted sequence,And (3) taking the average value j of the sequences to be detected as the time count, selecting the maximum value in the maximum value of the predicted sequences at each time as an upper limit of the online dynamic threshold value, and selecting the minimum value in the minimum value of the predicted sequences at each time as a lower limit of the online dynamic threshold value.
In this embodiment, the threshold boundary is considered as a simple two-classification for identifying whether the flow data at a certain point in time is an outlier. The interval between the upper and lower limits of the threshold is determined by statistical confidence.
Specifically, the anomaly detection model detects outliers based on Grubbs algorithm, and the anomaly duration calculation method comprises the following steps:
Ok-h~k=[(ik-hTf+Tt)Xt+l,(ikTf+Tt)Xt+l],
Wherein k is a time count, Ok-h~k is a real time interval from the (k-h) th anomaly to the kth anomaly, h is the number of iteration cycles for which the anomaly is continuous, ik-h is a model prediction fluctuation deviation of the period corresponding to the kth anomaly event, ik is a model prediction fluctuation deviation of the period corresponding to the kth anomaly event, Tf is a training period number, Tt is a prediction period number, Xt is anomaly traffic data, and l is an anomaly distribution value.
In this embodiment, a network traffic model is built under a long time scale and massive operation data. And carrying out pattern recognition on the mass data, and summarizing normal communication traffic patterns. The idea of machine learning is adopted, and model parameters and structures which can be called are generated off-line, so that the normal mode which is consistent with similar data can be matched with the similar data with high precision when the similar data are input; the information of the large database is memorized through an offline training process, and online prediction is performed based on offline training model knowledge. The network background flow data stored in the historical data server can be used as priori knowledge to train an offline LSTM model with longer time span; the real-time flow data to be detected is used as the input of the LSTM offline model, so that the mode of the normal background flow of the industrial network in the next time period can be predicted online, and the mode can be further used as the supplement of the flow dynamic online threshold model.
An intranet flow detection system based on artificial intelligence comprises a flow processing module, a flow prediction module, a flow threshold control module and an abnormality processing module;
The flow processing module is used for intercepting real-time flow data of the network through the flow probe and obtaining a time sequence by arranging according to time; the time sequence is subjected to averaging treatment to obtain a stable time sequence, and a network flow model is established through the stable time sequence;
the flow prediction module judges the intranet flow based on the network flow model, presets an iteration period and a flow prediction model, trains and iterates the flow prediction model according to the iteration period, and re-outputs a new optimal model and parameters of model adaptation to judge newly input real-time network flow data again;
The flow threshold control module is used for defining a standardized abnormal coefficient sequence according to the deviation of the actual flow of each period and the online dynamic threshold; setting an abnormal boundary trigger threshold value through the standardized abnormal coefficient sequence, and constructing an abnormal detection model;
The abnormality processing module is used for taking the flow data output by the flow prediction model as an input sequence of the abnormality detection model to obtain the flow abnormality degree and the abnormality duration.
In this embodiment, the Netlink is used to perform message communication between modules, after the system performs redundant cleaning on the collected flow data, the flow data is inserted into the background Mysql database after statistical analysis processing such as port description association and contemporaneous flow comparison, and the WEB front end can extract flow information in the current period through the interface to display, especially actively monitor ports such as high flow and zero flow, so that maintenance personnel can conveniently find out abnormal port flow caused by network faults or hidden hazards in time. At the WEB end, maintenance personnel can acquire the flow displayed in real time and can inquire the flow condition of one or more devices at one or more time points according to the requirements.
The present invention is not limited to the above embodiments, but is capable of modification and variation in detail, and other modifications and variations can be made by those skilled in the art without departing from the scope of the present invention.