Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 illustrates an application environment of a big data based network security monitoring method according to an embodiment of the present invention. The network security monitoring method based on the big data is applied to a network security monitoring scene. The recording scene comprises a server and a client, wherein the server and the client are connected through a network, the client sends an access request to the server to generate flow data, the server monitors real-time flow data, and filters the monitoring data through a log collection and analysis framework to obtain a log file, and then analyzes the log file to judge whether abnormal attack exists. The client may specifically be, but is not limited to, a smart device such as a mobile phone, a personal computer, a portable notebook, and a wearable smart device that can be used for network interaction, and the server may specifically be implemented by an independent server or a server cluster formed by multiple servers.
Referring to fig. 2, fig. 2 shows a network security monitoring method based on big data according to an embodiment of the present invention, which is described by taking the application of the method to the server in fig. 1 as an example, and is detailed as follows:
s10: and monitoring the flow information in real time, and analyzing the flow information in a deep message detection mode to obtain a log file corresponding to the flow information.
Specifically, the traffic information of the distributed cluster servers is huge, and in this embodiment, the traffic information of the distributed cluster servers is monitored in real time, and the monitored traffic information is analyzed in a deep packet inspection manner, so that some invalid data information is filtered, valid data information is classified according to a preset rule, and a log file corresponding to the traffic information is generated.
The Deep Packet Inspection (DPI) is a data Packet-based Deep Inspection technology, and the device performs Inspection and analysis on the traffic and the Packet content at key points of the network, and can filter and control the Inspection traffic according to a predefined strategy, thereby completing functions such as service fine identification, service traffic flow direction analysis, service traffic proportion statistics, and service proportion shaping of the link where the device is located.
S20: and acquiring a log file corresponding to the flow information through a log analysis framework, and screening and filtering the log file corresponding to the flow information to obtain a target log file.
Specifically, a preset log collection framework is respectively deployed on each cluster server of the cluster end, and the framework may specifically include a distributed publish-subscribe message system Kafka and a preset log collection system, and a log file corresponding to the traffic information is collected through the preset log collection framework, and is subjected to filtering to obtain a target log file.
Kafka, among others, is a high throughput distributed publish-subscribe messaging system. Kafka provides message persistence through a disk data structure that can maintain stable performance for long periods of time for TB-scale message storage, and can handle all action stream data in consumer-scale sites.
Specifically, actions in the action flow data include, but are not limited to: web browsing, searching and other user actions, which are a key factor in many social functions on modern networks. Action flow data is typically addressed by handling logs and log aggregations based on throughput requirements.
For example, in one embodiment, the Kafka collected action stream may include: logs generated by running of each process on the server, logs generated by operation of a manager on the server, processing logs of the server during running, and the like.
The preset log collection system includes but is not limited to: spark, hadoop, logstash, and flash, where Spark and Hadoop are relatively high in cost, and Logstash is a lightweight log collection processing framework and can conveniently collect and self-define dispersed and diversified logs, and therefore, as a preferred mode, the log collection system used in the embodiment of the present invention is flash or Logstash.
Among them, flume provides the ability to process data simply and write to various data recipients (customizable), when the speed of collecting data exceeds the data to be written, that is, when the collected information meets the peak value, the collected information is very large, even exceeds the data writing ability of the system, at this moment, flume will make adjustment between the data producer and the data container, ensure that it can provide stable data between the two, flume's pipeline is based on transaction, ensure the consistency of data when transmitting and receiving, at the same time, flume has the characteristics of strong reliability, high fault tolerance, support upgrading and easy management, and make the log collecting system highly available.
S30: and performing characteristic analysis of abnormal attacks on the data in the target log file to obtain a characteristic detection result.
Specifically, for the obtained target log file, flow analysis in unit time is performed on each source IP and target port included in the high target log file, and whether the characteristics of abnormal attacks exist or not is judged by combining the characteristics of various abnormal attacks, so that a characteristic detection result is obtained.
Among them, anomalous attacks include, but are not limited to: distributed Denial of Service (DDOS), black hole challenge (CC), exception traffic, SQL injection (sqlnject), and so on.
The source IP refers to an IP address of any port accessing the server, and the destination port is a port accessed by the source IP.
In this embodiment, each abnormal attack corresponds to its own traffic variation characteristic, and the detection of the abnormal attack is mainly based on the variation trend of the traffic in unit time to make a corresponding judgment.
For specific judgment of the abnormal attack, reference may be made to the description of step S31 to step S35, or reference may be made to the description of step S36 to step S37, and details are not repeated here to avoid repetition.
S40: and if the characteristic detection result indicates that at least one abnormal attack characteristic exists, executing corresponding early warning measures.
Specifically, when the feature detection result indicates that at least one abnormal attack feature exists, a corresponding early warning measure is executed.
The server side is provided with early warning measures corresponding to different abnormal attacks, the corresponding early warning measures are determined according to the abnormal attack features contained in the feature detection result, and the specific early warning measures can be set according to actual needs and are not limited here.
In this embodiment, flow information is monitored in real time, a deep packet inspection mode is adopted to analyze the flow information, a log file corresponding to the flow information is obtained through a log analysis framework, the log file corresponding to the flow information is screened and filtered, a target log file is obtained, an effective log file is accurately obtained, the data volume is reduced, the calculation amount is reduced, the calculation time is reduced, and the follow-up process can be carried out continuously.
Based on the embodiment corresponding to fig. 2, a specific implementation method for obtaining the target log file by obtaining the log file corresponding to the traffic information through the log analysis framework mentioned in step S20 and performing filtering processing on the log file corresponding to the traffic information through a specific embodiment is described in detail below.
Referring to fig. 3, fig. 3 shows a specific implementation flow of step S20 provided in the embodiment of the present invention, which is detailed as follows:
s21: a distributed publish-subscribe message system Kafka and a log analysis tool Logstash are deployed.
Specifically, a distributed publish-subscribe message system Kafka and a log analysis tool Logstash are respectively deployed on each cluster server at a cluster end.
Kafka, among others, is a high throughput distributed publish-subscribe messaging system. Kafka provides message persistence through a disk data structure that can maintain stable performance for long periods of time for TB-scale message storage, and can handle all action stream data in consumer-scale sites.
Specifically, actions in the action flow data include, but are not limited to: web browsing, searching and other user actions, which are a key factor in many social functions on modern networks. Action flow data is typically addressed by handling logs and log aggregations based on throughput requirements.
For example, in one embodiment, the Kafka collected action stream may include: logs generated by running of each process on the server, logs generated by operation of a manager on the server, processing logs of the server during running, and the like.
Currently common open-source log analysis tools for cluster management include: spark, hadoop, logstash, and the like, wherein Spark and Hadoop are relatively high in cost, and therefore the log analysis tool used in the embodiment of the present invention is Logstash.
The Logstash is a lightweight log collection processing framework, and can conveniently collect scattered and diversified logs, perform customized processing, and transmit the logs to a specified position, such as a certain server or a certain file.
Further, the Logstash can perform a log filtering operation by configuring the matching symbol.
S22: and acquiring a log file corresponding to the flow information in real time through a distributed publish-subscribe message system Kafka.
Specifically, each message acquired by the Kafka cluster has a category, which is called Topic. The messages of different topics are stored separately, the storage positions can be customized according to requirements and recorded in Offset, and consumers can acquire data only by specifying the topics of the messages without needing to care about where the data is stored specifically.
Wherein Offset is an index sequence of storage locations, and includes but is not limited to: offset number, message category, server IP address, storage location, and message time.
For example, in a Kafka cluster consisting of server a, server B, and server C, the message categories received at a particular time period include two categories: fault record message and debug record message, wherein, 23 pieces of fault record message, 160 pieces of debug record message, kafka select the server storing the fault record message and debug record message automatically according to the current state of server a, server B and server C. For example, the fault record message and the debug record message are respectively stored on a server a and a server B, where the server a stores 6 fault record messages, and the storage locations are: "C: \ temp \ server _ fault _2952.Log", and 100 debug record messages are also stored on the server A, and the storage positions thereof are as follows: "C: \ temp \ server _ debug _3623.Log", server B stores 17 fault record messages, and the storage positions are: \ min \ server _ fault _95.log ", and 60 debug record messages are also stored on the server B, and the storage positions are as follows: "C: \ server _ debug _532.Log", the IP address of server A is: 192.168.23.2, which stores the failure record message with Offset of "number: 9562 log category: fault record message, server IP address: 192.168.23.2, storage location: c \ temp \ server _ fault _2952.Log, message time: 2018-01-19 11:49:20".
It should be noted that, because Kafka adopts a decoupled design concept, instead of original publish-subscribe, a producer pushes data to each Topic, and a consumer acquires data from the Topic, which has the following advantages:
a) The load of the producer is decoupled from the load of the consumer.
b) The consumer obtains the data according to the needs of the consumer, and a large amount of unnecessary junk data generated in a consumer cluster is avoided. The Fetch method is used for obtaining the data, the Fetch method provides an API (application programming interface) for obtaining the resource data and a stronger and more flexible function set, and a consumer can obtain the interface according to own capability without being limited by a server of a producer.
c) The consumer can customize the number of consumptions.
Understandably, because of these advantages, kafka is enabled to acquire and store log files generated by all real-time traffic in real-time.
S23: and screening and filtering the data of the log file by using a log analysis tool Logstash to obtain a target log file.
Specifically, the Logstash is a lightweight log collection processing framework, and has the characteristic of conveniently collecting scattered and diversified logs. After Kafak stores data of a cluster end log file acquired in real time in a distributed mode in a preset user-defined storage position, logstash acquires the data of the log file according to application requirements and classifies and filters the data of the log file to obtain target resource data. The specific implementation process is as follows:
a) And the Logstash acquires Offset in a preset time interval.
Due to the characteristics of Kafka mentioned in step S302, kafka can acquire and store data of all log files in real time, and for performance, logstash needs to preset a time interval when processing data of these log files, and obtains records of relevant messages by acquiring all offsets within the preset time interval.
For example, in one embodiment, the preset time interval for processing the data of the log file is 60 seconds, and the Logstash first acquires all offsets within 60 seconds from the current time when starting to process the data of the log file.
b) And the Logstash acquires the corresponding log file according to the storage position recorded in the Offset.
As can be seen from the description of Offset in step S302, each Offset includes a corresponding storage location of the message described in the Offset, and a corresponding log file is acquired from the storage location.
For example, in one embodiment, the Offset that is obtained is specifically described as: "number: 9562, log category: fault record message, server IP address: 192.168.23.2, storage location: c \ temp \ server _ fault _2952.Log, message time: 2018-01-25 ", readily understandable, the recorded storage information is: "C: \ temp \ server _ fault _2952.Log", and the log file corresponding to the file is the "server _ fault _2952.Log" file under the "C: \ temp \ directory" recorded on the server with the server IP of "192.168.23.2".
c) The logstack classifies the record information in the log file.
After the log file is obtained, classifying each piece of record information in the log file.
Specifically, the log file contains at least one piece of record information, each piece of record information contains but is not limited to the category, number and specific event of the record information, but the contents are not divided, and the classification of the record information is obtained by dividing the record information.
For example: in one embodiment, there is such a record in a log file: "0009", "I ]" C: \ Windows \ system32\ Macromed \ Flash \ active x.vch ", is obtained by dividing the recorded information by using the Split function, wherein, {"0009"," C: \ Windows \ system32\ Macromed \ Flash \ active x.vch ", wherein" 0009 "is the number of the recorded information, and" [ I "is the category of the recorded information, and" C: \ Windows 32\ system32\ Macromed \ Flash \ active x.vch "is the specific event of the recorded information.
d) The logstack acquires the record information of the category matched with the preset matching symbol from the record information of the log file, and simultaneously acquires the server IP address and the message time corresponding to the record information, and takes the record information, the server IP address and the message time as a target log file.
Specifically, the Logstash searches for a corresponding category according to a preset matching symbol from the log information of the log file, acquires the queried record information of the category matched with the preset matching symbol, and acquires the server IP address and the message time corresponding to the record information according to the log file of the record information.
For example, the category of the record information may include at least one of "[ D ]", "[ I ]", "[ W ]", "[ E ]", "[ F ]" where "[ D ]", "[ I ]", "[ W ]", "[ E ]", "[ F ]" respectively corresponds to "Debug", "Info", "wave", "Error", and "total", and when the preset matching symbol is one or more of "Debug", "Info", "wave", "Error", and "total", the record information corresponding to the category is obtained by obtaining the category matching the matching symbol.
It should be noted that the types of the preset matching symbol and the record information can be set according to the needs of practical applications, and are not limited herein.
As a preferable mode, in this embodiment, the categories of the preset matching characters and the record information may be classified by using a data type of real-time traffic or a port type of access.
In the embodiment, the distributed publish-subscribe message system Kafka and the log analysis tool Logstash are deployed, the log file corresponding to the flow information is obtained in real time through the distributed publish-subscribe message system Kafka, and then the log analysis tool Logstash is used for screening and filtering the data of the log file to obtain the target log file, so that the required target log file is effectively obtained, meanwhile, the interference of excessive redundant data on subsequent abnormal judgment is avoided, the calculated amount is reduced, and the efficiency of the subsequent abnormal judgment is improved.
Based on the embodiment corresponding to fig. 2, a specific implementation method for performing the feature analysis of the abnormal attack on the data in the target log file mentioned in step S30 to obtain the feature detection result is described in detail below by using a specific embodiment.
Referring to fig. 4, fig. 4 shows a specific implementation flow of step S30 provided in the embodiment of the present invention, which is detailed as follows:
s31: and acquiring basic data packet information of each source IP access target port in a preset time interval from the target log file aiming at each target port corresponding to the flow information.
Specifically, the cluster server includes a plurality of node servers, each node server includes one or more ports, and basic packet information of each source IP accessing a target port within a preset time interval is acquired from a target log file for each target port corresponding to the traffic information.
The basic packet information refers to information of data traffic of the source IP access destination port.
S32: and summarizing the information of each basic data packet to obtain summarized data packet information of the access target port in a preset time interval.
Specifically, the information of each basic data packet is summarized to obtain the summarized data packet information of the access target port within a preset time interval.
S33: and calculating the ratio of basic data packet information corresponding to the source IP to the summarized data packet for each source IP to obtain the frequency of the source IP accessing the target port in unit time, and determining the information entropy in a preset time interval according to the frequency.
Specifically, for each source IP, a ratio of basic data packet information corresponding to the source IP to the summarized data packet is calculated, a frequency of the source IP accessing a target port in unit time is obtained, and an information entropy in a preset time interval is determined according to the frequency.
Where information entropy (informationentropy) is a quantitative measure of the probability of occurrence of information, what symbol a source sends is uncertain, and it can be measured according to its probability of occurrence. The probability is large, the occurrence chance is many, the uncertainty is small, otherwise, the uncertainty is large.
In the embodiment, the range of the number of the source IP accessing the target port within the preset time range is measured by the information entropy.
S34: and storing the information entropies into a sliding window with the length of M, wherein each information entropy accounts for 1 unit of the length of the sliding window.
Specifically, the information entropies are stored into a sliding window with the length of M, wherein each information entropy accounts for 1 unit of the length of the sliding window.
S35: and judging whether the source IP corresponding to the information entropy has abnormal flow or not based on the information entropy and the sliding window, and determining a characteristic detection result according to the judgment result.
Specifically, whether a source IP corresponding to the information entropy has a traffic anomaly is determined according to the information entropy and the sliding window, and a feature detection result is determined according to the determination result, and the specific process may refer to the description of steps S351 to S353, or refer to the description of steps S354 to S355, and is not described herein again to avoid repetition.
In this embodiment, for each target port corresponding to traffic information, basic packet information of each source IP accessing the target port in a preset time interval is obtained from a target log file, the basic packet information of each source IP accessing the target port is summarized to obtain summarized packet information of the access target port in the preset time interval, then, for each source IP, a ratio of the basic packet information corresponding to the source IP to the summarized packet is calculated to obtain a frequency of the source IP accessing the target port in unit time, an information entropy in the preset time interval is determined according to the frequency, and the information entropy is stored in a sliding window with a length of M, wherein each information entropy occupies a length of 1 unit of the sliding window, and finally, based on the information entropy and the sliding window, whether the source IP corresponding to the information entropy has traffic abnormality or not is determined, and according to a determination result, a feature detection result is determined, whether the traffic abnormality of each IP exists or not is rapidly monitored, and when the abnormality is detected, the source IP causing the abnormality is rapidly obtained to further process, which is beneficial to improving efficiency of network security monitoring.
Based on the embodiment corresponding to fig. 4, a detailed description is given below of a specific implementation method for determining whether the source IP corresponding to the information entropy has traffic anomaly based on the information entropy and the sliding window, and determining the feature detection result according to the determination result, which is mentioned in step S35.
Referring to fig. 5, fig. 5 shows a specific implementation flow of step S35 provided in the embodiment of the present invention, which is detailed as follows:
s351: when the number of the information entropies stored in the sliding window reaches M, confidence values N and an average value V of the M information entropies are calculated, and a confidence interval [ N-V, N + V ] is determined according to the confidence values and the average value.
Specifically, when the number of the information entropies stored in the sliding window reaches M, the confidence values N and the average value V of the M information entropies are calculated, and a confidence interval [ N-V, N + V ] is determined according to the confidence values and the average value.
The confidence value and the average value of the information entropy can be calculated according to a calculation formula of the confidence value and the average value, and are not specifically described here.
S352: and judging whether the information entropy of the source IP in the preset time interval belongs to the range of a confidence interval [ N-V, N + V ].
Specifically, after the confidence interval is determined, the information entropy of each source IP in the preset time interval is counted, and whether the information entropy of the source IP in the preset time interval belongs to the range of the confidence interval [ N-V, N + V ] is judged.
S353: and if the information entropy of the source IP in the preset time interval is in the range of the confidence interval [ N-V, N + V ], determining that the feature detection result is that at least one abnormal attack feature exists.
Specifically, when the source IP is in the information entropy confidence interval [ N-V, N + V ] range in the preset time interval, determining that the access flow of the source IP is abnormal, and at the moment, determining that the characteristic detection result is that at least one abnormal attack characteristic exists.
In this embodiment, whether the traffic of the source IP within the preset time interval is normal is determined through the specific information entropy and the confidence interval, and then whether an abnormal attack feature exists is determined, so that the abnormal detection is rapidly performed, and the efficiency of the abnormal detection is improved.
In an embodiment, the step S35 further includes monitoring the abnormal result by using a flow calculation framework Flink.
Referring to fig. 6, fig. 6 shows a specific implementation flow for monitoring an abnormal result by using a flow calculation framework Flink according to an embodiment of the present invention, which is detailed as follows:
s354: and setting a State management point State for the information entropy and the sliding window in real time by adopting a flow calculation framework Flink.
Specifically, in this embodiment, the data traffic belongs to continuously generated data, and for a scene in which the latest data is continuously generated, processing is performed in a streaming data manner, so that the processing speed can be effectively increased, and the streaming data has the main characteristics that data arrives in real time, the arrival order is independent and is not controlled by an application system, and data with unknown size has strong analysis and processing capabilities.
The core of the open source stream processing framework, which is developed by an Apache software foundation, of the stream computing framework Flink is a distributed stream data stream engine written by Java and Scala, the Flink executes any stream data program in a data parallel and pipeline mode, a pipeline runtime system of the Flink can execute batch processing and stream processing programs, and the Flink has Checkpoint, savepoint and a fault-tolerant mechanism.
In this embodiment, according to the State, the information entropy calculated in real time and the data stored in the sliding window may be stored and distributed to the concurrency framework for concurrent calculation.
S355: reading an information entropy and a sliding window corresponding to the State management point State according to a preset time interval, judging whether the source IP corresponding to the information entropy has abnormal flow or not by adopting a preset judgment mode based on the information entropy and the sliding window, and determining a characteristic detection result according to a judgment result.
Specifically, according to a preset time interval, reading an information entropy and a sliding window corresponding to a State management point State, judging whether a source IP corresponding to the information entropy has abnormal traffic or not by adopting a preset judgment mode based on the information entropy and the sliding window, and determining a characteristic detection result according to a judgment result.
It should be noted that the preset determination mode may specifically be a newly-added traffic anomaly determination algorithm according to an actual situation, for example, the preset determination mode is to determine whether an abnormal attack exists by determining whether the information entropy falls into a confidence interval, the newly-added traffic anomaly determination algorithm is to determine whether an abnormal attack exists by calculating a traffic threshold in a unit time, or determine whether a current traffic is abnormal or not by using a neural network model, or may be a traffic anomaly detection algorithm obtained by adjusting the determination mode provided in steps S351 to S353 according to an actual situation, in this embodiment, a state management (state) characteristic of a traffic calculation frame Flink itself is adopted, intermediate results (information entropy and information stored in a sliding window) in performing the operation are temporarily stored in real time, and the preset determination mode is executed for determination, a sequence between the determination mode provided in steps S351 to S353 and the preset determination mode is not necessarily provided, only the determination mode provided in steps S351 to S353 is executed, or only the preset determination mode is executed simultaneously, and no specific limitation is not performed here.
It should be understood that the preset determination method is only described as one specific embodiment of the present embodiment, and in actual production, two or more preset determination methods may be used, and the execution of the preset determination method may be one or more of multiple preset determination methods, and when multiple preset determination methods are executed, the multiple preset determination methods are executed concurrently.
It should be noted that the preset determination mode is performed after the preset time interval, so that the determination mode can be updated in real time according to actual needs without affecting normal operation of traffic monitoring.
In this embodiment, a flow calculation frame Flink is adopted to store the state of the information entropy and the sliding window in real time, and perform anomaly detection on the information entropy and the sliding window stored in the time period range according to a preset time interval to obtain a detection result. The flow calculation frame Flink and the State management point State are combined for use, the time interval is set according to actual requirements to carry out abnormal judgment on flow information, network congestion easily caused in a flow peak period is prevented, and the efficiency of real-time detection and the timeliness of detection are improved.
On the basis of the correspondence of fig. 6, the following explains, by a specific embodiment, whether the source IP corresponding to the information entropy has a traffic anomaly or not by using a preset judgment method based on the information entropy and the sliding window, and determines a feature detection result according to the judgment result.
As shown in fig. 7, fig. 7 shows a specific implementation process performed by determining a feature detection result according to a determination result by using a preset determination method to determine whether a source IP corresponding to an information entropy has a traffic anomaly based on the information entropy and a sliding window, which is detailed as follows:
s3551: and determining a flow sequence of a preset time interval according to the acquired information entropy and the sliding window.
Specifically, flow information corresponding to each information entropy in the sliding window is sequentially acquired, and a flow sequence of a preset time interval is obtained.
S3552: and performing wavelet transformation on the flow sequence to obtain a wavelet coefficient corresponding to the flow sequence, and determining a target Hurst index corresponding to the flow sequence according to the wavelet coefficient.
Specifically, normal flows in the network have similarity, that is, multiple sets of wavelet coefficients corresponding to multiple flow sequences obtained by sampling flows in the same time duration and the same time slot are also similar, and correspondingly, hesster index indexes respectively calculated based on the multiple sets of wavelet coefficients are also similar. That is to say, if there is abnormal traffic in the network within a certain preset time duration, the abnormal traffic will block the normal traffic in the network, resulting in a decrease in the similarity of the traffic transmitted by the network device, so that a set of wavelet coefficients corresponding to the traffic sequence obtained by sampling the traffic according to the preset time slot within the preset time duration will have a significant change from a set of wavelet coefficients corresponding to the traffic sequence obtained by sampling the normal traffic based on the same time duration and the same time slot, and correspondingly, the hestery index will also have a significant change.
Wherein, the Hurst exponent (Hurst exponent) is an exponent constructed based on a re-standard deviation (R/S) analysis method, and is used as an index for judging whether the time series data follows a random walk or a biased random walk process.
S3553: and if the absolute value of the difference value between the target Herster index and the preset Herster index is larger than a preset threshold value, determining that the abnormal flow exists in the characteristic detection result.
Specifically, when the absolute value of the difference between the target heuster index and the preset heuster index is greater than the preset threshold, it is determined that the abnormal flow exists in the feature detection result.
Wherein, the preset Hurst index is calculated according to the normal flow sequence.
The preset threshold value can be set according to actual requirements, and is usually between 0.1 and 0.4.
In this embodiment, a flow sequence of a preset time interval is determined according to the obtained information entropy and the sliding window, and then a target hester index of the flow sequence is calculated and compared with the preset hester index to determine whether an anomaly exists, so that the flow is rapidly analyzed, and the anomaly detection efficiency is improved.
In an embodiment, after step S40, the method for monitoring network security based on big data further includes:
and performing data analysis on the target log file and the abnormal attack characteristics, and constructing a visual chart, wherein the visual chart comprises at least one of a trend chart, a frequency chart, a proportion chart or a data table.
Specifically, after an anomaly is detected, a source IP and a target port corresponding to the abnormal traffic are obtained, and the traffic data of the source IP and the traffic data of the target port are visually displayed, so that managers can quickly and intuitively look up the anomaly.
The generated visualization chart specifically includes but is not limited to: the trend graph, the frequency graph, the ratio graph or the data table may be set according to actual requirements, and are not limited herein.
It should be noted that after the visual chart is generated, the visual chart can be sent to an interface of the monitoring end, and the visual chart is sent to the third-party communication platform as the early warning prompt information.
In the embodiment, the visual chart is generated by performing data analysis on the target position file and the abnormal attack characteristics, so that management personnel can quickly consult the visual chart.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 8 is a schematic block diagram of a big data based network security monitoring apparatus in one-to-one correspondence with the big data based network security monitoring method according to the above-described embodiment. As shown in fig. 8, the big data based network security monitoring apparatus includes atraffic analyzing module 10, alog screening module 20, ananomaly analyzing module 30, and an anomaly early warning module 50. The functional modules are explained in detail as follows:
theflow analysis module 10 is configured to monitor flow information in real time, and analyze the flow information in a deep packet inspection manner to obtain a log file corresponding to the flow information;
thelog screening module 20 is configured to obtain a log file corresponding to the flow information through a log analysis framework, and perform screening and filtering processing on the log file corresponding to the flow information to obtain a target log file;
theanomaly analysis module 30 is used for performing feature analysis of anomaly attacks on the data in the target log file to obtain a feature detection result;
and the anomalyearly warning module 40 is used for executing corresponding early warning measures if the feature detection result indicates that at least one anomaly attack feature exists.
Further, thelog filtering module 20 includes:
the system comprises a deployment unit, a service unit and a service unit, wherein the deployment unit is used for deploying a distributed publish-subscribe message system Kafka and a log analysis tool Logstash;
the log collection unit is used for acquiring a log file corresponding to the flow information in real time through a distributed publish-subscribe message system Kafka;
and the log analysis unit is used for screening and filtering the data of the log file by using a log analysis tool Logstash to obtain a target log file.
Further, theanomaly analysis module 30 includes:
a basic data packet obtaining unit, configured to obtain, from a target log file, basic data packet information of each source IP access target port within a preset time interval for each target port corresponding to the traffic information;
the summarized data packet determining unit is used for summarizing the information of each basic data packet to obtain the summarized data packet information of the access target port in a preset time interval;
the information entropy determining unit is used for calculating the ratio of basic data packet information corresponding to the source IP to the summarized data packet for each source IP to obtain the frequency of the source IP for accessing the target port in unit time, and determining the information entropy in a preset time interval according to the frequency;
the information entropy storage unit is used for storing the information entropy into a sliding window with the length of M, wherein each information entropy occupies the length of 1 unit of the sliding window;
and the result judging unit is used for judging whether the source IP corresponding to the information entropy has abnormal flow or not based on the information entropy and the sliding window, and determining a characteristic detection result according to the judgment result.
Further, the result judgment unit includes:
a confidence interval determining subunit, configured to calculate confidence values N and an average value V of the M information entropies when the number of the information entropies stored in the sliding window reaches M, and determine a confidence interval [ N-V, N + V ] according to the confidence values and the average value;
the range comparison subunit is used for judging whether the information entropy of the source IP in the preset time interval belongs to the range of a confidence interval [ N-V, N + V ];
and the abnormality determining subunit is used for determining that the feature detection result is that at least one abnormal attack feature exists if the information entropy of the source IP in the preset time interval is within the range of the confidence interval [ N-V, N + V ].
Further, the result judgment unit further includes:
the information storage subunit is used for setting a State management point State for the information entropy and the sliding window in real time by adopting a flow calculation frame Flink;
and the preset judgment subunit is used for reading the information entropy and the sliding window corresponding to the State management point State according to a preset time interval, judging whether the source IP corresponding to the information entropy has abnormal flow or not by adopting a preset judgment mode based on the information entropy and the sliding window, and determining a characteristic detection result according to the judgment result.
Further, the preset judgment unit includes:
the sequence acquisition element is used for determining a flow sequence of a preset time interval according to the acquired information entropy and the sliding window;
the index calculation element is used for performing wavelet transformation on the flow sequence to obtain a wavelet coefficient corresponding to the flow sequence and determining a target Hurst index corresponding to the flow sequence according to the wavelet coefficient;
and the abnormality judgment element is used for determining that the characteristic detection result is abnormal flow if the absolute value of the difference value of the target hestert index and the preset hestert index is greater than a preset threshold value.
This network security monitoring device based on big data still includes:
and the real-time updating unit is used for updating the preset judgment mode according to the updating instruction if the updating instruction aiming at the preset judgment mode is received.
For specific limitations of the big data based network security monitoring apparatus, reference may be made to the above limitations of the big data based network security monitoring method, which is not described herein again. All or part of each module in the big data based network security monitoring device can be realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 9 is a schematic diagram of a computer device provided by an embodiment of the invention. The computer device may be a server and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing a log analysis frame and a preset judgment mode. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a big data based network security monitoring method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the big data based network security monitoring method of the above embodiments, such as steps S10 to S40 shown in fig. 2. Alternatively, the processor, when executing the computer program, implements the functions of each module/unit of the big data based network security monitoring apparatus according to the above embodiments, such as the functions of themodules 10 to 40 shown in fig. 8. To avoid repetition, further description is omitted here.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions.
In an embodiment, a computer-readable storage medium is provided, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to implement the steps of the big data based network security monitoring method according to the above embodiment, or the computer program is executed by the processor to implement the functions of each module/unit in the big data based network security monitoring apparatus according to the above embodiment. To avoid repetition, further description is omitted here.
It is to be understood that the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and the like.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.