Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention provides a flow data processing method.
Fig. 1 is a flowchart of a traffic data processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102: when the flow of the target object is detected to be abnormal based on the weekly baseline, detecting the flow of the target object again through the extended period baseline to obtain a re-detection result, wherein the flow period corresponding to the extended period baseline is more than one week;
step S104: judging whether the re-detection result is abnormal or not;
step S106: and if the retest result is abnormal, outputting a first message, wherein the first message is used for prompting that the flow of the target object is abnormal, and if the matching between any one of the extended period baselines and the flow of the target object is successful, the retest result is not abnormal.
The embodiment adopts that when the flow of the target object is detected to be abnormal based on the week baseline, the flow of the target object is detected again through the extended period baseline to obtain a re-detection result, wherein the flow period corresponding to the extended period baseline is more than one week; judging whether the re-detection result is abnormal or not; if the retest result is abnormal, outputting a first message, wherein the first message is used for prompting that the flow of the target object is abnormal, and if the matching between any one of the extended period baselines and the flow of the target object is successful, the retest result is not abnormal, so that the problem of judgment error caused by unreasonable flow period determination is solved, and the effect of improving the accuracy of abnormal flow judgment is achieved.
In the embodiment of the invention, when the flow detection is carried out based on the weekly base line, if abnormal flow is found, the abnormal flow can be based on data flow with a longer period, such as data flow which occurs once per month or data flow which occurs once every quarter, the change of the flow with the period longer than the week can detect whether abnormality exists again through the extended period base line, if the abnormality still exists through the extended period base line detection, an abnormal prompt is sent, and by adding the step of the extended period base line detection, false alarm can be reduced, and the extended period flow in the abnormal flow can be identified.
Optionally, if there is no anomaly in the re-detection result, the extended period baseline is updated based on the target object's traffic.
If the current abnormal flow detected based on the weekly base line is successfully matched with the extended period base line, the abnormal flow is not indicated, and the extended period base line is updated according to the current flow data, so that the base line data is more accurate.
Optionally, before the flow of the target object is detected again through the extended period baseline to obtain a detection result again, whether the extended period baseline exists is judged; if not, a second message is output, the second message being used to prompt that there is no extended period baseline.
If an extended period baseline has not been added, a prompt message may be issued to prompt the user.
Optionally, after outputting the second message, sending a prompt message, where the prompt message is used to prompt whether to add the extended period baseline; after receiving an add instruction, an extended period baseline is added based on the add instruction.
And prompting whether the user adds the extension period baseline or not again after prompting that the user does not have the extension period baseline, and if the user sends an adding instruction, adding the extension period baseline based on the adding instruction.
Optionally, after receiving an instruction not to be added, determining the currently detected flow of the target object as an abnormal flow; and sending a flow abnormity prompt.
If the user does not add the extended period baseline, the weekly baseline detection result can be directly adopted as the current flow detection result, and a flow abnormity prompt is sent to prompt the user that abnormity exists.
In addition, if it is shown that there is no abnormality in the current flow based on the weekly baseline detection result, the weekly baseline may be updated with the current flow data to make the weekly baseline more accurate.
The embodiment of the present invention also provides a preferred embodiment, and the present application will be described based on the preferred embodiment.
The present embodiment provides a baseline construction technique for an arbitrary period (less than one year), and a flow rate detection technique and system design based on this baseline, which can allow an administrator to set an arbitrary period less than one year for a specific flow rate, and the system automatically learns and adjusts the baseline according to this period, and performs abnormal flow rate detection according to this period.
The method mainly comprises the following steps: an extended period baseline data structure, an extended period baseline management module, an extended flow anomaly detection process and the like.
1. Extended period baseline data structure
A conventional weekly baseline in cycles of weeks selects a certain point in time within a week as a starting point. And calculating time offset between the real-time and the starting point, and searching the corresponding baseline flow characteristics by using the time offset. And comparing the actual flow characteristic with the baseline characteristic to judge whether the flow is normal or abnormal. The simplified data structure of the weekly baseline is shown in the following table.
TABLE 1 simplified data Structure of week Baseline
Since the specific contents of TimeStamp (TimeStamp) and BaseLineData (baseline data) are not relevant to the present invention and are not defined in detail herein, TimeStamp may be a relative time from the beginning of a cycle, in minutes, hours, or days. The BaseLineData may also be a time period defined by the start time and the end time, and the BaseLineData should include, but is not limited to, traffic identification data such as IP addresses, ports, services, and the like of the traffic, traffic characteristic data, traffic statistics data, and the like.
Since the TimeStamp in the cycle baseline table is recorded relative to the starting point of the cycle every week, and the repeated events with a period greater than one week cannot be included in the table, the embodiment proposes a data structure of the extended cycle baseline to store any periodic baseline data less than one year, and the data structure of the extended cycle baseline is shown in the following table.
Table 2 data structure for extended period baseline
| TimeStamp | Pattern | BaseLineData |
| T1 | P1 | Data1 |
| T2 | P2 | Data2 |
| T3 | P3 | Data3 |
Each item in the data structure is specifically defined as:
timestamp: the time stamp, unlike the above definition, records the time period from the beginning of the first day of each year from which the month, week, day of the week, hour, minute, etc. information of the event time can be determined.
BaseLineData: baseline data, as defined above.
Pattern: repetitive pattern
And the Pattern expands new entries in the data structure of the periodic baseline to record the repeated mode of the flow corresponding to the current baseline. The values of Pattern include but are not limited to:
the medicine is repeated in a certain day in two weeks,
the operation is repeated on a certain day of the month,
the working days are repeated at the beginning of the month,
the working days at the end of the month are repeated,
repeating every year on calendar days.
The network traffic analysis system may add or delete supported repeating patterns according to the service operation characteristics of its target customers.
2. Extended period baseline management module
As a large amount of flow characteristic data needs to be found and stored in the weekly baseline, baselines in the weekly baseline data structure are obtained through automatic learning of the system and are adjusted in real time according to actual flow, and an administrator can achieve the available purpose only by finely adjusting the weekly baseline.
The extended period baseline includes a traffic baseline which cannot be included in the weekly baseline, the number of the traffic baseline is far lower than that of the weekly baseline, and the creation, modification and deletion of the extended period baseline require direct participation of an administrator.
Baseline addition: if the normal flow cannot be covered by the weekly flow baseline, the flow detection module can generate abnormal flow alarm for the flow. When handling an abnormal alarm, the administrator can distinguish whether the traffic is abnormal traffic or normal traffic that is repeatable and has a period longer than one week. In response to the latter, the administrator may add the corresponding traffic characteristics, time of occurrence, and traffic repetition pattern to the extended period baseline through the baseline addition functionality of the management module. The baseline adding function needs to be integrated into the processing flow of the abnormal traffic of the network traffic analysis system.
Automatic baseline adjustment: the baseline is consistent with the baseline in the week baseline, and the extended period baseline also needs to be automatically adjusted along with the actual flow. The tuning algorithm and implementation here is consistent with the function of the weekly baseline. This functionality can be implemented in a multiplexed system.
And (3) base line display: the baseline exhibition function provides an exhibition and management operation and maintenance entrance for the added baseline.
Manual baseline adjustment: the traffic represented by the extended period baseline may change as traffic changes. The timestamps and repetition patterns within the existing baselines can be manually adjusted by an administrator to accommodate changes in traffic.
Baseline manual deletion: as baselines in the extended period baselines have longer repetition time periods, the automatic aging mechanism based on time is difficult to realize, and the management module can realize a manual deleting function so as to assist an administrator to clear useless baselines.
The above design describes the basic functions of the management module, and different network traffic analysis systems can implement part of the above functions or add more management functions according to the design of their customer groups.
Fig. 2 is a flow chart of flow anomaly detection according to an embodiment of the present invention, and as shown in fig. 2, if flow anomaly is detected and reported only based on a weekly baseline, a system administrator generally has two options for flow anomaly caused by normal flow with a repetition period greater than one week, one of which is to ignore an anomaly alarm, which will repeatedly appear in the next flow period, and the other is to treat the alarm as a false alarm, and its flow characteristics will be referred to by a baseline automatic update function, resulting in inaccuracy of the weekly model.
This embodiment further proposes to create an extended period baseline data structure and a management module for the structure, and fig. 3 is a flow chart of extended traffic anomaly detection according to an embodiment of the present invention, as shown in fig. 3: if the flow is judged to be abnormal based on the detection of the week base line, the flow continues to quote the extended period base line for secondary detection, the detection function is designed to firstly use the network parameters (IP, ports, application and the like) of the flow to match with the base line, if no base line is matched, if some base lines are successfully matched, whether the flow time period meets the definition of the repeated time period in the base line is verified one by one, the timestamp and the repeated mode in the base line can be used for calculating the effective time period of the base line, if no time period is successfully matched, the query result is still matched without the base line, if the base line time period is successfully matched, the flow network parameters are further compared with the base line, the comparison result can be normal flow or abnormal flow, and the detection function is completely finished.
The final detection results may be three as follows: and if the flow is matched with a certain base line in the extended period base line and is judged to be normal, the base line is automatically updated by using the characteristics of the flow and the flow detection process is completed.
And if the flow is matched with a certain baseline in the extended period baseline and is judged to be abnormal, setting the judgment result to be an abnormal state and waiting for manual analysis by an administrator.
And if no baseline is matched in the extended date baseline, setting the judgment result as no baseline abnormality, and waiting for manual analysis by an administrator.
The next step of the flow is that an administrator manually analyzes the abnormity, the step does not need to be synchronously carried out with the real-time flow, the administrator combs abnormal events at a proper time, and artificially judges whether a certain abnormal flow is a real abnormity or a normal flow with a period larger than one week according to the conclusion obtained by the previous flow, and for the latter, the administrator adds an extended period baseline by using a baseline adding function provided by an extended period baseline management module and by using the characteristic data of the flow.
The above completely presents the extended period baseline data structure, the extended period baseline management module function, and the extension of the flow detection process, which are provided by the embodiment, so as to realize the creation and maintenance of any period flow baseline, and the extended detection process of performing secondary detection on abnormal flow by using the extended period baseline.
The application scenario of this embodiment is as follows: after the system is deployed, an administrator can establish an extended period baseline for network behaviors, so that data is ensured not to be subjected to outward transmission in a specified time period to generate false alarm, and suspicious data transmission in other time periods can generate alarm.
The network flow analysis system of the embodiment has the capability of recording, tracking and detecting network flow behaviors of any repetition period, is not limited by time of one week, is consistent with high-frequency network behaviors, can effectively monitor large-period network behaviors, can not ignore defects realized by the design of an intranet system by an administrator, effectively reduces false reports and false reports generated aiming at the large-period network behaviors, and improves the working efficiency of the administrator and the attention degree to real threats.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the invention provides a flow data processing device, which can be used for executing the flow data processing method of the embodiment of the invention.
Fig. 4 is a schematic diagram of a traffic data processing apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:
thedetection unit 10 is configured to detect the flow of the target object again through the extended period baseline when the flow of the target object is detected to be abnormal based on the weekly baseline, and obtain a re-detection result, where a flow period corresponding to the extended period baseline is longer than one week;
afirst judgment unit 20 for judging whether there is an abnormality in the retest result;
and thefirst output unit 30 is configured to output a first message when the redetection result is abnormal, where the first message is used to prompt that the traffic of the target object is abnormal, and if matching between any one of the extended period baselines and the traffic of the target object is successful, the redetection result is not abnormal.
In the embodiment, when detecting that the flow of the target object is abnormal based on the periodic baseline, thedetection unit 10 performs secondary detection on the flow of the target object through the extended periodic baseline to obtain a secondary detection result, wherein the flow period corresponding to the extended periodic baseline is more than one week; thefirst judgment unit 20 judges whether or not there is an abnormality in the re-detection result; thefirst output unit 30 outputs a first message when the retest result is abnormal, wherein the first message is used for prompting that the flow of the target object is abnormal, and if the matching between any one of the extended period baselines and the flow of the target object is successful, the retest result is not abnormal, so that the problem of judgment error caused by unreasonable flow period determination is solved, and the effect of improving the accuracy of abnormal flow judgment is achieved.
Optionally, the apparatus further comprises: and the updating unit is used for updating the extended period baseline based on the flow of the target object when the redetection result is not abnormal.
Optionally, the apparatus further comprises: the second judgment unit is used for judging whether the extended period baseline exists or not before the flow of the target object is detected again through the extended period baseline to obtain a detection result again; and a second output unit, configured to output a second message if the extension period baseline does not exist, where the second message is used to prompt that the extension period baseline does not exist.
The flow data processing device comprises a processor and a memory, wherein the detection unit, the first judgment unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the accuracy of judging the abnormal flow is improved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the traffic data processing method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the flow data processing method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises at least one processor, at least one memory and a bus, wherein the memory and the bus are connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory so as to execute the flow data processing method. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: when the flow of the target object is detected to be abnormal based on the week baseline, detecting the flow of the target object again through the extended period baseline to obtain a detection result again, wherein the flow period corresponding to the extended period baseline is more than one week; judging whether the re-detection result is abnormal or not; and if the retest result is abnormal, outputting a first message, wherein the first message is used for prompting that the flow of the target object is abnormal, and if the matching between any one of the extended period baselines and the flow of the target object is successful, the retest result is not abnormal.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.