Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
At present, in the process of network security analysis, uniform association processing needs to be performed on network behavior data of multiple data sources. However, because the network security analysis services or scenarios at the upper layer are many and complicated, the logs are often repeatedly used to perform different-dimension analysis along with the expansion of the analysis requirements.
However, the existing network security analysis method not only has the problem of computing resource waste, but also has the problem that the same analysis scene among a plurality of network security analysis services cannot be reused. For example, different network security analysis services are customized at respective service levels, which do not multiplex common network security analysis methods.
In addition, the prior art also discloses a hierarchical processing technology based on data warehouse design, which can establish a theme-oriented, integrated, time-varying, but relatively stable information data set from multiple heterogeneous data sources, and at the same time, the data set can be used for supporting the management decision process. And, during the construction of the data warehouse, the design can be performed according to the processing procedure of the data warehouse technology (ETL).
However, the security log data has characteristics such that it cannot support the network security analysis service well according to a general data processing procedure.
Based on this, the embodiment of the application skillfully provides a network security analysis scheme, and a process of preprocessed network behavior data can be established by establishing a set of unified association processing technology for network behavior data of multiple data sources and refining common points of processing of different network security analysis services, so that the preprocessed network behavior data can be used by different network security analysis services, thereby not only reducing repeated processing processes, but also improving the efficiency of network security analysis.
Referring to fig. 1, fig. 1 is a block diagram illustrating a system for network security analysis according to an embodiment of the present disclosure. The system shown in fig. 1 includes a network behavior data acquisition unit, a filtering unit, a packet aggregation unit, a labeling processing unit, an association unit and a service application unit. The filtering unit is respectively connected with the network behavior data acquisition unit, the grouping aggregation unit and the labeling processing unit, and the association unit is respectively connected with the grouping aggregation unit, the labeling processing unit and the service application unit.
The network behavior data acquisition unit can be used for acquiring original network behavior data; the filtering unit can be used for filtering the original network behavior data to obtain filtered data; the grouping aggregation unit can be used for grouping and aggregating the filtered data to obtain intermediate data; the labeling processing unit is used for labeling a data source corresponding to abnormal filtering data if the filtering data is determined to be abnormal in the process of grouping and aggregating the filtering data; the association unit can be used for performing association processing on the intermediate data to obtain preprocessed network behavior data; the service application unit can directly use the preprocessed network behavior data to determine the network security analysis result.
Therefore, because the security log data (or the network behavior data) has characteristics, the security service cannot be well supported according to the existing general data processing process, the embodiment of the application fully embodies the associated processing of the data on the basis of fully considering the network security analysis, divides a plurality of units of different types, considers the realization of the service in each unit, and provides better support for developing the upper-layer service.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for obtaining preprocessed network behavior data according to an embodiment of the present disclosure. It should be understood that the method shown in fig. 2 may be performed by the network security analysis apparatus shown in fig. 5, where the network security analysis apparatus may be various devices capable of performing the method, such as a personal computer, a server, or a network device, for example, and the embodiment of the present application is not limited thereto, and specifically includes the following steps:
step S210, acquiring original network behavior data of multiple data sources.
It should be understood that the data included in the raw network behavior data may be set according to actual requirements, and the embodiments of the present application are not limited thereto.
For example, raw network behavior data may include security event data, streaming data, domain name data, and uniform resource location system (URL) log data. Meanwhile, external threat intelligence, domain name query protocol whois domain name registration data and the like can also be introduced into the original network behavior data.
Step S220, filtering the original network behavior data of the multiple data sources to obtain filtered data.
Specifically, for an upper-layer business application, a part of behavior log data irrelevant to the analysis direction can be filtered out. For example, for domain name data, common domain name resolution requests are filtered out, and therefore large noise influence on subsequent service analysis is avoided.
And step S230, grouping and aggregating the filtered data and marking labels to obtain intermediate data.
It should be understood that packet aggregation may also be referred to as packet aggregation statistics and tagging may also be referred to as tag mapping.
It should also be understood that the specific process of packet aggregation may also be set according to actual requirements, and the embodiments of the present application are not limited thereto.
For example, the packet aggregation may be a packet aggregation of Group By. Wherein, Group represents grouping, and By represents field name.
It should also be appreciated that for packet aggregation, the filtered data may be packet aggregated in accordance with a preset traffic analysis dimension. The preset service analysis dimension may be set according to actual requirements, and the embodiment of the present application is not limited to this.
For example, filtering data may be grouped and aggregated in five-tuple of flow data (source IP, destination IP, source port, destination port, protocol), and the aggregation also needs to count data needed by traffic.
It should be noted here that the filtering data may be divided according to a preset time window to obtain divided data, and then the divided data may be grouped and aggregated to obtain intermediate data.
It should also be understood that the time length corresponding to the preset time window may be set according to actual requirements, and the embodiment of the present application is not limited thereto.
For example, the preset time window may be for a length of time of 5 minutes.
It should also be understood that during the aggregation process, the behavior data within each time window (or the partition data corresponding to each time window) may be pre-processed with a tag. The preprocessing process of the tag may be set according to actual requirements, and the embodiment of the application is not limited thereto.
For example, if the traffic in the current time window is greater than the traffic threshold, it may be considered as an abnormal traffic behavior, and therefore, a tag field may be added to mark the communication behavior of the source IP and the destination IP as abnormal.
And step S240, performing correlation processing on the intermediate data to obtain the preprocessed network behavior data.
It should be understood that the specific method of association processing may be set according to actual requirements, and the embodiments of the present application are not limited thereto.
For example, the association process may be a scenario association method, a sequence association method, or an entity association method.
In order to facilitate understanding of step S240, the following description is made by specific embodiments.
Specifically, the processing before step S240 is limited to a single data source, and in step S240, it may be necessary to perform association processing on different data sources, which may combine with a security analysis scenario and may perform association processing on all network behavior data of the same analysis object (e.g., the same IP) by using an association processing method.
It should be noted here that, although fig. 2 shows the acquisition process of the preprocessed network behavior data, it should be understood by those skilled in the art that, in the process of practical application, it may directly use the preprocessed network behavior data shown in fig. 2 to perform network security analysis.
With continued reference to fig. 3, fig. 3 shows a flowchart of a network security analysis method provided in an embodiment of the present application. It should be understood that the method shown in fig. 3 may be performed by the network security analysis apparatus shown in fig. 5, where the network security analysis apparatus may be various devices capable of performing the method, such as a personal computer, a server, or a network device, for example, and the embodiment of the present application is not limited thereto, and specifically includes the following steps:
step S310, a network security analysis request is obtained.
It should be understood that the specific request of the network security analysis request may be set according to actual requirements, and the embodiment of the present application is not limited thereto.
For example, the network security analysis request may be used to request an abnormal data source within a preset time period, may be used to request triple data, may be used to request quintuple data, and the like. The specific time period of the preset time period may be set according to actual requirements, and the embodiment of the application is not limited to this.
Step S320, searching target network behavior data corresponding to the network security analysis request from the preprocessed network behavior data according to the network security analysis request; wherein the preprocessing comprises at least one of filtering, packet aggregation, labeling and correlation processing.
It should be understood that, according to the network security analysis request, a specific process of searching the target network behavior data corresponding to the network security analysis request from the preprocessed network behavior data may be set according to an actual requirement, and the embodiment of the present application is not limited thereto.
For example, when all the network behavior data for which the network security analysis request belongs to the preprocessed network behavior data, the target network behavior data corresponding to the network security analysis request can be directly searched from the preprocessed network behavior data; when the network behavior data part for which the network security analysis request belongs to the preprocessed network behavior data, the target network behavior data corresponding to the partial network behavior data may be searched from the preprocessed network behavior data, and the security behavior data that is not matched with the network security analysis request needs to execute the process shown in fig. 2.
It should be noted that, although fig. 2 is described by taking preprocessing including filtering, packet aggregation, tagging and association processing as an example, it should be understood by those skilled in the art that the preprocessing includes at least one of filtering, packet aggregation, tagging and association processing, so that the efficiency of network security analysis can be improved by performing correlation processing on the original network behavior data in advance.
And step S330, determining a network security analysis result according to the target network behavior data.
It should be understood that, according to the target network behavior data, the specific process of determining the network security analysis result may be set according to actual requirements, and the embodiment of the present application is not limited thereto.
For example, since the target network behavior data goes through the process shown in fig. 2, the network security analysis result can be determined directly according to the target network behavior data.
For another example, in the case that the target network behavior data is quintuple data and the network security analysis result is that the ternary group data needs to be queried, the quintuple data may be summed to obtain ternary group data, so as to obtain the network security analysis result.
Therefore, the embodiment of the application can establish a set of unified association processing technology aiming at the network behavior data of multiple data sources and can refine common points of processing of different network security analysis services, so that the process of the preprocessed network behavior data can be established, different network security analysis services can use the preprocessed network behavior data, the repeated processing process can be reduced, and the efficiency of network security analysis can be improved.
In addition, the embodiment of the application can solve the problem that the upper layer can realize the multiplexing of data processing services in the same analysis process in a scene that a plurality of services are analyzed simultaneously.
In addition, the embodiment of the application can realize the statistical result of the same field aiming at the same data source without multiple isolated calculations, thereby saving calculation or storage resources.
In order to facilitate understanding of the embodiments of the present application, the following description will be given by way of specific examples.
Referring to fig. 4, fig. 4 is a specific process diagram illustrating a network security analysis method according to an embodiment of the present application. The network security analysis method shown in fig. 4 includes:
first, the original stream data and the original stream data are filtered.
Specifically, for the original stream data and the original stream data, the filtering process may be performed by setting respective white lists. The white list can be set according to the service requirement. For example, when analyzing domain name abnormal request behavior, a white list may be set to the common active domain name TOP10000, and requests for these common domain names are filtered out.
Subsequently, packet aggregation and labeling processes are performed on the filtered data.
Specifically, a time window may be preset, the time length of the time window is generally 5 minutes, the size of the window may be determined according to the data amount of the access processing and the performance of the processing, and should not be too large or too small, which would result in the data accumulation processing, and too small would not achieve the purpose of the association processing. And receiving data flow of the data bus, and performing real-time processing according to the time window, wherein the real-time processing can comprise packet aggregation statistics and labeling processing.
In addition, for the packet aggregation statistics, different grouping strategies may be adopted for the stream data and the domain name data respectively, specifically:
for domain name data, the grouping policy may be de-duplication aggregation of the same source IP, the requested domain name, and the requested type within a set time window, resulting in the data set of table 1 below.
TABLE 1
Where the field sip, the field domain and the field value _ list belong to the packet field, and the field count is a statistical block, and the field start _ time and the field end _ time identify the window size of the aggregated analysis.
In addition, key values in the domain name data may be persisted and aggregated.
For the streaming data, packet aggregation may be performed according to a five-tuple (i.e., source IP, source port, destination IP, destination port, protocol), to obtain the data set of table 2 below.
TABLE 2
| Field(s) | Means of | Remarks for note |
| sip | Source IP | Packet field |
| dip | Destination IP | Packet field |
| sport | Source port | Packet field |
| dport | Destination port | Packet field |
| proto_ | Protocol | Packet field |
| start_time | Starting time | |
| end_time | End time | |
| win_type | Window size type | 5 minutes |
| c_packets | Total number of packets in time window | |
| c_bytes | Total number of bytes in time window | |
Where the field start _ time and the field end _ time identify the window size of the aggregated analysis.
In addition, the packet number and the byte number in the flow linkage log can be added to obtain total data in the window, and the added value is stored and aggregated.
In addition, for the labeling processing, preprocessing analysis is performed on statistical data in a time window, and a corresponding required service label is enriched. For example, for packet statistical data of a domain name, if a behavior of frequently requesting txt records is found, it can be determined that a source IP is a suspicious request behavior; for another example, for the domain name data, if the total flow rate, i.e., the total number of bytes, of the requests in the time window is determined to be greater than the preset threshold, the communication behaviors of the source IP and the destination IP are marked to be abnormal.
Subsequently, the association processing may be performed on the two data sources, specifically including: the entity association method is used for associating the same analyzed safety entity object (such as an IP address or a domain name), integrating log data of the entity in two or more data sources to form an entity behavior log set, and then performing further analysis service on the entity behavior log set;
the time sequence correlation method is a correlation method for observing the time sequence of a behavior at two data sources. For example, after a source IP requests domain name data, a log of the source IP and the resolution value corresponding to the domain name is generated in 5 seconds, so that the correlation analysis of the time sequence is realized;
other association methods, other association methods applicable to this step.
Finally, for the service application, the upper layer service application can directly use the associated data, and because the associated data is pre-polymerized and pre-processed by the label in the operation process of the previous steps, part of the service can realize the service analysis capability without penetrating through the original data layer. Meanwhile, in the preprocessing or the above processing process, the public processing requirements of a plurality of different safety services can be extracted for setting, so that the different safety service analyses can realize the sharing and sharing of data processing, and the service application efficiency is improved.
It should be understood that the above network security analysis method is only exemplary, and those skilled in the art can make various changes, modifications or variations according to the above method and also fall within the protection scope of the present application.
Referring to fig. 5, fig. 5 is a block diagram illustrating a networksecurity analysis apparatus 500 according to an embodiment of the present disclosure. It should be understood that the networksecurity analysis apparatus 500 corresponds to the above method embodiments and is capable of executing the steps related to the above method embodiments, and the specific functions of the networksecurity analysis apparatus 500 may be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The networksecurity analysis apparatus 500 includes at least one software function module that can be stored in a memory in the form of software or firmware (firmware) or is fixed in an Operating System (OS) of the networksecurity analysis apparatus 500. Specifically, the networksecurity analysis apparatus 500 includes:
an obtainingmodule 510, configured to obtain a network security analysis request;
the searchingmodule 520 is configured to search, according to the network security analysis request, target network behavior data corresponding to the network security analysis request from the preprocessed network behavior data; wherein the preprocessing comprises at least one of filtering, grouping aggregation, labeling and association processing;
a determiningmodule 530, configured to determine a network security analysis result according to the target network behavior data.
In one possible embodiment, the networksecurity analysis apparatus 500 further includes: a pre-fetch module (not shown) for: filtering original network behavior data of various data sources to obtain filtered data; grouping aggregation and labeling are carried out on the filtered data to obtain intermediate data; and performing correlation processing on the intermediate data to obtain the preprocessed network behavior data.
In a possible embodiment, the pre-acquisition module is specifically configured to: dividing the filtered data according to a preset time window to obtain divided data; and grouping and aggregating the divided data to obtain intermediate data.
In a possible embodiment, the pre-acquisition module is specifically configured to: and in the process of grouping and aggregating the divided data, if the divided data is determined to be abnormal, labeling the data source corresponding to the divided data.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.
Referring to fig. 6, fig. 6 shows a block diagram of an electronic device 600 according to an embodiment of the present disclosure. Electronic device 600 may include aprocessor 610, acommunication interface 620, amemory 630, and at least onecommunication bus 640. Whereincommunication bus 640 is used to enable direct, coupled communication of these components. Thecommunication interface 620 in the embodiment of the present application is used for performing signaling or data communication with other devices. Theprocessor 610 may be an integrated circuit chip having signal processing capabilities. TheProcessor 610 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or theprocessor 610 may be any conventional processor or the like.
TheMemory 630 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. Thememory 630 stores computer readable instructions that, when executed by theprocessor 610, the electronic device 600 may perform the steps of the above-described method embodiments.
The electronic device 600 may further include a memory controller, an input-output unit, an audio unit, and a display unit.
Thememory 630, the memory controller, theprocessor 610, the peripheral interface, the input/output unit, the audio unit, and the display unit are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically coupled to each other via one ormore communication buses 640. Theprocessor 610 is configured to execute executable modules stored in thememory 630. Also, the electronic device 600 is configured to perform the following method: acquiring a network security analysis request; according to the network security analysis request, searching target network behavior data corresponding to the network security analysis request from the preprocessed network behavior data; wherein the preprocessing comprises at least one of filtering, packet aggregation, tagging and correlation processing; and determining a network security analysis result according to the target network behavior data.
The input and output unit is used for providing input data for a user to realize the interaction of the user and the server (or the local terminal). The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
The audio unit provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.
The display unit provides an interactive interface (e.g. a user interface) between the electronic device and a user or for displaying image data to a user reference. In this embodiment, the display unit may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. The support of single-point and multi-point touch operations means that the touch display can sense touch operations simultaneously generated from one or more positions on the touch display, and the sensed touch operations are sent to the processor for calculation and processing.
It will be appreciated that the configuration shown in FIG. 6 is merely illustrative and that the electronic device 600 may include more or fewer components than shown in FIG. 6 or have a different configuration than shown in FIG. 6. The components shown in fig. 6 may be implemented in hardware, software, or a combination thereof.
The present application also provides a storage medium having a computer program stored thereon, which, when executed by a processor, performs the method of the method embodiments.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the method embodiments.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.