Disclosure of Invention
The invention aims to provide an audit log rule matching method and system based on Flink, which can dynamically load user-defined rules, have high data processing parallelism and simple interface packaging, and are convenient for secondary development and function iteration.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present application provides an audit log rule matching method based on Flink, which includes:
a user adds a custom policy rule on a service page and sends the custom policy rule to a database for rule caching;
continuously pulling the rule data from the database through a Flink processing engine, and loading the rule data into a preset DAG model factory;
a DAG model factory generates a DAG model corresponding to the rule according to the rule data, and carries out topological sequencing on the DAG model to generate a topological result;
the DAG model factory sends the topology results corresponding to all the rules to a DAG model scheduler, delivers the Graph object containing the topology results to a flight processing engine, and broadcasts the Graph object to the subtasks of each computing node through the flight processing engine;
the execution window sequentially matches the audit log data according to the rules bound in the Graph object to obtain rule matching results, the rule matching results are sent to a DAG model scheduler, the DAG model scheduler updates a history matching result set according to the matching results, and the rules bound in the Graph object are switched;
and after the execution window is processed, all rule matching results in the window are sent to the downstream component.
Based on the first aspect, in some embodiments of the present invention, the step of adding the customized policy rule to the service page by the user and sending the customized policy rule to the database for rule caching includes:
a user adds a custom policy rule on a Web service page and issues the custom policy rule to a Redis database;
the Redis database receives the rule data, converts the rule data into a hash data structure, and then caches the hash data.
Based on the first aspect, in some embodiments of the present invention, the step of continuously pulling the rule data from the database through the Flink processing engine and loading the rule data into the preset DAG model factory includes:
the Flink processing engine starts a broadcast thread to periodically pull the rule data from the database and stores the rule data;
and carrying out data cleaning processing on the stored rule data and loading the rule data into a DAG model factory.
Based on the first aspect, in some embodiments of the present invention, the step of generating, by the DAG model factory, a DAG model corresponding to the rule according to the rule data, and performing topology ordering on the DAG model to generate a topology result includes:
decomposing the rule data into a plurality of subtask nodes by a DAG model factory, and constructing a DAG model corresponding to the rule according to the sequence among the subtask nodes;
and carrying out topology sequencing on the DAG model by using a depth-first traversal algorithm to obtain a topology sequence result corresponding to the rule.
Based on the first aspect, in some embodiments of the present invention, the step of performing topology sequencing on the DAG model by using a depth-first traversal algorithm to obtain a topology sequence result corresponding to the rule includes:
selecting a vertex with the degree of income of 0 from the DAG model and outputting the vertex;
starting from the adjacent points where the vertex is not accessed in sequence, performing depth-first traversal on the DAG model until all the vertices are traversed, and recording the access sequence;
and storing the access sequence into a corresponding queue to obtain a topological sequence result corresponding to the rule.
Based on the first aspect, in some embodiments of the present invention, the step of the execution window sequentially matching the audit log data according to the rule bound in the Graph object to obtain a rule matching result, and sending the rule matching result to the DAG model scheduler, where the DAG model scheduler updates the history matching result set according to the matching result, and switches the rule bound in the Graph object includes:
obtaining audit log data;
inquiring whether a matching rule corresponding to the data exists in a historical data rule matching result in an execution window according to the audit log data, and if so, directly sending the rule matching result to a downstream component; if not, matching the audit log data according to the rule bound in the Graph object to obtain a rule matching result and sending the rule matching result to a DAG model scheduler;
and the DAG model scheduler updates the rule matching result in the history matching result set according to the rule matching result and dynamically switches the rules bound in the Graph object.
Based on the first aspect, in some embodiments of the present invention, the step of obtaining audit log data includes:
receiving audit log data sent by each distributed log data acquisition agent;
putting the received audit log data into a distributed message queue system;
and obtaining audit log data from the distributed message queue system.
In a second aspect, an embodiment of the present application provides an audit log rule matching system based on Flink, which includes:
the rule temporary storage module is used for adding a user-defined policy rule on the service page by a user and transmitting the user-defined policy rule to the database for rule caching;
the rule data pulling module is used for pulling the rule data from the database continuously through the Flink processing engine and loading the rule data into a preset DAG model factory;
the DAG model generation module is used for generating a DAG model corresponding to the rule by the DAG model factory according to the rule data and carrying out topology sequencing on the DAG model to generate a topology result;
the broadcasting module is used for sending the topology results corresponding to all the rules to the DAG model scheduler by the DAG model factory, delivering the Graph object containing the topology results to the flight processing engine, and broadcasting the Graph object to the subtasks of each computing node through the flight processing engine;
the rule matching module is used for matching the audit log data by the execution window according to the rules bound in the Graph object in sequence to obtain rule matching results and sending the rule matching results to the DAG model scheduler, and the DAG model scheduler updates the history matching result set according to the matching results and switches the rules bound in the Graph object;
and the result sending module is used for sending all rule matching results in the window to the downstream component after the window execution processing is finished.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The one or more programs, when executed by the processor, implement the method as described in any of the first aspects above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any one of the above first aspects.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:
the embodiment of the application provides an audit log rule matching method and system based on Flink, wherein a user adds a user-defined strategy rule on a service page and then sends the service page to a database for rule caching, then, a Flink processing engine continuously pulls rule data from the database and loads the rule data into a preset DAG model factory, the DAG model factory generates a DAG model corresponding to the rule according to the rule data, and the DAG model is subjected to topological sorting to generate a topological result. Then, the DAG model factory sends the topology results corresponding to all the rules to the DAG model scheduler, delivers the Graph object containing the topology results to the flight processing engine, and broadcasts the Graph object to the subtasks of each computing node through the flight processing engine. And finally, the execution window sequentially matches the audit log data according to the rules bound in the Graph object to obtain rule matching results and sends the rule matching results to the DAG model scheduler, the DAG model scheduler updates the history matching result set according to the matching results and switches the rules bound in the Graph object, and all the rule matching results in the window are sent to a downstream component until the execution window is processed. On the whole, the method and the device are based on an Apache Flink distributed stream data processing engine, dynamic loading of user-defined rules is achieved through a DAG model, the data processing parallelism is further improved, the requirements of second-level or even millisecond-level instantaneity are met, the overall cluster data processing throughput can be improved, the interface packaging is simple, direct calling of Scala and Java languages is supported, the method and the device are friendly to developers, and secondary development and function iteration are more convenient.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Examples
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments and features of the embodiments described below can be combined with one another without conflict.
Referring to fig. 1 to 3, an embodiment of the present application provides an audit log rule matching method based on Flink, where the method includes the following steps:
step S1: a user adds a user-defined strategy rule on a service page and sends the user-defined strategy rule to a database for rule caching;
in the above steps, the user can add the custom policy rule on the Web service page and send the custom policy rule to the Redis database. The customized rule may include, among other things, creation time, user ID, rule content, activation status, and the like. The user can create the matching rule of the audit log data through self definition, so that the rule customization is more flexible, the audit strategy rule can be timely adjusted and changed, the safety audit is realized, and the purposes of quick processing execution and quick response are achieved. After the Redis database receives the rule data, the rule data are converted into a Hash data structure to be cached, so that a subsequent Apache Flink processing engine can rapidly pull the latest rule data in batches to perform rule matching on the audit log data. The Redis database can store data in a distributed mode, the requirements of fault recovery, load balancing and the like are met, the response speed is very high, about 110000 write operations or 81000 read operations can be executed every second, and therefore, the data in the Redis can always keep the latest state when a user updates rules. Redis supports 6 data types when stored: the character string, the hash structure, the list, the set, the orderable set and the cardinal number can meet the requirement of storing various data structure bodies on one hand; on the other hand, the data type is less, so that the rule is less, the needed judgment and logic are less, and the reading/writing speed is higher. In this embodiment, a hash structure may be used to store the rule data, which facilitates subsequent development.
Step S2: continuously pulling the rule data from the database through a Flink processing engine, and loading the rule data into a preset DAG model factory;
in the above steps, the Flink processing engine may start a thread to periodically pull the rule data from the redis database in batches, store the rule data in the sequence of the timestamps, and then perform data cleaning processing on the stored rule data, so as to update the rule data cached locally, so that the rule information acquired by the Flink cluster and the data of the rule information submitted by the user in a customized manner can be kept consistent, and real-time processing of bounded data streams and unbounded data streams is realized. And after the Flink processing engine draws the rule data, loading the rule data into a preset DAG model factory to realize the dynamic loading of the self-defined rule. The DAG model can be designed differently according to the actual needs of the realized business.
And step S3: a DAG model factory generates a DAG model corresponding to the rule according to the rule data, and carries out topological ordering on the DAG model to generate a topological result;
in the above steps, firstly, the DAG model factory decomposes the rule data into a plurality of subtask nodes (i.e. a plurality of subtasks), and constructs a DAG model corresponding to the rule according to the sequence between the subtask nodes. DAG (Directed Acyclic Graph) is a Directed Graph data structure, and can be used for processing problems such as dynamic planning, shortest path, data compression, and the like. And then, carrying out topology sequencing on the DAG model by using a depth-first traversal algorithm to obtain a topology sequence result corresponding to the rule. The topology sorting is to sort the nodes in the DAG, i.e., directed acyclic graph, to obtain a one-dimensional list, and the input node of each node must be located in front of the node in the sorting, so as to ensure the sequentiality, i.e., the rule integrity, among the nodes.
The specific process of utilizing the depth-first traversal algorithm to perform topology sequencing on the DAG model to obtain the topology sequence result corresponding to the rule is as follows: firstly, selecting a vertex with the degree of income of 0 from a DAG model and outputting the vertex, then starting from the adjacent points where the vertex is not accessed, performing depth-first traversal on the DAG model until all the vertices are traversed, recording the access sequence, and storing the access sequence into a corresponding queue, thereby obtaining a topological sequence result corresponding to the rule.
And step S4: the DAG model factory sends the topology results corresponding to all the rules to a DAG model scheduler, delivers the Graph object containing the topology results to a flight processing engine, and broadcasts the Graph object to the subtasks of each computing node through the flight processing engine;
in the above steps, the Graph object not only contains the topology sorting result corresponding to the rule model, but also includes a count accumulator, a version number, and the like, so that the Graph object can be delivered to the Flink processing engine to perform rule matching on the audit log data.
Specifically, on one hand, the DAG model factory sends the obtained topology results corresponding to all the rules to the DAG model scheduler, and the DAG model scheduler dynamically switches the rule contents bound in the Graph object stored in the subtasks of each computing node (namely, switches the sub-rules in the Graph object), so that audit log data is matched with all the rules one by one, and the rule matching accuracy is improved. On the other hand, the DAG model factory delivers the Graph objects corresponding to all the generated rules (i.e. the topology results corresponding to the rules) to the Flink processing engine, and broadcasts the Graph objects to the subtasks of each computing node through the Flink processing engine. The method comprises the steps that when broadcasting is carried out, a plurality of threads are created by a flight processing engine at the same time, a plurality of subtasks are processed in parallel, namely, one thread corresponds to one subtask, one Graph object is stored in one subtask, then, rule matching is carried out on audit log data in an execution window based on the Graph object corresponding to the window, parallel processing of the data is achieved, the requirements of second-level or even millisecond-level real-time performance are met, the overall data processing throughput of a cluster can be improved by more than 80%, and the audit log data with super-large level can be processed to meet the data requirements of upper-layer services.
Step S5: and the execution window sequentially matches the audit log data according to the rules bound in the Graph object to obtain rule matching results and sends the rule matching results to the DAG model scheduler, the DAG model scheduler updates the history matching result set according to the matching results, and the rules bound in the Graph object are switched.
Referring to fig. 2, the process of performing rule matching to obtain a rule matching result is as follows:
step S5-1: obtaining audit log data;
in the above steps, the execution window first obtains audit log data from other systems or upper layer structures. Specifically, the audit log data sent by each distributed log data acquisition agent is received, the received audit log data is put into a distributed message queue system, such as Kafka, and then the audit log data is obtained from a message queue.
Step S5-2: inquiring whether a matching rule corresponding to the data exists in a historical data rule matching result in an execution window according to the audit log data, and if so, directly sending the rule matching result to a downstream component; if not, matching the audit log data according to the rule bound in the Graph object, and sending the rule matching result to a DAG model scheduler;
in the above steps, the execution window inquires whether there is a matching rule corresponding to the data from the historical data rule matching result in the execution window according to the audit log data, if so, the rule matching result is directly sent to the downstream component, so as to avoid the same audit log being matched for many times; and if not, triggering a matching action, matching the audit log data by the execution window according to the rule bound in the Graph object, obtaining a rule matching result, and sending the rule matching result to the DAG model scheduler for maintaining the matching state in the window.
Step S5-3: and the DAG model scheduler updates the rule matching result in the history matching result set according to the rule matching result and dynamically switches the rules bound in the Graph object.
In the steps, on one hand, the DAG model scheduler updates the historical data rule matching result according to the rule matching result, and on the other hand, the DAG model scheduler dynamically switches the rules bound in the Graph object, so that all the rules are matched by auditing log data step by step, and the rule matching accuracy is improved. And after the rule matching is finished, the execution window sends the rule matching result of the audit log data to a downstream component so as to carry out subsequent other operations. On the other hand, all matching rule information of the audit log data set can be extracted from the DAG model scheduler, and query is facilitated.
For example, the size and the number of the execution windows may be adjusted according to the real-time performance and throughput requirement of the service, so as to increase the speed of rule matching.
Step S6: and after the execution window is processed, sending all rule matching results in the window to a downstream component.
Based on the same inventive concept, the invention further provides an audit log rule matching system based on the Flink, please refer to fig. 4, and fig. 4 is a structural block diagram of the audit log rule matching system based on the Flink provided by the embodiment of the present application. The system comprises:
the ruletemporary storage module 11 is used for adding a user-defined policy rule on a service page by a user and sending the user-defined policy rule to a database for rule caching;
the ruledata pulling module 12 is configured to continuously pull the rule data from the database through the Flink processing engine, and load the rule data into a preset DAG model factory;
a DAGmodel generation module 13, a DAG model factory generates a DAG model corresponding to the rule according to the rule data, and carries out topology sequencing on the DAG model to generate a topology result;
thebroadcasting module 14, DAG model factory sends the topology result corresponding to all the rules to the DAG model scheduler, delivers the Graph object containing the topology result to the flight processing engine, and broadcasts the Graph object to the subtask of each computing node through the flight processing engine;
therule matching module 15 is used for matching the audit log data by the execution window according to the rules bound in the Graph object in sequence to obtain rule matching results and sending the rule matching results to the DAG model scheduler, updating the history matching result set by the DAG model scheduler according to the matching results, and switching the rules bound in the Graph object;
and theresult sending module 16 sends all rule matching results in the window to the downstream component after the window execution processing is finished.
Referring to fig. 5, fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device comprises amemory 1, aprocessor 2 and acommunication interface 3, wherein thememory 1, theprocessor 2 and thecommunication interface 3 are electrically connected with each other directly or indirectly to realize the transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. Thememory 1 can be used for storing software programs and modules, such as program instructions/modules corresponding to a Flink-based audit log rule matching system provided by the embodiment of the present application, and theprocessor 2 executes various functional applications and data processing by executing the software programs and modules stored in thememory 1. Thecommunication interface 3 may be used for communication of signaling or data with other node devices.
TheMemory 1 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
Theprocessor 2 may be an integrated circuit chip having signal processing capabilities. TheProcessor 2 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 5 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 5 or have a different configuration than shown in fig. 1. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.