Movatterモバイル変換


[0]ホーム

URL:


CN116010600A - Log classification method, device, equipment and medium - Google Patents

Log classification method, device, equipment and medium
Download PDF

Info

Publication number
CN116010600A
CN116010600ACN202310026537.4ACN202310026537ACN116010600ACN 116010600 ACN116010600 ACN 116010600ACN 202310026537 ACN202310026537 ACN 202310026537ACN 116010600 ACN116010600 ACN 116010600A
Authority
CN
China
Prior art keywords
log
processed
information
preset
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310026537.4A
Other languages
Chinese (zh)
Other versions
CN116010600B (en
Inventor
王世峰
张彩霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co LtdfiledCriticalBeijing Topsec Technology Co Ltd
Priority to CN202310026537.4ApriorityCriticalpatent/CN116010600B/en
Publication of CN116010600ApublicationCriticalpatent/CN116010600A/en
Application grantedgrantedCritical
Publication of CN116010600BpublicationCriticalpatent/CN116010600B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the disclosure relates to a log classification method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring logs to be processed; under the condition that the matching of the log to be processed and the preset classification condition fails, determining a target historical event associated with the log to be processed; the target historical event is a historical event which is determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs; and determining the classification result of the log to be processed according to the supplementary information of the target historical event. According to the embodiment of the disclosure, the logs which cannot be classified according to the preset classification conditions can be processed, the application scene of the log classification method is widened, the number of the logs classified manually is reduced, the automation degree of log classification is improved, and the consumption of human resources is reduced.

Description

Log classification method, device, equipment and medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a method, a device, equipment and a medium for classifying logs.
Background
In the running process of the computer program, a log for recording the running condition of the program can be generated, the number of the logs is usually more, a large amount of human resources are consumed by manually processing each log, and the efficiency of processing the log by subsequent developers can be improved by classifying the log, so that the human resources are saved.
In the related art, logs are generally classified according to a single classification criterion. However, the method cannot classify the logs which do not meet the single classification standard, the limitation of classifying the logs is strong, the logs which are not classified are required to be classified manually and subsequently, the degree of automation is low, and more manpower resources are consumed.
Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems described above, the present disclosure provides a log classification method, apparatus, device, and medium.
The embodiment of the disclosure provides a log classification method, which comprises the following steps:
acquiring logs to be processed;
under the condition that the matching of the log to be processed and a preset classification condition fails, determining a target historical event associated with the log to be processed; the target historical event is a historical event determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs;
and determining the classification result of the log to be processed according to the supplementary information of the target historical event.
The embodiment of the disclosure also provides a log classification device, which comprises:
The acquisition module is used for acquiring logs to be processed;
the first determining module is used for determining a target historical event associated with the log to be processed under the condition that the log to be processed fails to be matched with a preset classification condition; the target historical event is a historical event determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs;
and the second determining module is used for determining the classification result of the log to be processed according to the supplementary information of the target historical event.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement a log classification method as provided in an embodiment of the disclosure.
The present disclosure also provides a computer-readable storage medium storing a computer program for executing the log classification method as provided by the embodiments of the present disclosure.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: the log classification scheme provided by the embodiment of the disclosure obtains logs to be processed; under the condition that the matching of the log to be processed and the preset classification condition fails, determining a target historical event associated with the log to be processed; the target historical event is a historical event which is determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs; and determining the classification result of the log to be processed according to the supplementary information of the target historical event. By adopting the technical scheme, under the condition that the logs to be processed cannot be classified by the preset classification conditions, the target historical event associated with the logs to be processed is determined, the logs to be processed are classified based on the supplementary information in the target historical event, and other basis for classifying the logs to be processed are introduced, so that the log classification is not limited to the preset classification conditions, the logs which cannot be classified according to the preset classification conditions can be processed, the application scene of the log classification method is widened, the number of the logs which are manually classified is reduced, the automation degree of log classification is improved, and the consumption of manpower resources is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a log classification method according to an embodiment of the disclosure;
FIG. 2 is a flow chart of another log classification method according to an embodiment of the disclosure;
FIG. 3 is a flow chart of a log classification method according to an embodiment of the disclosure;
FIG. 4 is a flowchart illustrating a log classification method according to an embodiment of the disclosure;
fig. 5 is a schematic structural diagram of a log classification device according to an embodiment of the disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
In order to solve the above-mentioned problems, embodiments of the present disclosure provide a log classification method, which is described below with reference to specific embodiments.
Step 101, obtaining a log to be processed.
The log to be processed may be an alarm log which needs to be classified. The types of the logs to be processed are various, and the embodiment is not limited, and for example, the logs to be processed may be one or more of a system log, an application log and a security log.
In the embodiment of the present disclosure, there are various methods for acquiring the log to be processed, and the embodiment is not limited, for example, the log classification device may actively acquire the log to be processed, or the log classification device may passively receive the log to be processed. In an alternative embodiment, the log collector may collect logs of the application system, the server, the terminal, etc., and input the collected logs into the log classification device through a system log (Syslog), or an application programming interface (Application Programming Interface, API), or a file, etc., and the log classification device uses the obtained logs as the logs to be processed.
Step 102, determining a target historical event associated with the log to be processed under the condition that the log to be processed fails to be matched with a preset classification condition; the target historical event is a historical event which is determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs.
The preset classification condition may be a preset condition for classifying the log to be processed. The preset classification condition is various, and the embodiment is not limited, for example, the preset classification condition may include one or more of a preset source IP address, a preset file hash value, and a preset destination IP address in the information. The history log may be a log that precedes the log to be processed in time sequence, i.e., the history log is generated prior to the log to be processed. The historical events may include: aggregation results of the history log and supplemental information related to the aggregation results. The supplemental information may be information for determining the supplemental description of the aggregation result by performing automatic or manual research on the aggregation result, and the supplemental information is various, and the embodiment is not limited thereto, and for example, the supplemental information may include one or more of computer virus sample information, pcap package information, and vulnerability scanning information. In the embodiment of the present disclosure, the history logs may be aggregated into the history events by the prior art, and this specific process is not described in detail in the embodiment of the present disclosure. The target historical event may be a historical event that is screened from a plurality of candidate historical events.
In the embodiment of the disclosure, the log classifying device classifies the log to be processed according to a preset classifying condition, if the classifying result of the log to be processed cannot be determined according to the preset classifying condition, a plurality of candidate historical events are obtained, and candidate historical events with association relation with the log to be processed in the plurality of candidate historical events are determined as target historical events. It will be appreciated that since the target historical event is determined based on the historical log aggregation and the addition of supplemental information, the target historical event includes supplemental information that is not available in the log.
In some embodiments of the present disclosure, the log to be processed fails to match with a preset classification condition, including at least one of the following: the log to be processed fails to match with a preset source internet protocol address; the log to be processed fails to match with the hash value of the preset file; the matching of the log to be processed and the preset target internet protocol address fails.
Wherein the source internet protocol (Internet Protocol, IP) address may be the IP address of the transmitted data. The preset source IP address may be a source IP address of a preset state exception, which may be a source IP address determined according to an IP reputation, including but not limited to: malicious IP addresses, anonymous proxy IP addresses, IP addresses of computers infected with computer viruses, etc. The preset source IP address may be understood as an IP address that initiates a network attack.
The hash value of the file may be a hash value determined by performing hash calculation on the file, where the hash calculation method is various, and the embodiment is not limited, and the hash Algorithm may be, for example, a Message-Digest Algorithm (MD 5). The hash value of the file can be used as an identification of the file and corresponds to the file one by one. The preset file hash value may be a hash value determined by performing hash calculation on abnormal files such as a predetermined computer virus, a predetermined computer malicious program, and the like, and the preset file hash value is also called a sample hash value.
The destination IP address may be the IP address of the received data. The preset IP address may be a preset destination IP address of abnormal state, which may be a destination IP address determined according to information of the detection index (Indicators of Compromise, IOC). It will be appreciated that, in the process of a network attack, after a node in the network is attacked and controlled, the node sends a local file to the node that initiates the network attack, and the preset destination IP address may be the IP address of the node that initiates the network attack.
In the embodiment of the present disclosure, the log to be processed may be subjected to a normalization process, where the normalization process may be a regularization process or the like, and a source IP address, a file hash value, a destination IP address, and the like included in the log to be processed are determined through the normalization process. If the to-be-processed log does not comprise the source IP address or the source IP address is not queried in the preset source IP address, determining that the to-be-processed log fails to be matched with the preset source IP address. If the to-be-processed log does not comprise the file hash value or the file hash value is not queried in the preset file hash value, determining that the to-be-processed log fails to be matched with the preset file hash value. If the to-be-processed log does not comprise the target IP address or the target IP address is not queried in the preset target IP address, determining that the to-be-processed log fails to be matched with the preset target IP address.
Optionally, when the matching of the log to be processed and the preset source IP address fails, the hash matching of the log to be processed and the preset file fails, and the matching of the log to be processed and the preset destination IP address fails, determining that the matching of the log to be processed and the preset classification condition fails.
In an alternative embodiment, determining a target historical event associated with a log to be processed includes: extracting first field information corresponding to a preset field in a log to be processed; and acquiring a plurality of candidate historical events, extracting second field information corresponding to a preset field in each candidate historical event, and determining the candidate historical event as a target historical event if the information similarity of the first field information and the second field information is greater than a preset information threshold value.
Where a field may be information of a certain dimension, the field may be understood as a key of a key-value pair. The preset field may be a preset field. The preset field is not limited, and may be any one of a file hash value field, a vulnerability field, a poc script field, a source IP field, and a destination IP field. The first field information may be information corresponding to a preset field in the log to be processed, the second field information may be information corresponding to a preset field in the candidate history event, and the first field information and the second field information may be understood as values in a key value pair. The preset information threshold may be a minimum value of the preset information similarity.
In this embodiment, the log to be processed after the normalization processing is matched with a preset field, and first field information corresponding to the preset field in the log to be processed is determined. The log classifying device can read the candidate historical events in the historical event library, and determine second field information corresponding to a preset field in each candidate historical event. Further, the information similarity of the first field information and the second field information is calculated, and the method for calculating the information similarity is not limited in this embodiment, for example, the text similarity of the first field information and the second field information may be calculated, and the text similarity may be used as the information similarity, or if the preset field is a field having a regional property such as an IP address, a domain name, or the like, a first region corresponding to the first field information may be determined, a second region corresponding to the second field information may be determined, and a region distance between the first region and the second region may be calculated, and the region distance may be used as the information similarity.
After the information similarity is determined, comparing the information similarity with a preset information threshold, and if the information similarity is greater than the preset information threshold, describing the first field information and the second field information
The information is similar, the relation 5 between the log to be processed and the candidate historical event in the dimension of the preset field is strong, and the candidate historical event is determined to be the target history associated with the log to be processed
An event.
In the scheme, the target history log associated with the log to be processed is determined, the information of the log to be processed is expanded, and a basis is provided for the subsequent determination and classification result.
And step 103, determining a classification result 0 of the log to be processed according to the supplementary information of the target historical event.
It will be appreciated that since the history event is an event determined by aggregating the history log and adding the supplemental information, the history event includes the supplemental information not included in the log. In an embodiment of the present disclosure, after determining a target historical event associated with a log to be processed, the target historical event is extracted
And dividing 5 the log to be processed with the same supplementary information into the same category by the supplementary information in the target historical event, and further determining the corresponding category result of the log to be processed. It should be noted that, when classifying the log to be processed according to the supplemental information, the classification result of the log to be processed may be determined according to the supplemental information of one dimension, or the classification result of the log to be processed may be determined according to the supplemental information of multiple dimensions, which is not limited in this embodiment.
The log classification method provided by the embodiment of the disclosure comprises the following steps: acquiring logs to be processed; under the condition that 0 log to be processed fails to match with a preset classification condition, determining that the log to be processed is related to
A linked target historical event; the target historical event is a historical event which is determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs; and determining the classification result of the log to be processed according to the supplementary information of the target historical event. By adopting the technical proposal, the utility model has the advantages that,
under the condition that the logs to be processed cannot be classified through preset classification conditions, determining target historical events associated with the logs to be processed 5, and based on the supplementary information in the target historical events, performing classification on the logs to be processed
The logs to be processed are classified, other basis for classifying the logs to be processed is introduced, so that the log classification is not limited to preset classification conditions, and logs which cannot be classified according to the preset classification conditions can be processed, the applicable scene of the log classification method is widened, the number of logs classified manually is reduced, the automation degree of log classification is improved, and the consumption of human resources is reduced.
In some embodiments of the present disclosure, the log classification method further comprises: if the to-be-processed log does not have the associated target history event, determining the target history log associated with the to-be-processed log, combining the target history log with the to-be-processed log to obtain a combined log, taking the combined log as a new to-be-processed log, and returning to match the new to-be-processed log with a preset classification condition until a preset stop condition is met.
In this embodiment, if, for each preset field, the information similarity between the first field information of the log to be processed and the second field information of the candidate historical event is not greater than the preset information threshold, it is determined that there is no target historical event associated with the log to be processed. Further, a plurality of candidate history logs are obtained, log similarity calculation is carried out on the candidate history logs and the logs to be processed, and the candidate history logs with the log similarity larger than a preset log threshold value are determined to be target history logs. The log classification device may calculate the text similarity between the whole log to be processed and the whole candidate history log, and determine the calculation result as the log similarity, or the log classification device may calculate the similarity between the log to be processed and the candidate history log with respect to each information field, to obtain a plurality of similarities, and determine the maximum value of the plurality of similarities as the log similarity.
After determining a target history log associated with the log to be processed in the plurality of candidate history logs, the log to be processed and the target history log can be combined into a whole to obtain a combined log which comprises the log to be processed and the target history log, the combined log is used as a new log to be processed, the log to be processed is returned to be matched with a preset classification condition, and the next step is determined according to a matching result until a preset stop condition is met.
In some embodiments, until a preset stop condition is met, comprising: the new log to be processed is successfully matched with the preset classification condition; or determining a target historical event associated with the new log to be processed; or determining that the new log to be processed fails to match with the candidate history log.
It can be understood that if the new log to be processed is successfully matched with the preset classification condition, the classification result of the new log to be processed can be determined according to the preset classification condition, and the classification result is used as the classification result of the original log to be processed.
If the new log to be processed fails to match with the preset classification condition and a target historical event associated with the new log to be processed is determined in the candidate historical events, a classification result of the new log to be processed can be determined according to the supplementary information in the target historical event, and the classification result is used as a classification result of the original log to be processed.
If the new log to be processed fails to match with the preset classification condition, the target history event associated with the new log to be processed is not matched in the candidate history event, and the target history log associated with the new log to be processed is not matched in the candidate history event, the current failure to classify the new log to be processed is indicated, and the log classification device can generate and send classification failure prompt information so that the user can manually process the original log to be processed.
In the above scheme, under the condition that the log to be processed does not have the associated target historical event, the joint log is determined according to the log to be processed and the corresponding target historical log thereof, and the matching of the preset classification condition is continued based on the joint log, so that another method for determining the classification result of the log to be processed is provided based on the target historical log, the classification success rate of the log to be processed is further improved, the automation degree of log classification is further improved, and the consumption of human resources is further reduced.
In some embodiments of the present disclosure, the log classification method further comprises: under the condition that the log to be processed is successfully matched with the preset classification condition, determining a classification result of the log to be processed based on the successfully matched network information in the log to be processed; wherein the network information includes at least one of a source internet protocol address, a file hash value, and a destination internet protocol address.
In this embodiment, if the log to be processed is successfully matched with the preset source IP address in the preset classification condition, the log to be processed is divided into classifications corresponding to the preset source IP address, where the classifications corresponding to the preset source IP address may be in one-to-one correspondence with the preset source IP address, or the classifications may be area classifications of the preset source IP address. If the log to be processed is successfully matched with the preset target IP address in the preset classification condition, the log to be processed is divided into classifications corresponding to the preset target IP address, wherein the classifications corresponding to the preset target IP address can be in one-to-one correspondence with the preset target IP address or can be the area classification of the preset target IP address.
Optionally, if the log to be processed is successfully matched with the preset source IP address or the preset destination IP address in the preset classification condition, the log to be processed may be divided into classifications determined by the source IP address and the destination IP address of the log to be processed together.
If the log to be processed is successfully matched with the preset file hash value in the preset classification condition, the log to be processed is divided into classifications corresponding to the preset file hash value, wherein the classifications corresponding to the preset file hash value can be in one-to-one correspondence with the preset file hash value.
Optionally, determining the classification result of the log to be processed based on the successfully matched network information in the log to be processed includes: matching the log to be processed with preset information to determine the associated information of the log to be processed; wherein the association information includes phase information and/or initiator information; and determining a classification result of the log to be processed according to the association information and the network information.
The intelligence information is also called threat intelligence, and can be knowledge information based on evidence, wherein the intelligence information comprises one or more of context information, mechanism information, indication information, meaning information and suggestion information capable of being executed. The preset informative information may be preset informative information. The association information may be network security information associated with the log to be processed. The stage information may characterize a stage scenario of a network attack corresponding to the log to be processed, where the stage information includes one or more of investigation stage information, weapon construction stage information, payload delivery stage information, exploit stage information, installation stage information, command control stage information, and goal achievement stage information. The initiator information may be information of an initiator of the network attack, and the initiator information may be information of an initiator of the targeted threat attack (Advanced Persistent Threat, ATP).
In the embodiment of the disclosure, the preset information may be used to match with a source IP address and/or a destination IP address in the log to be processed, and determine stage information corresponding to the log to be processed and/or initiator information of an initiator of a network attack.
Further, the log to be processed may be divided into classifications corresponding to the stage information, and the classifications corresponding to the stage information may correspond to the stage information one by one. Alternatively, the log to be processed may be divided into classifications corresponding to the initiator information, and the classifications corresponding to the initiator information may correspond to the initiator information one by one. Alternatively, the log to be processed may be divided into categories that are commonly determined by the phase information and the initiator information.
In an alternative embodiment, the classification result of the log to be processed may be determined together based on the network information of the log to be processed and the association information, for example, the log to be processed with the same source IP address, destination IP address, and stage information may be classified into the same classification. Alternatively, the logs to be processed having the same source IP address, destination IP address, file hash value, domain name, and initiator information may be divided into the same category.
Optionally, the user may preset list screening information, where the list screening information includes one or more of the list screening information, and the log classification device may divide the logs to be processed that satisfy the list screening information into the same classification.
In the scheme, under the condition that the to-be-processed logs are successfully matched with the preset classification conditions, the classification results of the to-be-processed logs are determined from a plurality of dimensions such as network information, associated information and the like, the accuracy of the classification results is improved, the to-be-processed logs generated before the network attack can be classified through the source IP address which is successfully matched, the to-be-processed logs generated in the network attack can be classified through the file hash value which is successfully matched, the to-be-processed logs generated after the network attack can be classified through the destination IP address which is successfully matched, and the comprehensiveness of the classification of the to-be-processed logs is realized.
Fig. 2 is a flow chart of another log classification method according to an embodiment of the disclosure, as shown in fig. 2, where the log classification method includes:
step 201, determining whether the log to be processed is successfully matched with the preset classification condition. If yes, go to step 202, otherwise go to step 203.
Step 202, determining a classification result of the log to be processed according to a matching result of the log to be processed and a preset classification condition.
Step 203, determining whether the log to be processed has an associated target history event. If yes, go to step 204, otherwise, go to step 205.
And 204, determining a classification result of the log to be processed according to the supplementary information of the target historical event.
Step 205, determining whether the log to be processed has an associated target history log. If yes, go to step 206, otherwise, go to step 207.
And 206, combining the to-be-processed log and the target history log into a combined log, and determining the combined log as a new to-be-processed log. Execution returns to step 201.
Step 207, ignoring the log to be processed.
Fig. 3 is a flow chart of another log classification method according to an embodiment of the disclosure, as shown in fig. 3, in some embodiments of the disclosure, the log classification method further includes:
step 301, determining a plurality of classification results corresponding to the plurality of logs to be processed.
In this embodiment, the log classification device may perform classification processing on each of the plurality of logs to be processed to obtain a classification result corresponding to each of the logs to be processed, so as to divide the plurality of logs to be processed into a plurality of classifications.
For example, if the logs to be processed include the logs a, B, C, D, E, and F, the classification results of the logs a, B, and C are all the classification α, and the classification results of the logs D, E, and F are all the classification β, according to the above-mentioned log classification method.
Step 302, dividing the multiple classification results into a first number of intermediate results according to the preset aggregation information, determining the intermediate results as new classification results, updating the preset aggregation information to obtain new preset aggregation information, and returning to divide the new classification results into a second number of new intermediate results according to the new preset aggregation information until the first number is the same as the second number.
The aggregation information may be information according to which the classification results are further aggregated. The preset aggregation information may be preset aggregation information, which is not limited in this embodiment, for example, the preset aggregation information may be one or more of a source IP address, a destination IP address, a port, a protocol, and a data flow determined by domain name system (Domain Name System, DNS) resolution. The intermediate result may be a result obtained by polymerizing the classification result.
In this embodiment, after determining a plurality of classification results, the classification results with the same preset aggregation information may be further aggregated according to the preset aggregation information, and the classification results with the same preset aggregation information may be divided into the same intermediate results, so as to obtain a first number of intermediate results. Then, the first number of intermediate results are used as new classification results, the preset aggregation information is updated to be new preset aggregation information, the new classification results are returned to be classified according to the new preset aggregation information, classification results with the same new preset aggregation information are divided into the same intermediate results, a second number of new intermediate results are obtained, whether the first number is identical to the second number is judged, if yes, it is indicated that the new classification results cannot be further aggregated based on the new preset aggregation information, and aggregation of the new classification results is stopped; otherwise, the aggregation is described to aggregate the classification results into a smaller number of intermediate results, the preset aggregation information is continuously updated, and the classification of the classification results is continuously performed according to the updated preset aggregation information.
For example, if the classification results of the logs a, B, and C are all the classification α, and the classification results of the logs D, E, and F are all the classification β, the classification α and the classification β may be aggregated into the classification γ according to the preset aggregation information.
In the scheme, classification results are further classified, so that the aggregation degree of logs to be processed is improved.
The log classification method in the embodiment of the present disclosure is further described below by way of a specific example. Fig. 4 is a flowchart of another log classification method according to an embodiment of the disclosure, as shown in fig. 4, where the log classification method includes:
step 401, obtaining a log to be processed.
In this embodiment, the log classification device may collect system logs, application logs, security logs, and the like by means of active collection or passive reception, perform normal processing on logs to be processed according to preset data specifications, and perform real-time correlation analysis and data analysis on logs to be processed after normal processing.
In an alternative implementation manner, the log collector is used for collecting log data of equipment such as a service application system, a server, a terminal and the like, and the log to be processed is input into the log classification device through a system log (Syslog), an API, a file and the like.
Step 402, matching the log to be processed with the IP credit, the hash value of the preset file and the IOC information in the information. If the matching is successful,step 403 is executed; if the match fails,step 404 is performed.
Step 403, extracting network information in the log to be processed, and classifying the log to be processed through the network information and/or the associated information of the log to be processed; wherein the association information includes phase information and/or initiator information.
Matching the source IP address in the log to be processed with the IP reputation, and if the matching is successful, extracting source attribute information in the log to be processed, wherein the source attribute information comprises but is not limited to: autonomous system number (Autonomous System Number, ASN) domain, geographical location of source IP, presence or absence of proxy, network egress type, historical binding domain name, current binding domain name, port, service, etc.
And matching the file hash value in the log to be processed with a preset file hash value, if the matching is successful, determining whether the file corresponding to the log to be processed is an abnormal file according to the preset file corresponding to the preset file hash value, and if the file corresponding to the log to be processed is an abnormal file, further determining the abnormal type, the abnormal code family, whether the file is used in a directional attack, network behavior and the like. Among them, the exception types may include: computer viruses, remote controls, etc., network behavior may include: abnormal access, network attacks, etc.
The target IP address in the log to be processed is matched with the IOC information, and if the matching is successful, the target attribute information in the log to be processed is extracted, wherein the target attribute information comprises but is not limited to: domain name, uniform resource location system (Uniform Resource Locator, URL), IP, mailbox, file hash value.
In this embodiment, according to the extracted network information, the logs to be processed having the same network information are divided into the same category, and the category result of each log to be processed is obtained. Or determining the phase information corresponding to the log to be processed according to the information, and dividing the log to be processed in the same phase information into the same category.
Specific classification methods include, but are not limited to, the following:
according to the method, a classification result of the log to be processed is determined according to the source IP address and/or the destination IP address of the log to be processed.
If the source IP addresses and the destination IP addresses of the logs to be processed are matched with the similar information, the logs to be processed are classified into the same class. The same kind of information may be information of the same area or information of the same initiator.
If the source IP addresses of the logs to be processed are matched with the similar information, the logs to be processed are divided into the same category.
If the destination IP addresses of the logs to be processed are matched with the similar information, the logs to be processed are divided into the same category.
And secondly, determining a classification result of the log to be processed according to the stage information, the source IP address and the destination IP address.
And determining the stage information of the log to be processed according to the information, and dividing the log to be processed with the same source IP address, the destination IP address and the stage information into the same class.
And thirdly, determining a classification result of the log to be processed according to the source IP address, the destination IP address, the file hash value, the domain name and the initiator information.
And determining the initiator information of the logs to be processed according to the information, and dividing the logs to be processed with the same source IP address, the destination IP address, the file hash value, the domain name and the initiator information into the same category.
The fourth method classifies logs to be processed according to list screening information, specifically, the logs to be processed successfully matched with the list screening information may be classified into the same classification, where the list screening information may include: source IP address, destination IP address, source port, destination port, domain name.
Step 404, matching the log to be processed with the candidate historical event, if the log to be processed is successfully matched with the target historical event in the candidate historical event, extracting the supplementary information in the target historical event, and classifying the log to be processed according to the supplementary information; if the matching of the log to be processed and the candidate history event fails, the log to be processed is matched with the candidate history log, if the matching of the log to be processed and the target history log in the candidate history log is successful, the log to be processed and the target history log are integrated into an associated log, and the associated log is used as a new log to be processed to be matched with the IP credit, the preset file hash value and the IOC information in the information; if the matching of the log to be processed and the candidate history log fails, the log to be processed is ignored.
Optionally, for the log to be processed which is not successfully matched with the IP reputation, the preset file hash value and the IOC information in the information, and the log to be processed which is difficult to classify after being successfully matched, the log to be processed can be input into an intelligent algorithm to assist in judgment. The intelligent algorithm can be an algorithm obtained by training based on a neural network model such as a multi-layer feed forward network (BP).
Step 405, aggregating the multiple classification results into a new classification result according to the preset aggregation information.
After a plurality of classification results corresponding to a plurality of logs to be processed are determined, the classification results with the same preset aggregation information are determined to be in the same classification, and then the classification results are aggregated to be new classification results until the number of classification results before aggregation is the same as the number of new classification results after aggregation. A new classification result to be further aggregated may be determined as a population.
And step 406, determining the logs to be processed included in each category according to the classification result of each log to be processed.
In the scheme, classification based on the historical event and the historical log is provided, and is not limited to classification of simple rules, so that the success rate of classifying the log to be processed is improved. And by matching the logs to be processed with the preset source IP address, the preset file hash value and the preset destination IP address, the quality of classification of the logs to be processed is improved, and the probability of classification errors is reduced. Deep classification aggregation of logs to be processed is achieved based on information matching, aggregation of logs to be processed before attack is achieved through a preset source IP address, aggregation of logs to be processed in attack is achieved through a preset file hash value, aggregation of logs to be processed after attack is achieved through a preset destination IP address, and finally complete security event normalization can be achieved. Through the history event and the history log, the to-be-processed log which cannot be classified based on the preset classification condition can be automatically researched and judged.
Fig. 5 is a schematic structural diagram of a log classification device according to an embodiment of the disclosure, where the device may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 5, the apparatus includes:
an obtainingmodule 501, configured to obtain a log to be processed;
a first determiningmodule 502, configured to determine a target historical event associated with the log to be processed if the log to be processed fails to match with a preset classification condition; the target historical event is a historical event determined based on aggregation of a plurality of historical logs, and the target historical event comprises supplementary information of the historical logs;
a second determiningmodule 503, configured to determine a classification result of the log to be processed according to the supplemental information of the target historical event.
Optionally, the failure of matching the log to be processed with the preset classification condition includes at least one of the following: the log to be processed fails to match with a preset source internet protocol address; the log to be processed fails to be matched with a hash value of a preset file; and the matching of the log to be processed and a preset target Internet protocol address fails.
Optionally, the first determiningmodule 502 is configured to:
Extracting first field information corresponding to a preset field in the log to be processed;
and acquiring a plurality of candidate historical events, extracting second field information corresponding to the preset field in the candidate historical event aiming at each candidate historical event, and determining the candidate historical event as the target historical event if the information similarity of the first field information and the second field information is larger than a preset information threshold value.
Optionally, the apparatus further includes:
and the joint module is used for determining the target history log associated with the log to be processed if the log to be processed does not have the associated target history event, combining the target history log with the log to be processed to obtain a joint log, taking the joint log as a new log to be processed, and returning to match the new log to be processed with the preset classification condition until a preset stop condition is met.
Optionally, the step of until a preset stopping condition is met includes: the new log to be processed is successfully matched with a preset classification condition; or determining a target historical event associated with the new log to be processed; or determining that the new log to be processed fails to match with the candidate history log.
Optionally, the apparatus further includes:
the third determining module is used for determining a classification result of the log to be processed based on the network information successfully matched in the log to be processed under the condition that the log to be processed is successfully matched with the preset classification condition; wherein the network information includes at least one of a source internet protocol address, a file hash value, and a target internet protocol address.
Optionally, the third determining module is configured to:
matching the log to be processed with preset information to determine the associated information of the log to be processed; wherein the association information includes phase information and/or initiator information;
and determining a classification result of the log to be processed according to the association information and the network information.
Optionally, the apparatus further comprises:
a fourth determining module, configured to determine a plurality of classification results corresponding to the plurality of logs to be processed;
the processing module is used for dividing the plurality of classification results into a first number of intermediate results according to preset aggregation information, determining the intermediate results as new classification results, updating the preset aggregation information to obtain new preset aggregation information, and returning to divide the new classification results into a second number of new intermediate results according to the new preset aggregation information until the first number is the same as the second number.
The log classification device provided by the embodiment of the disclosure can execute the log classification method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 6, theelectronic device 600 includes one ormore processors 601 andmemory 602.
Theprocessor 601 may be a Central Processing Unit (CPU) or other form of processing unit having log classification capability and/or instruction execution capability, and may control other components in theelectronic device 600 to perform desired functions.
Thememory 602 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by theprocessor 601 to implement the log classification method and/or other desired functions of the embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, theelectronic device 600 may further include:input device 603 andoutput device 604, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
In addition, theinput device 603 may also include, for example, a keyboard, a mouse, and the like.
Theoutput device 604 may output various information to the outside, including the determined distance information, direction information, and the like. The output means 604 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of theelectronic device 600 that are relevant to the present disclosure are shown in fig. 6, with components such as buses, input/output interfaces, etc. omitted for simplicity. In addition, theelectronic device 600 may include any other suitable components depending on the particular application.
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the log classification method provided by the embodiments of the present disclosure.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Further, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the log classification method provided by the embodiments of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

CN202310026537.4A2023-01-092023-01-09Log classification method, device, equipment and mediumActiveCN116010600B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310026537.4ACN116010600B (en)2023-01-092023-01-09Log classification method, device, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310026537.4ACN116010600B (en)2023-01-092023-01-09Log classification method, device, equipment and medium

Publications (2)

Publication NumberPublication Date
CN116010600Atrue CN116010600A (en)2023-04-25
CN116010600B CN116010600B (en)2023-09-26

Family

ID=86026638

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310026537.4AActiveCN116010600B (en)2023-01-092023-01-09Log classification method, device, equipment and medium

Country Status (1)

CountryLink
CN (1)CN116010600B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105488185A (en)*2015-12-012016-04-13上海智臻智能网络科技股份有限公司 Method and device for optimizing knowledge base
CN106383917A (en)*2016-11-112017-02-08苏州天平先进数字科技有限公司Data processing method based on user logs
US20170269985A1 (en)*2016-03-162017-09-21EMC IP Holding Company LLCMethod and apparatus for failure classification
CN107368526A (en)*2017-06-092017-11-21北京因果树网络科技有限公司A kind of data processing method and device
CN109325865A (en)*2018-08-132019-02-12中国平安人寿保险股份有限公司Abnormality eliminating method, device, computer equipment and storage medium
CN109684627A (en)*2018-11-162019-04-26北京奇虎科技有限公司A kind of file classification method and device
CN109840157A (en)*2017-11-282019-06-04中国移动通信集团浙江有限公司Method, apparatus, electronic equipment and the storage medium of fault diagnosis
CN111447232A (en)*2020-03-302020-07-24杭州迪普科技股份有限公司Network flow detection method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105488185A (en)*2015-12-012016-04-13上海智臻智能网络科技股份有限公司 Method and device for optimizing knowledge base
US20170269985A1 (en)*2016-03-162017-09-21EMC IP Holding Company LLCMethod and apparatus for failure classification
CN106383917A (en)*2016-11-112017-02-08苏州天平先进数字科技有限公司Data processing method based on user logs
CN107368526A (en)*2017-06-092017-11-21北京因果树网络科技有限公司A kind of data processing method and device
CN109840157A (en)*2017-11-282019-06-04中国移动通信集团浙江有限公司Method, apparatus, electronic equipment and the storage medium of fault diagnosis
CN109325865A (en)*2018-08-132019-02-12中国平安人寿保险股份有限公司Abnormality eliminating method, device, computer equipment and storage medium
CN109684627A (en)*2018-11-162019-04-26北京奇虎科技有限公司A kind of file classification method and device
CN111447232A (en)*2020-03-302020-07-24杭州迪普科技股份有限公司Network flow detection method and device

Also Published As

Publication numberPublication date
CN116010600B (en)2023-09-26

Similar Documents

PublicationPublication DateTitle
CN107241352B (en)Network security event classification and prediction method and system
CN112866023B (en)Network detection method, model training method, device, equipment and storage medium
US10038706B2 (en)Systems, devices, and methods for separating malware and background events
CN114697068B (en) A method and related device for identifying malicious traffic
CN113810408B (en)Network attack organization detection method, device, equipment and readable storage medium
US9614866B2 (en)System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature
JP2022533552A (en) Hierarchical Behavior Modeling and Detection Systems and Methods for System-Level Security
CN109274632A (en) Method and device for identifying website
CN113343228B (en)Event credibility analysis method and device, electronic equipment and readable storage medium
CN113704772B (en)Safety protection processing method and system based on user behavior big data mining
CN111314326B (en)Method, device, equipment and medium for confirming HTTP vulnerability scanning host
US20220164449A1 (en)Classifer generator
WO2021170249A1 (en)Cyberattack identification in a network environment
CN113704328B (en)User behavior big data mining method and system based on artificial intelligence
EP3848822B1 (en)Data classification device, data classification method, and data classification program
CN114697066A (en)Network threat detection method and device
CN112769803B (en)Network threat detection method and device and electronic equipment
CN115442109A (en)Method, device, equipment and storage medium for determining network attack result
KR20200066428A (en)A unit and method for processing rule based action
CN116010600B (en)Log classification method, device, equipment and medium
CN113572781A (en)Method for collecting network security threat information
CN116991680B (en)Log noise reduction method and electronic equipment
CN117813800A (en)Apparatus and method for intrusion detection and prevention of cyber threat intelligence
US8219667B2 (en)Automated identification of computing system resources based on computing resource DNA
EP3361405A1 (en)Enhancement of intrusion detection systems

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp