Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the prior art, the detection of abnormal sessions needs to use a system configured by extremely high equipment to perform characteristic analysis and matching on flow, but the system has low detection accuracy on the abnormal sessions, needs to consume a large amount of manpower to merge, compress and mine all the sessions, and has large investment and poor effect.
In order to solve the above problem, the present invention provides an abnormal session detection method, and fig. 1 is a flowchart of the abnormal session detection method provided in the embodiment of the present invention, as shown in fig. 1, including:
s101, extracting a session in a data stream and equipment interconnection information corresponding to the session;
the device interconnection information includes at least: source address (IMSI), source port, destination address, destination port, protocol type, connection establishment time, periodicity, interconnection times, etc.
S102, judging whether the corresponding session exists in a preset white list or not according to the equipment interconnection information;
the preset white list is a set of sessions conforming to a legal connection relationship, and the preset white list can be manually set or obtained by analyzing data by a system.
S103, classifying sessions which do not exist in the preset white list step by step according to session contents to obtain a session tree;
and classifying all the sessions step by step according to the source address, the source port, the destination address, the destination port, the transmission protocol, the instruction of the session initiator, the used account number and other information of the sessions to obtain a session tree.
S104, calculating the data type confidence of each leaf node in the session tree according to the equipment interconnection information;
for each leaf node in the session tree, the credibility of each leaf node can be scored according to the dimensional characteristics of the session number, the frequency, the periodicity, the instructions, the single IP address session quantity, the 24-hour regular distribution and the like in the device interconnection information.
S105, determining a session set corresponding to the leaf node with the data type confidence coefficient larger than a preset confidence coefficient threshold value as a latest white list;
and S106, judging the conversation which does not exist in the latest white list as an abnormal conversation.
If the confidence of the data type of a certain leaf node is greater than a preset confidence threshold, the session under the node is a session conforming to a legal connection relationship, the corresponding session set is determined as a latest white list, and the session which does not belong to the latest white list set is determined as an abnormal session.
As an optional implementation manner, fig. 2 is a flowchart of a data type confidence calculation method provided by an embodiment of the present invention, and as shown in fig. 2, the step S104 of calculating a data type confidence of each leaf node in the session tree according to the device interconnection information includes:
s1041, extracting and counting multi-dimensional feature information in the equipment interconnection information;
s1042, determining a session initial confidence corresponding to the feature information of each dimension according to the conformity of the feature information of each dimension and the data type feature;
and S1043, synthesizing a plurality of session initial confidences to obtain the data type confidence.
The traffic types can be roughly divided into two types, one is a production terminal and a local server service data stream, and the other is a management terminal and a local server management data stream.
And for the production terminal and local server service data stream, the production terminal state acquisition and report, the local server computing center operating instructions for the terminal and the like are included. The method is characterized by comprising the following steps:
1) high convergence of source address (IMSI) and destination address
2) Periodic change rule of session number change at 24 hours per day
3) Single source address access presence periodicity
For the management data flow of the management terminal and the local server, the conditions of inquiry and state synchronization, task synchronization, maintenance and the like of the management terminal are mainly used. The method is characterized by comprising the following steps:
1) high convergence of destination address and port
2) The source address on the conversation time sequence is highly random, and different insiders randomly select time operation management
3) The number of sessions has a 24-hour periodic change rule every day, and particularly, the working time period distribution of working days is mainly
Extracting corresponding multi-dimensional feature information of each leaf node according to the features of the actual data types, comparing the feature information of each dimension with the feature information of the actual data types, and obtaining an initial confidence coefficient according to the similarity of the feature information after comparison. After the feature information of all dimensions is obtained, the data type confidence of the session is calculated according to the weight corresponding to each dimension, and the higher the data type confidence is, the higher the possibility of the data belonging to the type is.
As an alternative, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, because the service data flow occupies a main flow body and the label port is fixed and limited, the type of the session can be quickly judged according to the characteristics. The service types such as http, mqtt, DNS recursion, etc. are all traffic data streams. And adding the session which is judged as the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an optional implementation manner, the multidimensional feature information in the device interconnection information includes at least two of the following: convergence condition information of the source address; convergence status information of the destination address; convergence condition information of the destination port; accessing the session frequency characteristic information by using a single source address; the single source address session periodically changes the characteristic information of the law.
As a specific embodiment, each leaf node may be divided into five dimensions for analysis.
Dimension 1: convergence status information of source address. The source address of the service data flow is highly converged; the source address of the management data stream substantially converges within a limited address range.
Dimension 2: convergence status information of the destination address. The destination address of the service data flow is highly converged; the destination addresses of the management data streams are highly random and converge within a limited set.
Dimension 3: convergence status information of the destination port. The destination ports of the service data streams are converged in a limited set; the destination ports that manage the data streams converge into a limited set and the daily set of destination addresses is substantially the same.
Dimension 4: the single source address accesses session frequency characteristic information. The single source address access conversation frequency of the service data flow presents high-frequency characteristics and has reference distribution; the statistical population of the weekly or daily access sessions of a single source address of the management data flow is distributed substantially evenly.
Dimension 5: the single source address session periodically changes the characteristic information of the law. A 24-hour daily periodic variation rule (baseline rule) exists in a single source address session in a service data flow; a single source address session in the management data stream is periodically regular 24 hours a day (baseline regularity), especially nine morning and five evening, with monday to friday being the primary traffic distribution period.
As a specific implementation manner, when determining the data type reputation value of a leaf node, first determining the source address convergence condition of the leaf node, if the source address convergence condition of the leaf node meets the high convergence feature, determining the initial reputation value of the service data stream of dimension 1 of the leaf node to be 100, and determining the initial reputation value of the management data stream to be 0; secondly, determining the convergence condition of the destination address of the leaf node, if the convergence condition of the destination address of the leaf node conforms to the high convergence characteristic, determining the initial credit value of the service data stream of the leaf node dimension 2 as 100, and determining the initial credit value of the management data stream as 0; thirdly, determining the convergence condition of the destination port of the leaf node, if the convergence condition of the destination port of the leaf node meets the characteristics that the destination port is converged in a limited set and the difference of the destination address sets of each day is large, determining the initial credit value of the service data stream of the leaf node dimension 3 as 100, and determining the initial credit value of the management data stream as 0; fourthly, determining the frequency characteristic of the single source address access session of the leaf node, if the frequency of the single source address access session presents a high frequency characteristic and a reference distribution characteristic exists, determining the initial credit value of the service data stream of the leaf node dimension 4 as 100, and determining the initial credit value of the management data stream as 0; fifthly, determining the characteristic of the periodic variation rule of the single source address conversation of the leaf node, if the single source address conversation of the leaf node conforms to the 24-hour periodic variation rule (baseline rule) every day and does not conform to the characteristics of five ninth evening and five monday as a main traffic distribution period, determining the initial credit value of the service data flow of the leaf node dimension 4 as 100, and determining the initial credit value of the management data flow as 0.
The weight values of the service data flow dimension 1-5 are respectively 0.2, 0.3 and 0.1; then, a rule is determined according to the data type of the service data stream, and the confidence of the data type of the leaf node belonging to the service data stream is: 0.2 × 100+0.3 × 100+0.1 × 100= 100;
the weight values of the management data stream dimensions 1-5 are respectively 0.2, 0.2 and 0.2; then a rule is determined according to the data type of the management data stream, and the confidence of the data type of the leaf node belonging to the management data stream is: 0.2 × 0+0.2 × 0= 0;
comparing the confidence degrees of the data types of the two data stream types, if the confidence degree of the data type of the leaf node belonging to the service data stream is larger, comparing theconfidence degree 100 of the data type with a preset confidence degree threshold value 80, if the confidence degree of the data type of the leaf node is larger than the preset confidence degree threshold value, judging that the data type of the leaf node is the service data stream, conforming to a legal connection relation, and further classifying the session under the leaf node into a white list.
As an optional implementation manner, fig. 3 is a flowchart of a method for determining an abnormal session according to a data amount according to an embodiment of the present invention, as shown in fig. 3, after the step S102 of determining whether a corresponding session exists in a preset white list according to the device interconnection information includes:
s108, acquiring the data volume of the session in the preset white list;
s109, comparing the data volume with a preset standard data volume;
and S110, determining the session corresponding to the data volume larger than the preset standard data volume as an abnormal session.
And for the sessions existing in the white list, counting the data volume of each session in a preset period, and judging the session corresponding to the data volume higher than the preset data volume as an abnormal session. Therefore, the abnormal judgment is carried out on the sessions with suddenly increased data volume caused by equipment failure, network failure or software reasons, so that the accuracy of abnormal session detection is improved, and the condition of missing report is reduced.
As an optional implementation manner, after determining, as the latest white list, the S105, a session set corresponding to the leaf node whose data type confidence is greater than the preset confidence threshold, the method includes: s107, supplementing the latest white list into the preset white list.
In this embodiment, the preset white list is dynamically updated, and after the latest white list is obtained, the data in the latest white list is added to the preset white list, and the preset white list after data supplementation is used as a basis for judging whether the next session is legally connected.
As an alternative implementation manner, fig. 4 is a block diagram of a structure of an abnormal session detection apparatus provided in an embodiment of the present invention, and as shown in fig. 4, the present invention further provides an abnormal session detection apparatus, including:
an extractingmodule 100, configured to extract a session in a data stream and device interconnection information corresponding to the session;
the device interconnection information includes at least: source address (IMSI), source port, destination address, destination port, protocol type, connection establishment time, periodicity, interconnection times, etc.
A judgingmodule 200, configured to judge whether a corresponding session exists in a preset white list according to the device interconnection information;
the preset white list is a set of sessions conforming to a legal connection relationship, and the preset white list can be manually set or obtained by analyzing data by a system.
The sessiontree generation module 300 is configured to classify sessions that do not exist in the preset white list step by step according to session contents to obtain a session tree;
and classifying all the sessions step by step according to the source address, the source port, the destination address, the destination port, the transmission protocol, the instruction of the session initiator, the used account number and other information of the sessions to obtain a session tree.
Aconfidence calculation module 400, configured to calculate a confidence of the data type of each leaf node in the session tree according to the device interconnection information;
for each leaf node in the session tree, the credibility of each leaf node can be scored according to the dimensional characteristics of the session number, the frequency, the periodicity, the instructions, the single IP address session quantity, the 24-hour regular distribution and the like in the device interconnection information.
A latest whitelist determining module 500, configured to determine, as a latest white list, a session set corresponding to the leaf node whose data type confidence is greater than a preset confidence threshold;
a first abnormalsession determining module 600, configured to determine a session that does not exist in the latest white list as an abnormal session.
If the confidence of the data type of a certain leaf node is greater than a preset confidence threshold, the session under the node is a session conforming to a legal connection relationship, the corresponding session set is determined as a latest white list, and the session which does not belong to the latest white list set is determined as an abnormal session.
As an alternative implementation manner, fig. 5 is a block diagram of a structure of a confidence level calculation module according to an embodiment of the present invention, and as shown in fig. 5, the confidencelevel calculation module 400 includes:
a multi-dimensional featureinformation extraction submodule 4001, configured to extract and count multi-dimensional feature information in the device interconnection information;
the initial confidencecoefficient determining submodule 4002 is configured to determine, according to the conformity between the feature information of each dimension and the data type feature, a session initial confidence coefficient corresponding to the feature information of each dimension;
and a data typeconfidence determining submodule 4003, configured to synthesize multiple session initial confidences to obtain the data type confidence.
The traffic types can be roughly divided into two types, one is a production terminal and a local server service data stream, and the other is a management terminal and a local server management data stream.
And for the production terminal and local server service data stream, the production terminal state acquisition and report, the local server computing center operating instructions for the terminal and the like are included. The method is characterized by comprising the following steps:
1) high convergence of source address (IMSI) and destination address
2) Periodic change rule of session number change at 24 hours per day
3) Single source address access presence periodicity
For the management data flow of the management terminal and the local server, the conditions of inquiry and state synchronization, task synchronization, maintenance and the like of the management terminal are mainly used. The method is characterized by comprising the following steps:
1) high convergence of destination address and port
2) The source address on the conversation time sequence is highly random, and different insiders randomly select time operation management
3) The number of sessions has a 24-hour periodic change rule every day, and particularly, the working time period distribution of working days is mainly
Extracting corresponding multi-dimensional feature information of each leaf node according to the features of the actual data types, comparing the feature information of each dimension with the feature information of the actual data types, and obtaining an initial confidence coefficient according to the similarity of the feature information after comparison. After the feature information of all dimensions is obtained, the data type confidence of the session is calculated according to the weight corresponding to each dimension, and the higher the data type confidence is, the higher the possibility of the data belonging to the type is.
As an alternative, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, because the service data flow occupies a main flow body and the label port is fixed and limited, the type of the session can be quickly judged according to the characteristics. The service types such as http, mqtt, DNS recursion, etc. are all traffic data streams. And adding the session which is judged as the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an optional implementation manner, fig. 6 is a schematic structural diagram of a data volume determining module, a comparing module, and a second abnormal session determining module provided in the embodiment of the present invention, and as shown in fig. 6, the apparatus further includes:
a dataamount determining module 700, configured to obtain a data amount of a session existing in the preset white list;
acomparison module 800, configured to compare the data amount with a preset standard data amount;
a second abnormalsession determining module 900, configured to determine a session corresponding to a data amount greater than the preset standard data amount as an abnormal session.
As an alternative, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, because the service data flow occupies a main flow body and the label port is fixed and limited, the type of the session can be quickly judged according to the characteristics. The service types such as http, mqtt, DNS recursion, etc. are all traffic data streams. And adding the session which is judged as the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an optional implementation, the apparatus further comprises:
and the supplement module is used for supplementing the latest white list into the preset white list.
In this embodiment, the preset white list is dynamically updated, and after the latest white list is obtained, the data in the latest white list is added to the preset white list, and the preset white list after data supplementation is used as a basis for judging whether the next session is legally connected.
As an alternative embodiment, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor implements the abnormal conversation detection method described above.
The storage medium stores the software, and the storage medium includes but is not limited to: optical disks, floppy disks, hard disks, erasable memory, etc.
The technical scheme has the following beneficial effects: filtering abnormal conversations which do not conform to legal connection relations by setting a white list; after dividing the session into session trees, calculating the characteristics of each leaf node, calculating the confidence of each leaf node, and determining the latest white list according to the confidence. The white list is dynamically generated according to the conversation, so that the labor is saved, and the detection efficiency of abnormal conversation is improved. In addition, the white list is dynamically updated according to different sessions, so that the scheme can accurately detect each session.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.