Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, a system with extremely high equipment configuration is required to perform feature analysis and matching on traffic for detecting abnormal sessions, but the system has low detection accuracy on abnormal sessions, and a large amount of labor is required to merge, compress and excavate all sessions, so that investment is large and effect is poor.
In order to solve the above problems, the present invention provides an abnormal session detection method, and fig. 1 is a flowchart of an abnormal session detection method provided in an embodiment of the present invention, as shown in fig. 1, including:
s101, extracting session and equipment interconnection information corresponding to the session in a data stream;
the device interconnection information includes at least: information such as source address (IMSI), source port, destination address, destination port, protocol type, connection establishment time, periodicity, number of interconnections, etc.
S102, judging whether a corresponding session exists in a preset white list or not according to the equipment interconnection information;
the preset white list is a set of sessions conforming to legal connection relations, and can be set manually or obtained by analyzing data by a system.
S103, classifying the sessions which do not exist in the preset white list step by step according to the session content to obtain a session tree;
and classifying all the sessions step by step according to the information such as the source address, the source port, the destination address, the destination port, the transmission protocol, the instruction of the session initiator, the used account number and the like of the session to obtain a session tree.
S104, calculating the data type confidence of each leaf node in the session tree according to the equipment interconnection information;
for each leaf node in the session tree, the credibility of each leaf node can be scored according to the dimension characteristics of the session number, frequency, periodicity, instruction, single IP address session quantity, 24-hour regular distribution and the like in the equipment interconnection information.
S105, determining a session set corresponding to the leaf node with the data type confidence coefficient larger than a preset confidence coefficient threshold as a latest white list;
s106, judging the conversation which does not exist in the latest white list as an abnormal conversation.
If the confidence coefficient of the data type of a certain leaf node is larger than a preset confidence coefficient threshold value, the session under the node is a session conforming to the legal connection relation, the corresponding session set is determined to be the latest white list, and the session not belonging to the latest white list set is determined to be an abnormal session.
As an optional implementation manner, fig. 2 is a flowchart of a data type confidence calculating method provided by an embodiment of the present invention, as shown in fig. 2, where, S104, calculating, according to the device interconnection information, a data type confidence of each leaf node in the session tree includes:
s1041, extracting and counting multidimensional feature information in the equipment interconnection information;
s1042, determining the session initial confidence corresponding to the feature information of each dimension according to the coincidence degree of the feature information of each dimension and the data type feature;
s1043, synthesizing a plurality of session initial confidence degrees to obtain the data type confidence degrees.
The traffic types can be roughly divided into two types, one is a production terminal and a local server, and the other is a management terminal and a local server.
And for the production terminal and the local server service data stream, the production terminal state acquisition and reporting, the operation instruction of the local server computing center on the terminal and the like are included. The method is characterized by comprising the following steps:
1) Source address (IMSI), destination address high convergence
2) Periodic variation law of session number variation in 24 hours daily
3) Single source address access presence periodicity
For the management data flow of the management terminal and the local server, the management terminal is mainly used for inquiring and synchronizing the state, synchronizing the task, maintaining and the like. The method is characterized by comprising the following steps:
1) Destination address, port height convergence
2) Source address is highly random on session time sequence, different internals randomly select time operation management
3) The session number has a periodic variation law of 24 hours per day, and particularly the working time period distribution of working days is the main
And extracting corresponding multidimensional characteristic information of each leaf node according to the characteristics of the actual data types, comparing the characteristic information of each dimension with the characteristic information of the actual data types, and obtaining an initial confidence coefficient according to the characteristic information similarity after comparison. After the feature information of all the dimensions is obtained, calculating the data type confidence coefficient of the session according to the weight corresponding to each dimension, wherein the higher the data type confidence coefficient is, the higher the probability of belonging to the type of data is.
As an alternative embodiment, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, the traffic main body is occupied, the labeling port is fixed and limited, so that the type of the session can be rapidly judged according to the characteristics. The service types such as the transmission protocol http, mqtt, DNS recursion are all traffic data flows. And adding the session which is judged to be the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an optional implementation manner, the multidimensional feature information in the device interconnection information at least includes two kinds of following: convergence condition information of the source address; convergence condition information of the destination address; convergence condition information of the destination port; the single source address accesses the session frequency characteristic information; single source address session periodicity variation law characteristic information.
As a specific embodiment, each leaf node may be analyzed in five dimensions.
Dimension 1: convergence information of the source address. The source address of the service data stream is highly converged; the source addresses of the management data streams are substantially converged within a limited address range.
Dimension 2: convergence information of the destination address. The destination address of the service data stream is highly converged; the destination addresses of the management data streams are highly random and converge into a finite set.
Dimension 3: convergence status information of the destination port. The destination ports of the traffic data flows converge into a finite set; the destination ports that manage the data flows converge into a limited set and the daily destination address sets are substantially the same.
Dimension 4: the single source address accesses session frequency characteristic information. The single source address access session frequency of the service data flow presents high-frequency characteristics, and reference distribution exists; the statistical population of weekly or daily frequency of single source address access sessions that manage the data stream is substantially evenly distributed.
Dimension 5: single source address session periodicity variation law characteristic information. A single source address session in the service data stream has a 24-hour daily periodic variation rule (baseline rule); a single source address session in the management data stream is a 24-hour daily periodic law (baseline law), especially five in the morning, evening, monday to friday, as the primary traffic distribution period.
As a specific implementation manner, when determining the data type reputation value of a leaf node, firstly determining the source address convergence condition of the leaf node, if the source address convergence condition of the leaf node accords with the high convergence characteristic, determining the initial reputation value of the business data stream with dimension 1 of the leaf node as 100, and determining the initial reputation value of the management data stream as 0; secondly, determining the convergence condition of the destination address of the leaf node, if the convergence condition of the destination address of the leaf node accords with the high convergence characteristic, determining the initial reputation value of the business data stream of the dimension 2 of the leaf node as 100, and determining the initial reputation value of the management data stream as 0; thirdly, determining the convergence condition of the destination port of the leaf node, if the convergence condition of the destination port of the leaf node accords with the characteristic that the destination port converges in a limited set and the difference of daily destination address sets is large, determining the initial reputation value of the business data stream of the dimension 3 of the leaf node as 100, and determining the initial reputation value of the management data stream as 0; fourth, determining the single source address access session frequency characteristic of the leaf node, if the single source address access session frequency presents the high frequency characteristic and the reference distribution characteristic exists, determining the initial reputation value of the business data stream of the leaf node dimension 4 as 100, and determining the initial reputation value of the management data stream as 0; fifthly, determining the periodic variation rule characteristics of the single source address session of the leaf node, if the single source address session of the leaf node accords with the 24-hour daily periodic variation rule (baseline rule) and does not accord with the characteristics of towards nine-night five, monday friday is the main flow distribution period, determining the initial reputation value of the business data stream of the leaf node dimension 4 as 100, and determining the initial reputation value of the management data stream as 0.
The weight of the service data stream dimension 1-5 is 0.2, 0.3 and 0.1 respectively; then a rule is determined according to the data type of the service data flow, and the confidence of the data type of the leaf node belonging to the service data flow is: 0.2×100+0.2×100+0.2×100+0.3×100+0.1×100=100;
the weight of the management data stream dimension 1-5 is 0.2, 0.2 and 0.2 respectively; then a rule is determined based on the data type of the management data stream, the confidence that the leaf node belongs to the data type of the management data stream is: 0.2+0.2+0.2+0.2+0.2+0.2+0.2 =0;
and comparing the data type confidence degrees of the two data stream types, wherein the data type confidence degree of the leaf node belonging to the service data stream is larger, and then the data type confidence degree 100 is required to be compared with a preset confidence degree threshold value 80, and if the data type confidence degree of the leaf node is larger than the preset confidence degree threshold value, the data type of the leaf node is judged to be the service data stream, the legal connection relation is met, and the session under the leaf node is further classified as a white list.
As an optional implementation manner, fig. 3 is a flowchart of a method for determining an abnormal session according to a data volume according to an embodiment of the present invention, as shown in fig. 3, where after S102 determines whether a corresponding session exists in a preset whitelist according to the device interconnection information, the method includes:
s108, acquiring the data quantity of the session existing in the preset white list;
s109, comparing the data volume with a preset standard data volume;
s110, determining the session corresponding to the data volume larger than the preset standard data volume as an abnormal session.
For the sessions existing in the white list, counting the data quantity of each session in a preset period, and judging the session corresponding to the data quantity higher than the preset data quantity as an abnormal session. Therefore, the abnormal judgment is carried out on the session with sudden increase of the data volume caused by equipment failure, network failure or software reasons, so that the accuracy of abnormal session detection is improved, and the condition of missing report is reduced.
As an optional implementation manner, after determining, as the latest whitelist, the session set corresponding to the leaf node with the data type confidence coefficient greater than the preset confidence coefficient threshold in S105, the method includes: and S107, supplementing the latest white list into the preset white list.
In this embodiment, the preset whitelist is dynamically updated, and after the latest whitelist is obtained, data in the latest whitelist is added into the preset whitelist, and the preset whitelist after the data is supplemented is used as a basis for judging whether the next session is legally connected.
As an optional implementation manner, fig. 4 is a block diagram of an abnormal session detection apparatus provided by an embodiment of the present invention, and as shown in fig. 4, the present invention further provides an abnormal session detection apparatus, including:
an extracting module 100, configured to extract a session in a data stream and device interconnection information corresponding to the session;
the device interconnection information includes at least: information such as source address (IMSI), source port, destination address, destination port, protocol type, connection establishment time, periodicity, number of interconnections, etc.
A judging module 200, configured to judge whether a corresponding session exists in a preset whitelist according to the device interconnection information;
the preset white list is a set of sessions conforming to legal connection relations, and can be set manually or obtained by analyzing data by a system.
The session tree generation module 300 is configured to classify sessions that do not exist in the preset whitelist step by step according to session content, so as to obtain a session tree;
and classifying all the sessions step by step according to the information such as the source address, the source port, the destination address, the destination port, the transmission protocol, the instruction of the session initiator, the used account number and the like of the session to obtain a session tree.
A confidence calculating module 400, configured to calculate a data type confidence of each leaf node in the session tree according to the device interconnection information;
for each leaf node in the session tree, the credibility of each leaf node can be scored according to the dimension characteristics of the session number, frequency, periodicity, instruction, single IP address session quantity, 24-hour regular distribution and the like in the equipment interconnection information.
The latest white list determining module 500 is configured to determine, as a latest white list, a session set corresponding to a leaf node whose data type confidence coefficient is greater than a preset confidence coefficient threshold;
the first abnormal session judging module 600 is configured to judge that a session that does not exist in the latest whitelist is an abnormal session.
If the confidence coefficient of the data type of a certain leaf node is larger than a preset confidence coefficient threshold value, the session under the node is a session conforming to the legal connection relation, the corresponding session set is determined to be the latest white list, and the session not belonging to the latest white list set is determined to be an abnormal session.
As an alternative implementation manner, fig. 5 is a block diagram of a confidence coefficient calculating module provided by an embodiment of the present invention, and as shown in fig. 5, the confidence coefficient calculating module 400 includes:
a multidimensional feature information extraction submodule 4001, configured to extract and count multidimensional feature information in the device interconnection information;
an initial confidence determining submodule 4002, configured to determine, according to the coincidence degree of the feature information of each dimension and the data type feature, a session initial confidence corresponding to the feature information of each dimension;
a data type confidence determining submodule 4003, configured to synthesize a plurality of session initial confidences to obtain the data type confidence.
The traffic types can be roughly divided into two types, one is a production terminal and a local server, and the other is a management terminal and a local server.
And for the production terminal and the local server service data stream, the production terminal state acquisition and reporting, the operation instruction of the local server computing center on the terminal and the like are included. The method is characterized by comprising the following steps:
1) Source address (IMSI), destination address high convergence
2) Periodic variation law of session number variation in 24 hours daily
3) Single source address access presence periodicity
For the management data flow of the management terminal and the local server, the management terminal is mainly used for inquiring and synchronizing the state, synchronizing the task, maintaining and the like. The method is characterized by comprising the following steps:
1) Destination address, port height convergence
2) Source address is highly random on session time sequence, different internals randomly select time operation management
3) The session number has a periodic variation law of 24 hours per day, and particularly the working time period distribution of working days is the main
And extracting corresponding multidimensional characteristic information of each leaf node according to the characteristics of the actual data types, comparing the characteristic information of each dimension with the characteristic information of the actual data types, and obtaining an initial confidence coefficient according to the characteristic information similarity after comparison. After the feature information of all the dimensions is obtained, calculating the data type confidence coefficient of the session according to the weight corresponding to each dimension, wherein the higher the data type confidence coefficient is, the higher the probability of belonging to the type of data is.
As an alternative embodiment, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, the traffic main body is occupied, the labeling port is fixed and limited, so that the type of the session can be rapidly judged according to the characteristics. The service types such as the transmission protocol http, mqtt, DNS recursion are all traffic data flows. And adding the session which is judged to be the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an optional implementation manner, fig. 6 is a schematic structural diagram of a data amount determining module, a comparing module, and a second abnormal session judging module provided in the embodiment of the present invention, as shown in fig. 6, where the apparatus further includes:
a data amount determining module 700, configured to obtain the data amount of the session existing in the preset whitelist;
a comparison module 800, configured to compare the data amount with a preset standard data amount;
and a second abnormal session judging module 900, configured to determine a session corresponding to a data amount greater than the preset standard data amount as an abnormal session.
As an alternative embodiment, a preliminary filtering may be performed on all sessions before the session tree is generated. For the service data flow, the traffic main body is occupied, the labeling port is fixed and limited, so that the type of the session can be rapidly judged according to the characteristics. The service types such as the transmission protocol http, mqtt, DNS recursion are all traffic data flows. And adding the session which is judged to be the service flow in advance into a white list, so that the session does not participate in the subsequent session tree division step, and the subsequent data processing amount is reduced.
As an alternative embodiment, the device further comprises:
and the supplementing module is used for supplementing the latest white list into the preset white list.
In this embodiment, the preset whitelist is dynamically updated, and after the latest whitelist is obtained, data in the latest whitelist is added into the preset whitelist, and the preset whitelist after the data is supplemented is used as a basis for judging whether the next session is legally connected.
As an alternative embodiment, the present invention also provides a computer storage medium having stored thereon a computer program which when executed by a processor implements the abnormal session detection method described above.
The above-described software is stored in the above-described storage medium including, but not limited to: optical discs, floppy discs, hard discs, erasable memory, etc.
The technical scheme has the following beneficial effects: filtering abnormal sessions which do not accord with legal connection relations by setting a white list; after dividing the session into session trees, calculating the characteristics of each leaf node, calculating the confidence coefficient of each leaf node, and determining the latest white list according to the confidence coefficient. The white list is dynamically generated according to the session, so that manpower is saved, and the detection efficiency of the abnormal session is improved. In addition, the white list is dynamically updated according to different sessions, so that the scheme can accurately detect each session.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.